Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Ric Wheeler

Zach Brown wrote:

The second use case is to look at the physical layout of blocks on disk
for a specific file, use Mark Lord's write_long patches to inject a disk
error and then read that file to make sure that we are handling disk IO
errors correctly.  A bit obscure, but really quite useful.


Hmm, yeah, that's interesting.


It would be even better if we could poke holes in metadata, etc, but 
this gives us a reasonable test case.





We have also used FIBMAP a few times to try and map an observed IO error
back to a file. Really slow and painful to do, but should work on any
file system when a better method is not supported.


We're getting off of this FIBMAP topic, but this interests me.  Can we
explore this a little?  How did you find out about the error without
having a file to associate with it?  Drive scrubbing, or some such?

- z


Vladimir extended debugreiserfs to do an optimized reverse mapping scan 
of the disk sector to file/metadata/etc. Definitely worth having that 
ability for any file system.


We also do drive scrubbing, looking for bad sectors. The list of those 
sectors is fed into the reverse mapping code to enable us to gage the 
impact of the IO errors, start recovering the user files, etc.


The scrub code we use takes advantage of the read-verify command (to 
avoid data transfer from the drive to the page cache).


ric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Zach Brown

> The second use case is to look at the physical layout of blocks on disk
> for a specific file, use Mark Lord's write_long patches to inject a disk
> error and then read that file to make sure that we are handling disk IO
> errors correctly.  A bit obscure, but really quite useful.

Hmm, yeah, that's interesting.

> We have also used FIBMAP a few times to try and map an observed IO error
> back to a file. Really slow and painful to do, but should work on any
> file system when a better method is not supported.

We're getting off of this FIBMAP topic, but this interests me.  Can we
explore this a little?  How did you find out about the error without
having a file to associate with it?  Drive scrubbing, or some such?

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Ric Wheeler

Zach Brown wrote:

Can you clarify what you mean above with an example?  I don't really
follow.


Sure, take 'tar' as an example.  It'll read files in the order that
their names are returned from directory listing.  This can produce bad
IO patterns because the order in which the file names are returned
doesn't match the order of the file's blocks on disk.  (htree, I'm
looking at you!)

People have noticed that tar-like loads can be sped up greatly just by
sorting the files by their inode number as returned by stat(), never
mind the file blocks themselves.  One example of this is Chris Mason's
'acp'.

  http://oss.oracle.com/~mason/acp/

The logical extension of that is to use FIBMAP to find the order of file
blocks on disk and then doing IO on blocks in sorted order.  It'd take
work to write an app that does this reliably, sure.

In this use the application doesn't actually care what the absolute
numbers are.  It cares about their ordering.  File systems would be able
to chose whatever scheme they wanted for the actual values of the
results from a FIBMAP-alike as long as the sorting resulted in the right
IO patterns.

Arguing that this use is significant enough to justify an addition to
the file system API is a stretch.  I'm just sharing the observation.

- z


I use FIBMAP support for a few different things.

The first is to exactly the case that you describe above where we can 
use the first block of a file extracted by FIBMAP to produce an optimal 
sorting for the read order.  My testing showed that the cost of the 
extra fibmap was not too high compared to the speedup, but it was not a 
huge gain over the speedup gained when the read was done in inode sorted 
order.


The second use case is to look at the physical layout of blocks on disk 
for a specific file, use Mark Lord's write_long patches to inject a disk 
error and then read that file to make sure that we are handling disk IO 
errors correctly.  A bit obscure, but really quite useful.


We have also used FIBMAP a few times to try and map an observed IO error 
back to a file. Really slow and painful to do, but should work on any 
file system when a better method is not supported.



ric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Ric Wheeler

Zach Brown wrote:

Can you clarify what you mean above with an example?  I don't really
follow.


Sure, take 'tar' as an example.  It'll read files in the order that
their names are returned from directory listing.  This can produce bad
IO patterns because the order in which the file names are returned
doesn't match the order of the file's blocks on disk.  (htree, I'm
looking at you!)

People have noticed that tar-like loads can be sped up greatly just by
sorting the files by their inode number as returned by stat(), never
mind the file blocks themselves.  One example of this is Chris Mason's
'acp'.

  http://oss.oracle.com/~mason/acp/

The logical extension of that is to use FIBMAP to find the order of file
blocks on disk and then doing IO on blocks in sorted order.  It'd take
work to write an app that does this reliably, sure.

In this use the application doesn't actually care what the absolute
numbers are.  It cares about their ordering.  File systems would be able
to chose whatever scheme they wanted for the actual values of the
results from a FIBMAP-alike as long as the sorting resulted in the right
IO patterns.

Arguing that this use is significant enough to justify an addition to
the file system API is a stretch.  I'm just sharing the observation.

- z


I use FIBMAP support for a few different things.

The first is to exactly the case that you describe above where we can 
use the first block of a file extracted by FIBMAP to produce an optimal 
sorting for the read order.  My testing showed that the cost of the 
extra fibmap was not too high compared to the speedup, but it was not a 
huge gain over the speedup gained when the read was done in inode sorted 
order.


The second use case is to look at the physical layout of blocks on disk 
for a specific file, use Mark Lord's write_long patches to inject a disk 
error and then read that file to make sure that we are handling disk IO 
errors correctly.  A bit obscure, but really quite useful.


We have also used FIBMAP a few times to try and map an observed IO error 
back to a file. Really slow and painful to do, but should work on any 
file system when a better method is not supported.



ric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Zach Brown

 The second use case is to look at the physical layout of blocks on disk
 for a specific file, use Mark Lord's write_long patches to inject a disk
 error and then read that file to make sure that we are handling disk IO
 errors correctly.  A bit obscure, but really quite useful.

Hmm, yeah, that's interesting.

 We have also used FIBMAP a few times to try and map an observed IO error
 back to a file. Really slow and painful to do, but should work on any
 file system when a better method is not supported.

We're getting off of this FIBMAP topic, but this interests me.  Can we
explore this a little?  How did you find out about the error without
having a file to associate with it?  Drive scrubbing, or some such?

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Ric Wheeler

Zach Brown wrote:

The second use case is to look at the physical layout of blocks on disk
for a specific file, use Mark Lord's write_long patches to inject a disk
error and then read that file to make sure that we are handling disk IO
errors correctly.  A bit obscure, but really quite useful.


Hmm, yeah, that's interesting.


It would be even better if we could poke holes in metadata, etc, but 
this gives us a reasonable test case.





We have also used FIBMAP a few times to try and map an observed IO error
back to a file. Really slow and painful to do, but should work on any
file system when a better method is not supported.


We're getting off of this FIBMAP topic, but this interests me.  Can we
explore this a little?  How did you find out about the error without
having a file to associate with it?  Drive scrubbing, or some such?

- z


Vladimir extended debugreiserfs to do an optimized reverse mapping scan 
of the disk sector to file/metadata/etc. Definitely worth having that 
ability for any file system.


We also do drive scrubbing, looking for bad sectors. The list of those 
sectors is fed into the reverse mapping code to enable us to gage the 
impact of the IO errors, start recovering the user files, etc.


The scrub code we use takes advantage of the read-verify command (to 
avoid data transfer from the drive to the page cache).


ric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown

> But, we shouldn't inflict all of this on fibmap/fiemapwe'll get
> lost trying to make the one true interface for all operations.
> 
> For grouping operations on files, I think a read_tree syscall with
> hints for what userland will do (read, stat, delete, list
> filenames), and a better cookie than readdir should do it.

Agreed, on both points.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown

> Can you clarify what you mean above with an example?  I don't really
> follow.

Sure, take 'tar' as an example.  It'll read files in the order that
their names are returned from directory listing.  This can produce bad
IO patterns because the order in which the file names are returned
doesn't match the order of the file's blocks on disk.  (htree, I'm
looking at you!)

People have noticed that tar-like loads can be sped up greatly just by
sorting the files by their inode number as returned by stat(), never
mind the file blocks themselves.  One example of this is Chris Mason's
'acp'.

  http://oss.oracle.com/~mason/acp/

The logical extension of that is to use FIBMAP to find the order of file
blocks on disk and then doing IO on blocks in sorted order.  It'd take
work to write an app that does this reliably, sure.

In this use the application doesn't actually care what the absolute
numbers are.  It cares about their ordering.  File systems would be able
to chose whatever scheme they wanted for the actual values of the
results from a FIBMAP-alike as long as the sorting resulted in the right
IO patterns.

Arguing that this use is significant enough to justify an addition to
the file system API is a stretch.  I'm just sharing the observation.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Andreas Dilger
On Oct 29, 2007  12:16 -0700, Mike Waychison wrote:
> Chris Mason wrote:
>> Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
>> was a way to indicate your-data-is-here-but-isn't-alone.  But that's
>> more of a feature for the FIEMAP stuff.
>
> I hadn't heard of FIEMAP, so I went back and read the thread from 
> April/May.  It seems that this is a much better approach than introducing a 
> FIBMAP64.
>
> What ever happened with this proposal?

We made a patch for ext4 that we need to update and push upstream.  I've
just resent the spec we used in a separate email (attached to old thread)
for reference.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Chris Mason
On Mon, 29 Oct 2007 12:18:22 -0700
Mike Waychison <[EMAIL PROTECTED]> wrote:

> Zach Brown wrote:
> >>> And another of my pet peeves with ->bmap is that it uses 0 to
> >>> mean "sparse" which causes a conflict on NTFS at least as block
> >>> zero is part of the $Boot system file so it is a real, valid
> >>> block...  NTFS uses -1 to denote sparse blocks internally.
> >> Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if
> >> there was a way to indicate your-data-is-here-but-isn't-alone.
> >> But that's more of a feature for the FIEMAP stuff.
> > 
> > And maybe we can step back and see what the callers of FIBMAP are
> > doing with the results they're getting.
> > 
> > One use is to discover the order in which to read file data that
> > will result in efficient IO.
> > 
> > If we had an interface specifically for this use case then perhaps a
> > sparse block would be better reported as the position of the inode
> > relative to other data blocks.  Maybe the inode block number in
> > ext* land.
> > 
> 
> Can you clarify what you mean above with an example?  I don't really
> follow.

This is a larger topic of helping userland optimize access to groups of
files.  For example, during a readdir if we knew the next step was to
delete all the files found, we could do one top of readahead (or even
ordering the returned values).

If we knew the next step would be to read all the files found, a
different type of readahead would be useful.

But, we shouldn't inflict all of this on fibmap/fiemapwe'll get
lost trying to make the one true interface for all operations.

For grouping operations on files, I think a read_tree syscall with
hints for what userland will do (read, stat, delete, list
filenames), and a better cookie than readdir should do it.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Mike Waychison

Zach Brown wrote:
And another of my pet peeves with ->bmap is that it uses 0 to mean  
"sparse" which causes a conflict on NTFS at least as block zero is  
part of the $Boot system file so it is a real, valid block...  NTFS  
uses -1 to denote sparse blocks internally.

Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
was a way to indicate your-data-is-here-but-isn't-alone.  But that's
more of a feature for the FIEMAP stuff.


And maybe we can step back and see what the callers of FIBMAP are doing
with the results they're getting.

One use is to discover the order in which to read file data that will
result in efficient IO.

If we had an interface specifically for this use case then perhaps a
sparse block would be better reported as the position of the inode
relative to other data blocks.  Maybe the inode block number in ext* land.



Can you clarify what you mean above with an example?  I don't really follow.

Thanks,

Mike Waychison


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Mike Waychison

Chris Mason wrote:

On Sat, 27 Oct 2007 18:57:06 +0100
Anton Altaparmakov <[EMAIL PROTECTED]> wrote:


Hi,

->bmap is ugly and horrible!  If you have to do this at the very
least please cause ->bmap64 to be able to return error values in case
the file system failed to get the information or indeed such
information does not exist as is the case for compressed and
encrypted files for example and also for small files that are inside
the on-disk inode (NTFS resident files and reiserfs packed tails are
examples of this).

And another of my pet peeves with ->bmap is that it uses 0 to mean  
"sparse" which causes a conflict on NTFS at least as block zero is  
part of the $Boot system file so it is a real, valid block...  NTFS  
uses -1 to denote sparse blocks internally.


Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
was a way to indicate your-data-is-here-but-isn't-alone.  But that's
more of a feature for the FIEMAP stuff.



I hadn't heard of FIEMAP, so I went back and read the thread from 
April/May.  It seems that this is a much better approach than 
introducing a FIBMAP64.


What ever happened with this proposal?

Mike Waychison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown

>> And another of my pet peeves with ->bmap is that it uses 0 to mean  
>> "sparse" which causes a conflict on NTFS at least as block zero is  
>> part of the $Boot system file so it is a real, valid block...  NTFS  
>> uses -1 to denote sparse blocks internally.
> 
> Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
> was a way to indicate your-data-is-here-but-isn't-alone.  But that's
> more of a feature for the FIEMAP stuff.

And maybe we can step back and see what the callers of FIBMAP are doing
with the results they're getting.

One use is to discover the order in which to read file data that will
result in efficient IO.

If we had an interface specifically for this use case then perhaps a
sparse block would be better reported as the position of the inode
relative to other data blocks.  Maybe the inode block number in ext* land.

This use is interesting to me because it can be useful for any file
system, particularly networked file systems.

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Chris Mason
On Sat, 27 Oct 2007 18:57:06 +0100
Anton Altaparmakov <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> ->bmap is ugly and horrible!  If you have to do this at the very
> least please cause ->bmap64 to be able to return error values in case
> the file system failed to get the information or indeed such
> information does not exist as is the case for compressed and
> encrypted files for example and also for small files that are inside
> the on-disk inode (NTFS resident files and reiserfs packed tails are
> examples of this).
> 
> And another of my pet peeves with ->bmap is that it uses 0 to mean  
> "sparse" which causes a conflict on NTFS at least as block zero is  
> part of the $Boot system file so it is a real, valid block...  NTFS  
> uses -1 to denote sparse blocks internally.

Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
was a way to indicate your-data-is-here-but-isn't-alone.  But that's
more of a feature for the FIEMAP stuff.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Chris Mason
On Sat, 27 Oct 2007 18:57:06 +0100
Anton Altaparmakov [EMAIL PROTECTED] wrote:

 Hi,
 
 -bmap is ugly and horrible!  If you have to do this at the very
 least please cause -bmap64 to be able to return error values in case
 the file system failed to get the information or indeed such
 information does not exist as is the case for compressed and
 encrypted files for example and also for small files that are inside
 the on-disk inode (NTFS resident files and reiserfs packed tails are
 examples of this).
 
 And another of my pet peeves with -bmap is that it uses 0 to mean  
 sparse which causes a conflict on NTFS at least as block zero is  
 part of the $Boot system file so it is a real, valid block...  NTFS  
 uses -1 to denote sparse blocks internally.

Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
was a way to indicate your-data-is-here-but-isn't-alone.  But that's
more of a feature for the FIEMAP stuff.

-chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown

 And another of my pet peeves with -bmap is that it uses 0 to mean  
 sparse which causes a conflict on NTFS at least as block zero is  
 part of the $Boot system file so it is a real, valid block...  NTFS  
 uses -1 to denote sparse blocks internally.
 
 Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
 was a way to indicate your-data-is-here-but-isn't-alone.  But that's
 more of a feature for the FIEMAP stuff.

And maybe we can step back and see what the callers of FIBMAP are doing
with the results they're getting.

One use is to discover the order in which to read file data that will
result in efficient IO.

If we had an interface specifically for this use case then perhaps a
sparse block would be better reported as the position of the inode
relative to other data blocks.  Maybe the inode block number in ext* land.

This use is interesting to me because it can be useful for any file
system, particularly networked file systems.

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Mike Waychison

Chris Mason wrote:

On Sat, 27 Oct 2007 18:57:06 +0100
Anton Altaparmakov [EMAIL PROTECTED] wrote:


Hi,

-bmap is ugly and horrible!  If you have to do this at the very
least please cause -bmap64 to be able to return error values in case
the file system failed to get the information or indeed such
information does not exist as is the case for compressed and
encrypted files for example and also for small files that are inside
the on-disk inode (NTFS resident files and reiserfs packed tails are
examples of this).

And another of my pet peeves with -bmap is that it uses 0 to mean  
sparse which causes a conflict on NTFS at least as block zero is  
part of the $Boot system file so it is a real, valid block...  NTFS  
uses -1 to denote sparse blocks internally.


Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
was a way to indicate your-data-is-here-but-isn't-alone.  But that's
more of a feature for the FIEMAP stuff.



I hadn't heard of FIEMAP, so I went back and read the thread from 
April/May.  It seems that this is a much better approach than 
introducing a FIBMAP64.


What ever happened with this proposal?

Mike Waychison
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Mike Waychison

Zach Brown wrote:
And another of my pet peeves with -bmap is that it uses 0 to mean  
sparse which causes a conflict on NTFS at least as block zero is  
part of the $Boot system file so it is a real, valid block...  NTFS  
uses -1 to denote sparse blocks internally.

Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
was a way to indicate your-data-is-here-but-isn't-alone.  But that's
more of a feature for the FIEMAP stuff.


And maybe we can step back and see what the callers of FIBMAP are doing
with the results they're getting.

One use is to discover the order in which to read file data that will
result in efficient IO.

If we had an interface specifically for this use case then perhaps a
sparse block would be better reported as the position of the inode
relative to other data blocks.  Maybe the inode block number in ext* land.



Can you clarify what you mean above with an example?  I don't really follow.

Thanks,

Mike Waychison


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown

 Can you clarify what you mean above with an example?  I don't really
 follow.

Sure, take 'tar' as an example.  It'll read files in the order that
their names are returned from directory listing.  This can produce bad
IO patterns because the order in which the file names are returned
doesn't match the order of the file's blocks on disk.  (htree, I'm
looking at you!)

People have noticed that tar-like loads can be sped up greatly just by
sorting the files by their inode number as returned by stat(), never
mind the file blocks themselves.  One example of this is Chris Mason's
'acp'.

  http://oss.oracle.com/~mason/acp/

The logical extension of that is to use FIBMAP to find the order of file
blocks on disk and then doing IO on blocks in sorted order.  It'd take
work to write an app that does this reliably, sure.

In this use the application doesn't actually care what the absolute
numbers are.  It cares about their ordering.  File systems would be able
to chose whatever scheme they wanted for the actual values of the
results from a FIBMAP-alike as long as the sorting resulted in the right
IO patterns.

Arguing that this use is significant enough to justify an addition to
the file system API is a stretch.  I'm just sharing the observation.

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Andreas Dilger
On Oct 29, 2007  12:16 -0700, Mike Waychison wrote:
 Chris Mason wrote:
 Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if there
 was a way to indicate your-data-is-here-but-isn't-alone.  But that's
 more of a feature for the FIEMAP stuff.

 I hadn't heard of FIEMAP, so I went back and read the thread from 
 April/May.  It seems that this is a much better approach than introducing a 
 FIBMAP64.

 What ever happened with this proposal?

We made a patch for ext4 that we need to update and push upstream.  I've
just resent the spec we used in a separate email (attached to old thread)
for reference.

Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown

 But, we shouldn't inflict all of this on fibmap/fiemapwe'll get
 lost trying to make the one true interface for all operations.
 
 For grouping operations on files, I think a read_tree syscall with
 hints for what userland will do (read, stat, delete, list
 filenames), and a better cookie than readdir should do it.

Agreed, on both points.

- z
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Chris Mason
On Mon, 29 Oct 2007 12:18:22 -0700
Mike Waychison [EMAIL PROTECTED] wrote:

 Zach Brown wrote:
  And another of my pet peeves with -bmap is that it uses 0 to
  mean sparse which causes a conflict on NTFS at least as block
  zero is part of the $Boot system file so it is a real, valid
  block...  NTFS uses -1 to denote sparse blocks internally.
  Reiserfs and Btrfs also use 0 to mean packed.  It would be nice if
  there was a way to indicate your-data-is-here-but-isn't-alone.
  But that's more of a feature for the FIEMAP stuff.
  
  And maybe we can step back and see what the callers of FIBMAP are
  doing with the results they're getting.
  
  One use is to discover the order in which to read file data that
  will result in efficient IO.
  
  If we had an interface specifically for this use case then perhaps a
  sparse block would be better reported as the position of the inode
  relative to other data blocks.  Maybe the inode block number in
  ext* land.
  
 
 Can you clarify what you mean above with an example?  I don't really
 follow.

This is a larger topic of helping userland optimize access to groups of
files.  For example, during a readdir if we knew the next step was to
delete all the files found, we could do one top of readahead (or even
ordering the returned values).

If we knew the next step would be to read all the files found, a
different type of readahead would be useful.

But, we shouldn't inflict all of this on fibmap/fiemapwe'll get
lost trying to make the one true interface for all operations.

For grouping operations on files, I think a read_tree syscall with
hints for what userland will do (read, stat, delete, list
filenames), and a better cookie than readdir should do it.

-chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread H. Peter Anvin

Mike Waychison wrote:

The following series is meant to clean up FIBMAP paths with the eventual goal 
of allowing users to be able to FIBMAP their data.


Keep in mind FIBMAP is currently extremely expensive on some 
filesystems, e.g. ext3.  Therefore, additional filesystem-level work 
would have to be done in order for this not become a DoS issue.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread Szabolcs Szakacsits

On Sat, 27 Oct 2007, Anton Altaparmakov wrote:

> And another of my pet peeves with ->bmap is that it uses 0 to mean "sparse"
> which causes a conflict on NTFS at least as block zero is part of the $Boot
> system file so it is a real, valid block...  NTFS uses -1 to denote sparse
> blocks internally.

In practice, the meaning of 0 is file system [driver] dependent. For 
example in case of NTFS-3G it means that the block is sparse or the file is 
encrypted or compressed, or resident, or it's the $Boot file, or an error 
happened.

Thankfully the widely used FIBMAP users (swapon and the ever less used 
lilo) are only interested in the non-zero values and they report an error 
if the driver returns 0 for some reason. Which is perfectly ok since both 
swaping and Linux booting would fail using a sparse, encrypted, compressed, 
resident, or the NTFS $Boot file. 

But in real, both swap files and lilo work fine with NTFS if the needed 
files were created the way these softwares expect. If not then swapon or 
lilo will catch and report the file creation error.

Afair, somebody is doing (has done?) an indeed much needed, better 
alternative. Bmap is legacy, thank you Mike for maintaining it.

Szaka

--
NTFS-3G Lead Developer:  http://ntfs-3g.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread Anton Altaparmakov

Hi,

->bmap is ugly and horrible!  If you have to do this at the very least  
please cause ->bmap64 to be able to return error values in case the  
file system failed to get the information or indeed such information  
does not exist as is the case for compressed and encrypted files for  
example and also for small files that are inside the on-disk inode  
(NTFS resident files and reiserfs packed tails are examples of this).


And another of my pet peeves with ->bmap is that it uses 0 to mean  
"sparse" which causes a conflict on NTFS at least as block zero is  
part of the $Boot system file so it is a real, valid block...  NTFS  
uses -1 to denote sparse blocks internally.


Best regards,

Anton

On 27 Oct 2007, at 00:37, Mike Waychison wrote:

The following series is meant to clean up FIBMAP paths with the  
eventual goal of allowing users to be able to FIBMAP their data.


I'm sending this as an RFC as I've only tested this on a x86_64  
kernel with a 32bit binary on ext2 and I've noticed a couple  
ext2_warnings already.


I'm unsure of the locking in [4/6] fix_race_with_truncate.patch.   
Any help here would greatly be appreciated.


The last patch, [6/6] drop_cap_sys_rawio_for_fibmap.patch, is of  
course, not to be applied until any remaining issues are fixed :)


Thanks,

Mike Waychison



--
Anton Altaparmakov  (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread Anton Altaparmakov

Hi,

-bmap is ugly and horrible!  If you have to do this at the very least  
please cause -bmap64 to be able to return error values in case the  
file system failed to get the information or indeed such information  
does not exist as is the case for compressed and encrypted files for  
example and also for small files that are inside the on-disk inode  
(NTFS resident files and reiserfs packed tails are examples of this).


And another of my pet peeves with -bmap is that it uses 0 to mean  
sparse which causes a conflict on NTFS at least as block zero is  
part of the $Boot system file so it is a real, valid block...  NTFS  
uses -1 to denote sparse blocks internally.


Best regards,

Anton

On 27 Oct 2007, at 00:37, Mike Waychison wrote:

The following series is meant to clean up FIBMAP paths with the  
eventual goal of allowing users to be able to FIBMAP their data.


I'm sending this as an RFC as I've only tested this on a x86_64  
kernel with a 32bit binary on ext2 and I've noticed a couple  
ext2_warnings already.


I'm unsure of the locking in [4/6] fix_race_with_truncate.patch.   
Any help here would greatly be appreciated.


The last patch, [6/6] drop_cap_sys_rawio_for_fibmap.patch, is of  
course, not to be applied until any remaining issues are fixed :)


Thanks,

Mike Waychison



--
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread Szabolcs Szakacsits

On Sat, 27 Oct 2007, Anton Altaparmakov wrote:

 And another of my pet peeves with -bmap is that it uses 0 to mean sparse
 which causes a conflict on NTFS at least as block zero is part of the $Boot
 system file so it is a real, valid block...  NTFS uses -1 to denote sparse
 blocks internally.

In practice, the meaning of 0 is file system [driver] dependent. For 
example in case of NTFS-3G it means that the block is sparse or the file is 
encrypted or compressed, or resident, or it's the $Boot file, or an error 
happened.

Thankfully the widely used FIBMAP users (swapon and the ever less used 
lilo) are only interested in the non-zero values and they report an error 
if the driver returns 0 for some reason. Which is perfectly ok since both 
swaping and Linux booting would fail using a sparse, encrypted, compressed, 
resident, or the NTFS $Boot file. 

But in real, both swap files and lilo work fine with NTFS if the needed 
files were created the way these softwares expect. If not then swapon or 
lilo will catch and report the file creation error.

Afair, somebody is doing (has done?) an indeed much needed, better 
alternative. Bmap is legacy, thank you Mike for maintaining it.

Szaka

--
NTFS-3G Lead Developer:  http://ntfs-3g.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread H. Peter Anvin

Mike Waychison wrote:

The following series is meant to clean up FIBMAP paths with the eventual goal 
of allowing users to be able to FIBMAP their data.


Keep in mind FIBMAP is currently extremely expensive on some 
filesystems, e.g. ext3.  Therefore, additional filesystem-level work 
would have to be done in order for this not become a DoS issue.


-hpa

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/6][RFC] Cleanup FIBMAP

2007-10-26 Thread Mike Waychison
The following series is meant to clean up FIBMAP paths with the eventual goal 
of allowing users to be able to FIBMAP their data.

I'm sending this as an RFC as I've only tested this on a x86_64 kernel with a 
32bit binary on ext2 and I've noticed a couple ext2_warnings already.

I'm unsure of the locking in [4/6] fix_race_with_truncate.patch.  Any help here 
would greatly be appreciated.

The last patch, [6/6] drop_cap_sys_rawio_for_fibmap.patch, is of course, not to 
be applied until any remaining issues are fixed :)

Thanks,

Mike Waychison
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/6][RFC] Cleanup FIBMAP

2007-10-26 Thread Mike Waychison
The following series is meant to clean up FIBMAP paths with the eventual goal 
of allowing users to be able to FIBMAP their data.

I'm sending this as an RFC as I've only tested this on a x86_64 kernel with a 
32bit binary on ext2 and I've noticed a couple ext2_warnings already.

I'm unsure of the locking in [4/6] fix_race_with_truncate.patch.  Any help here 
would greatly be appreciated.

The last patch, [6/6] drop_cap_sys_rawio_for_fibmap.patch, is of course, not to 
be applied until any remaining issues are fixed :)

Thanks,

Mike Waychison
--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/