Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, 28 November 2007 20:27:11 -0700, Andreas Dilger wrote: > On Nov 29, 2007 00:48 +0100, Jörn Engel wrote: > > I didn't follow the discussion much, since it didn't appear to suit > > logfs too well. In a nutshell, logfs is purely block-based, prepends > > every block with a header, may compress blocks and packs them as tightly > > as possible (byte alignment). > > The FIEMAP interface in some ways well suited to your needs, because it > uses byte offsets instead of FIBMAP, which uses block offsets. I now see > that one hole in the specification is that your physical extent is not the > same length as the logical extent that the data represents. > > I'm not sure that is in itself a reason not to use FIEMAP. There is already > a provision for logical extents that do not map directly to the physical > layout (i.e. FIEMAP_EXTENT_NO_DIRECT flag on the extent). In the case of > logfs, the extent fe_length would represent the logical length of the data > that the extent covers, fe_offset is the start of the physical extent, and > FIEMAP_EXTENT_NO_DIRECT indicates that this extent cannot be directly mapped. > This is useful for applications like LILO that would otherwise assume the > physical offset can be used to access the data from the block device. That could work. There really isn't a reason for any application to mess with physical locations when dealing with logfs. > SEEK_HOLE/SEEK_DATA only provides a fraction of the useful information > that FIEMAP does. It won't give users or developers any information about > the on disk layout (which is quite important for knowing if allocation > algorithms are good). It has the advantage of being easy to use. My completely untested attempt at a copy loop is just 14 lines. Add some error handling and it should still be quite small. off_t data_ofs, hole_ofs; long count; for (data_ofs = hole_ofs = 0; ; ) { if (data_ofs >= hole_ofs) { data_ofs = llseek(in_fd, data_ofs, SEEK_DATA); hole_ofs = llseek(in_fd, data_ofs, SEEK_HOLE); } count = splice(in_fd, &data_ofs, out_fd, &data_ofs, hole_ofs - data_ofs, 0); if (count == 0) break; data_ofs += count; } And when trying to optimize away some of the system calls, my personal preference would be to teach splice about seek_hole and seek_data and just splice the complete file range in one go. Jörn -- "Security vulnerabilities are here to stay." -- Scott Culp, Manager of the Microsoft Security Response Center, 2001 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Nov 29, 2007 00:48 +0100, Jörn Engel wrote: > I didn't follow the discussion much, since it didn't appear to suit > logfs too well. In a nutshell, logfs is purely block-based, prepends > every block with a header, may compress blocks and packs them as tightly > as possible (byte alignment). The FIEMAP interface in some ways well suited to your needs, because it uses byte offsets instead of FIBMAP, which uses block offsets. I now see that one hole in the specification is that your physical extent is not the same length as the logical extent that the data represents. I'm not sure that is in itself a reason not to use FIEMAP. There is already a provision for logical extents that do not map directly to the physical layout (i.e. FIEMAP_EXTENT_NO_DIRECT flag on the extent). In the case of logfs, the extent fe_length would represent the logical length of the data that the extent covers, fe_offset is the start of the physical extent, and FIEMAP_EXTENT_NO_DIRECT indicates that this extent cannot be directly mapped. This is useful for applications like LILO that would otherwise assume the physical offset can be used to access the data from the block device. It would still provide very useful information about the layout of files on disk for filefrag, and if tar or cp used FIEMAP they could know to read the data from the start up to the end of an extent to avoid waiting for a seek in the middle of the IO, and of course skipping holes during copy. A very smart tar might even FIEMAP a whole bunch of files and then read the extents in physical block offset order to reduce seeking. I don't think most applications will care about the actual physical location of an extent, so much as the relative locations of extents within a file and between files. > Maybe the "MAP" part fooled me to believe FIEMAP would also expose > physical location of extends on the medium. But reading the proposal > again, I am unsure about that part. If physical locations are exposed, > SEEK_HOLE/SEEK_DATA is significantly more elegant for logfs. If not, > FIEMAP could be useful. SEEK_HOLE/SEEK_DATA only provides a fraction of the useful information that FIEMAP does. It won't give users or developers any information about the on disk layout (which is quite important for knowing if allocation algorithms are good). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, 28 November 2007 16:06:25 -0800, Nicholas Miell wrote: > > I'd have to reread the original proposal, but I remember FIEMAP as being > a generalized way of getting information about a files extents. I think > the original proposal only dealt with mapping file offsets to physical > extents, but IIRC the interface was flexible enough to implement a > "where are the holes" request. > > Regardless, SEEK_HOLE/SEEK_DATA being a better suited interface for the > needs of logfs doesn't make it the best interface for that need. I'm not sure we are looking for the best interface. A significant improvement without significant drawbacks would be enough for me. Having to provide "physical extends" complete with device offsets would be a significant drawback. Even if such an interface may have greater advantages under ideal circumstances. Jörn -- He who knows others is wise. He who knows himself is enlightened. -- Lao Tsu - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Thu, 2007-11-29 at 00:48 +0100, Jörn Engel wrote: > On Wed, 28 November 2007 16:39:59 -0700, Andreas Dilger wrote: > > On Nov 28, 2007 14:56 -0800, Nicholas Miell wrote: > > > > > > type: one of EXTENT_TYPE_HOLE, EXTENT_TYPE_DATA, EXTENT_TYPE_EXTENTS, > > > EXTENT_TYPE_COMPRESSED, EXTENT_TYPE_UNCOMPRESSED etc. > > > > This is what FIEMAP is supposed to do. We wrote a spec and implemented > > a prototype for ext4, but haven't had time to make it generic to move > > the large part of the code into the VFS. If someone wanted to take that > > up, it would be much appreciated. > > > > See "[RFC] add FIEMAP ioctl to efficiently map file allocation" in > > linux-fsdevel for details on this interface. > > I didn't follow the discussion much, since it didn't appear to suit > logfs too well. In a nutshell, logfs is purely block-based, prepends > every block with a header, may compress blocks and packs them as tightly > as possible (byte alignment). > > Maybe the "MAP" part fooled me to believe FIEMAP would also expose > physical location of extends on the medium. But reading the proposal > again, I am unsure about that part. If physical locations are exposed, > SEEK_HOLE/SEEK_DATA is significantly more elegant for logfs. If not, > FIEMAP could be useful. > > Jörn > I'd have to reread the original proposal, but I remember FIEMAP as being a generalized way of getting information about a files extents. I think the original proposal only dealt with mapping file offsets to physical extents, but IIRC the interface was flexible enough to implement a "where are the holes" request. Regardless, SEEK_HOLE/SEEK_DATA being a better suited interface for the needs of logfs doesn't make it the best interface for that need. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, 28 November 2007 16:39:59 -0700, Andreas Dilger wrote: > On Nov 28, 2007 14:56 -0800, Nicholas Miell wrote: > > > > type: one of EXTENT_TYPE_HOLE, EXTENT_TYPE_DATA, EXTENT_TYPE_EXTENTS, > > EXTENT_TYPE_COMPRESSED, EXTENT_TYPE_UNCOMPRESSED etc. > > This is what FIEMAP is supposed to do. We wrote a spec and implemented > a prototype for ext4, but haven't had time to make it generic to move > the large part of the code into the VFS. If someone wanted to take that > up, it would be much appreciated. > > See "[RFC] add FIEMAP ioctl to efficiently map file allocation" in > linux-fsdevel for details on this interface. I didn't follow the discussion much, since it didn't appear to suit logfs too well. In a nutshell, logfs is purely block-based, prepends every block with a header, may compress blocks and packs them as tightly as possible (byte alignment). Maybe the "MAP" part fooled me to believe FIEMAP would also expose physical location of extends on the medium. But reading the proposal again, I am unsure about that part. If physical locations are exposed, SEEK_HOLE/SEEK_DATA is significantly more elegant for logfs. If not, FIEMAP could be useful. Jörn -- Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest. -- Rob Pike - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, 2007-11-28 at 18:33 -0500, Josef Bacik wrote: > On Wed, Nov 28, 2007 at 02:56:54PM -0800, Nicholas Miell wrote: > > > > On Wed, 2007-11-28 at 15:02 -0500, Josef Bacik wrote: > > > Hello, > > > > > > This is my first pass at implementing SEEK_HOLE/SEEK_DATA. This has been > > > in > > > solaris for about a year now, and is described here > > > > > > http://docs.sun.com/app/docs/doc/819-2241/lseek-2?l=en&a=view&q=SEEK_HOLE > > > http://blogs.sun.com/roller/page/bonwick?entry=seek_hole_and_seek_data > > > > > > I've added a file operation to allow filesystems to override the default > > > seek_hole_data function, which just loops through bmap looking for either > > > a hole > > > or data. I've tested this and it seems to work well. I ran my testcase > > > on a > > > solaris box to make sure I got consistent results (I just ran my test > > > script on > > > the solaris box, I haven't looked at any of their code in case thats a > > > concern). > > > All comments welcome. Thank you, > > > > > > Josef > > > > I stand by my belief that SEEK_HOLE/SEEK_DATA is a lousy interface. > > > > It abuses the seek operation to become a query operation, it requires a > > total number of system calls proportional to the number holes+data and > > it isn't general enough for other similar uses (e.g. total number of > > contiguous extents, compressed extents, offline extents, extents > > currently shared with other inodes, extents embedded in the inode > > (tails), etc.) > > > > Something like the following would be much better: > > > > int getfilextents(int fd, off_t offset, int type, size_t *length, struct > > extent *extents) > > > > with > > > > int fd: open file > > > > offset: offset in file to start reporting extents > > > > type: one of EXTENT_TYPE_HOLE, EXTENT_TYPE_DATA, EXTENT_TYPE_EXTENTS, > > EXTENT_TYPE_COMPRESSED, EXTENT_TYPE_UNCOMPRESSED etc. > > > > length: in/out parameter, on entry contains length of extents array, on > > exit contains number of valid entries in the extents array or total > > number of extents remaining in the file, whichever is larger > > > > extents: array of struct extent { off_t offset; off_t length }, only > > updated if non-NULL > > > > Making the type parameter a bitmask and adding a type member to struct > > extent could be useful so that multiple types of extents could be > > reported at once could be useful, too. (But you end up with weird cases > > like data extents overlapping with compressed extents.) > > > > Actually, now that I've searched my mailbox, Andreas Dilger's FIEMAP > > proposal is pretty much what I suggest here and is certainly superior to > > Sun's SEEK_HOLE/SEEK_DATA. > > > > Agreed, however in speaking hch and others the consensus was FIEMAP was good, > however there was no reason why SEEK_HOLE/SEEK_DATA shouldn't also be > implemented, and then at some point down the road when a generic FIEMAP is in > place either change the SEEK_HOLE/SEEK_DATA implementation to try to use > FIEMAP > by default and then fall back on bmap if it has to, or some other such > operation. I'm cool with passing on this implementation in preference for > FIEMAP, but given the discussion I had earlier this week with some of the > other > fs people the general thought was go ahead and do this for now. > Well, there's no demand specifically for SEEK_HOLE/SEEK_DATA[1] and the interface is ugly, so killing it before it spreads beyond Solaris seems like a good idea to me. OTOH, if you implement SEEK_HOLE/SEEK_DATA, nobody is going to bother using the good interface if SEEK_HOLE/ SEEK_DATA is the only portable interface. [1] The only user appears to be Joerg Schilling. http://www.google.com/codesearch?hl=en&q=+%5CWSEEK_(DATA%7CHOLE)&sa=N&as_case=y -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Nov 28, 2007 15:02 -0500, Josef Bacik wrote: > +static loff_t generic_seek_hole_data(struct file *file, loff_t offset, > + int origin) > +{ > + loff_t retval = -ENXIO; > + struct inode *inode = file->f_mapping->host; > + sector_t block, found_block; > + sector_t last_block = (i_size_read(inode) - 1) >> inode->i_blkbits; > + int want = (origin == SEEK_HOLE) ? 0 : 1; > + > + for (block = offset >> inode->i_blkbits; block <= last_block; block++) { > + found_block = bmap(inode, block); > + > + if (!!found_block == want) { > + retval = block << inode->i_blkbits; > + break; > + } > + } The problem with this is that it doesn't know anything about the filesystem, and might spin for ages in a tight loop for a large sparse or dense file. Almost any fs would have to implement ->seek_hole_data() for this to be useful, at which point we might as well just report -EINVAL to the application or glibc (since this needs to be handled anyways). Also, what about network filesystems that don't support bmap? They will show up as entirely holes, since bmap() returns 0 if ->bmap is not supported. That could lead to serious data loss, if e.g. a filesystem is being backed up with a SEEK_DATA-aware tool. In light of this, it makes much more sense to only support this functionality if the filesystem explicitly adds it, instead of pretending that it works when there are sharp pointy things in the dark corners. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, Nov 28, 2007 at 01:56:49PM -0800, Brad Boyer wrote: > > I have a few comments. Some of them are inline with the code. > > The one generic thing is that the Sun documentation specifically > says to check pathconf(_PC_MIN_HOLE_SIZE) on the filesystem to see > if it supports this request. Have you looked into adding this to > the Linux version of pathconf? I haven't tried to look at the latest > glibc, but 2.3.2 doesn't appear to have it. If we do have it, this > request should really go to the kernel to ask the individual > filesystem. However, it looks like pathconf is entirely implemented > in glibc. Should we add a system call or some other way to pass > pathconf requests into the kernel? I know pathconf currently returns > some inaccurate results in some cases due to this limitation. There > are some existing ones like _PC_LINK_MAX that glibc tries to guess > the correct result based on statfs and can get wrong in any > non-standard setup. > Since it appears pathconf doesn't actually interface with the kernel I think that its not really worth chasing down. I don't think every fs maintainer is really going to want to chase down a glibc person in order to change what pathconf returns for their particular fs. > > On Wed, Nov 28, 2007 at 03:02:07PM -0500, Josef Bacik wrote: > > diff --git a/fs/read_write.c b/fs/read_write.c > > index ea1f94c..cf61e1e 100644 > > --- a/fs/read_write.c > > +++ b/fs/read_write.c > > @@ -31,10 +31,32 @@ const struct file_operations generic_ro_fops = { > > > > EXPORT_SYMBOL(generic_ro_fops); > > > > +static loff_t generic_seek_hole_data(struct file *file, loff_t offset, > > +int origin) > > +{ > > + loff_t retval = -ENXIO; > > + struct inode *inode = file->f_mapping->host; > > + sector_t block, found_block; > > + sector_t last_block = (i_size_read(inode) - 1) >> inode->i_blkbits; > > + int want = (origin == SEEK_HOLE) ? 0 : 1; > > + > > + for (block = offset >> inode->i_blkbits; block <= last_block; block++) { > > + found_block = bmap(inode, block); > > + > > + if (!!found_block == want) { > > + retval = block << inode->i_blkbits; > > + break; > > + } > > + } > > + > > + return retval; > > +} > > + > > It looks like this function can return an offset that is before the > one passed in as an argument due to losing the lower bits. I think > it needs to do somthing more like this in the loop initializer: > > block = (offset + (1 << inode->i_blkbits) - 1) >> inode->i_blkbits; > > By adding 1 block if it wasn't already on an even block, this assures > us that it won't go backwards from the original offset while allowing > someone to get a block that really does start exactly on the offset. > > > loff_t generic_file_llseek(struct file *file, loff_t offset, int origin) > > { > > long long retval; > > struct inode *inode = file->f_mapping->host; > > + loff_t (*fn)(struct file *, loff_t, int); > > > > mutex_lock(&inode->i_mutex); > > switch (origin) { > > @@ -43,15 +65,24 @@ loff_t generic_file_llseek(struct file *file, loff_t > > offset, int origin) > > break; > > case SEEK_CUR: > > offset += file->f_pos; > > + break; > > + case SEEK_HOLE: > > + case SEEK_DATA: > > + fn = generic_seek_hole_data; > > + if (file->f_op->seek_hole_data) > > + fn = file->f_op->seek_hole_data; > > + offset = fn(file, offset, origin); > > } > > retval = -EINVAL; > > if (offset>=0 && offset<=inode->i_sb->s_maxbytes) { > > - if (offset != file->f_pos) { > > + if (offset != file->f_pos && origin != SEEK_HOLE) { > > Why is SEEK_HOLE special in this case? The description of SEEK_HOLE > and SEEK_DATA in the Sun document would imply to me that they should > be handled the same. > This was my interpretation of the man page, however Joern looked at what solaris does and it seems that SEEK_HOLE does update f_pos, so I will change this. > > file->f_pos = offset; > > file->f_version = 0; > > } > > retval = offset; > > - } > > + } else if (offset < 0) > > + retval = offset; > > + > > mutex_unlock(&inode->i_mutex); > > return retval; Thanks much for your comments, Josef - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Nov 28, 2007 14:56 -0800, Nicholas Miell wrote: > I stand by my belief that SEEK_HOLE/SEEK_DATA is a lousy interface. > > It abuses the seek operation to become a query operation, it requires a > total number of system calls proportional to the number holes+data and > it isn't general enough for other similar uses (e.g. total number of > contiguous extents, compressed extents, offline extents, extents > currently shared with other inodes, extents embedded in the inode > (tails), etc.) > > Something like the following would be much better: > > int getfilextents(int fd, off_t offset, int type, size_t *length, struct > extent *extents) > > with > > int fd: open file > > offset: offset in file to start reporting extents > > type: one of EXTENT_TYPE_HOLE, EXTENT_TYPE_DATA, EXTENT_TYPE_EXTENTS, > EXTENT_TYPE_COMPRESSED, EXTENT_TYPE_UNCOMPRESSED etc. This is what FIEMAP is supposed to do. We wrote a spec and implemented a prototype for ext4, but haven't had time to make it generic to move the large part of the code into the VFS. If someone wanted to take that up, it would be much appreciated. See "[RFC] add FIEMAP ioctl to efficiently map file allocation" in linux-fsdevel for details on this interface. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, Nov 28, 2007 at 02:56:54PM -0800, Nicholas Miell wrote: > > On Wed, 2007-11-28 at 15:02 -0500, Josef Bacik wrote: > > Hello, > > > > This is my first pass at implementing SEEK_HOLE/SEEK_DATA. This has been in > > solaris for about a year now, and is described here > > > > http://docs.sun.com/app/docs/doc/819-2241/lseek-2?l=en&a=view&q=SEEK_HOLE > > http://blogs.sun.com/roller/page/bonwick?entry=seek_hole_and_seek_data > > > > I've added a file operation to allow filesystems to override the default > > seek_hole_data function, which just loops through bmap looking for either a > > hole > > or data. I've tested this and it seems to work well. I ran my testcase on > > a > > solaris box to make sure I got consistent results (I just ran my test > > script on > > the solaris box, I haven't looked at any of their code in case thats a > > concern). > > All comments welcome. Thank you, > > > > Josef > > I stand by my belief that SEEK_HOLE/SEEK_DATA is a lousy interface. > > It abuses the seek operation to become a query operation, it requires a > total number of system calls proportional to the number holes+data and > it isn't general enough for other similar uses (e.g. total number of > contiguous extents, compressed extents, offline extents, extents > currently shared with other inodes, extents embedded in the inode > (tails), etc.) > > Something like the following would be much better: > > int getfilextents(int fd, off_t offset, int type, size_t *length, struct > extent *extents) > > with > > int fd: open file > > offset: offset in file to start reporting extents > > type: one of EXTENT_TYPE_HOLE, EXTENT_TYPE_DATA, EXTENT_TYPE_EXTENTS, > EXTENT_TYPE_COMPRESSED, EXTENT_TYPE_UNCOMPRESSED etc. > > length: in/out parameter, on entry contains length of extents array, on > exit contains number of valid entries in the extents array or total > number of extents remaining in the file, whichever is larger > > extents: array of struct extent { off_t offset; off_t length }, only > updated if non-NULL > > Making the type parameter a bitmask and adding a type member to struct > extent could be useful so that multiple types of extents could be > reported at once could be useful, too. (But you end up with weird cases > like data extents overlapping with compressed extents.) > > Actually, now that I've searched my mailbox, Andreas Dilger's FIEMAP > proposal is pretty much what I suggest here and is certainly superior to > Sun's SEEK_HOLE/SEEK_DATA. > Agreed, however in speaking hch and others the consensus was FIEMAP was good, however there was no reason why SEEK_HOLE/SEEK_DATA shouldn't also be implemented, and then at some point down the road when a generic FIEMAP is in place either change the SEEK_HOLE/SEEK_DATA implementation to try to use FIEMAP by default and then fall back on bmap if it has to, or some other such operation. I'm cool with passing on this implementation in preference for FIEMAP, but given the discussion I had earlier this week with some of the other fs people the general thought was go ahead and do this for now. Josef - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
On Wed, 2007-11-28 at 15:02 -0500, Josef Bacik wrote: > Hello, > > This is my first pass at implementing SEEK_HOLE/SEEK_DATA. This has been in > solaris for about a year now, and is described here > > http://docs.sun.com/app/docs/doc/819-2241/lseek-2?l=en&a=view&q=SEEK_HOLE > http://blogs.sun.com/roller/page/bonwick?entry=seek_hole_and_seek_data > > I've added a file operation to allow filesystems to override the default > seek_hole_data function, which just loops through bmap looking for either a > hole > or data. I've tested this and it seems to work well. I ran my testcase on a > solaris box to make sure I got consistent results (I just ran my test script > on > the solaris box, I haven't looked at any of their code in case thats a > concern). > All comments welcome. Thank you, > > Josef I stand by my belief that SEEK_HOLE/SEEK_DATA is a lousy interface. It abuses the seek operation to become a query operation, it requires a total number of system calls proportional to the number holes+data and it isn't general enough for other similar uses (e.g. total number of contiguous extents, compressed extents, offline extents, extents currently shared with other inodes, extents embedded in the inode (tails), etc.) Something like the following would be much better: int getfilextents(int fd, off_t offset, int type, size_t *length, struct extent *extents) with int fd: open file offset: offset in file to start reporting extents type: one of EXTENT_TYPE_HOLE, EXTENT_TYPE_DATA, EXTENT_TYPE_EXTENTS, EXTENT_TYPE_COMPRESSED, EXTENT_TYPE_UNCOMPRESSED etc. length: in/out parameter, on entry contains length of extents array, on exit contains number of valid entries in the extents array or total number of extents remaining in the file, whichever is larger extents: array of struct extent { off_t offset; off_t length }, only updated if non-NULL Making the type parameter a bitmask and adding a type member to struct extent could be useful so that multiple types of extents could be reported at once could be useful, too. (But you end up with weird cases like data extents overlapping with compressed extents.) Actually, now that I've searched my mailbox, Andreas Dilger's FIEMAP proposal is pretty much what I suggest here and is certainly superior to Sun's SEEK_HOLE/SEEK_DATA. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
I have a few comments. Some of them are inline with the code. The one generic thing is that the Sun documentation specifically says to check pathconf(_PC_MIN_HOLE_SIZE) on the filesystem to see if it supports this request. Have you looked into adding this to the Linux version of pathconf? I haven't tried to look at the latest glibc, but 2.3.2 doesn't appear to have it. If we do have it, this request should really go to the kernel to ask the individual filesystem. However, it looks like pathconf is entirely implemented in glibc. Should we add a system call or some other way to pass pathconf requests into the kernel? I know pathconf currently returns some inaccurate results in some cases due to this limitation. There are some existing ones like _PC_LINK_MAX that glibc tries to guess the correct result based on statfs and can get wrong in any non-standard setup. Brad Boyer [EMAIL PROTECTED] On Wed, Nov 28, 2007 at 03:02:07PM -0500, Josef Bacik wrote: > diff --git a/fs/read_write.c b/fs/read_write.c > index ea1f94c..cf61e1e 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -31,10 +31,32 @@ const struct file_operations generic_ro_fops = { > > EXPORT_SYMBOL(generic_ro_fops); > > +static loff_t generic_seek_hole_data(struct file *file, loff_t offset, > + int origin) > +{ > + loff_t retval = -ENXIO; > + struct inode *inode = file->f_mapping->host; > + sector_t block, found_block; > + sector_t last_block = (i_size_read(inode) - 1) >> inode->i_blkbits; > + int want = (origin == SEEK_HOLE) ? 0 : 1; > + > + for (block = offset >> inode->i_blkbits; block <= last_block; block++) { > + found_block = bmap(inode, block); > + > + if (!!found_block == want) { > + retval = block << inode->i_blkbits; > + break; > + } > + } > + > + return retval; > +} > + It looks like this function can return an offset that is before the one passed in as an argument due to losing the lower bits. I think it needs to do somthing more like this in the loop initializer: block = (offset + (1 << inode->i_blkbits) - 1) >> inode->i_blkbits; By adding 1 block if it wasn't already on an even block, this assures us that it won't go backwards from the original offset while allowing someone to get a block that really does start exactly on the offset. > loff_t generic_file_llseek(struct file *file, loff_t offset, int origin) > { > long long retval; > struct inode *inode = file->f_mapping->host; > + loff_t (*fn)(struct file *, loff_t, int); > > mutex_lock(&inode->i_mutex); > switch (origin) { > @@ -43,15 +65,24 @@ loff_t generic_file_llseek(struct file *file, loff_t > offset, int origin) > break; > case SEEK_CUR: > offset += file->f_pos; > + break; > + case SEEK_HOLE: > + case SEEK_DATA: > + fn = generic_seek_hole_data; > + if (file->f_op->seek_hole_data) > + fn = file->f_op->seek_hole_data; > + offset = fn(file, offset, origin); > } > retval = -EINVAL; > if (offset>=0 && offset<=inode->i_sb->s_maxbytes) { > - if (offset != file->f_pos) { > + if (offset != file->f_pos && origin != SEEK_HOLE) { Why is SEEK_HOLE special in this case? The description of SEEK_HOLE and SEEK_DATA in the Sun document would imply to me that they should be handled the same. > file->f_pos = offset; > file->f_version = 0; > } > retval = offset; > - } > + } else if (offset < 0) > + retval = offset; > + > mutex_unlock(&inode->i_mutex); > return retval; > } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] Implement SEEK_HOLE/SEEK_DATA
Hello, This is my first pass at implementing SEEK_HOLE/SEEK_DATA. This has been in solaris for about a year now, and is described here http://docs.sun.com/app/docs/doc/819-2241/lseek-2?l=en&a=view&q=SEEK_HOLE http://blogs.sun.com/roller/page/bonwick?entry=seek_hole_and_seek_data I've added a file operation to allow filesystems to override the default seek_hole_data function, which just loops through bmap looking for either a hole or data. I've tested this and it seems to work well. I ran my testcase on a solaris box to make sure I got consistent results (I just ran my test script on the solaris box, I haven't looked at any of their code in case thats a concern). All comments welcome. Thank you, Josef diff --git a/fs/read_write.c b/fs/read_write.c index ea1f94c..cf61e1e 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -31,10 +31,32 @@ const struct file_operations generic_ro_fops = { EXPORT_SYMBOL(generic_ro_fops); +static loff_t generic_seek_hole_data(struct file *file, loff_t offset, +int origin) +{ + loff_t retval = -ENXIO; + struct inode *inode = file->f_mapping->host; + sector_t block, found_block; + sector_t last_block = (i_size_read(inode) - 1) >> inode->i_blkbits; + int want = (origin == SEEK_HOLE) ? 0 : 1; + + for (block = offset >> inode->i_blkbits; block <= last_block; block++) { + found_block = bmap(inode, block); + + if (!!found_block == want) { + retval = block << inode->i_blkbits; + break; + } + } + + return retval; +} + loff_t generic_file_llseek(struct file *file, loff_t offset, int origin) { long long retval; struct inode *inode = file->f_mapping->host; + loff_t (*fn)(struct file *, loff_t, int); mutex_lock(&inode->i_mutex); switch (origin) { @@ -43,15 +65,24 @@ loff_t generic_file_llseek(struct file *file, loff_t offset, int origin) break; case SEEK_CUR: offset += file->f_pos; + break; + case SEEK_HOLE: + case SEEK_DATA: + fn = generic_seek_hole_data; + if (file->f_op->seek_hole_data) + fn = file->f_op->seek_hole_data; + offset = fn(file, offset, origin); } retval = -EINVAL; if (offset>=0 && offset<=inode->i_sb->s_maxbytes) { - if (offset != file->f_pos) { + if (offset != file->f_pos && origin != SEEK_HOLE) { file->f_pos = offset; file->f_version = 0; } retval = offset; - } + } else if (offset < 0) + retval = offset; + mutex_unlock(&inode->i_mutex); return retval; } diff --git a/include/linux/fs.h b/include/linux/fs.h index b3ec4a4..a55d68e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -30,7 +30,9 @@ #define SEEK_SET 0 /* seek relative to beginning of file */ #define SEEK_CUR 1 /* seek relative to current file position */ #define SEEK_END 2 /* seek relative to end of file */ -#define SEEK_MAX SEEK_END +#define SEEK_HOLE 3 /* seek relative to the next hole */ +#define SEEK_DATA 4 /* seek relative to the next block with data */ +#define SEEK_MAX SEEK_DATA /* And dynamically-tunable limits and defaults: */ struct files_stat_struct { @@ -1163,6 +1165,7 @@ typedef int (*read_actor_t)(read_descriptor_t *, struct page *, unsigned long, u struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); + loff_t (*seek_hole_data) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html