Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 06:03:50PM -0800, Nate Diller wrote: > i had a patch integrating the iodesc idea, but after some thought, had > decided to call it struct file_io. That name reflects the fact that > it's doing I/O in arbitrary lengths with byte offsets, and struct > file_io *fio contrasts well with struct bio (block_io). I also had > used the field ->nbytes instead of ->count, to clarify the difference > between segment iterators, segment offsets, and absolute bytecount. struct file_io sounds rather ugly to me, I don't know why. And it's really user I/O so we could call it struct uio (historical punt intended) :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 06:03:50PM -0800, Nate Diller wrote: i had a patch integrating the iodesc idea, but after some thought, had decided to call it struct file_io. That name reflects the fact that it's doing I/O in arbitrary lengths with byte offsets, and struct file_io *fio contrasts well with struct bio (block_io). I also had used the field -nbytes instead of -count, to clarify the difference between segment iterators, segment offsets, and absolute bytecount. struct file_io sounds rather ugly to me, I don't know why. And it's really user I/O so we could call it struct uio (historical punt intended) :) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
What I have there is not actually a full-blown file io descriptor, because there is no file or offset. It is just an iovec iterator (so maybe I should rename it to iov_iter, rather than iodesc). I think it might be a nice idea to keep this iov_iter as a standalone structure, and it could be embedded into a struct file_io? Yeah, maybe. I'm not sure we need something as generic as a "file_io" struct. To recap, I've hoped for the expression of the state of an iovec array with a richer structure to avoid the multiple walks of the array at different parts of the kernel that previously only had access to the raw iovec * and size_t count. Stuff like the alignment checks in __blockdev_direct_IO() and the pages_in_io accounting in direct_io_worker(). I imagined building up the state in this 'iodesc' structure as we first copied and verified the structure from userspace. (say in rw_copy_check_uvector()). If as we copied we, say, stored in the bits of the buffer and length fields then by the time we got to __blockdev_direct_IO() we'd just test the bits for misalignment instead of iterating over the array again. It starts to get gross as some paths currently modify the kernel copy of the array as they process it :/. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
What I have there is not actually a full-blown file io descriptor, because there is no file or offset. It is just an iovec iterator (so maybe I should rename it to iov_iter, rather than iodesc). I think it might be a nice idea to keep this iov_iter as a standalone structure, and it could be embedded into a struct file_io? Yeah, maybe. I'm not sure we need something as generic as a file_io struct. To recap, I've hoped for the expression of the state of an iovec array with a richer structure to avoid the multiple walks of the array at different parts of the kernel that previously only had access to the raw iovec * and size_t count. Stuff like the alignment checks in __blockdev_direct_IO() and the pages_in_io accounting in direct_io_worker(). I imagined building up the state in this 'iodesc' structure as we first copied and verified the structure from userspace. (say in rw_copy_check_uvector()). If as we copied we, say, stored in the bits of the buffer and length fields then by the time we got to __blockdev_direct_IO() we'd just test the bits for misalignment instead of iterating over the array again. It starts to get gross as some paths currently modify the kernel copy of the array as they process it :/. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 06:03:50PM -0800, Nate Diller wrote: > On 2/8/07, Nick Piggin <[EMAIL PROTECTED]> wrote: > >On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote: > >> On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: > >> > Add an iterator data structure to operate over an iovec. Add usercopy > >> > operators needed by generic_file_buffered_write, and convert that > >function > >> > over. > >> > >> iovec_iterator is an awfully long and not very descriptive name. > >> In past discussions we named this thingy iodesc and wanted to pass it > >> down all the I/O path, including the file operations. > > > >Hi Christoph, > > > >Sure I think it would be a good idea to shorten the name. And yes, although > >I just construct the iterator to pass into perform_write, I think it should > >make sense to go much further up the call stack instead of passing all > >those > >args around. iodesc seems like a fine name, so I'll use that unless > >anyone objects. > > i had a patch integrating the iodesc idea, but after some thought, had > decided to call it struct file_io. That name reflects the fact that > it's doing I/O in arbitrary lengths with byte offsets, and struct > file_io *fio contrasts well with struct bio (block_io). I also had > used the field ->nbytes instead of ->count, to clarify the difference > between segment iterators, segment offsets, and absolute bytecount. The field name is a good suggestion. What I have there is not actually a full-blown file io descriptor, because there is no file or offset. It is just an iovec iterator (so maybe I should rename it to iov_iter, rather than iodesc). I think it might be a nice idea to keep this iov_iter as a standalone structure, and it could be embedded into a struct file_io? Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On 2/8/07, Nick Piggin <[EMAIL PROTECTED]> wrote: On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote: > On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: > > Add an iterator data structure to operate over an iovec. Add usercopy > > operators needed by generic_file_buffered_write, and convert that function > > over. > > iovec_iterator is an awfully long and not very descriptive name. > In past discussions we named this thingy iodesc and wanted to pass it > down all the I/O path, including the file operations. Hi Christoph, Sure I think it would be a good idea to shorten the name. And yes, although I just construct the iterator to pass into perform_write, I think it should make sense to go much further up the call stack instead of passing all those args around. iodesc seems like a fine name, so I'll use that unless anyone objects. i had a patch integrating the iodesc idea, but after some thought, had decided to call it struct file_io. That name reflects the fact that it's doing I/O in arbitrary lengths with byte offsets, and struct file_io *fio contrasts well with struct bio (block_io). I also had used the field ->nbytes instead of ->count, to clarify the difference between segment iterators, segment offsets, and absolute bytecount. FYI, the patch is still in the works and would convert the whole file I/O stack to use the new structure. I would like to base it off of this work as well if this makes it into -mm (as I think it should) NATE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote: > On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: > > Add an iterator data structure to operate over an iovec. Add usercopy > > operators needed by generic_file_buffered_write, and convert that function > > over. > > iovec_iterator is an awfully long and not very descriptive name. > In past discussions we named this thingy iodesc and wanted to pass it > down all the I/O path, including the file operations. Hi Christoph, Sure I think it would be a good idea to shorten the name. And yes, although I just construct the iterator to pass into perform_write, I think it should make sense to go much further up the call stack instead of passing all those args around. iodesc seems like a fine name, so I'll use that unless anyone objects. Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
Hi Nick, On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: > Add an iterator data structure to operate over an iovec. Add usercopy > operators needed by generic_file_buffered_write, and convert that function > over. I really like this iterator structure - it brings together some fields that seem to get passed around seperately, even though they're linked. Also IMHO, it makes things in filemap.c easer to read... --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: > Add an iterator data structure to operate over an iovec. Add usercopy > operators needed by generic_file_buffered_write, and convert that function > over. iovec_iterator is an awfully long and not very descriptive name. In past discussions we named this thingy iodesc and wanted to pass it down all the I/O path, including the file operations. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/3] fs: add an iovec iterator
Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. include/linux/fs.h | 32 mm/filemap.c | 132 ++--- mm/filemap.h | 103 - 3 files changed, 137 insertions(+), 130 deletions(-) Index: linux-2.6/include/linux/fs.h === --- linux-2.6.orig/include/linux/fs.h +++ linux-2.6/include/linux/fs.h @@ -395,6 +395,38 @@ struct page; struct address_space; struct writeback_control; +struct iovec_iterator { + const struct iovec *iov; + unsigned long nr_segs; + size_t iov_offset; + size_t count; +}; + +size_t iovec_iterator_copy_from_user_atomic(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes); +size_t iovec_iterator_copy_from_user(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes); +void iovec_iterator_advance(struct iovec_iterator *i, size_t bytes); +int iovec_iterator_fault_in_readable(struct iovec_iterator *i); + +static inline void iovec_iterator_init(struct iovec_iterator *i, + const struct iovec *iov, unsigned long nr_segs, + size_t count, size_t written) +{ + i->iov = iov; + i->nr_segs = nr_segs; + i->iov_offset = 0; + i->count = count + written; + + iovec_iterator_advance(i, written); +} + +static inline size_t iovec_iterator_count(struct iovec_iterator *i) +{ + return i->count; +} + + struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -30,7 +30,7 @@ #include #include #include -#include "filemap.h" +#include /* for BUG_ON(!in_atomic()) only */ #include "internal.h" /* @@ -1679,8 +1679,7 @@ int remove_suid(struct dentry *dentry) } EXPORT_SYMBOL(remove_suid); -size_t -__filemap_copy_from_user_iovec_inatomic(char *vaddr, +static size_t __iovec_copy_from_user_inatomic(char *vaddr, const struct iovec *iov, size_t base, size_t bytes) { size_t copied = 0, left = 0; @@ -1703,6 +1702,98 @@ __filemap_copy_from_user_iovec_inatomic( } /* + * Copy as much as we can into the page and return the number of bytes which + * were sucessfully copied. If a fault is encountered then return the number of + * bytes which were copied. + */ +size_t iovec_iterator_copy_from_user_atomic(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes) +{ + char *kaddr; + size_t copied; + + BUG_ON(!in_atomic()); + kaddr = kmap_atomic(page, KM_USER0); + if (likely(i->nr_segs == 1)) { + int left; + char __user *buf = i->iov->iov_base + i->iov_offset; + left = __copy_from_user_inatomic_nocache(kaddr + offset, + buf, bytes); + copied = bytes - left; + } else { + copied = __iovec_copy_from_user_inatomic(kaddr + offset, + i->iov, i->iov_offset, bytes); + } + kunmap_atomic(kaddr, KM_USER0); + + return copied; +} + +/* + * This has the same sideeffects and return value as + * iovec_iterator_copy_from_user_atomic(). + * The difference is that it attempts to resolve faults. + * Page must not be locked. + */ +size_t iovec_iterator_copy_from_user(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes) +{ + char *kaddr; + size_t copied; + + kaddr = kmap(page); + if (likely(i->nr_segs == 1)) { + int left; + char __user *buf = i->iov->iov_base + i->iov_offset; + left = __copy_from_user_nocache(kaddr + offset, buf, bytes); + copied = bytes - left; + } else { + copied = __iovec_copy_from_user_inatomic(kaddr + offset, + i->iov, i->iov_offset, bytes); + } + kunmap(page); + return copied; +} + +static void __iovec_iterator_advance_iov(struct iovec_iterator *i, size_t bytes) +{ + if (likely(i->nr_segs == 1)) { + i->iov_offset += bytes; + } else { + const struct iovec *iov = i->iov; + size_t base = i->iov_offset; + + while (bytes) { + int copy = min(bytes, iov->iov_len - base); + + bytes -= copy; + base += copy; + if (iov->iov_len == base) { +
[patch 1/3] fs: add an iovec iterator
Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. include/linux/fs.h | 32 mm/filemap.c | 132 ++--- mm/filemap.h | 103 - 3 files changed, 137 insertions(+), 130 deletions(-) Index: linux-2.6/include/linux/fs.h === --- linux-2.6.orig/include/linux/fs.h +++ linux-2.6/include/linux/fs.h @@ -395,6 +395,38 @@ struct page; struct address_space; struct writeback_control; +struct iovec_iterator { + const struct iovec *iov; + unsigned long nr_segs; + size_t iov_offset; + size_t count; +}; + +size_t iovec_iterator_copy_from_user_atomic(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes); +size_t iovec_iterator_copy_from_user(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes); +void iovec_iterator_advance(struct iovec_iterator *i, size_t bytes); +int iovec_iterator_fault_in_readable(struct iovec_iterator *i); + +static inline void iovec_iterator_init(struct iovec_iterator *i, + const struct iovec *iov, unsigned long nr_segs, + size_t count, size_t written) +{ + i-iov = iov; + i-nr_segs = nr_segs; + i-iov_offset = 0; + i-count = count + written; + + iovec_iterator_advance(i, written); +} + +static inline size_t iovec_iterator_count(struct iovec_iterator *i) +{ + return i-count; +} + + struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -30,7 +30,7 @@ #include linux/security.h #include linux/syscalls.h #include linux/cpuset.h -#include filemap.h +#include linux/hardirq.h /* for BUG_ON(!in_atomic()) only */ #include internal.h /* @@ -1679,8 +1679,7 @@ int remove_suid(struct dentry *dentry) } EXPORT_SYMBOL(remove_suid); -size_t -__filemap_copy_from_user_iovec_inatomic(char *vaddr, +static size_t __iovec_copy_from_user_inatomic(char *vaddr, const struct iovec *iov, size_t base, size_t bytes) { size_t copied = 0, left = 0; @@ -1703,6 +1702,98 @@ __filemap_copy_from_user_iovec_inatomic( } /* + * Copy as much as we can into the page and return the number of bytes which + * were sucessfully copied. If a fault is encountered then return the number of + * bytes which were copied. + */ +size_t iovec_iterator_copy_from_user_atomic(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes) +{ + char *kaddr; + size_t copied; + + BUG_ON(!in_atomic()); + kaddr = kmap_atomic(page, KM_USER0); + if (likely(i-nr_segs == 1)) { + int left; + char __user *buf = i-iov-iov_base + i-iov_offset; + left = __copy_from_user_inatomic_nocache(kaddr + offset, + buf, bytes); + copied = bytes - left; + } else { + copied = __iovec_copy_from_user_inatomic(kaddr + offset, + i-iov, i-iov_offset, bytes); + } + kunmap_atomic(kaddr, KM_USER0); + + return copied; +} + +/* + * This has the same sideeffects and return value as + * iovec_iterator_copy_from_user_atomic(). + * The difference is that it attempts to resolve faults. + * Page must not be locked. + */ +size_t iovec_iterator_copy_from_user(struct page *page, + struct iovec_iterator *i, unsigned long offset, size_t bytes) +{ + char *kaddr; + size_t copied; + + kaddr = kmap(page); + if (likely(i-nr_segs == 1)) { + int left; + char __user *buf = i-iov-iov_base + i-iov_offset; + left = __copy_from_user_nocache(kaddr + offset, buf, bytes); + copied = bytes - left; + } else { + copied = __iovec_copy_from_user_inatomic(kaddr + offset, + i-iov, i-iov_offset, bytes); + } + kunmap(page); + return copied; +} + +static void __iovec_iterator_advance_iov(struct iovec_iterator *i, size_t bytes) +{ + if (likely(i-nr_segs == 1)) { + i-iov_offset += bytes; + } else { + const struct iovec *iov = i-iov; + size_t base = i-iov_offset; + + while (bytes) { + int copy = min(bytes, iov-iov_len - base); + + bytes -= copy; + base += copy; + if
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. iovec_iterator is an awfully long and not very descriptive name. In past discussions we named this thingy iodesc and wanted to pass it down all the I/O path, including the file operations. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
Hi Nick, On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. I really like this iterator structure - it brings together some fields that seem to get passed around seperately, even though they're linked. Also IMHO, it makes things in filemap.c easer to read... --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote: On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. iovec_iterator is an awfully long and not very descriptive name. In past discussions we named this thingy iodesc and wanted to pass it down all the I/O path, including the file operations. Hi Christoph, Sure I think it would be a good idea to shorten the name. And yes, although I just construct the iterator to pass into perform_write, I think it should make sense to go much further up the call stack instead of passing all those args around. iodesc seems like a fine name, so I'll use that unless anyone objects. Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On 2/8/07, Nick Piggin [EMAIL PROTECTED] wrote: On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote: On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. iovec_iterator is an awfully long and not very descriptive name. In past discussions we named this thingy iodesc and wanted to pass it down all the I/O path, including the file operations. Hi Christoph, Sure I think it would be a good idea to shorten the name. And yes, although I just construct the iterator to pass into perform_write, I think it should make sense to go much further up the call stack instead of passing all those args around. iodesc seems like a fine name, so I'll use that unless anyone objects. i had a patch integrating the iodesc idea, but after some thought, had decided to call it struct file_io. That name reflects the fact that it's doing I/O in arbitrary lengths with byte offsets, and struct file_io *fio contrasts well with struct bio (block_io). I also had used the field -nbytes instead of -count, to clarify the difference between segment iterators, segment offsets, and absolute bytecount. FYI, the patch is still in the works and would convert the whole file I/O stack to use the new structure. I would like to base it off of this work as well if this makes it into -mm (as I think it should) NATE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] fs: add an iovec iterator
On Thu, Feb 08, 2007 at 06:03:50PM -0800, Nate Diller wrote: On 2/8/07, Nick Piggin [EMAIL PROTECTED] wrote: On Thu, Feb 08, 2007 at 07:49:53PM +, Christoph Hellwig wrote: On Thu, Feb 08, 2007 at 02:07:24PM +0100, Nick Piggin wrote: Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. iovec_iterator is an awfully long and not very descriptive name. In past discussions we named this thingy iodesc and wanted to pass it down all the I/O path, including the file operations. Hi Christoph, Sure I think it would be a good idea to shorten the name. And yes, although I just construct the iterator to pass into perform_write, I think it should make sense to go much further up the call stack instead of passing all those args around. iodesc seems like a fine name, so I'll use that unless anyone objects. i had a patch integrating the iodesc idea, but after some thought, had decided to call it struct file_io. That name reflects the fact that it's doing I/O in arbitrary lengths with byte offsets, and struct file_io *fio contrasts well with struct bio (block_io). I also had used the field -nbytes instead of -count, to clarify the difference between segment iterators, segment offsets, and absolute bytecount. The field name is a good suggestion. What I have there is not actually a full-blown file io descriptor, because there is no file or offset. It is just an iovec iterator (so maybe I should rename it to iov_iter, rather than iodesc). I think it might be a nice idea to keep this iov_iter as a standalone structure, and it could be embedded into a struct file_io? Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/