Re: [PATCH v3] UDF: Add support for O_DIRECT
On Wed 05-09-12 14:55:33, Jan Kara wrote: > On Wed 05-09-12 14:05:20, Jan Kara wrote: > > On Tue 04-09-12 16:11:32, Ian Abbott wrote: > > > >>1. Small files stored in the ICB (inode control block?): just return 0 > > > >>from the new udf_adinicb_direct_IO() handler to fall back to buffered > > > >>I/O. For direct writes, there is a "gotcha" to deal with when > > > >>generic_file_direct_write() in mm/filemap.c invalidates the pages. In > > > >>the udf_adinicb_writepage() handler, only part of the page data will be > > > >>valid and the rest will be zeroed out, so only copy the valid part into > > > >>the ICB. (This is actually a bit inefficient as udf_adinicb_write_end() > > > >>will have already copied the data into the ICB once, but it's pretty > > > >>likely that the file will grow to the point where its data can no longer > > > >>be stored in the ICB and will be moved to a different area of the file > > > >>system. At that point, a different direct_IO handler will be used - see > > > >>below.) > > > > Sorry, I didn't quite get this. What is the problem with copying all > > > > the > > > >data to inode in udf_adinicb_writepage() as it is now? > > > > > > Part of the good data in the ICB outside the range being addressed > > > would get overwritten by zeroes. This can be tested by creating a > > > UDF filesystem with 4KiB blocks and with small files stored in the > > > ICB, backed by a block device with 512 byte sectors. Create a 2KiB > > > file with random (or non-zero) data on the file system so that its > > > data gets stored in the ICB. Then open the file for writing without > > > truncation and with the O_DIRECT flag set, write 512 bytes at some > > > 512 byte offset within the 2KiB file and close it. If you then > > > hexdump the file, you'll find some of the old random data has been > > > zeroed out. > > But don't you fall back to buffered IO for files in ICB? So then no > > zeroing should happen? > Oh, I've tested things now and the bug is in buffered write as well! > It has nothing to do with direct IO. We cannot use simple_write_begin() for > UDF when the file is in ICB. I'll write a proper fix. Fix sent. Please resend your patch without that writepage() change which shouldn't be needed now. Thanks. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] UDF: Add support for O_DIRECT
On Wed 05-09-12 14:05:20, Jan Kara wrote: > On Tue 04-09-12 16:11:32, Ian Abbott wrote: > > >>1. Small files stored in the ICB (inode control block?): just return 0 > > >>from the new udf_adinicb_direct_IO() handler to fall back to buffered > > >>I/O. For direct writes, there is a "gotcha" to deal with when > > >>generic_file_direct_write() in mm/filemap.c invalidates the pages. In > > >>the udf_adinicb_writepage() handler, only part of the page data will be > > >>valid and the rest will be zeroed out, so only copy the valid part into > > >>the ICB. (This is actually a bit inefficient as udf_adinicb_write_end() > > >>will have already copied the data into the ICB once, but it's pretty > > >>likely that the file will grow to the point where its data can no longer > > >>be stored in the ICB and will be moved to a different area of the file > > >>system. At that point, a different direct_IO handler will be used - see > > >>below.) > > > Sorry, I didn't quite get this. What is the problem with copying all the > > >data to inode in udf_adinicb_writepage() as it is now? > > > > Part of the good data in the ICB outside the range being addressed > > would get overwritten by zeroes. This can be tested by creating a > > UDF filesystem with 4KiB blocks and with small files stored in the > > ICB, backed by a block device with 512 byte sectors. Create a 2KiB > > file with random (or non-zero) data on the file system so that its > > data gets stored in the ICB. Then open the file for writing without > > truncation and with the O_DIRECT flag set, write 512 bytes at some > > 512 byte offset within the 2KiB file and close it. If you then > > hexdump the file, you'll find some of the old random data has been > > zeroed out. > But don't you fall back to buffered IO for files in ICB? So then no > zeroing should happen? Oh, I've tested things now and the bug is in buffered write as well! It has nothing to do with direct IO. We cannot use simple_write_begin() for UDF when the file is in ICB. I'll write a proper fix. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] UDF: Add support for O_DIRECT
On Tue 04-09-12 16:11:32, Ian Abbott wrote: > On 2012-09-04 15:39, Jan Kara wrote: > > Hello, > > > > first, you have my address wrong (you had suze instead of suse) which is > >why I wasn't getting your email and not replying (missed the patch in LKML > >traffic). Second, it's good to CC also linux-fsdevel for UDF related > >matters (I tend to use that for UDF announcements etc. so people caring > >about UDF can watch there and don't have to read high-volume LKML). > > Oops, sorry about the misspelling. Also, I've noted the > linux-fsdevel for future (I was just following what it said in > MAINTAINERS). I see. Actually I thought linux-fsdevel is inherited from the default of fs/ directory. But now I tried get_maintainer.pl script and I can see that it's not. I'll update MAINTAINERS. > >On Tue 04-09-12 10:49:39, Ian Abbott wrote: > >>Add support for the O_DIRECT flag. There are two cases to deal with: > > Out of curiosity, do you have a use for this feature or is it mostly > >academic interest? > > I'm planning to use it for an embedded project that needs to stream > large files off a CompactFlash card, but the data doesn't need to be > in the buffer cache as its only read once, and the system has very > limited memory bandwidth so I can't afford the the extra copy. The > old version of this project only supported FAT, but that limited the > file size to about 4GiB. The filesystem needs to be something > reasonably Windows-friendly, at least for adding the files to the > CompactFlash card in the first place. OK, that sounds reasonably. > >>1. Small files stored in the ICB (inode control block?): just return 0 > >>from the new udf_adinicb_direct_IO() handler to fall back to buffered > >>I/O. For direct writes, there is a "gotcha" to deal with when > >>generic_file_direct_write() in mm/filemap.c invalidates the pages. In > >>the udf_adinicb_writepage() handler, only part of the page data will be > >>valid and the rest will be zeroed out, so only copy the valid part into > >>the ICB. (This is actually a bit inefficient as udf_adinicb_write_end() > >>will have already copied the data into the ICB once, but it's pretty > >>likely that the file will grow to the point where its data can no longer > >>be stored in the ICB and will be moved to a different area of the file > >>system. At that point, a different direct_IO handler will be used - see > >>below.) > > Sorry, I didn't quite get this. What is the problem with copying all the > >data to inode in udf_adinicb_writepage() as it is now? > > Part of the good data in the ICB outside the range being addressed > would get overwritten by zeroes. This can be tested by creating a > UDF filesystem with 4KiB blocks and with small files stored in the > ICB, backed by a block device with 512 byte sectors. Create a 2KiB > file with random (or non-zero) data on the file system so that its > data gets stored in the ICB. Then open the file for writing without > truncation and with the O_DIRECT flag set, write 512 bytes at some > 512 byte offset within the 2KiB file and close it. If you then > hexdump the file, you'll find some of the old random data has been > zeroed out. But don't you fall back to buffered IO for files in ICB? So then no zeroing should happen? Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] UDF: Add support for O_DIRECT
On 2012-09-04 16:11, Ian Abbott wrote: On 2012-09-04 15:39, Jan Kara wrote: On Tue 04-09-12 10:49:39, Ian Abbott wrote: Add support for the O_DIRECT flag. There are two cases to deal with: Out of curiosity, do you have a use for this feature or is it mostly academic interest? I'm planning to use it for an embedded project that needs to stream large files off a CompactFlash card, but the data doesn't need to be in the buffer cache as its only read once, and the system has very limited memory bandwidth so I can't afford the the extra copy. The old version of this project only supported FAT, but that limited the file size to about 4GiB. The filesystem needs to be something reasonably Windows-friendly, at least for adding the files to the CompactFlash card in the first place. Actually, remembering back (the old project was about 3 years ago), the main reason for using O_DIRECT was it was causing too much memory fragmentation on my MMU-less embedded system. That and the extra overhead of managing the buffer cache for data that was only read once. -- -=( Ian Abbott @ MEV Ltd.E-mail: )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] UDF: Add support for O_DIRECT
On 2012-09-04 15:39, Jan Kara wrote: Hello, first, you have my address wrong (you had suze instead of suse) which is why I wasn't getting your email and not replying (missed the patch in LKML traffic). Second, it's good to CC also linux-fsdevel for UDF related matters (I tend to use that for UDF announcements etc. so people caring about UDF can watch there and don't have to read high-volume LKML). Oops, sorry about the misspelling. Also, I've noted the linux-fsdevel for future (I was just following what it said in MAINTAINERS). On Tue 04-09-12 10:49:39, Ian Abbott wrote: Add support for the O_DIRECT flag. There are two cases to deal with: Out of curiosity, do you have a use for this feature or is it mostly academic interest? I'm planning to use it for an embedded project that needs to stream large files off a CompactFlash card, but the data doesn't need to be in the buffer cache as its only read once, and the system has very limited memory bandwidth so I can't afford the the extra copy. The old version of this project only supported FAT, but that limited the file size to about 4GiB. The filesystem needs to be something reasonably Windows-friendly, at least for adding the files to the CompactFlash card in the first place. 1. Small files stored in the ICB (inode control block?): just return 0 from the new udf_adinicb_direct_IO() handler to fall back to buffered I/O. For direct writes, there is a "gotcha" to deal with when generic_file_direct_write() in mm/filemap.c invalidates the pages. In the udf_adinicb_writepage() handler, only part of the page data will be valid and the rest will be zeroed out, so only copy the valid part into the ICB. (This is actually a bit inefficient as udf_adinicb_write_end() will have already copied the data into the ICB once, but it's pretty likely that the file will grow to the point where its data can no longer be stored in the ICB and will be moved to a different area of the file system. At that point, a different direct_IO handler will be used - see below.) Sorry, I didn't quite get this. What is the problem with copying all the data to inode in udf_adinicb_writepage() as it is now? Part of the good data in the ICB outside the range being addressed would get overwritten by zeroes. This can be tested by creating a UDF filesystem with 4KiB blocks and with small files stored in the ICB, backed by a block device with 512 byte sectors. Create a 2KiB file with random (or non-zero) data on the file system so that its data gets stored in the ICB. Then open the file for writing without truncation and with the O_DIRECT flag set, write 512 bytes at some 512 byte offset within the 2KiB file and close it. If you then hexdump the file, you'll find some of the old random data has been zeroed out. -- -=( Ian Abbott @ MEV Ltd.E-mail: )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] UDF: Add support for O_DIRECT
Hello, first, you have my address wrong (you had suze instead of suse) which is why I wasn't getting your email and not replying (missed the patch in LKML traffic). Second, it's good to CC also linux-fsdevel for UDF related matters (I tend to use that for UDF announcements etc. so people caring about UDF can watch there and don't have to read high-volume LKML). On Tue 04-09-12 10:49:39, Ian Abbott wrote: > Add support for the O_DIRECT flag. There are two cases to deal with: Out of curiosity, do you have a use for this feature or is it mostly academic interest? > 1. Small files stored in the ICB (inode control block?): just return 0 > from the new udf_adinicb_direct_IO() handler to fall back to buffered > I/O. For direct writes, there is a "gotcha" to deal with when > generic_file_direct_write() in mm/filemap.c invalidates the pages. In > the udf_adinicb_writepage() handler, only part of the page data will be > valid and the rest will be zeroed out, so only copy the valid part into > the ICB. (This is actually a bit inefficient as udf_adinicb_write_end() > will have already copied the data into the ICB once, but it's pretty > likely that the file will grow to the point where its data can no longer > be stored in the ICB and will be moved to a different area of the file > system. At that point, a different direct_IO handler will be used - see > below.) Sorry, I didn't quite get this. What is the problem with copying all the data to inode in udf_adinicb_writepage() as it is now? Honza > 2. Larger files, not stored in the ICB: nothing special here. Just call > blockdev_direct_IO() from our new udf_direct_IO() handler and tidy up > any blocks instantiated outside i_size on error. This is pretty > standard. Factor error handling code out of udf_write_begin() into new > function udf_write_failed() so it can also be called by udf_direct_IO(). > > Also change the whitespace in udf_aops and udf_adinicb_aops to make them > a bit neater. > > Signed-off-by: Ian Abbott > --- > v2: Rework error handling in udf_direct_IO to avoid calling deprecated > vmtruncate(). > v3: Rebased to next-20120904. > --- > fs/udf/file.c | 29 + > fs/udf/inode.c | 52 > 2 files changed, 61 insertions(+), 20 deletions(-) > > diff --git a/fs/udf/file.c b/fs/udf/file.c > index 7f3f7ba..f5f9ddd 100644 > --- a/fs/udf/file.c > +++ b/fs/udf/file.c > @@ -34,6 +34,7 @@ > #include > #include > #include > +#include > #include > > #include "udf_i.h" > @@ -64,12 +65,23 @@ static int udf_adinicb_writepage(struct page *page, > struct inode *inode = page->mapping->host; > char *kaddr; > struct udf_inode_info *iinfo = UDF_I(inode); > + loff_t start, end, page_start, page_offset; > > BUG_ON(!PageLocked(page)); > > kaddr = kmap(page); > - memcpy(iinfo->i_ext.i_data + iinfo->i_lenEAttr, kaddr, inode->i_size); > - mark_inode_dirty(inode); > + /* The beginning and/or end of the page data is likely to be invalid > + * for O_DIRECT, so only copy the valid part. */ > + page_start = (loff_t)page->index << PAGE_CACHE_SHIFT; > + start = max(page_start, wbc->range_start); > + end = min3(page_start + (loff_t)PAGE_CACHE_SIZE - 1, > +wbc->range_end, inode->i_size - 1); > + if (likely(start <= end)) { > + page_offset = start - page_start; > + memcpy(iinfo->i_ext.i_data + iinfo->i_lenEAttr + start, > +kaddr + page_offset, (end + 1 - start)); > + mark_inode_dirty(inode); > + } > SetPageUptodate(page); > kunmap(page); > unlock_page(page); > @@ -95,11 +107,20 @@ static int udf_adinicb_write_end(struct file *file, > return simple_write_end(file, mapping, pos, len, copied, page, fsdata); > } > > +static ssize_t udf_adinicb_direct_IO(int rw, struct kiocb *iocb, > + const struct iovec *iov, > + loff_t offset, unsigned long nr_segs) > +{ > + /* Fallback to buffered I/O. */ > + return 0; > +} > + > const struct address_space_operations udf_adinicb_aops = { > .readpage = udf_adinicb_readpage, > .writepage = udf_adinicb_writepage, > - .write_begin = simple_write_begin, > - .write_end = udf_adinicb_write_end, > + .write_begin= simple_write_begin, > + .write_end = udf_adinicb_write_end, > + .direct_IO = udf_adinicb_direct_IO, > }; > > static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec > *iov, > diff --git a/fs/udf/inode.c b/fs/udf/inode.c > index 1a0588e..b905448 100644 > --- a/fs/udf/inode.c > +++ b/fs/udf/inode.c > @@ -95,6 +95,22 @@ void udf_evict_inode(struct inode *inode) > } > } > > +static void udf_write_failed(struct address_space *mapping, loff_t to) > +{ > +
[PATCH v3] UDF: Add support for O_DIRECT
Add support for the O_DIRECT flag. There are two cases to deal with: 1. Small files stored in the ICB (inode control block?): just return 0 from the new udf_adinicb_direct_IO() handler to fall back to buffered I/O. For direct writes, there is a "gotcha" to deal with when generic_file_direct_write() in mm/filemap.c invalidates the pages. In the udf_adinicb_writepage() handler, only part of the page data will be valid and the rest will be zeroed out, so only copy the valid part into the ICB. (This is actually a bit inefficient as udf_adinicb_write_end() will have already copied the data into the ICB once, but it's pretty likely that the file will grow to the point where its data can no longer be stored in the ICB and will be moved to a different area of the file system. At that point, a different direct_IO handler will be used - see below.) 2. Larger files, not stored in the ICB: nothing special here. Just call blockdev_direct_IO() from our new udf_direct_IO() handler and tidy up any blocks instantiated outside i_size on error. This is pretty standard. Factor error handling code out of udf_write_begin() into new function udf_write_failed() so it can also be called by udf_direct_IO(). Also change the whitespace in udf_aops and udf_adinicb_aops to make them a bit neater. Signed-off-by: Ian Abbott --- v2: Rework error handling in udf_direct_IO to avoid calling deprecated vmtruncate(). v3: Rebased to next-20120904. --- fs/udf/file.c | 29 + fs/udf/inode.c | 52 2 files changed, 61 insertions(+), 20 deletions(-) diff --git a/fs/udf/file.c b/fs/udf/file.c index 7f3f7ba..f5f9ddd 100644 --- a/fs/udf/file.c +++ b/fs/udf/file.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include "udf_i.h" @@ -64,12 +65,23 @@ static int udf_adinicb_writepage(struct page *page, struct inode *inode = page->mapping->host; char *kaddr; struct udf_inode_info *iinfo = UDF_I(inode); + loff_t start, end, page_start, page_offset; BUG_ON(!PageLocked(page)); kaddr = kmap(page); - memcpy(iinfo->i_ext.i_data + iinfo->i_lenEAttr, kaddr, inode->i_size); - mark_inode_dirty(inode); + /* The beginning and/or end of the page data is likely to be invalid +* for O_DIRECT, so only copy the valid part. */ + page_start = (loff_t)page->index << PAGE_CACHE_SHIFT; + start = max(page_start, wbc->range_start); + end = min3(page_start + (loff_t)PAGE_CACHE_SIZE - 1, + wbc->range_end, inode->i_size - 1); + if (likely(start <= end)) { + page_offset = start - page_start; + memcpy(iinfo->i_ext.i_data + iinfo->i_lenEAttr + start, + kaddr + page_offset, (end + 1 - start)); + mark_inode_dirty(inode); + } SetPageUptodate(page); kunmap(page); unlock_page(page); @@ -95,11 +107,20 @@ static int udf_adinicb_write_end(struct file *file, return simple_write_end(file, mapping, pos, len, copied, page, fsdata); } +static ssize_t udf_adinicb_direct_IO(int rw, struct kiocb *iocb, +const struct iovec *iov, +loff_t offset, unsigned long nr_segs) +{ + /* Fallback to buffered I/O. */ + return 0; +} + const struct address_space_operations udf_adinicb_aops = { .readpage = udf_adinicb_readpage, .writepage = udf_adinicb_writepage, - .write_begin = simple_write_begin, - .write_end = udf_adinicb_write_end, + .write_begin= simple_write_begin, + .write_end = udf_adinicb_write_end, + .direct_IO = udf_adinicb_direct_IO, }; static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov, diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 1a0588e..b905448 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -95,6 +95,22 @@ void udf_evict_inode(struct inode *inode) } } +static void udf_write_failed(struct address_space *mapping, loff_t to) +{ + struct inode *inode = mapping->host; + struct udf_inode_info *iinfo = UDF_I(inode); + loff_t isize = inode->i_size; + + if (to > isize) { + truncate_pagecache(inode, to, isize); + if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB) { + down_write(&iinfo->i_data_sem); + udf_truncate_extents(inode); + up_write(&iinfo->i_data_sem); + } + } +} + static int udf_writepage(struct page *page, struct writeback_control *wbc) { return block_write_full_page(page, udf_get_block, wbc); @@ -124,21 +140,24 @@ static int udf_write_begin(struct file *file, struct address_space *mapping, int ret; ret = block_write_begin(mapping, pos, len, flags, pagep, udf_get_block); - if (unlikely(