RE: [PATCH] aio: fix kernel bug when page is temporally busy
Zach Brown wrote: > It will not return until kiocbSetKicked() is called, > and that is only called from kick_iocb(). There is no test or wait of Kicked in considered for (;;) aio_write() loop. Zach Brown wrote: >> The proposed patch does not crate this bug if any. > Right, and I said that in the mail you're quoting. ... > You're introducing other bugs with the patch. Could you list other introduced bugs? >> It is interesting that I've not seen any EIOCBQUEUED returned >> to aio_run_iocb() during 5 hours aiostress running. > What arguments are you running aio-stress with? -EIOCBQUEUED is only > used for O_DIRECT I wrote in the vary first mail that the panic is appearing in random write O_DIRECT aio-stress running. Other aio-stress modes where tested after patching as well. > and then only in certain circumstances. Looking closely into sources we can see that EIOCBQUEUED never may be returned to aio_run_iocb(). include/linux/aio.h says * If ki_retry returns -EIOCBRETRY ... Could you point source line which "returns -EIOCBRETRY"? Leonid -Original Message- From: Zach Brown [mailto:[EMAIL PROTECTED] Sent: Friday, February 16, 2007 3:01 AM To: Ananiev, Leonid I Cc: Ken Chen; [EMAIL PROTECTED]; Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Chris Mason Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On Feb 15, 2007, at 3:32 PM, Ananiev, Leonid I wrote: >>> If EIOCBRETRY then generic_file_aio_write() will be recalled for the >>> same iocb. >> Only if kick_iocb() is called. It won't be called if i_i_p2_r() was >> the only thing to return -EIOCBRETRY. > It is not need to call kick_iocb() > for generic_file_aio_write() calling. > It is recalled without any wakeup waiting: > for (;;) { > ret = filp->f_op->aio_write(&kiocb, &iov, 1, > kiocb.ki_pos); > if (ret != -EIOCBRETRY) > break; > wait_on_retry_sync_kiocb(&kiocb); > } > Note: wait_on_retry_sync_kiocb() does not wait. Yes it does. It will not return until kiocbSetKicked() is called, and that is only called from kick_iocb(). >>> It overwrites -EIOCBQUEUED > Do you mean that there is one more kernel bug which > overwrites -EIOCBQUEUED by any errno or number of bytes and this > new value is returned to caller as an IO result > while IO is not finished yet. > > The proposed patch does not crate this bug if any. Right, and I said that in the mail you're quoting. > It actually fixes a kernel panic bag when iocb.users count becomes > incorrect. The bag " Kernel BUG at fs/aio.c:509" is there because > aio_run_iocb() have not a chance to differ real EIO and > EIO which is actually means EAGAYN or EIOCBRETRY. Yes, I understand the bug you're trying to fix. You're introducing other bugs with the patch. It will not be merged. > It is interesting that I've not seen any EIOCBQUEUED returned > to aio_run_iocb() during 5 hours aiostress running. What arguments are you running aio-stress with? -EIOCBQUEUED is only used for O_DIRECT, and then only in certain circumstances. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Feb 15, 2007, at 3:32 PM, Ananiev, Leonid I wrote: If EIOCBRETRY then generic_file_aio_write() will be recalled for the same iocb. Only if kick_iocb() is called. It won't be called if i_i_p2_r() was the only thing to return -EIOCBRETRY. It is not need to call kick_iocb() for generic_file_aio_write() calling. It is recalled without any wakeup waiting: for (;;) { ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); if (ret != -EIOCBRETRY) break; wait_on_retry_sync_kiocb(&kiocb); } Note: wait_on_retry_sync_kiocb() does not wait. Yes it does. It will not return until kiocbSetKicked() is called, and that is only called from kick_iocb(). It overwrites -EIOCBQUEUED Do you mean that there is one more kernel bug which overwrites -EIOCBQUEUED by any errno or number of bytes and this new value is returned to caller as an IO result while IO is not finished yet. The proposed patch does not crate this bug if any. Right, and I said that in the mail you're quoting. It actually fixes a kernel panic bag when iocb.users count becomes incorrect. The bag " Kernel BUG at fs/aio.c:509" is there because aio_run_iocb() have not a chance to differ real EIO and EIO which is actually means EAGAYN or EIOCBRETRY. Yes, I understand the bug you're trying to fix. You're introducing other bugs with the patch. It will not be merged. It is interesting that I've not seen any EIOCBQUEUED returned to aio_run_iocb() during 5 hours aiostress running. What arguments are you running aio-stress with? -EIOCBQUEUED is only used for O_DIRECT, and then only in certain circumstances. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
>> If EIOCBRETRY then generic_file_aio_write() will be recalled for the >> same iocb. > Only if kick_iocb() is called. It won't be called if i_i_p2_r() was > the only thing to return -EIOCBRETRY. It is not need to call kick_iocb() for generic_file_aio_write() calling. It is recalled without any wakeup waiting: for (;;) { ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); if (ret != -EIOCBRETRY) break; wait_on_retry_sync_kiocb(&kiocb); } Note: wait_on_retry_sync_kiocb() does not wait. That is for dio. For aio iocb generic_file_aio_write() call is required from ki_run_list while next io_submit or read_events() is called. So when an IO hang may happen? >> It overwrites -EIOCBQUEUED Do you mean that there is one more kernel bug which overwrites -EIOCBQUEUED by any errno or number of bytes and this new value is returned to caller as an IO result while IO is not finished yet. The proposed patch does not crate this bug if any. It actually fixes a kernel panic bag when iocb.users count becomes incorrect. The bag " Kernel BUG at fs/aio.c:509" is there because aio_run_iocb() have not a chance to differ real EIO and EIO which is actually means EAGAYN or EIOCBRETRY. I'me sure the patch changes source code in correct direction: to differentiate that two kinds of EIOs. > Have you read the giant comment over the definition of struct kiocb > in include/linux/aio.h? I have read. But compiler has not: it did not create an object code for * If ki_retry returns -EIOCBRETRY ... >>> This can lead to reference count confusion. >> But just reference count confusion was deleted by patch. Isn't it? >Sorry, I don't understand what you're trying to ask here. One of reference count iocb.users confusion is deleted by the patch. I'm not sure that there is other bag. At least I have not see IO hang while testing. It is interesting that I've not seen any EIOCBQUEUED returned to aio_run_iocb() during 5 hours aiostress running. Does it mean that EIOCBQUEUED is always reset and never returned? Leonid -Original Message- From: Zach Brown [mailto:[EMAIL PROTECTED] Sent: Thursday, February 15, 2007 10:23 PM To: Ananiev, Leonid I Cc: Ken Chen; [EMAIL PROTECTED]; Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Chris Mason Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On Feb 15, 2007, at 11:11 AM, Ananiev, Leonid I wrote: >> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be >> called. This can lead to operations hanging > > If EIOCBRETRY then generic_file_aio_write() will be recalled for the > same iocb. Only if kick_iocb() is called. It won't be called if i_i_p2_r() was the only thing to return -EIOCBRETRY. > >> It overwrites -EIOCBQUEUED, leading to an aio_complete() while a >> retry is happening. > > EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call: Not by fs/aio.c, but *by the place that originated -EIOCBQUEUED*. Later. After IO has completed. see fs/direct-io.c:dio_bio_end_aio(). This is what -EIOCBQUEUED means! It's a promise to call aio_complete () in the future. Have you read the giant comment over the definition of struct kiocb in include/linux/aio.h? >> This can lead to reference count confusion. > But just reference count confusion was deleted by patch. Isn't it? Sorry, I don't understand what you're trying to ask here. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
-Original Message- From: Zach Brown [mailto:[EMAIL PROTECTED] Sent: Thursday, February 15, 2007 10:23 PM To: Ananiev, Leonid I Cc: Ken Chen; [EMAIL PROTECTED]; Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Chris Mason Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On Feb 15, 2007, at 11:11 AM, Ananiev, Leonid I wrote: >> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be >> called. This can lead to operations hanging > > If EIOCBRETRY then generic_file_aio_write() will be recalled for the > same iocb. Only if kick_iocb() is called. It won't be called if i_i_p2_r() was the only thing to return -EIOCBRETRY. > >> It overwrites -EIOCBQUEUED, leading to an aio_complete() while a >> retry is happening. > > EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call: Not by fs/aio.c, but *by the place that originated -EIOCBQUEUED*. Later. After IO has completed. see fs/direct-io.c:dio_bio_end_aio(). This is what -EIOCBQUEUED means! It's a promise to call aio_complete () in the future. Have you read the giant comment over the definition of struct kiocb in include/linux/aio.h? >> This can lead to reference count confusion. > But just reference count confusion was deleted by patch. Isn't it? Sorry, I don't understand what you're trying to ask here. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Feb 15, 2007, at 11:11 AM, Ananiev, Leonid I wrote: It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be called. This can lead to operations hanging If EIOCBRETRY then generic_file_aio_write() will be recalled for the same iocb. Only if kick_iocb() is called. It won't be called if i_i_p2_r() was the only thing to return -EIOCBRETRY. It overwrites -EIOCBQUEUED, leading to an aio_complete() while a retry is happening. EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call: Not by fs/aio.c, but *by the place that originated -EIOCBQUEUED*. Later. After IO has completed. see fs/direct-io.c:dio_bio_end_aio(). This is what -EIOCBQUEUED means! It's a promise to call aio_complete () in the future. Have you read the giant comment over the definition of struct kiocb in include/linux/aio.h? This can lead to reference count confusion. But just reference count confusion was deleted by patch. Isn't it? Sorry, I don't understand what you're trying to ask here. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be > called. This can lead to operations hanging If EIOCBRETRY then generic_file_aio_write() will be recalled for the same iocb. > It overwrites -EIOCBQUEUED, leading to an aio_complete() while a > retry is happening. EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call: if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) { aio_complete(iocb, ret, 0); > This can lead to reference count confusion. But just reference count confusion was deleted by patch. Isn't it? Leonid -Original Message- From: Zach Brown [mailto:[EMAIL PROTECTED] Sent: Thursday, February 15, 2007 9:25 PM To: Ananiev, Leonid I Cc: Ken Chen; [EMAIL PROTECTED]; Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Chris Mason Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy > If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch > "aio: fix kernel bug when page is temporally busy" Sorry Leonid, this patch is not safe. It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be called. This can lead to operations hanging, both AIO and calls that come through do_sync_{read,write}. It overwrites -EIOCBQUEUED, leading to an aio_complete() while a retry is happening. This can lead to reference count confusion. Double-frees, referencing freed memory, that kind of thing. This isn't a new problem. The current code that overwrites with -EIO has this problem. But moving to -EIOCBRETRY does introduce new behaviour of aio_complete() and the retry path racing. I'll have a candidate patch to address the problem of EIO being raised on the way back up from a path which has returned -EIOCBQUEUED. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch "aio: fix kernel bug when page is temporally busy" Sorry Leonid, this patch is not safe. It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be called. This can lead to operations hanging, both AIO and calls that come through do_sync_{read,write}. It overwrites -EIOCBQUEUED, leading to an aio_complete() while a retry is happening. This can lead to reference count confusion. Double-frees, referencing freed memory, that kind of thing. This isn't a new problem. The current code that overwrites with -EIO has this problem. But moving to -EIOCBRETRY does introduce new behaviour of aio_complete() and the retry path racing. I'll have a candidate patch to address the problem of EIO being raised on the way back up from a path which has returned -EIOCBQUEUED. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
Ken Chen wrote > It might shut up kernel > panic by eliminate double calls to aio_complete(), but it will > silently introduce data corruption. I had got kernel panic after an hour of aiostress running. After patching I have not got aiostress massage "verify error, file %s offset %Lu contents (offset:bad:good):\n" during 5 hour aiostress running with 'verify' option. Looking closely into aiostress.c ftp://ftp.suse.com/pub/people/mason/utils/aio-stress.c we can see that this program may write in random mode THE SAME contents to the same file offset asynchronously from different buffers and do not cure about it. Does Ken consider that kernel panic is the best way to prevent data corruption in such kind of programs? > So any error value returned from invalidate_inode_pages2_range() has > to be taken seriously in the direct IO submit path instead of dropping > it to the floor. If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch "aio: fix kernel bug when page is temporally busy" sets then do_sync_read/write() will not drop IO submit but will retry it: for (;;) { ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos); if (ret != -EIOCBRETRY) break; wait_on_retry_sync_kiocb(&kiocb); } And do_sync_read/write() will not return EIO if page is busy as it does now, before patching. Ken Chen wrote: > I also think the original patch is wrong. What do you mean saying 'also'? Leonid - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
> Please, tell us what problem this is fixing so that we can look into > alternative solutions. This patch fixes a kernel panic "Kernel BUG at fs/aio.c:509" http://marc.theaimsgroup.com/?l=linux-kernel&m=117031052517746&w=2 First of all it was found that the kernel panic happens after IO error reporting. But later it was found that the actual reason is not in real IO error but in a busy page. EIO is returned if page has unstable state while IO completion result is processing. If aio_run_iocb() as well as in do_sync_read/write() get EIO result it is considered as IO is finished and IO control block could be free; result is reported to caller. I suppose that aio_run_iocb() as well as in do_sync_read/write() should have a chance to differ real EIO and EIO which is actually means EAGAYN. That functions test for EIOCBRETRY meaning EAGAYN purpose. do_sync_read/write() retries IO as well as aio: for (;;) { ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos); if (ret != -EIOCBRETRY) break; wait_on_retry_sync_kiocb(&kiocb); } By the way EIOCBRETRY is never actually set in the kernel. So what is the purpose of previous lines? More details about EIO returning: aio_complete: res=-5 iocb=81002b38f080 ki_users=2 [] invalidate_inode_pages2_range+0x236/0x26b [] ext3_direct_IO+0x16c/0x19e [] ext3_get_block+0x0/0xe2 [] generic_file_direct_IO+0xb9/0xcf [] generic_file_direct_write+0x67/0x10e [] __generic_file_aio_write_nolock+0x2d6/0x3fe [] __switch_to+0x26f/0x27e [] thread_return+0x0/0xd8 [] generic_file_aio_write+0x67/0xc7 [] ext3_file_write+0x0/0x8f [] ext3_file_write+0x16/0x8f [] ext3_file_write+0x0/0x8f [] aio_rw_vect_retry+0x72/0x14f [] aio_run_iocb+0xe6/0x187 [] io_submit_one+0x296/0x2e3 [] sys_io_submit+0x9b/0x108 [] default_wake_function+0x0/0xe [] system_call+0x7e/0x83 Call Trace: [] aio_complete+0x5c/0x18c [] finished_one_bio+0xac/0xf3 [] dio_bio_complete+0x95/0xaa [] dio_bio_end_aio+0x20/0x25 [] __end_that_request_first+0x10e/0x3fd [] scsi_end_request+0x27/0xc9 [] scsi_io_completion+0xf0/0x2c5 [] ahd_done+0x537/0x580 [] sd_rw_intr+0x182/0x1ad [] blk_done_softirq+0x5c/0x6a [] __do_softirq+0x55/0xc3 [] ack_apic_level+0x37/0x4b [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] do_IRQ+0xa8/0xc8 [] mwait_idle+0x0/0x20 [] ret_from_intr+0x0/0xa [] mwait_idle_with_hints+0x44/0x45 [] mwait_idle+0xc/0x20 [] cpu_idle+0x8b/0xb0 [] start_secondary+0x462/0x471 -Original Message- From: Andrew Morton [mailto:[EMAIL PROTECTED] Sent: Thursday, February 15, 2007 6:31 AM To: Ananiev, Leonid I Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On Wed, 14 Feb 2007 20:51:33 +0300 "Ananiev, Leonid I" <[EMAIL PROTECTED]> wrote: > Fix kernel bug when IO page is temporally busy: > invalidate_inode_pages2_range() returns EIOCBRETRY but not EIO. > invalidate_inode_pages2() returns EIO as earlier. > > Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> > --- > --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 > +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.0 -0800 > @@ -366,7 +366,7 @@ static int do_launder_page(struct addres > * Any pages which are found to be mapped into pagetables are unmapped prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2_range(struct address_space *mapping, > pgoff_t start, pgoff_t end) > @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct > } > ret = do_launder_page(mapping, page); > if (ret == 0 && !invalidate_complete_page2(mapping, page)) > - ret = -EIO; > + ret = -EIOCBRETRY; > unlock_page(page); > } > pagevec_release(&pvec); > @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages > */ > int invalidate_inode_pages2(struct address_space *mapping) > { > - return invalidate_inode_pages2_range(mapping, 0, -1); > + int ret = invalidate_inode_pages2_range(mapping, 0, -1); > + return (ret < 0)?-EIO:ret; > } > EXPORT_SYMBOL_GPL(invalidate_inode_pages2); If someone later uses invalidate_inode_pages2_range() elsewhere, they're going to need to know to convert -EIOCBRETRY into -EIO, if they weren't called by aio. Or something. Please, tell us what problem this is fixing so that we can look into alternative solutions. For example, one acceptable-but-ugly solution might be to do: static inline int invalidate_inode_page
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Wed, 14 Feb 2007 20:51:33 +0300 "Ananiev, Leonid I" <[EMAIL PROTECTED]> wrote: > Fix kernel bug when IO page is temporally busy: > invalidate_inode_pages2_range() returns EIOCBRETRY but not EIO. > invalidate_inode_pages2() returns EIO as earlier. > > Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> > --- > --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 > +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.0 -0800 > @@ -366,7 +366,7 @@ static int do_launder_page(struct addres > * Any pages which are found to be mapped into pagetables are unmapped prior > to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2_range(struct address_space *mapping, > pgoff_t start, pgoff_t end) > @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct > } > ret = do_launder_page(mapping, page); > if (ret == 0 && !invalidate_complete_page2(mapping, > page)) > - ret = -EIO; > + ret = -EIOCBRETRY; > unlock_page(page); > } > pagevec_release(&pvec); > @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages > */ > int invalidate_inode_pages2(struct address_space *mapping) > { > - return invalidate_inode_pages2_range(mapping, 0, -1); > + int ret = invalidate_inode_pages2_range(mapping, 0, -1); > + return (ret < 0)?-EIO:ret; > } > EXPORT_SYMBOL_GPL(invalidate_inode_pages2); If someone later uses invalidate_inode_pages2_range() elsewhere, they're going to need to know to convert -EIOCBRETRY into -EIO, if they weren't called by aio. Or something. Please, tell us what problem this is fixing so that we can look into alternative solutions. For example, one acceptable-but-ugly solution might be to do: static inline int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end) { return __invalidate_inode_pages2_range(mapping, start, end, 0); } static inline int invalidate_inode_pages2_range_for_aio(struct address_space *mapping, pgoff_t start, pgoff_t end) { return __invalidate_inode_pages2_range(mapping, start, end, 1); } and to then use invalidate_inode_pages2_range_for_aio() from the appropriate callsite. But without a complete description of the bug which this is fixing, it's hard to say how practical such an approach would be. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
Andrew, You wrote on Friday, February 09, 2007 8:53 AM > invalidate_inode_pages2() has other callers. I suspect with this change > we'll end up leaking EIOCBRETRY back to userspace. The path is modified so that invalidate_inode_pages2() returns EIO as earlier. could you consider modified patch The patch against 2.6.20. Long story: The kernel panic is happening after hours of AIO benchmark running in mcp. First of all it was found that the kernel panic happens if IO error is reported. But later it was found that the actual reason is not in real IO error but in a busy page. While the current CPU tests if IO is completed it happens that another CPU at the same time processes IO completion in soft_irq. The considered buffer page is busy now by second CPU and invalidate_inode_pages2_range() returns EIO in this case. First CPU reports EIO to caller ; completes IO and frees control block in aio_complete(). Second CPU frees the same control block once more. The patch makes invalidate_inode_pages2_range() to return EIOCBRETRY which is tested just in aio_run_iocb(). It retries IO competition check if EIOCBRETRY is got. EIOCBRETRY is tested in do_sync_read/write() functions as well. And direct IO competition will be retested "instead of dropping it to the floor". >From Leonid Ananiev Fix kernel bug when IO page is temporally busy: invalidate_inode_pages2_range() returns EIOCBRETRY but not EIO. invalidate_inode_pages2() returns EIO as earlier. Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> --- --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.0 -0800 @@ -366,7 +366,7 @@ static int do_launder_page(struct addres * Any pages which are found to be mapped into pagetables are unmapped prior to * invalidation. * - * Returns -EIO if any pages could not be invalidated. + * Returns -EIOCBRETRY if any pages could not be invalidated. */ int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end) @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct } ret = do_launder_page(mapping, page); if (ret == 0 && !invalidate_complete_page2(mapping, page)) - ret = -EIO; + ret = -EIOCBRETRY; unlock_page(page); } pagevec_release(&pvec); @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages */ int invalidate_inode_pages2(struct address_space *mapping) { - return invalidate_inode_pages2_range(mapping, 0, -1); + int ret = invalidate_inode_pages2_range(mapping, 0, -1); + return (ret < 0)?-EIO:ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
Andrew, You wrote on Friday, February 09, 2007 8:53 AM > invalidate_inode_pages2() has other callers. I suspect with this change > we'll end up leaking EIOCBRETRY back to userspace. The path is modified so that invalidate_inode_pages2() returns EIO as earlier. could you consider modified patch The patch against 2.6.20. Long story: The kernel panic is happening after hours of AIO benchmark running in mcp. First of all it was found that the kernel panic happens if IO error is reported. But later it was found that the actual reason is not in real IO error but in a busy page. While the current CPU tests if IO is completed it happens that another CPU at the same time processes IO completion in soft_irq. The considered buffer page is busy now by second CPU and invalidate_inode_pages2_range() returns EIO in this case. First CPU reports EIO to caller ; completes IO and frees control block in aio_complete(). Second CPU frees the same control block once more. The patch makes invalidate_inode_pages2_range() to return EIOCBRETRY which is tested just in aio_run_iocb(). It retries IO competition check if EIOCBRETRY is got. EIOCBRETRY is tested in do_sync_read/write() functions as well. And direct IO competition will be retested "instead of dropping it to the floor". >From Leonid Ananiev Fix kernel bug when IO page is temporally busy: invalidate_inode_pages2_range() returns EIOCBRETRY but not EIO. invalidate_inode_pages2() returns EIO as earlier. Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> --- --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.0 -0800 @@ -366,7 +366,7 @@ static int do_launder_page(struct addres * Any pages which are found to be mapped into pagetables are unmapped prior to * invalidation. * - * Returns -EIO if any pages could not be invalidated. + * Returns -EIOCBRETRY if any pages could not be invalidated. */ int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end) @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct } ret = do_launder_page(mapping, page); if (ret == 0 && !invalidate_complete_page2(mapping, page)) - ret = -EIO; + ret = -EIOCBRETRY; unlock_page(page); } pagevec_release(&pvec); @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages */ int invalidate_inode_pages2(struct address_space *mapping) { - return invalidate_inode_pages2_range(mapping, 0, -1); + int ret = invalidate_inode_pages2_range(mapping, 0, -1); + return (ret < 0)?-EIO:ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
Ken Chen writes: > So any error value returned from invalidate_inode_pages2_range() has > to be taken seriously in the direct IO submit path instead of dropping > it to the floor. If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch "aio: fix kernel bug when page is temporally busy" serts then do_sync_read/write() will not drop IO submit but will retry it: for (;;) { ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos); if (ret != -EIOCBRETRY) break; wait_on_retry_sync_kiocb(&kiocb); } And do_sync_read/write() will not return EIO if page is busy as it does now, befor patching. Curretly do_sync_read/write() tests for EIOCBRETRY. But EIOCBRETRY is not set ever. Leonid - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
> If invalidate_inode_pages2_range() says it can not invalidate pages, > while dio to the same file offset range is in flight, something is > really wrong there. If invalidate_inode_pages2_range() says it can not invalidate pages It means that soft_irq does completing IO now on other cpu. Next retry() call in aio_run_iocb() will see the IO well completed. The patch is updated: invalidate_inode_pages2() returns EIO as earlier for nfs and other. But invalidate_inode_pages2_range() returns EIOCBRETRY for aio and dio. The patch against 2.6.20. >From Leonid Ananiev Fix kernel bug when IO page is temporally busy: invalidate_inode_pages2_range() returns EIOCBRETRY but not EIO. Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> --- --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.0 -0800 @@ -366,7 +366,7 @@ static int do_launder_page(struct addres * Any pages which are found to be mapped into pagetables are unmapped prior to * invalidation. * - * Returns -EIO if any pages could not be invalidated. + * Returns -EIOCBRETRY if any pages could not be invalidated. */ int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end) @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct } ret = do_launder_page(mapping, page); if (ret == 0 && !invalidate_complete_page2(mapping, page)) - ret = -EIO; + ret = -EIOCBRETRY; unlock_page(page); } pagevec_release(&pvec); @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages */ int invalidate_inode_pages2(struct address_space *mapping) { - return invalidate_inode_pages2_range(mapping, 0, -1); + int ret = invalidate_inode_pages2_range(mapping, 0, -1); + return (ret < 0)?-EIO:ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
> If invalidate_inode_pages2_range() says it can not invalidate pages, > while dio to the same file offset range is in flight, something is > really wrong there. If invalidate_inode_pages2_range() says it can not invalidate pages It means that soft_irq does completing now on other cpu. Next retry() will see a IO well completed. Leonid -Original Message- From: Ken Chen [mailto:[EMAIL PROTECTED] Sent: Saturday, February 10, 2007 9:05 PM To: Ananiev, Leonid I Cc: [EMAIL PROTECTED]; Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Zach Brown; Chris Mason; Badari Pulavarty Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On 2/9/07, Ananiev, Leonid I <[EMAIL PROTECTED]> wrote: > I have used EIOCBRETRY in the patch to minimize source code modification > only. > [...] > A lot of errno's have different meaning in different functions or > contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant > set. I also think the original patch is wrong. It might shut up kernel panic by eliminate double calls to aio_complete(), but it will silently introduce data corruption. If invalidate_inode_pages2_range() says it can not invalidate pages, while dio to the same file offset range is in flight, something is really wrong there. In generic_file_direct_IO, the function explicitly flushes all dirty pages and wait on them before submits DIO. So any error value returned from invalidate_inode_pages2_range() has to be taken seriously in the direct IO submit path instead of dropping it to the floor. - Ken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
But if page is busy as invalidate_inode_pages2_range() says because of bh_count>0 then aio_complet() is not called from aio_run_iocb() and next retry() will get the iocb result. Leonid -Original Message- From: Ken Chen [mailto:[EMAIL PROTECTED] Sent: Saturday, February 10, 2007 9:05 PM To: Ananiev, Leonid I Cc: [EMAIL PROTECTED]; Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Zach Brown; Chris Mason; Badari Pulavarty Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On 2/9/07, Ananiev, Leonid I <[EMAIL PROTECTED]> wrote: > I have used EIOCBRETRY in the patch to minimize source code modification > only. > [...] > A lot of errno's have different meaning in different functions or > contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant > set. I also think the original patch is wrong. It might shut up kernel panic by eliminate double calls to aio_complete(), but it will silently introduce data corruption. If invalidate_inode_pages2_range() says it can not invalidate pages, while dio to the same file offset range is in flight, something is really wrong there. In generic_file_direct_IO, the function explicitly flushes all dirty pages and wait on them before submits DIO. So any error value returned from invalidate_inode_pages2_range() has to be taken seriously in the direct IO submit path instead of dropping it to the floor. - Ken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On 2/9/07, Ananiev, Leonid I <[EMAIL PROTECTED]> wrote: I have used EIOCBRETRY in the patch to minimize source code modification only. [...] A lot of errno's have different meaning in different functions or contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant set. I also think the original patch is wrong. It might shut up kernel panic by eliminate double calls to aio_complete(), but it will silently introduce data corruption. If invalidate_inode_pages2_range() says it can not invalidate pages, while dio to the same file offset range is in flight, something is really wrong there. In generic_file_direct_IO, the function explicitly flushes all dirty pages and wait on them before submits DIO. So any error value returned from invalidate_inode_pages2_range() has to be taken seriously in the direct IO submit path instead of dropping it to the floor. - Ken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Feb 9, 2007, at 6:05 AM, Suparna Bhattacharya wrote: On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote: On Fri, 9 Feb 2007, Andrew Morton wrote: @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, do_generic_file_read(filp,ppos,&desc,file_read_actor); retval += desc.written; if (desc.error) { - retval = retval ?: desc.error; + retval = desc.error; I was worried about this too. blocking. The high level AIO code (see aio_rw_vect_rety) has the ability to handle this. I had missed this, and yeah, that's some level of comfort. But I'm not convinced we can guarantee that's safe. The positive return code that aio_rw_vect_retry() sees is telling it that some IO has completed and, arguably, that no more IO is in flight. If we return partial progress from generic_file_aio_read() while we have an iocb in a wait queue then we are adding yet another invariant. That while an iocb is pending from a previous call down the call chain, we can't return a non-aio negative error. Doing so would cause fs/aio.c to complete while there is still an iocb on a wait queue from a previous retry attempt. Right? I also noticed something just now while poking around these paths to see if I could get the start of generic_file_aio_read() to fail when it had previously succeeded. What's to stop another task from racing to set O_DIRECT between retries? That sounds like a pretty hilarious way to get a read retry to fail due to buffer misalignment while a previously buffered instance of it is still in flight. Hi-larious. In thinking about this a discussing it with Chris a bit, I wonder if the right fix isn't to refuse changing O_DIRECT via setfl() once any IO paths have started on the filp. Something like: filp->frozen_flags |= O_DIRECT at the start of paths and check it in setfl()? Are we similarly worried about O_APPEND? - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
>>noone seems to be setting the error values to EIOCBRETRY > We'd want retries to act only on the remaining bytes which haven't been > transferred as yet, so even in the EIOCBRETRY... OK. But any variable is not set to EIOCBRETRY in 2.6.14, 2.6.19, 2.6.20. Is it correct? Leonid -Original Message- From: Suparna Bhattacharya [mailto:[EMAIL PROTECTED] Sent: Friday, February 09, 2007 2:06 PM To: Jiri Kosina Cc: Andrew Morton; Ananiev, Leonid I; linux-kernel@vger.kernel.org; linux-aio; Zach Brown; Chris Mason; Badari Pulavarty; Jan Kara Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote: > On Fri, 9 Feb 2007, Andrew Morton wrote: > > > > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, > > > do_generic_file_read(filp,ppos,&desc,file_read_actor); > > > retval += desc.written; > > > if (desc.error) { > > > - retval = retval ?: desc.error; > > > + retval = desc.error; > > > break; > > > } > > Nope. On error the read() syscall must return the number of bytes which > > were successfully read. > > You are right. > > In current mainline this even is not a problem, because noone seems to be > setting the error values to EIOCBRETRY. But it still stinks a bit, as > there are tests for EIOCBRETRY. We'd want retries to act only on the remaining bytes which haven't been transferred as yet, so even in the EIOCBRETRY case the right thing to do is to return the number of bytes that were successfully read without blocking. The high level AIO code (see aio_rw_vect_rety) has the ability to handle this. So this is still correct. Is there a real bug that you are seeing here ? Regards Suparna > > -- > Jiri Kosina > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to [EMAIL PROTECTED] For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: mailto:"[EMAIL PROTECTED]">[EMAIL PROTECTED] -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote: > On Fri, 9 Feb 2007, Andrew Morton wrote: > > > > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const > > > struct iovec *iov, > > > do_generic_file_read(filp,ppos,&desc,file_read_actor); > > > retval += desc.written; > > > if (desc.error) { > > > - retval = retval ?: desc.error; > > > + retval = desc.error; > > > break; > > > } > > Nope. On error the read() syscall must return the number of bytes which > > were successfully read. > > You are right. > > In current mainline this even is not a problem, because noone seems to be > setting the error values to EIOCBRETRY. But it still stinks a bit, as > there are tests for EIOCBRETRY. We'd want retries to act only on the remaining bytes which haven't been transferred as yet, so even in the EIOCBRETRY case the right thing to do is to return the number of bytes that were successfully read without blocking. The high level AIO code (see aio_rw_vect_rety) has the ability to handle this. So this is still correct. Is there a real bug that you are seeing here ? Regards Suparna > > -- > Jiri Kosina > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to [EMAIL PROTECTED] For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: mailto:"[EMAIL PROTECTED]">[EMAIL PROTECTED] -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, 9 Feb 2007, Andrew Morton wrote: > > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const > > struct iovec *iov, > > do_generic_file_read(filp,ppos,&desc,file_read_actor); > > retval += desc.written; > > if (desc.error) { > > - retval = retval ?: desc.error; > > + retval = desc.error; > > break; > > } > Nope. On error the read() syscall must return the number of bytes which > were successfully read. You are right. In current mainline this even is not a problem, because noone seems to be setting the error values to EIOCBRETRY. But it still stinks a bit, as there are tests for EIOCBRETRY. -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, 9 Feb 2007 10:54:36 +0100 (CET) Jiri Kosina <[EMAIL PROTECTED]> wrote: > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct > iovec *iov, > do_generic_file_read(filp,ppos,&desc,file_read_actor); > retval += desc.written; > if (desc.error) { > - retval = retval ?: desc.error; > + retval = desc.error; > break; > } Nope. On error the read() syscall must return the number of bytes which were successfully read. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, 9 Feb 2007, Ananiev, Leonid I wrote: > I have used EIOCBRETRY in the patch to minimize source code modification > only. It is notable that EIOCBRETRY is never set in kernel, but tested > only. There is indeed something strange in aio in 2.6.20 (and maybe older) - for example do_generic_mapping_read() gets stuck inside lock_page() in the situation when the page is not up-to-date, instead of returning properl error to aio_run_cb(). Or does it work in some other way and I get it wrong? -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, 9 Feb 2007, Ananiev, Leonid I wrote: > Fix "Kernel BUG at fs/aio.c:509". Return EIOCBRETRY but not EIO if page > is busy. I am currently also hunting in this area, and I think there is something strange in generic_file_aio_read(). This seems strange: for (seg = 0; seg < nr_segs; seg++) { read_descriptor_t desc; desc.written = 0; desc.arg.buf = iov[seg].iov_base; desc.count = iov[seg].iov_len; if (desc.count == 0) continue; desc.error = 0; do_generic_file_read(filp,ppos,&desc,file_read_actor); retval += desc.written; if (desc.error) { retval = retval ?: desc.error; break; } } So this code propagates desc.error back to caller _only if_ all of the previous segments had desc.written == 0. Which is bogus - we need to propagate -EIOCBRETRY as soon as any of the do_generic_file_read() returned it, so that the caller is aware of requests being queued. I think something like this would be appropriate [PATCH] AIO: handle return value from do_generic_file_read() properly generic_file_aio_read() doesn't always propagate error return value from do_generic_file_read(). Namely handling of -EIOCBRETRY is important, because aio_run_iocb() relies on this return value being always properly propagated to it. Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> --- mm/filemap.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8332c77..5ce76ce 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, do_generic_file_read(filp,ppos,&desc,file_read_actor); retval += desc.written; if (desc.error) { - retval = retval ?: desc.error; + retval = desc.error; break; } } -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
I have used EIOCBRETRY in the patch to minimize source code modification only. It is notable that EIOCBRETRY is never set in kernel, but tested only. EAGAIN - means that you may want to recall the same function with the same argument. But user have not to recall it just now. He may want to free or process some other his resources before recall for success. EIOCBRETRY - means that you may not recall the same function with the same argument because an argument has no sense already in cuurnt context. The result is in processing already. You may want to recall much higher level or external function to get the common result. It is means that inernal caller should ignore or let through errno to its caller but do not recall the function which has returned this errno. A lot of errno's have different meaning in different functions or contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant set. Leonid -Original Message- From: Suparna Bhattacharya [mailto:[EMAIL PROTECTED] Sent: Friday, February 09, 2007 10:16 AM To: Ananiev, Leonid I Cc: Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Zach Brown; Chris Mason; Badari Pulavarty Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy On Fri, Feb 09, 2007 at 08:41:41AM +0300, Ananiev, Leonid I wrote: > > invalidate_inode_pages2() has other callers. I suspect with this > change > > we'll end up leaking EIOCBRETRY back to userspace. > > EIOCBRETRY is used and caught already in do_sync_read() and > do_sync_readv_writev(). > > Below fixed patch against kernel 2.6.20. I agree with Andrew, this doesn't look like the right solution. EIOCBRETRY isn't the same as EAGAIN or "try again later". EIOCBRETRY means "I have queued up an action to notify (wakeup) the callback to retry the operation once data is ready". Regards Suparna > > >From Leonid Ananiev > > Fix kernel bug when IO page is temporally busy: > invalidate_inode_pages2() returns EIOCBRETRY but not EIO.. > > Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> > --- > --- linux-2.6.20/mm/truncate.c2007-02-04 10:44:54.0 -0800 > +++ linux-2.6.20p/mm/truncate.c 2007-02-08 11:38:11.0 -0800 > @@ -366,7 +366,7 @@ static int do_launder_page(struct addres > * Any pages which are found to be mapped into pagetables are unmapped > prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2_range(struct address_space *mapping, > pgoff_t start, pgoff_t end) > @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct > } > ret = do_launder_page(mapping, page); > if (ret == 0 && > !invalidate_complete_page2(mapping, page)) > - ret = -EIO; > + ret = -EIOCBRETRY; > unlock_page(page); > } > pagevec_release(&pvec); > @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages > * Any pages which are found to be mapped into pagetables are unmapped > prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2(struct address_space *mapping) > { > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to [EMAIL PROTECTED] For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: mailto:"[EMAIL PROTECTED]">[EMAIL PROTECTED] -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, Feb 09, 2007 at 08:41:41AM +0300, Ananiev, Leonid I wrote: > > invalidate_inode_pages2() has other callers. I suspect with this > change > > we'll end up leaking EIOCBRETRY back to userspace. > > EIOCBRETRY is used and caught already in do_sync_read() and > do_sync_readv_writev(). > > Below fixed patch against kernel 2.6.20. I agree with Andrew, this doesn't look like the right solution. EIOCBRETRY isn't the same as EAGAIN or "try again later". EIOCBRETRY means "I have queued up an action to notify (wakeup) the callback to retry the operation once data is ready". Regards Suparna > > >From Leonid Ananiev > > Fix kernel bug when IO page is temporally busy: > invalidate_inode_pages2() returns EIOCBRETRY but not EIO.. > > Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> > --- > --- linux-2.6.20/mm/truncate.c2007-02-04 10:44:54.0 -0800 > +++ linux-2.6.20p/mm/truncate.c 2007-02-08 11:38:11.0 -0800 > @@ -366,7 +366,7 @@ static int do_launder_page(struct addres > * Any pages which are found to be mapped into pagetables are unmapped > prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2_range(struct address_space *mapping, > pgoff_t start, pgoff_t end) > @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct > } > ret = do_launder_page(mapping, page); > if (ret == 0 && > !invalidate_complete_page2(mapping, page)) > - ret = -EIO; > + ret = -EIOCBRETRY; > unlock_page(page); > } > pagevec_release(&pvec); > @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages > * Any pages which are found to be mapped into pagetables are unmapped > prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2(struct address_space *mapping) > { > > -- > To unsubscribe, send a message with 'unsubscribe linux-aio' in > the body to [EMAIL PROTECTED] For more info on Linux AIO, > see: http://www.kvack.org/aio/ > Don't email: mailto:"[EMAIL PROTECTED]">[EMAIL PROTECTED] -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, 9 Feb 2007 08:41:41 +0300 "Ananiev, Leonid I" <[EMAIL PROTECTED]> wrote: > > > invalidate_inode_pages2() has other callers. I suspect with this > change > > we'll end up leaking EIOCBRETRY back to userspace. > > EIOCBRETRY is used and caught already in do_sync_read() and > do_sync_readv_writev(). To pick one example: nfs_follow_link ->nfs_revalidate_mapping_nolock ->nfs_invalidate_mapping_nolock ->invalidate_inode_pages2 so that, I assume, affects open(), unlink(), etc. > Below fixed patch against kernel 2.6.20. The tab->spaces issue is fixed, but it's still all wordwrapped. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] aio: fix kernel bug when page is temporally busy
> invalidate_inode_pages2() has other callers. I suspect with this change > we'll end up leaking EIOCBRETRY back to userspace. EIOCBRETRY is used and caught already in do_sync_read() and do_sync_readv_writev(). Below fixed patch against kernel 2.6.20. >From Leonid Ananiev Fix kernel bug when IO page is temporally busy: invalidate_inode_pages2() returns EIOCBRETRY but not EIO.. Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> --- --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 +++ linux-2.6.20p/mm/truncate.c 2007-02-08 11:38:11.0 -0800 @@ -366,7 +366,7 @@ static int do_launder_page(struct addres * Any pages which are found to be mapped into pagetables are unmapped prior to * invalidation. * - * Returns -EIO if any pages could not be invalidated. + * Returns -EIOCBRETRY if any pages could not be invalidated. */ int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t start, pgoff_t end) @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct } ret = do_launder_page(mapping, page); if (ret == 0 && !invalidate_complete_page2(mapping, page)) - ret = -EIO; + ret = -EIOCBRETRY; unlock_page(page); } pagevec_release(&pvec); @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages * Any pages which are found to be mapped into pagetables are unmapped prior to * invalidation. * - * Returns -EIO if any pages could not be invalidated. + * Returns -EIOCBRETRY if any pages could not be invalidated. */ int invalidate_inode_pages2(struct address_space *mapping) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] aio: fix kernel bug when page is temporally busy
On Fri, 9 Feb 2007 07:29:31 +0300 "Ananiev, Leonid I" <[EMAIL PROTECTED]> wrote: > >From Leonid Ananiev > > Fix "Kernel BUG at fs/aio.c:509". Return EIOCBRETRY but not EIO if page > is busy. > > Signed-off-by: Leonid Ananiev <[EMAIL PROTECTED]> > > "Kernel BUG at fs/aio.c:509" > http://marc.theaimsgroup.com/?l=linux-kernel&m=117031052517746&w=2 > is present in 2.6.20 as well as 2.6.19. > The investigation shows that the bug's panic occurs when aio_complete() > is called with argument ret=EIO which is returned from > invalidate_inode_pages2_range() because the page is temporally busy. > Call Trace: > [] invalidate_inode_pages2_range+0x236/0x26b > [] ext3_direct_IO+0x16c/0x19e > [] ext3_get_block+0x0/0xe2 > [] generic_file_direct_IO+0xb9/0xcf > [] generic_file_direct_write+0x67/0x10e > [] __generic_file_aio_write_nolock+0x2d6/0x3fe > [] __switch_to+0x26f/0x27e > [] thread_return+0x0/0xd8 > [] generic_file_aio_write+0x67/0xc7 > [] ext3_file_write+0x0/0x8f > [] ext3_file_write+0x16/0x8f > [] ext3_file_write+0x0/0x8f > [] aio_rw_vect_retry+0x72/0x14f > [] aio_run_iocb+0xe6/0x187 > [] io_submit_one+0x296/0x2e3 > [] sys_io_submit+0x9b/0x108 > [] default_wake_function+0x0/0xe > [] system_call+0x7e/0x83 > > As a result aio_complete() is called while it should not be. > When IO is finished aio_complete() is called once more > and iocb->ki_users becomes negative. > > Taking into account aio.c lines in aio_run_iocb() > if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) { > aio_complete(iocb, ret, 0); > > proposed patch makes function invalidate_inode_pages2_range() > to return EIOCBRETRY but not EIO if page is busy. > The patch is tested on 2.6.20. > > --- linux-2.6.20/mm/truncate.c 2007-02-04 10:44:54.0 -0800 > +++ linux-2.6.20p/mm/truncate.c 2007-02-08 11:38:11.0 -0800 > @@ -366,7 +366,7 @@ static int do_launder_page(struct addres > * Any pages which are found to be mapped into pagetables are unmapped > prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2_range(struct address_space *mapping, > pgoff_t start, pgoff_t end) > @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct > } > ret = do_launder_page(mapping, page); > if (ret == 0 && > !invalidate_complete_page2(mapping, page)) > - ret = -EIO; > + ret = -EIOCBRETRY; > unlock_page(page); > } > pagevec_release(&pvec); > @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages > * Any pages which are found to be mapped into pagetables are unmapped > prior to > * invalidation. > * > - * Returns -EIO if any pages could not be invalidated. > + * Returns -EIOCBRETRY if any pages could not be invalidated. > */ > int invalidate_inode_pages2(struct address_space *mapping) > { The patch is wildly wordwrapped and has had it tabs replaced with spaces. invalidate_inode_pages2() has other callers. I suspect with this change we'll end up leaking EIOCBRETRY back to userspace. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/