Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote: > > This now does: > > > > CPU 1 CPU 2 > > - - > > (flags = 0) > > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > > (flags = 3) > > [...] We can still add here: (fetch flags) > > xchg(, IRQ_WORK_BUSY) > > (flags = 2) > > func() > > oflags = cmpxchg(, flags, > > nflags); Then the cmpxchg() would fail, and oflags would be 2 > > (sees flags = 2) > > if (flags & IRQ_WORK_PENDING) This should be: if (oflags & IRQ_WORK_PENDING) > > (not true) > > (loop) > > > > cmpxchg(flags, 2, 0); > > (flags = 2) This should be: (flags = 0) as we described the previous cmpxchg() as failing, flags would still be 2 before this cmpxchg(), and this one would succeed. -- Steve > > flags = 3 > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
2012/10/31 Steven Rostedt : > On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote: >> nflags = 1 | 3 >> nflags = 2 | 3 >> In both cases the result would be same.If I am right then wouldn't this >> operation be redundant? > > Right. Actually we could change the new loop to: > > for (;;) { > oflags = cmpxchg(>flags, flags, IRQ_WORK_FLAGS); > if (oflags == flags) > break; > if (oflags & IRQ_WORK_PENDING) > return false; > flags = oflags; > cpu_relax(); > } We could. But I wanted to keep the code able to handle new flags in the future (such as IRQ_WORK_LAZY). > Frederic, > > Would you like to add my explanation to your change log? You can add the > entire thing, which I think would explain a lot to people. It's indeed a very clear explanation. I'll put that in the changelog, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote: > > /* > > * Claim the entry so that no one else will poke at it. > > */ > > static bool irq_work_claim(struct irq_work *work) > > { > > unsigned long flags, nflags; > > > > for (;;) { > > flags = work->flags; > > if (flags & IRQ_WORK_PENDING) > > return false; > > nflags = flags | IRQ_WORK_FLAGS; > nflags = 1 | 3 > nflags = 2 | 3 > In both cases the result would be same.If I am right then wouldn't this > operation be redundant? Right. Actually we could change the new loop to: for (;;) { oflags = cmpxchg(>flags, flags, IRQ_WORK_FLAGS); if (oflags == flags) break; if (oflags & IRQ_WORK_PENDING) return false; flags = oflags; cpu_relax(); } > > if (cmpxchg(>flags, flags, nflags) == flags) > > break; > > cpu_relax(); > > } > > > > return true; > > } > > > > > > This now does: > > > > CPU 1 CPU 2 > > - - > > (flags = 0) > > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > > (flags = 3) > > [...] > > xchg(, IRQ_WORK_BUSY) > > (flags = 2) > > func() > > oflags = cmpxchg(, flags, > > nflags); > > (sees flags = 2) > > if (flags & IRQ_WORK_PENDING) > > (not true) > > (loop) > > > > cmpxchg(flags, 2, 0); > > (flags = 2) > > flags = 3 > > > > > > > > With both patch 1 and 2 you fixed the bug. > This is the best explanation anyone can put forward for this problem. Frederic, Would you like to add my explanation to your change log? You can add the entire thing, which I think would explain a lot to people. Thanks, -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Tue, 2012-10-30 at 14:45 -0400, Steven Rostedt wrote: > On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote: > > > > CPU 0 CPU 1 > > > > > > data = something flags = IRQ_WORK_BUSY > > > smp_mb() (implicit with cmpxchg smp_mb() > > > on flags in claim) execute_work (sees data from CPU > > > 0) > > > try to claim > > > > > As I understand without the memory barrier proposed by you the situation > > would be as below: > > CPU 0 CPU 1 > > > > data = something flags = IRQ_WORK_BUSY > > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) > >on flags in claim) > > _success_ in claiming and goes > > Correct, because it would see the stale value of flags. > > > ahead and execute the work(wrong?) > > cmpxchg cause flag to IRQ_WORK_BUSY > > > > Now knows the flag==IRQ_WORK_BUSY > > > > Am I right? > > right. > > > > > Probably a stupid question.Why do we return the bool from irq_work_queue > > when no one bothers to check the return value?Wouldn't it be better if > > this function is void as used by the users of this function or am I > > looking at the wrong code. > > Not a stupid question, as I was asking that to myself just earlier > today. But forgot to mention it as well. Especially, because it makes it > look like there's a bug in the code. Maybe someday someone will care if > their work was finished by itself, or some other CPU. > > Probably should just nix the return value. Can I send a patch to fix this? > > -- Steve > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Tue, 2012-10-30 at 22:23 -0400, Steven Rostedt wrote: > On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote: > > 2012/10/30 anish kumar : > > > As I understand without the memory barrier proposed by you the situation > > > would be as below: > > > CPU 0 CPU 1 > > > > > > data = something flags = IRQ_WORK_BUSY > > > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) > > >on flags in claim) > > > _success_ in claiming and goes > > > ahead and execute the work(wrong?) > > > cmpxchg cause flag to IRQ_WORK_BUSY > > > > > > Now knows the flag==IRQ_WORK_BUSY > > > > > > Am I right? > > > > (Adding Paul in Cc because I'm again confused with memory barriers) > > > > Actually what I had in mind is rather that CPU 0 fails its claim > > because it's not seeing the IRQ_WORK_BUSY flag as it should: > > > > > > CPU 0 CPU 1 > > > > data = something flags = IRQ_WORK_BUSY > > cmpxchg() for claim execute_work (sees data from CPU 0) > > > > CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this > > value in a non-atomic way. > > > > Also, while browsing Paul's perfbook, I realize my changelog is buggy. > > It seems we can't reliably use memory barriers here because we would > > be in the following case: > > > > CPU 0 CPU 1 > > > > store(work data)store(flags) > > smp_mb()smp_mb() > > load(flags)load(work data) > > > > On top of this barrier pairing, we can't make the assumption that, for > > example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees > > the flags stored in CPU 1. > > > > So now I wonder if cmpxchg() can give us more confidence: > > More confidence over what? The xchg()? They are equivalent (wrt memory > barriers). > > Here's the issue that currently exists. Let's look at the code: > > > /* > * Claim the entry so that no one else will poke at it. > */ > static bool irq_work_claim(struct irq_work *work) > { > unsigned long flags, nflags; > > for (;;) { > flags = work->flags; > if (flags & IRQ_WORK_PENDING) > return false; > nflags = flags | IRQ_WORK_FLAGS; nflags = 1 | 3 nflags = 2 | 3 In both cases the result would be same.If I am right then wouldn't this operation be redundant? > if (cmpxchg(>flags, flags, nflags) == flags) > break; > cpu_relax(); > } > > return true; > } > > and > > llnode = llist_del_all(this_list); > while (llnode != NULL) { > work = llist_entry(llnode, struct irq_work, llnode); > > llnode = llist_next(llnode); > > /* >* Clear the PENDING bit, after this point the @work >* can be re-used. >*/ > work->flags = IRQ_WORK_BUSY; > work->func(work); > /* >* Clear the BUSY bit and return to the free state if >* no-one else claimed it meanwhile. >*/ > (void)cmpxchg(>flags, IRQ_WORK_BUSY, 0); > } > > The irq_work_claim() will only queue its work if it's not already > pending. If it is pending, then someone is going to process it anyway. > But once we start running the function, new work needs to be processed > again. > > Thus we have: > > CPU 1 CPU 2 > - - > (flags = 0) > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > (flags = 3) > [...] > > if (flags & IRQ_WORK_PENDING) > return false > flags = IRQ_WORK_BUSY > (flags = 2) > func() > > The above shows the normal case were CPU 2 doesn't need to queue work, > because its going to be done for it by CPU 1. But... > > > > CPU 1 CPU 2 > - - > (flags = 0) > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > (flags = 3) > [...] > flags = IRQ_WORK_BUSY > (flags = 2) > func() > (sees flags = 3) > if (flags & IRQ_WORK_PENDING) > return false > cmpxchg(flags, 2, 0); > (flags = 0) > > > Here, because we did not do a memory barrier after > flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work, > and missed the opportunity. Now if you had this fix with the xchg() as > you have in your patch, then CPU 2 would not see the stale flags. > Except, with the code I
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Tue, 2012-10-30 at 22:23 -0400, Steven Rostedt wrote: On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote: 2012/10/30 anish kumar anish198519851...@gmail.com: As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? (Adding Paul in Cc because I'm again confused with memory barriers) Actually what I had in mind is rather that CPU 0 fails its claim because it's not seeing the IRQ_WORK_BUSY flag as it should: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY cmpxchg() for claim execute_work (sees data from CPU 0) CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this value in a non-atomic way. Also, while browsing Paul's perfbook, I realize my changelog is buggy. It seems we can't reliably use memory barriers here because we would be in the following case: CPU 0 CPU 1 store(work data)store(flags) smp_mb()smp_mb() load(flags)load(work data) On top of this barrier pairing, we can't make the assumption that, for example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees the flags stored in CPU 1. So now I wonder if cmpxchg() can give us more confidence: More confidence over what? The xchg()? They are equivalent (wrt memory barriers). Here's the issue that currently exists. Let's look at the code: /* * Claim the entry so that no one else will poke at it. */ static bool irq_work_claim(struct irq_work *work) { unsigned long flags, nflags; for (;;) { flags = work-flags; if (flags IRQ_WORK_PENDING) return false; nflags = flags | IRQ_WORK_FLAGS; nflags = 1 | 3 nflags = 2 | 3 In both cases the result would be same.If I am right then wouldn't this operation be redundant? if (cmpxchg(work-flags, flags, nflags) == flags) break; cpu_relax(); } return true; } and llnode = llist_del_all(this_list); while (llnode != NULL) { work = llist_entry(llnode, struct irq_work, llnode); llnode = llist_next(llnode); /* * Clear the PENDING bit, after this point the @work * can be re-used. */ work-flags = IRQ_WORK_BUSY; work-func(work); /* * Clear the BUSY bit and return to the free state if * no-one else claimed it meanwhile. */ (void)cmpxchg(work-flags, IRQ_WORK_BUSY, 0); } The irq_work_claim() will only queue its work if it's not already pending. If it is pending, then someone is going to process it anyway. But once we start running the function, new work needs to be processed again. Thus we have: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] if (flags IRQ_WORK_PENDING) return false flags = IRQ_WORK_BUSY (flags = 2) func() The above shows the normal case were CPU 2 doesn't need to queue work, because its going to be done for it by CPU 1. But... CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] flags = IRQ_WORK_BUSY (flags = 2) func() (sees flags = 3) if (flags IRQ_WORK_PENDING) return false cmpxchg(flags, 2, 0); (flags = 0) Here, because we did not do a memory barrier after flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work, and missed the opportunity. Now if you had this fix with the xchg() as you have in your patch, then CPU 2 would not see the stale flags. Except, with the code I showed above it still can! CPU 1 CPU 2 - - (flags = 0)
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Tue, 2012-10-30 at 14:45 -0400, Steven Rostedt wrote: On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg smp_mb() on flags in claim) execute_work (sees data from CPU 0) try to claim As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes Correct, because it would see the stale value of flags. ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? right. Probably a stupid question.Why do we return the bool from irq_work_queue when no one bothers to check the return value?Wouldn't it be better if this function is void as used by the users of this function or am I looking at the wrong code. Not a stupid question, as I was asking that to myself just earlier today. But forgot to mention it as well. Especially, because it makes it look like there's a bug in the code. Maybe someday someone will care if their work was finished by itself, or some other CPU. Probably should just nix the return value. Can I send a patch to fix this? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote: /* * Claim the entry so that no one else will poke at it. */ static bool irq_work_claim(struct irq_work *work) { unsigned long flags, nflags; for (;;) { flags = work-flags; if (flags IRQ_WORK_PENDING) return false; nflags = flags | IRQ_WORK_FLAGS; nflags = 1 | 3 nflags = 2 | 3 In both cases the result would be same.If I am right then wouldn't this operation be redundant? Right. Actually we could change the new loop to: for (;;) { oflags = cmpxchg(work-flags, flags, IRQ_WORK_FLAGS); if (oflags == flags) break; if (oflags IRQ_WORK_PENDING) return false; flags = oflags; cpu_relax(); } if (cmpxchg(work-flags, flags, nflags) == flags) break; cpu_relax(); } return true; } This now does: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] xchg(flags, IRQ_WORK_BUSY) (flags = 2) func() oflags = cmpxchg(flags, flags, nflags); (sees flags = 2) if (flags IRQ_WORK_PENDING) (not true) (loop) cmpxchg(flags, 2, 0); (flags = 2) flags = 3 With both patch 1 and 2 you fixed the bug. This is the best explanation anyone can put forward for this problem. Frederic, Would you like to add my explanation to your change log? You can add the entire thing, which I think would explain a lot to people. Thanks, -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
2012/10/31 Steven Rostedt rost...@goodmis.org: On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote: nflags = 1 | 3 nflags = 2 | 3 In both cases the result would be same.If I am right then wouldn't this operation be redundant? Right. Actually we could change the new loop to: for (;;) { oflags = cmpxchg(work-flags, flags, IRQ_WORK_FLAGS); if (oflags == flags) break; if (oflags IRQ_WORK_PENDING) return false; flags = oflags; cpu_relax(); } We could. But I wanted to keep the code able to handle new flags in the future (such as IRQ_WORK_LAZY). Frederic, Would you like to add my explanation to your change log? You can add the entire thing, which I think would explain a lot to people. It's indeed a very clear explanation. I'll put that in the changelog, thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote: This now does: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] We can still add here: (fetch flags) xchg(flags, IRQ_WORK_BUSY) (flags = 2) func() oflags = cmpxchg(flags, flags, nflags); Then the cmpxchg() would fail, and oflags would be 2 (sees flags = 2) if (flags IRQ_WORK_PENDING) This should be: if (oflags IRQ_WORK_PENDING) (not true) (loop) cmpxchg(flags, 2, 0); (flags = 2) This should be: (flags = 0) as we described the previous cmpxchg() as failing, flags would still be 2 before this cmpxchg(), and this one would succeed. -- Steve flags = 3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
2012/10/31 Steven Rostedt : > More confidence over what? The xchg()? They are equivalent (wrt memory > barriers). > > Here's the issue that currently exists. Let's look at the code: > > > /* > * Claim the entry so that no one else will poke at it. > */ > static bool irq_work_claim(struct irq_work *work) > { > unsigned long flags, nflags; > > for (;;) { > flags = work->flags; > if (flags & IRQ_WORK_PENDING) > return false; > nflags = flags | IRQ_WORK_FLAGS; > if (cmpxchg(>flags, flags, nflags) == flags) > break; > cpu_relax(); > } > > return true; > } > > and > > llnode = llist_del_all(this_list); > while (llnode != NULL) { > work = llist_entry(llnode, struct irq_work, llnode); > > llnode = llist_next(llnode); > > /* > * Clear the PENDING bit, after this point the @work > * can be re-used. > */ > work->flags = IRQ_WORK_BUSY; > work->func(work); > /* > * Clear the BUSY bit and return to the free state if > * no-one else claimed it meanwhile. > */ > (void)cmpxchg(>flags, IRQ_WORK_BUSY, 0); > } > > The irq_work_claim() will only queue its work if it's not already > pending. If it is pending, then someone is going to process it anyway. > But once we start running the function, new work needs to be processed > again. > > Thus we have: > > CPU 1 CPU 2 > - - > (flags = 0) > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > (flags = 3) > [...] > > if (flags & IRQ_WORK_PENDING) > return false > flags = IRQ_WORK_BUSY > (flags = 2) > func() > > The above shows the normal case were CPU 2 doesn't need to queue work, > because its going to be done for it by CPU 1. But... > > > > CPU 1 CPU 2 > - - > (flags = 0) > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > (flags = 3) > [...] > flags = IRQ_WORK_BUSY > (flags = 2) > func() > (sees flags = 3) > if (flags & IRQ_WORK_PENDING) > return false > cmpxchg(flags, 2, 0); > (flags = 0) > > > Here, because we did not do a memory barrier after > flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work, > and missed the opportunity. Now if you had this fix with the xchg() as > you have in your patch, then CPU 2 would not see the stale flags. > Except, with the code I showed above it still can! > > CPU 1 CPU 2 > - - > (flags = 0) > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > (flags = 3) > [...] > (fetch flags) > xchg(, IRQ_WORK_BUSY) > (flags = 2) > func() > (sees flags = 3) > if (flags & IRQ_WORK_PENDING) > return false > cmpxchg(flags, 2, 0); > (flags = 0) > > > Even with the update of xchg(), if CPU2 fetched the flags before CPU1 > did the xchg, then it would still lose out. But that's where your > previous patch comes in that does: > >flags = work->flags & ~IRQ_WORK_PENDING; >for (;;) { >nflags = flags | IRQ_WORK_FLAGS; >oflags = cmpxchg(>flags, flags, nflags); >if (oflags == flags) >break; >if (oflags & IRQ_WORK_PENDING) >return false; >flags = oflags; >cpu_relax(); >} > > > This now does: > > CPU 1 CPU 2 > - - > (flags = 0) > cmpxchg(flags, 0, IRQ_WORK_FLAGS) > (flags = 3) > [...] > xchg(, IRQ_WORK_BUSY) > (flags = 2) > func() > oflags = cmpxchg(, > flags, nflags); > (sees flags = 2) > if (flags & IRQ_WORK_PENDING) > (not true) > (loop)
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote: > 2012/10/30 anish kumar : > > As I understand without the memory barrier proposed by you the situation > > would be as below: > > CPU 0 CPU 1 > > > > data = something flags = IRQ_WORK_BUSY > > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) > >on flags in claim) > > _success_ in claiming and goes > > ahead and execute the work(wrong?) > > cmpxchg cause flag to IRQ_WORK_BUSY > > > > Now knows the flag==IRQ_WORK_BUSY > > > > Am I right? > > (Adding Paul in Cc because I'm again confused with memory barriers) > > Actually what I had in mind is rather that CPU 0 fails its claim > because it's not seeing the IRQ_WORK_BUSY flag as it should: > > > CPU 0 CPU 1 > > data = something flags = IRQ_WORK_BUSY > cmpxchg() for claim execute_work (sees data from CPU 0) > > CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this > value in a non-atomic way. > > Also, while browsing Paul's perfbook, I realize my changelog is buggy. > It seems we can't reliably use memory barriers here because we would > be in the following case: > > CPU 0 CPU 1 > > store(work data)store(flags) > smp_mb()smp_mb() > load(flags)load(work data) > > On top of this barrier pairing, we can't make the assumption that, for > example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees > the flags stored in CPU 1. > > So now I wonder if cmpxchg() can give us more confidence: More confidence over what? The xchg()? They are equivalent (wrt memory barriers). Here's the issue that currently exists. Let's look at the code: /* * Claim the entry so that no one else will poke at it. */ static bool irq_work_claim(struct irq_work *work) { unsigned long flags, nflags; for (;;) { flags = work->flags; if (flags & IRQ_WORK_PENDING) return false; nflags = flags | IRQ_WORK_FLAGS; if (cmpxchg(>flags, flags, nflags) == flags) break; cpu_relax(); } return true; } and llnode = llist_del_all(this_list); while (llnode != NULL) { work = llist_entry(llnode, struct irq_work, llnode); llnode = llist_next(llnode); /* * Clear the PENDING bit, after this point the @work * can be re-used. */ work->flags = IRQ_WORK_BUSY; work->func(work); /* * Clear the BUSY bit and return to the free state if * no-one else claimed it meanwhile. */ (void)cmpxchg(>flags, IRQ_WORK_BUSY, 0); } The irq_work_claim() will only queue its work if it's not already pending. If it is pending, then someone is going to process it anyway. But once we start running the function, new work needs to be processed again. Thus we have: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] if (flags & IRQ_WORK_PENDING) return false flags = IRQ_WORK_BUSY (flags = 2) func() The above shows the normal case were CPU 2 doesn't need to queue work, because its going to be done for it by CPU 1. But... CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] flags = IRQ_WORK_BUSY (flags = 2) func() (sees flags = 3) if (flags & IRQ_WORK_PENDING) return false cmpxchg(flags, 2, 0); (flags = 0) Here, because we did not do a memory barrier after flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work, and missed the opportunity. Now if you had this fix with the xchg() as you have in your patch, then CPU 2 would not see the stale flags. Except, with the code I showed above it still can! CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] (fetch flags) xchg(, IRQ_WORK_BUSY) (flags = 2) func()
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
2012/10/30 anish kumar : > As I understand without the memory barrier proposed by you the situation > would be as below: > CPU 0 CPU 1 > > data = something flags = IRQ_WORK_BUSY > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) >on flags in claim) > _success_ in claiming and goes > ahead and execute the work(wrong?) > cmpxchg cause flag to IRQ_WORK_BUSY > > Now knows the flag==IRQ_WORK_BUSY > > Am I right? (Adding Paul in Cc because I'm again confused with memory barriers) Actually what I had in mind is rather that CPU 0 fails its claim because it's not seeing the IRQ_WORK_BUSY flag as it should: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY cmpxchg() for claim execute_work (sees data from CPU 0) CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this value in a non-atomic way. Also, while browsing Paul's perfbook, I realize my changelog is buggy. It seems we can't reliably use memory barriers here because we would be in the following case: CPU 0 CPU 1 store(work data)store(flags) smp_mb()smp_mb() load(flags)load(work data) On top of this barrier pairing, we can't make the assumption that, for example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees the flags stored in CPU 1. So now I wonder if cmpxchg() can give us more confidence: CPU 0CPU 1 store(work data) xchg(flags, IRQ_WORK_BUSY) cmpxchg(flags, IRQ_WORK_FLAGS)load(work data) Can I make this assumption? - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg()) then CPU 1 will execute the work and see our data. At least cmpxchg / xchg pair orders correctly to ensure somebody will execute our work. Now probably some locking is needed from the work function itself if it's not per cpu. > > Probably a stupid question.Why do we return the bool from irq_work_queue > when no one bothers to check the return value?Wouldn't it be better if > this function is void as used by the users of this function or am I > looking at the wrong code. No idea. Probably Peter had plans there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote: > > CPU 0 CPU 1 > > > > data = something flags = IRQ_WORK_BUSY > > smp_mb() (implicit with cmpxchg smp_mb() > > on flags in claim) execute_work (sees data from CPU > > 0) > > try to claim > > > As I understand without the memory barrier proposed by you the situation > would be as below: > CPU 0 CPU 1 > > data = something flags = IRQ_WORK_BUSY > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) >on flags in claim) > _success_ in claiming and goes Correct, because it would see the stale value of flags. > ahead and execute the work(wrong?) >cmpxchg cause flag to IRQ_WORK_BUSY > > Now knows the flag==IRQ_WORK_BUSY > > Am I right? right. > > Probably a stupid question.Why do we return the bool from irq_work_queue > when no one bothers to check the return value?Wouldn't it be better if > this function is void as used by the users of this function or am I > looking at the wrong code. Not a stupid question, as I was asking that to myself just earlier today. But forgot to mention it as well. Especially, because it makes it look like there's a bug in the code. Maybe someday someone will care if their work was finished by itself, or some other CPU. Probably should just nix the return value. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
> The IRQ_WORK_BUSY flag is set right before we execute the > work. Once this flag value is set, the work enters a > claimable state again. > > This is necessary because if we want to enqueue a work but we > fail the claim, we want to ensure that the CPU where that work > is still pending will see and handle the data we expected the > work to compute. > > This might not work as expected though because IRQ_WORK_BUSY > isn't set atomically. By the time a CPU fails a work claim, > this work may well have been already executed by the CPU where > it was previously pending. > > Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY > flag value may not be visible by the CPU trying to claim while > the work is executing, and that until we clear the busy bit in > the work flags using cmpxchg() that implies the full barrier. > > One solution could involve a full barrier between setting > IRQ_WORK_BUSY flag and the work execution. This way we > ensure that the work execution site sees the expected data > and the claim site sees the IRQ_WORK_BUSY: > > CPU 0 CPU 1 > > data = something flags = IRQ_WORK_BUSY > smp_mb() (implicit with cmpxchg smp_mb() > on flags in claim) execute_work (sees data from CPU > 0) > try to claim > As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? Probably a stupid question.Why do we return the bool from irq_work_queue when no one bothers to check the return value?Wouldn't it be better if this function is void as used by the users of this function or am I looking at the wrong code. > > As a shortcut, let's just use xchg() that implies a full memory > barrier. > > Signed-off-by: Frederic Weisbecker > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Andrew Morton > Cc: Steven Rostedt > Cc: Paul Gortmaker > --- > kernel/irq_work.c |7 +-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c > index 764240a..ea79365 100644 > --- a/kernel/irq_work.c > +++ b/kernel/irq_work.c > @@ -130,9 +130,12 @@ void irq_work_run(void) > > /* > * Clear the PENDING bit, after this point the @work > -* can be re-used. > +* can be re-used. Use xchg to force ordering against > +* data to process, such that if claiming fails on > +* another CPU, we see and handle the data it wants > +* us to process on the work. > */ > - work->flags = IRQ_WORK_BUSY; > + xchg(>flags, IRQ_WORK_BUSY); > work->func(work); > /* > * Clear the BUSY bit and return to the free state if > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Tue, 2012-10-30 at 16:35 +0100, Frederic Weisbecker wrote: > The IRQ_WORK_BUSY flag is set right before we execute the > work. Once this flag value is set, the work enters a > claimable state again. > > This is necessary because if we want to enqueue a work but we > fail the claim, we want to ensure that the CPU where that work > is still pending will see and handle the data we expected the > work to compute. > > This might not work as expected though because IRQ_WORK_BUSY > isn't set atomically. By the time a CPU fails a work claim, > this work may well have been already executed by the CPU where > it was previously pending. > > Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY > flag value may not be visible by the CPU trying to claim while > the work is executing, and that until we clear the busy bit in > the work flags using cmpxchg() that implies the full barrier. > > One solution could involve a full barrier between setting > IRQ_WORK_BUSY flag and the work execution. This way we > ensure that the work execution site sees the expected data > and the claim site sees the IRQ_WORK_BUSY: > > CPU 0 CPU 1 > > data = something flags = IRQ_WORK_BUSY > smp_mb() (implicit with cmpxchg smp_mb() > on flags in claim) execute_work (sees data from CPU 0) > try to claim > > As a shortcut, let's just use xchg() that implies a full memory > barrier. > > Signed-off-by: Frederic Weisbecker > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Andrew Morton > Cc: Steven Rostedt > Cc: Paul Gortmaker Reviewed-by: Steven Rostedt -- Steve > --- > kernel/irq_work.c |7 +-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c > index 764240a..ea79365 100644 > --- a/kernel/irq_work.c > +++ b/kernel/irq_work.c > @@ -130,9 +130,12 @@ void irq_work_run(void) > > /* >* Clear the PENDING bit, after this point the @work > - * can be re-used. > + * can be re-used. Use xchg to force ordering against > + * data to process, such that if claiming fails on > + * another CPU, we see and handle the data it wants > + * us to process on the work. >*/ > - work->flags = IRQ_WORK_BUSY; > + xchg(>flags, IRQ_WORK_BUSY); > work->func(work); > /* >* Clear the BUSY bit and return to the free state if -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Tue, 2012-10-30 at 16:35 +0100, Frederic Weisbecker wrote: The IRQ_WORK_BUSY flag is set right before we execute the work. Once this flag value is set, the work enters a claimable state again. This is necessary because if we want to enqueue a work but we fail the claim, we want to ensure that the CPU where that work is still pending will see and handle the data we expected the work to compute. This might not work as expected though because IRQ_WORK_BUSY isn't set atomically. By the time a CPU fails a work claim, this work may well have been already executed by the CPU where it was previously pending. Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY flag value may not be visible by the CPU trying to claim while the work is executing, and that until we clear the busy bit in the work flags using cmpxchg() that implies the full barrier. One solution could involve a full barrier between setting IRQ_WORK_BUSY flag and the work execution. This way we ensure that the work execution site sees the expected data and the claim site sees the IRQ_WORK_BUSY: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg smp_mb() on flags in claim) execute_work (sees data from CPU 0) try to claim As a shortcut, let's just use xchg() that implies a full memory barrier. Signed-off-by: Frederic Weisbecker fweis...@gmail.com Cc: Peter Zijlstra pet...@infradead.org Cc: Ingo Molnar mi...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: Andrew Morton a...@linux-foundation.org Cc: Steven Rostedt rost...@goodmis.org Cc: Paul Gortmaker paul.gortma...@windriver.com Reviewed-by: Steven Rostedt rost...@goodmis.org -- Steve --- kernel/irq_work.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 764240a..ea79365 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -130,9 +130,12 @@ void irq_work_run(void) /* * Clear the PENDING bit, after this point the @work - * can be re-used. + * can be re-used. Use xchg to force ordering against + * data to process, such that if claiming fails on + * another CPU, we see and handle the data it wants + * us to process on the work. */ - work-flags = IRQ_WORK_BUSY; + xchg(work-flags, IRQ_WORK_BUSY); work-func(work); /* * Clear the BUSY bit and return to the free state if -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
The IRQ_WORK_BUSY flag is set right before we execute the work. Once this flag value is set, the work enters a claimable state again. This is necessary because if we want to enqueue a work but we fail the claim, we want to ensure that the CPU where that work is still pending will see and handle the data we expected the work to compute. This might not work as expected though because IRQ_WORK_BUSY isn't set atomically. By the time a CPU fails a work claim, this work may well have been already executed by the CPU where it was previously pending. Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY flag value may not be visible by the CPU trying to claim while the work is executing, and that until we clear the busy bit in the work flags using cmpxchg() that implies the full barrier. One solution could involve a full barrier between setting IRQ_WORK_BUSY flag and the work execution. This way we ensure that the work execution site sees the expected data and the claim site sees the IRQ_WORK_BUSY: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg smp_mb() on flags in claim) execute_work (sees data from CPU 0) try to claim As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? Probably a stupid question.Why do we return the bool from irq_work_queue when no one bothers to check the return value?Wouldn't it be better if this function is void as used by the users of this function or am I looking at the wrong code. As a shortcut, let's just use xchg() that implies a full memory barrier. Signed-off-by: Frederic Weisbecker fweis...@gmail.com Cc: Peter Zijlstra pet...@infradead.org Cc: Ingo Molnar mi...@kernel.org Cc: Thomas Gleixner t...@linutronix.de Cc: Andrew Morton a...@linux-foundation.org Cc: Steven Rostedt rost...@goodmis.org Cc: Paul Gortmaker paul.gortma...@windriver.com --- kernel/irq_work.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 764240a..ea79365 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -130,9 +130,12 @@ void irq_work_run(void) /* * Clear the PENDING bit, after this point the @work -* can be re-used. +* can be re-used. Use xchg to force ordering against +* data to process, such that if claiming fails on +* another CPU, we see and handle the data it wants +* us to process on the work. */ - work-flags = IRQ_WORK_BUSY; + xchg(work-flags, IRQ_WORK_BUSY); work-func(work); /* * Clear the BUSY bit and return to the free state if -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg smp_mb() on flags in claim) execute_work (sees data from CPU 0) try to claim As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes Correct, because it would see the stale value of flags. ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? right. Probably a stupid question.Why do we return the bool from irq_work_queue when no one bothers to check the return value?Wouldn't it be better if this function is void as used by the users of this function or am I looking at the wrong code. Not a stupid question, as I was asking that to myself just earlier today. But forgot to mention it as well. Especially, because it makes it look like there's a bug in the code. Maybe someday someone will care if their work was finished by itself, or some other CPU. Probably should just nix the return value. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
2012/10/30 anish kumar anish198519851...@gmail.com: As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? (Adding Paul in Cc because I'm again confused with memory barriers) Actually what I had in mind is rather that CPU 0 fails its claim because it's not seeing the IRQ_WORK_BUSY flag as it should: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY cmpxchg() for claim execute_work (sees data from CPU 0) CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this value in a non-atomic way. Also, while browsing Paul's perfbook, I realize my changelog is buggy. It seems we can't reliably use memory barriers here because we would be in the following case: CPU 0 CPU 1 store(work data)store(flags) smp_mb()smp_mb() load(flags)load(work data) On top of this barrier pairing, we can't make the assumption that, for example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees the flags stored in CPU 1. So now I wonder if cmpxchg() can give us more confidence: CPU 0CPU 1 store(work data) xchg(flags, IRQ_WORK_BUSY) cmpxchg(flags, IRQ_WORK_FLAGS)load(work data) Can I make this assumption? - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg()) then CPU 1 will execute the work and see our data. At least cmpxchg / xchg pair orders correctly to ensure somebody will execute our work. Now probably some locking is needed from the work function itself if it's not per cpu. Probably a stupid question.Why do we return the bool from irq_work_queue when no one bothers to check the return value?Wouldn't it be better if this function is void as used by the users of this function or am I looking at the wrong code. No idea. Probably Peter had plans there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote: 2012/10/30 anish kumar anish198519851...@gmail.com: As I understand without the memory barrier proposed by you the situation would be as below: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0) on flags in claim) _success_ in claiming and goes ahead and execute the work(wrong?) cmpxchg cause flag to IRQ_WORK_BUSY Now knows the flag==IRQ_WORK_BUSY Am I right? (Adding Paul in Cc because I'm again confused with memory barriers) Actually what I had in mind is rather that CPU 0 fails its claim because it's not seeing the IRQ_WORK_BUSY flag as it should: CPU 0 CPU 1 data = something flags = IRQ_WORK_BUSY cmpxchg() for claim execute_work (sees data from CPU 0) CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this value in a non-atomic way. Also, while browsing Paul's perfbook, I realize my changelog is buggy. It seems we can't reliably use memory barriers here because we would be in the following case: CPU 0 CPU 1 store(work data)store(flags) smp_mb()smp_mb() load(flags)load(work data) On top of this barrier pairing, we can't make the assumption that, for example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees the flags stored in CPU 1. So now I wonder if cmpxchg() can give us more confidence: More confidence over what? The xchg()? They are equivalent (wrt memory barriers). Here's the issue that currently exists. Let's look at the code: /* * Claim the entry so that no one else will poke at it. */ static bool irq_work_claim(struct irq_work *work) { unsigned long flags, nflags; for (;;) { flags = work-flags; if (flags IRQ_WORK_PENDING) return false; nflags = flags | IRQ_WORK_FLAGS; if (cmpxchg(work-flags, flags, nflags) == flags) break; cpu_relax(); } return true; } and llnode = llist_del_all(this_list); while (llnode != NULL) { work = llist_entry(llnode, struct irq_work, llnode); llnode = llist_next(llnode); /* * Clear the PENDING bit, after this point the @work * can be re-used. */ work-flags = IRQ_WORK_BUSY; work-func(work); /* * Clear the BUSY bit and return to the free state if * no-one else claimed it meanwhile. */ (void)cmpxchg(work-flags, IRQ_WORK_BUSY, 0); } The irq_work_claim() will only queue its work if it's not already pending. If it is pending, then someone is going to process it anyway. But once we start running the function, new work needs to be processed again. Thus we have: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] if (flags IRQ_WORK_PENDING) return false flags = IRQ_WORK_BUSY (flags = 2) func() The above shows the normal case were CPU 2 doesn't need to queue work, because its going to be done for it by CPU 1. But... CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] flags = IRQ_WORK_BUSY (flags = 2) func() (sees flags = 3) if (flags IRQ_WORK_PENDING) return false cmpxchg(flags, 2, 0); (flags = 0) Here, because we did not do a memory barrier after flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work, and missed the opportunity. Now if you had this fix with the xchg() as you have in your patch, then CPU 2 would not see the stale flags. Except, with the code I showed above it still can! CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] (fetch flags) xchg(flags, IRQ_WORK_BUSY) (flags = 2) func()
Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting
2012/10/31 Steven Rostedt rost...@goodmis.org: More confidence over what? The xchg()? They are equivalent (wrt memory barriers). Here's the issue that currently exists. Let's look at the code: /* * Claim the entry so that no one else will poke at it. */ static bool irq_work_claim(struct irq_work *work) { unsigned long flags, nflags; for (;;) { flags = work-flags; if (flags IRQ_WORK_PENDING) return false; nflags = flags | IRQ_WORK_FLAGS; if (cmpxchg(work-flags, flags, nflags) == flags) break; cpu_relax(); } return true; } and llnode = llist_del_all(this_list); while (llnode != NULL) { work = llist_entry(llnode, struct irq_work, llnode); llnode = llist_next(llnode); /* * Clear the PENDING bit, after this point the @work * can be re-used. */ work-flags = IRQ_WORK_BUSY; work-func(work); /* * Clear the BUSY bit and return to the free state if * no-one else claimed it meanwhile. */ (void)cmpxchg(work-flags, IRQ_WORK_BUSY, 0); } The irq_work_claim() will only queue its work if it's not already pending. If it is pending, then someone is going to process it anyway. But once we start running the function, new work needs to be processed again. Thus we have: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] if (flags IRQ_WORK_PENDING) return false flags = IRQ_WORK_BUSY (flags = 2) func() The above shows the normal case were CPU 2 doesn't need to queue work, because its going to be done for it by CPU 1. But... CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] flags = IRQ_WORK_BUSY (flags = 2) func() (sees flags = 3) if (flags IRQ_WORK_PENDING) return false cmpxchg(flags, 2, 0); (flags = 0) Here, because we did not do a memory barrier after flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work, and missed the opportunity. Now if you had this fix with the xchg() as you have in your patch, then CPU 2 would not see the stale flags. Except, with the code I showed above it still can! CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] (fetch flags) xchg(flags, IRQ_WORK_BUSY) (flags = 2) func() (sees flags = 3) if (flags IRQ_WORK_PENDING) return false cmpxchg(flags, 2, 0); (flags = 0) Even with the update of xchg(), if CPU2 fetched the flags before CPU1 did the xchg, then it would still lose out. But that's where your previous patch comes in that does: flags = work-flags ~IRQ_WORK_PENDING; for (;;) { nflags = flags | IRQ_WORK_FLAGS; oflags = cmpxchg(work-flags, flags, nflags); if (oflags == flags) break; if (oflags IRQ_WORK_PENDING) return false; flags = oflags; cpu_relax(); } This now does: CPU 1 CPU 2 - - (flags = 0) cmpxchg(flags, 0, IRQ_WORK_FLAGS) (flags = 3) [...] xchg(flags, IRQ_WORK_BUSY) (flags = 2) func() oflags = cmpxchg(flags, flags, nflags); (sees flags = 2) if (flags IRQ_WORK_PENDING) (not true) (loop) cmpxchg(flags, 2, 0); (flags = 2) flags =