On Thu, 3 Feb 2005 22:12:24 +1100 Herbert Xu <[EMAIL PROTECTED]> wrote:
> This paradigm is repeated throughout the kernel. I bet the > same race can be found in a lot of those places. So we really > need to sit down and audit them one by one or else come up with > a magical solution apart from disabling SMP :) Ok. I'm commenting now considering Anton's atomic_t rules. Anton, please read down below, I think there is a hole in the PPC memory barriers used for atomic ops on SMP. I don't see what changes are needed anywhere given those rules. Olaf says the problem shows up on PPC SMP system, and since Anton knows the proper rules we hopefully can safely assume he implemented them correctly on PPC :-) I thought for a moment that the atomic_read() might be an issue, and I'd really hate to kill that optimization. But I can't see how it is. Let us restate Olaf's original guess as to the problematic sequence of events: cpu 0 cpu 1 skb_get(skb) unlock(neigh) lock(neigh) __skb_unlink(skb) kfree_skb(sb) kfree_skb(skb) First, __skb_unlink(skb) does an unlocked queue unlink, and these memory operations may have their visibility freely reordered by the processor. However, cpu 1 will see the refcount at 2, so it will execute: atomic_dec_and_test(&skb->users) which has the implicit memory barriers, as Anton stated. This means that the cpu will make the __skb_unlink(skb) results visible globally before the decrement. Now the kfree_skb() on cpu 0 executes, the atomic_read() sees it at 1, we do the __kfree_skb() and since the __skb_unlink() has been made visible before the decrement on the count to "1" the BUG() should not trigger in __kfree_skb(). This all assumes, again, that PPC implements these things properly. Let's take a look (Anton, start reading here). My understanding of PPC memory barriers, wrt. the physical memory coherency domain, is as follows: sync ! All previous read/write execute before all subsequent read/write lwsync ! All previous reads execute before all subsequent read/write eieio ! All previous writes execute before all subsequent read/write isync ! All previous memory operations and instructions execute and ! reach global visibility before any subsequent instructions execute What guarentees does isync really make about "read" reordering around the atomic increment? Any descrepencies here would account for the case Olaf observed. All the atomic ops returning values on PPC do this on SMP: eieio atomic_op() isync At a minimum, it seems that the eieio is not strong enough a memory barrier. It is defined to order previous writes against future memory operations, but we also need to strictly order previous reads as well don't we? Also, if my understanding of isync is not correct, that could have implications as well. I guess for performance reasons, ppc doesn't use "sync" both before and after the atomic ops requiring ordering. But I suspect that might be what is actually needed for proper conformity to the atomic_t memory ordering rules. Anton? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/