On Mon, 17 Nov 2014, Linus Torvalds wrote: > On Mon, Nov 17, 2014 at 2:43 PM, Thomas Gleixner <t...@linutronix.de> wrote: > > So a combo of both (Jens and yours) might do the trick. Patch below. > > Yeah, I guess that would work. The important part is that *if* > somebody really reuses the csd, we'd better have a release barrier > (which csd_unlock() does, although badly - but this probably isn't > that performance-critical) *before* we call the function, because > otherwise there's no real serialization for the reuse.
Indeed. > Of course, most of these things are presumably always per-cpu data > structures, so the whole worry about "csd" being accessed from > different CPU's probably doesn't even exist, and this all works fine > as-is anyway, even in the presense of odd memory ordering issues. > > Judging from Jens' later email, it looks like we simply don't need > this code at all any more, though, and we could just revert the > commit. Right. Reverting it is the proper solution for now. Though we should really think about the async seperation later. It makes a lot of sense. > NOTE! I don't think this actually has anything to do with the actual > problem that Dave saw. I just reacted to that WARN_ON() when I was > looking at the code, and it made me go "that looks extremely > suspicious". One thing I was looking into today is the increased use of irq_work which uses IPIs as well. Not sure whether that's related, but I it's not from my radar yet. But the possible compiler wreckage (or exposed kernel wreckage) is frightening in several aspects ... > Particularly on x86, with strong memory ordering, I don't think that > any random accesses to 'csd' after the call to 'csd->func()' could > actually matter. I just felt very nervous about the claim that > somebody can reuse the csd immediately, that smelled bad to me from a > *conceptual* standpoint, even if I suspect it works perfectly fine in > practice. > > Anyway, I've found *another* race condition, which (again) doesn't > actually seem to be an issue on x86. > > In particular, "csd_lock()" does things pretty well, in that it does a > smp_mb() after setting the lock bit, so certainly nothing afterwards > will leak out of that locked region. > > But look at csd_lock_wait(). It just does > > while (csd->flags & CSD_FLAG_LOCK) > cpu_relax(); > > and basically there's no memory barriers there. Now, on x86, this is a > non-issue, since all reads act as an acquire, but at least in *theory* > we have this completely unordered read going on. So any subsequent > memory oeprations (ie after the return from generic_exec_single() > could in theory see data from *before* the read. True. > So that whole kernel/smp.c locking looks rather dubious. The smp_mb() > in csd_lock() is overkill (a "smp_store_release()" should be > sufficient), and I think that the read of csd->flags in csd_unlock() > should be a smp_load_acquire(). > > Again, none of this has anything to do with Dave's problem. The memory > ordering issues really cannot be an issue on x86, I'm just saying that > there's code there that makes me a bit uncomfortable. Right you are and we should fix it asap. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/