On Mon, Oct 09, 2017 at 10:16:37AM +0200, Peter Zijlstra wrote: > On Sat, Oct 07, 2017 at 11:28:57AM -0700, Paul E. McKenney wrote: > > But if you are saying that it would be good to have wait_for_completion() > > and complete() directly modeled at some point, no argument. In addition, > > I hope that the memory model is applied to other tools that analyze kernel > > code. > > > > I'm not sure I got the point across; so I'll try once more. Without > > > providing this ordering the completion would be fundamentally broken. It > > > _must_ provide this ordering. > > > > OK, I now understand what you are getting at, and I do very much like > > that guarantee. > > Right, so maybe we should update the completion comments a bit to call > out this property, because I'm not sure its there.
That would be very good! > Also, with this, I think the smp_store_release() / smp_load_acquire() is > a perfectly fine abstraction of it, I don't think the model needs to be > taught about the completion interface. Agreed, if we pull too much stuff into the model it might well become difficult to maintain in and of itself. So we do need to choose carefully based on model performance and ease of use. For a performance example, note that locking can be easily modeled with xchg_acquire() for lock acquisition and smp_store_release() for lock release. But here is the resulting performance compared to directly modeling locking within the cat code: xchg_acquire() and # Processes smp_store_release() Direct modeling 2 0.061 s 0.007 s 3 3.542 s 0.039 s 4 569.189 s 0.397 s 5 > 113,853.000 s 5.187 s Combinatorial explosion is a real pain... The problem is that the xchg_acquire() / smp_store_release() approach requires the herd7 tool to consider and then reject all the scenarios where the lock critical sections overlap. In contrast, when direct modeling, those overlapping critical-section scenarios are implicitly rejected by the cat-code that models locking. Don't get me wrong, there is still combinatorial explosion, as you can see from the rightmost column of numbers. After all, the herd7 tool still needs to consider all possible lock-acquisition orders. But the explosion is much less explosive. ;-) Given that we expect locking to be very heavily used, it makes a lot of sense to directly model it, which we (mostly Alan, Andrea, and Luc) have done. Direct modeling of RCU gets similar speedups over emulation, plus the fact that accurate emulation of RCU requires some surprisingly unnatural acts. ("Ghost" accesses unaffected by normal memory-ordering primitives, and "ghost-buster memory barriers" that affect both "ghost" and normal accesses. Plus the transformation is complex, so much so that I broke down and wrote a script, which was used to validate the early versions of the RCU model.) In constrast, I have no reason to believe that direct modeling could make wait_for_completion() / complete() go any faster than the equivalent setup using smp_store_release() / smp_load_acquire(), that is, I don't see anything that could be optimized away byt cat-code. So the only reason to directly model wait_for_completion() / complete() would be to get rid of the need to add the extra term to the "exists" clause. And agreed, we should wait until people are actually being significantly inconvenienced by this, which I do not expect any time soon. > > > Why not? In what sort of cases does it go wobbly? > > > > For one, when it conflicts with maintainability. For example, it would > > probably be OK for some of RCU's rcu_node ->lock acquisitions to skip the > > smp_mb__after_unlock_lock() invocations. But those are slowpaths, and the > > small speedup on only one architecture is just not worth the added pain. > > Especially given the nice wrapper functions that you provided. > > > > But of course if this were instead (say) rcu_read_lock() or common-case > > rcu_read_unlock(), I would be willing to undergo much more pain. On the > > other hand, for that exact reason, that common-case code path doesn't > > acquire locks in the first place. ;-) > > Ah, so for models I would go absolutely minimal; it helps understand > what the strict requirements are and where we over-provide etc.. Agreed, we should be careful how much we promise lest we get locked into needlessly heavyweight implementations. > For actual code you're entirely right, there's no point in trying to be > cute with the rcu-node locks. Simplicity rules etc.. Hear, hear! ;-) Thanx, Paul