On Thu, Sep 06, 2012 at 04:36:02PM +0200, Peter Zijlstra wrote: > On Thu, 2012-08-30 at 11:18 -0700, Paul E. McKenney wrote: > > From: "Paul E. McKenney" <paul.mcken...@linaro.org> > > > > The current quiescent-state detection algorithm is needlessly > > complex. > > Heh! Be careful, we might be led into believing all this RCU is actually > really rather simple and this complexity is a bug on your end ;-)
Actually, the smallest "toy" implementation of RCU is only about 20 lines of code -- and on a mythical sequentially consistent system, it would be smaller still. Of course, the Linux kernel implementation if somewhat larger. Something about wanting scalability above a few tens of CPUs, real-time response (also on huge numbers of CPUs), energy-efficient handling of dyntick-idle mode, detection of stalled CPUs, and so on. ;-) > > It records the grace-period number corresponding to > > the quiescent state at the time of the quiescent state, which > > works, but it seems better to simply erase any record of previous > > quiescent states at the time that the CPU notices the new grace > > period. This has the further advantage of removing another piece > > of RCU for which lockless reasoning is required. > > So why didn't you do that from the start? :-) Because I was slow and stupid! ;-) > That is, I'm curious to know some history, why was it so and what led > you to this insight? I had historically (as in for almost 20 years now) used a counter to track grace periods. Now these are in theory subject to integer overflow, but DYNIX/ptx was non-preemptible, so the general line of reasoning was that anything that might stall long enough for even a 32-bit grace-period counter to overflow would necessarily stall grace periods, thus preventing overflow. Of course, the advent of CONFIG_PREEMPT in the Linux kernel invalidated this assumption, but for most uses, if the grace-period counter overflows, you have waited way more than a grace period, so who cares? Then combination of TREE_RCU and dyntick-idle came along, and it became necessary to more carefully associate quiescent states with the corresponding grace period. Now here overflow is dangerous, because it can result in associating an ancient quiescent state with the current grace period. But my attitude was that if you have a task preempted for more than one year, getting soft-lockup warnings every two minutes during that time, well, you got what you deserved. And even then at very low probability. However, formal validation software (such as Promela) do not take kindly to free-running counters. The usual trick is to use a much narrower counter. But that would mean that any attempted mechanical validation would give a big fat false positive on the counter used to associate quiescent states with grace periods. Because I have a long-term goal of formally validating RCU is it sits in the Linux kernel, that counter had to go. And I do believe that the result is easier for humans to understand, so it is all to the good. This time, at least. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/