On Mon, Nov 26, 2007 at 06:39:58PM -0800, Paul E. McKenney wrote: > On Mon, Nov 26, 2007 at 02:48:08PM -0800, James Huang wrote: > > > > > -----Original Message----- > > > From: James Huang [mailto:[EMAIL PROTECTED] > > > Sent: Monday, November 26, 2007 2:21 PM > > > To: James Huang > > > Subject: Fw: __rcu_process_callbacks() in Linux 2.6 > > > > > > ----- Forwarded Message ---- > > > From: Manfred Spraul <[EMAIL PROTECTED]> > > > To: James Huang <[EMAIL PROTECTED]> > > > Cc: Paul E. McKenney <[EMAIL PROTECTED]>; linux- > > > [EMAIL PROTECTED] > > > Sent: Monday, November 26, 2007 10:28:37 AM > > > Subject: __rcu_process_callbacks() in Linux 2.6 > > > > > > Hi James, > > > > > > If I understand the issue correctly, then the race is: > > > > > > step 1: cpu 1: starts a new rcu batch (i.e. rcp->cur++, smb_mb) > > > > > > step 2: cpu 2: completes the quiet state > > > step 3: cpu 2: reads pointer 0x123 (ptr to a rcu protected struct) > > > > > > step 4: cpu 3: call_rcu(0x123): rcu protected struct added to > > rdp->nxtlist > > > step 5: cpu 3: moves a new batch into rdp->curlist, rdp->batch = rcp- > > > >cur+1. > > > xxxxxxxxxxxxxxx Problem: where is the smp_rmb() that guarantees that > > > xxxxxxxxxxxxxxx update to rcp->cur from step 1 is seen by cpu 3? > > > step 6: cpu 3: completes quiet state > > > step 7: cpu 3: struct 0x123 destroyed > > > > > > step 8: cpu 2: accesses pointer 0x123, but the struct is already > > destroyed > > > > > > James: Is that the race? > > > > > > [James Huang] > > > > Yes, this is the race condition that I am concerned about. > > > > > > > > > > I agree with Paul, there are smb_rmb's on cpu 3 between Step 1 and > > Step 5: > > > Either the test_and_set_bit in tasklet_action for rcu_process_callback > > > if step 4 happens before the tasklet or somewhere in the irq handler > > > path if step 4 happens in an irq handler that interrupted > > > rcu_process_callback. > > > > > > Thus theoretically no additional smb_rmb() should be necessary. > > > What is missing is proper documentation. > > > > > > > > > [James Huang] > > > > Is it true that a smb_rmb() before a read operation (say from variable > > X) will guarantee that the read will always retrieve the most "current" > > value of X? I can not find such a guarantee in atomic_ops.txt or > > memory-barriers.txt under Linux's documentation directory. What is > > described in both documents is relative ordering, e.g. > > > > CPU1 CPU2 > > ------ ------ > > write X = x1 > > smp_wmb() > > write Y = y1 > > > > read Y > > smp_rmb() > > read X > > > > Then CPU2 will read X with a value of x1 if it reads Y with a value of > > y1. > > > > Please point me to the right section in the document if smp_rmb() does > > provide such a guarantee. > > You are correct, smp_rmb() is about ordering rather than about any sort > of immediacy. For one thing, it can be quite difficult to say exactly what > the most "current" version of X might be at a given point in time from > the viewpoint of a given CPU -- the different CPUs might well disagree as > to what the "current" version is for awhile (though they are guaranteed > to come to agreement). > > > Thanks, > > -- James Huang > > > > > I'm analyzing the code right now: > > > Is it really true that typically a cpu only completes data in every > > other > > > rcu > > > cycle? I.e. that most structures are stored in the rcu callback list > > until > > > two > > > quiet states happened? > > That is correct. This does mean that we should be able to leverage > locking primitives and memory barriers executed from the scheduling > clock interrupt.
And I managed to get some time on a 64-CPU POWER5+ system. Been running rcutorture for about 2.5 hours without a failure (128 reader processes) running through not quite 1.5M RCU updates. Of course, this is not proof that the Classic RCU implementation works, but is should provide some reassurance. I will keep it running until I get kicked off (probably rather soon). Thanx, Paul > > > I've tried to track the values of rcp->cur and rdp->batch. > > > If next_pending is set, then cpu_quiet() immetiately starts > > > the next rcu cycle and a cpu cannot both complete the currently > > > pending rcu callbacks and add new callbacks to the next cycle, > > > thus a cpu only takes part in every other rcu cycle. > > > > > > The oocalc file is at > > > http://www.colorfullife.com/~manfred/rcu.ods > > > http://www.colorfullife.com/~manfred/rcu.pdf > > > > > > Is that analysis correct? Perhaps the whole code should be rewritten? > > I believe that the sequencing in spreadsheet is correct (and thank > you very much for going through it!!!), but it seems to be silent on > memory-barrier issues. > > I also believe that Gautham's new CPU-hotplug setup will make > it possible to simplify the code quite a bit. And given that the > grace-period-detection code is not on any sort of hot code path, it should > be possible to use a less-aggressive design, perhaps one using straight > locking to guard the shared structures. Also, we are working in the > -rt implementation on a scheme that allows CPUs to stay asleep through > a grace period without the heavy overhead that is otherwise required to > interact with them. The trick is to maintain a per-CPU counter that is > incremented on each entry and exit to low-power state. But I would like > to get this right in -rt before trying it in Classic RCU. ;-) > > Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/