* Daniel Walker ([EMAIL PROTECTED]) wrote: > On Tue, 2007-02-27 at 11:02 -0500, Mathieu Desnoyers wrote: > > * Daniel Walker ([EMAIL PROTECTED]) wrote: > > > On Tue, 2007-02-27 at 02:38 -0500, Mathieu Desnoyers wrote: > > > > > > > > > > > I am concerned about the automatic fallback to the PIT when no other > > > > clock source is available. A clocksource read would be atomic when TSC > > > > or HPET are available, but would fall back on PIT otherwise. There > > > > should be some way to specify that a caller is only interested in atomic > > > > clock sources (if none are available, the call should simply return an > > > > error, or 0). > > > > > > > I'm not sure what you mean by using the RCU > > > > The original proposal of this thread uses a RCU (read-copy-update) style > > update of the previous 64 bits counter : it swaps a pointer (atomically) > > upon update by incrementing a word-sized counter that is used, by the > > reader, to get the offest in the array (with a modulo operation) for the > > current readable data and as a way to detect incorrect reads of > > overwritten information (we re-read the word-sized counter after having > > read the data structure to make sure is has not been incremented. If we > > detect an increment, we redo the whole operation). > > I didn't see RCU at all in your original message, so I'm not sure how > you propose to use it .. My understanding of the RCU was that it > couldn't be used from interrupt context, that could be totally wrong so > I'll let you explain how you planed to use it. >
1 - I do not plan to use the rcupdate.h API, because it is oriented towards allowing/freeing data structures after a quiescent state. I don't need that. I only want to have a 64 bits data structure valid for reading, with atomic update. Therefore, I keep an array of 2 64 bits structures. At all time, there is one used as "readable" value and the other as "writeable". The role is exchanged at each update. The word-sized counter is used to select the current read and write pointers through a mask, and is also used to detect bad reads (is a read is preempted, and then we have 2 updates, the reader could read a bad value without knowing it). By keeping a word-sized counter of the number of updates, we have 32 (or 64) bits (depending on the architecture) before the wrap around, which should not happen even in a far future. > > > > I still think that an RCU style update mechanism would be a good way to > > > > fix the current clocksource read issue. Another, slower and non NMI > > > > safe way to do this would be with a read seqlock and with IRQ disabling. > > > > > > , but the pit clocksource > > > does disable interrupts with a spin_lock_irqsave(). > > > > > > > When I say "clocksource read issue", I am talking about > > race between the function you proposed earlier, which you say is used in > > -rt kernels for latency tracing (get_monotonic_cycles), and HPET and TSC > > "last cycles" updates. > > Right .. You said that regular interrupts would cause this non-atomic > 64-bit update race , but the pit disabled interrupts, and the > last_cycles update is done with interrupts off .. So I think we're back > to only the NMI case .. > > Did you have another scenario ? > __get_nsec_offset : reads clock->cycle_last. Should be called with xtime_lock held. (ok so far, but see below) change_clocksource clock->cycle_last = now; (non atomic 64 bits update. Not protected by any lock ?) -> this would race with __get_nsec_offset ? update_wall_time Called from timer interrupt. Holds xtime_lock and has a priority higher than other interrupts. Other clock->cycle_last protected by write_seqlock_irqsave. get_monotonic_cycles (as you proposed, in -rt kernels) : reads clock->cycle_last. Not protected by any read seqlock and does not disable interrupts. Races with change_clocksource, update_wall_time and all other time update functions. For instance, is someone uses get_monotonic_cycles in process context and the timer interrupt fires update_wall_time right at the middle of the 2 32 bits read, the value will be wrong. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/