On Mon, Jan 30, 2017 at 02:54:51PM +0100, Stanislaw Gruszka wrote: > On Sat, Jan 28, 2017 at 04:28:13PM +0100, Frederic Weisbecker wrote: > > On Sat, Jan 28, 2017 at 12:57:40PM +0100, Stanislaw Gruszka wrote: > > > On 32 bit architectures 64bit store/load is not atomic and if not > > > protected - 64bit variables can be mangled. I do not see any protection > > > (lock) between utime/stime store and load in the patch and seems that > > > {u/s}time store/load can be performed at the same time. Though problem > > > is very very improbable it still can happen at least theoretically when > > > lower and upper 32 bits are changed at the same time i.e. process > > > {u,s}time become near to multiple of 2**32 nsec (aprox: 4sec) and > > > 64bit {u,s}time is stored and loaded at the same time on different > > > cpus. As said this is very improbable situation, but eventually could > > > be possible on long lived processes. > > > > "Improbable situation" doesn't appply to Linux. With millions (billion?) > > of machines using it, a rare issue in the core turns into likely to happen > > somewhere in the planet every second. > > > > So it's definetly a race we want to consider. Note it goes beyond the scope > > of this patchset as the issue was already there before since cputime_t can > > already > > map to u64 on 32 bits systems upstream. But this patchset definetly extends > > the issue on all 32 bits configs. > > > > kcpustat has the same issue upstream. It's is made of u64 on all configs. > > I would like to add what are possible consequences if value will be > mangled. For sum_exec_runtime, utime and stime we could get wrong values > on cpu-clock related syscalls like clock_gettime() or clock_nanosleep() > and cpu-clock timers like timer_create(CLOCK_PROCESS_CPUTIME_ID) can be > triggered before or long after expected. For kcpustat this seems to be > wrong values read by procfs and 3 drivers (cpufreq, appldata, macintosh).
Yep, all agreed. > > > > I considering fixing problem of sum_exec_runtime possible mangling > > > by using prev_sum_exec_runtime: > > > > > > u64 read_sum_exec_runtime(struct task_struct *t) > > > { > > > u64 ns, prev_ns; > > > > > > do { > > > prev_ns = READ_ONCE(t->se.prev_sum_exec_runtime); > > > ns = READ_ONCE(t->se.sum_exec_runtime); > > > } while (ns < prev_ns || ns > (prev_ns + U32_MAX)); > > > > > > return ns; > > > } > > > > > > This should work based on fact that prev_sum_exec_runtime and > > > sum_exec_runtime are not modified and stored at the same time, so only > > > one of those variabled can be mangled. Though I need to think about > > > correctnes of that a bit more. > > > > I'm not sure that would be enough. READ_ONCE prevents from reordering by the > > compiler but not by the CPU. You'd need memory barriers between reads and > > writes of prev_ns and ns. > > It will not be enough, this _suppose_ to work based on that sum_exec_runtime > and prev_sum_exec_runtime are not written at the same time. i.e. only > one variable can be mangled as another one sits already in the memory. > However "not written at the same time" is weak part of reasoning. Even > if those variables are stored at different part of code (sum_exec_runtime > on update_curr() and prev_sum_exec_runtime on set_next_entity()) we can > not assume store of one variable is finished before another one starts. Right. > > > WRITE ns READ prev_ns > > smp_wmb() smp_rmb() > > WRITE prev_ns READ ns > > smp_wmb() smp_rmb() > > > > It seems to be the only way to make sure that at least one of the reads > > (prev_ns or ns) is correct. Well reading that again, I'm not 100% sure about the correctness guarantee. But it might work. > > I think you have right, but seems on much code paths we have scenario: > > WRITE ns READ prev_ns > smp_wmb() smp_rmb() > WRITE prev_ns READ ns > > and we have already smp_wmb() after write of sum_exec_runtime on > update_min_vruntime(). You still need a second barrier after the second write and read (or before the first write and read, it's the same) to ensure that if you read a mangled version of ns, prev_ns is ok. Still I think u64_stats_sync is less trouble and more reliable. Thanks.