On Tue, Jul 04, 2017 at 02:40:03PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 04, 2017 at 02:21:50PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 04, 2017 at 12:13:09PM +0200, Ingo Molnar wrote:
> > > 
> > > This code on the other hand:
> > > 
> > >   sa->last_update_time += delta << 10;
> > > 
> > > ... in essence creates a whole new absolute clock value that slowly but 
> > > surely is 
> > > drifting away from the real rq->clock, because 'delta' is always rounded 
> > > down to 
> > > the nearest 1024 ns boundary, so we accumulate the 'remainder' losses.
> > > 
> > > That is because:
> > > 
> > >         delta >>= 10;
> > >   ...
> > >         sa->last_update_time += delta << 10;
> > > 
> > > Given enough time, ->last_update_time can drift a long way, and this 
> > > delta:
> > > 
> > >   delta = now - sa->last_update_time;
> > > 
> > > ... becomes meaningless AFAICS, because it's essentially two different 
> > > clocks that 
> > > get compared.
> > 
> > Thing is, once you drift over 1023 (ns) your delta increases and you
> > catch up again.
> > 
> > 
> > 
> >  A  B     C       D          E  F
> >  |  |     |       |          |  |
> >  +----+----+----+----+----+----+----+----+----+----+----+
> > 
> > 
> > A: now = 0
> >    sa->last_update_time = 0
> >    delta := (now - sa->last_update_time) >> 10 = 0
> > 
> > B: now = 614                                (+614)
> >    delta = (614 - 0) >> 10 = 0
> >    sa->last_update_time += 0                (0)
> >    sa->last_update_time = now & ~1023       (0)
> > 
> > C: now = 1843                               (+1229)
> >    delta = (1843 - 0) >> 10 = 1
> >    sa->last_update_time += 1024             (1024)
> >    sa->last_update_time = now & ~1023       (1024)
> > 
> > 
> > D: now = 3481                               (+1638)
> >    delta = (3481 - 1024) >> 10 = 2
> >    sa->last_update_time += 2048             (3072)
> >    sa->last_update_time = now & ~1023       (3072)
> > 
> > E: now = 5734                               (+2253)
> >    delta = (5734 - 3072) = 2
> >    sa->last_update_time += 2048             (5120)
> >    sa->last_update_time = now & ~1023       (5120)
> > 
> > F: now = 6348                               (+614)
> >    delta = (6348 - 5120) >> 10 = 1
> >    sa->last_update_time += 1024             (6144)
> >    sa->last_update_time = now & ~1023       (6144)
> > 
> > 
> > 
> > And you'll see that both are identical, and that both D and F have
> > gotten a spill from sub-chunk accounting.
> 
> 
> Where the two approaches differ is when we have different modifications
> to sa->last_update_time (and we do).
> 
> The differential (+=) one does not mandate initial value of
> ->last_update_time has the bottom 9 bits cleared. It will simply
> continue from wherever.
> 
> The absolute (&) one however mandates that ->last_update_time always has
> the bottom few bits 0, otherwise we can 'gain' time. The first iteration
> will clear those bits and we'll then double account them.
> 
> It so happens that we have an explicit assign in migrate
> (attach_entity_load_avg / set_task_rq_fair). And on negative delta. In
> all those cases we use the immediate 'now' value, no clearing of bottom
> bits.
> 
> The differential should work fine with that, the absolute one has double
> accounting issues in that case.
> 
> So it would be very good to find what exactly causes Josef's workload to
> get 'fixed'.

Sorry let me experiment some more, like I said this is one of 4 patches I need
to actually fix my workload, and I've tested so many iterations of this problem
that I may be _thinking_ it affects things but it really doesn't.  I'll re-test
with the normal code and the other 3 patches in place and see if things are ok.
Thanks,

Josef

Reply via email to