Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-19 Thread Yuyang Du
On Wed, May 18, 2016 at 09:41:20AM +0100, Matt Fleming wrote:
> On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote:
> > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> > > So, if the code looks like the following, either now or in the future,
> > > 
> > > static void __schedule(bool preempt)
> > > {
> > >   ...
> > >   /* Clear RQCF_ACT_SKIP */
> > >   rq->clock_update_flags = 0;
> > >   ...
> > >   delta = rq_clock();
> > > }
> >  
> > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
> > you clear everything.
> 
> That was sloppy on my part but intentional because that's what the
> code looks like in tip/sched/core today.
 
Sure, rq->clock_update_flags = 0 is itself all right, just say what you do.

> It was purely meant to demonstrate that setting RQCF_UPDATE just
> because RQCF_ACT_SKIP is set does not make sense. You can replace the
> clearing line with the correct bit masking operation.
 
I don't understand how you demonstrated that does not make sense. Anways,
you sort it out.

> But I get it, the pseudo-code was confusing. I'll send out a v2.
>
> > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but
> > actually it does not matter), of course you will get a warning...
> 
> Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch.
> 
> > In addition, it looks like multiple skips are possible, so:
>  
> I'm not sure what you mean, could you elaborate?
> 
> > update_rq_clock() {
> > rq->clock_update_flags |= RQCF_UPDATE;
> > 
> > ...
> > }
> > 
> > instead of clearing the skip flag there.
> 
> Huh? RQCF_*_SKIP are not cleared in update_rq_clock().

Yeah, I previously cleared the SKIP bit, which is not right, so I corrected.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-19 Thread Yuyang Du
On Wed, May 18, 2016 at 09:41:20AM +0100, Matt Fleming wrote:
> On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote:
> > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> > > So, if the code looks like the following, either now or in the future,
> > > 
> > > static void __schedule(bool preempt)
> > > {
> > >   ...
> > >   /* Clear RQCF_ACT_SKIP */
> > >   rq->clock_update_flags = 0;
> > >   ...
> > >   delta = rq_clock();
> > > }
> >  
> > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
> > you clear everything.
> 
> That was sloppy on my part but intentional because that's what the
> code looks like in tip/sched/core today.
 
Sure, rq->clock_update_flags = 0 is itself all right, just say what you do.

> It was purely meant to demonstrate that setting RQCF_UPDATE just
> because RQCF_ACT_SKIP is set does not make sense. You can replace the
> clearing line with the correct bit masking operation.
 
I don't understand how you demonstrated that does not make sense. Anways,
you sort it out.

> But I get it, the pseudo-code was confusing. I'll send out a v2.
>
> > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but
> > actually it does not matter), of course you will get a warning...
> 
> Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch.
> 
> > In addition, it looks like multiple skips are possible, so:
>  
> I'm not sure what you mean, could you elaborate?
> 
> > update_rq_clock() {
> > rq->clock_update_flags |= RQCF_UPDATE;
> > 
> > ...
> > }
> > 
> > instead of clearing the skip flag there.
> 
> Huh? RQCF_*_SKIP are not cleared in update_rq_clock().

Yeah, I previously cleared the SKIP bit, which is not right, so I corrected.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-18 Thread Matt Fleming
On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote:
> On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> > So, if the code looks like the following, either now or in the future,
> > 
> > static void __schedule(bool preempt)
> > {
> > ...
> > /* Clear RQCF_ACT_SKIP */
> > rq->clock_update_flags = 0;
> > ...
> > delta = rq_clock();
> > }
>  
> Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
> you clear everything.

That was sloppy on my part but intentional because that's what the
code looks like in tip/sched/core today.

It was purely meant to demonstrate that setting RQCF_UPDATE just
because RQCF_ACT_SKIP is set does not make sense. You can replace the
clearing line with the correct bit masking operation.

But I get it, the pseudo-code was confusing. I'll send out a v2.

> And if you clear the RQCF_UPDATE also (maybe you shouldn't, but
> actually it does not matter), of course you will get a warning...

Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch.

> In addition, it looks like multiple skips are possible, so:
 
I'm not sure what you mean, could you elaborate?

> update_rq_clock() {
> rq->clock_update_flags |= RQCF_UPDATE;
> 
> ...
> }
> 
> instead of clearing the skip flag there.

Huh? RQCF_*_SKIP are not cleared in update_rq_clock().


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-18 Thread Matt Fleming
On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote:
> On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> > So, if the code looks like the following, either now or in the future,
> > 
> > static void __schedule(bool preempt)
> > {
> > ...
> > /* Clear RQCF_ACT_SKIP */
> > rq->clock_update_flags = 0;
> > ...
> > delta = rq_clock();
> > }
>  
> Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
> you clear everything.

That was sloppy on my part but intentional because that's what the
code looks like in tip/sched/core today.

It was purely meant to demonstrate that setting RQCF_UPDATE just
because RQCF_ACT_SKIP is set does not make sense. You can replace the
clearing line with the correct bit masking operation.

But I get it, the pseudo-code was confusing. I'll send out a v2.

> And if you clear the RQCF_UPDATE also (maybe you shouldn't, but
> actually it does not matter), of course you will get a warning...

Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch.

> In addition, it looks like multiple skips are possible, so:
 
I'm not sure what you mean, could you elaborate?

> update_rq_clock() {
> rq->clock_update_flags |= RQCF_UPDATE;
> 
> ...
> }
> 
> instead of clearing the skip flag there.

Huh? RQCF_*_SKIP are not cleared in update_rq_clock().


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-17 Thread Yuyang Du
On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> So, if the code looks like the following, either now or in the future,
> 
> static void __schedule(bool preempt)
> {
>   ...
>   /* Clear RQCF_ACT_SKIP */
>   rq->clock_update_flags = 0;
>   ...
>   delta = rq_clock();
> }
 
Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
you clear everything. And if you clear the RQCF_UPDATE also (maybe you
shouldn't, but actually it does not matter), of course you will get
a warning...

In addition, it looks like multiple skips are possible, so:

update_rq_clock() {
rq->clock_update_flags |= RQCF_UPDATE;

...
}

instead of clearing the skip flag there.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-17 Thread Yuyang Du
On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> So, if the code looks like the following, either now or in the future,
> 
> static void __schedule(bool preempt)
> {
>   ...
>   /* Clear RQCF_ACT_SKIP */
>   rq->clock_update_flags = 0;
>   ...
>   delta = rq_clock();
> }
 
Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
you clear everything. And if you clear the RQCF_UPDATE also (maybe you
shouldn't, but actually it does not matter), of course you will get
a warning...

In addition, it looks like multiple skips are possible, so:

update_rq_clock() {
rq->clock_update_flags |= RQCF_UPDATE;

...
}

instead of clearing the skip flag there.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-17 Thread Matt Fleming
On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote:
> On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote:
> > 
> > No because if someone calls rq_clock() immediately after __schedule(),
> > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
> > should trigger a warning since the clock has not actually been
> > updated.
> 
> Well, I don't know how concurrent it can be, but aren't both update
> and read synchronized by rq->lock? So I don't understand the latter
> case, and the former should be addressed (missing its own update?).

I'm not talking about concurrency; when I said "someone" above, I was
referring to code.

So, if the code looks like the following, either now or in the future,

static void __schedule(bool preempt)
{
...
/* Clear RQCF_ACT_SKIP */
rq->clock_update_flags = 0;
...
delta = rq_clock();
}

I would expect to see a warning triggered, because we've read the rq
clock outside of the code area where we know it's safe to do so
without a clock update.

The solution for that bug may be as simple as rearranging the code,

delta = rq_clock();
...
rq->clock_update_flags = 0;

but we definitely want to catch such bugs in the first instance.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-17 Thread Matt Fleming
On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote:
> On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote:
> > 
> > No because if someone calls rq_clock() immediately after __schedule(),
> > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
> > should trigger a warning since the clock has not actually been
> > updated.
> 
> Well, I don't know how concurrent it can be, but aren't both update
> and read synchronized by rq->lock? So I don't understand the latter
> case, and the former should be addressed (missing its own update?).

I'm not talking about concurrency; when I said "someone" above, I was
referring to code.

So, if the code looks like the following, either now or in the future,

static void __schedule(bool preempt)
{
...
/* Clear RQCF_ACT_SKIP */
rq->clock_update_flags = 0;
...
delta = rq_clock();
}

I would expect to see a warning triggered, because we've read the rq
clock outside of the code area where we know it's safe to do so
without a clock update.

The solution for that bug may be as simple as rearranging the code,

delta = rq_clock();
...
rq->clock_update_flags = 0;

but we definitely want to catch such bugs in the first instance.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-16 Thread Yuyang Du
On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote:
> > >  
> > > - rq->clock_skip_update = 0;
> > > + /* Clear ACT, preserve everything else */
> > > + rq->clock_update_flags ^= RQCF_ACT_SKIP;
> > 
> > The comment says "Clear ACT", but this is really xor, and I am not sure
> > this is even what you want.
>  
> Urgh, you're right. I'm not sure what I was thinking when I wrote
> that.

It happens, ;)
 
> > In addition, would it be simpler to do this?
> > 
> > update_rq_clock()
> > if (flags & RQCF_ACT_SKIP)
> > flags <<= 1; /* effective skip is an update */
> > return;
> > 
> > flags = RQCF_UPDATED;
> 
> No because if someone calls rq_clock() immediately after __schedule(),
> or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
> should trigger a warning since the clock has not actually been
> updated.

Well, I don't know how concurrent it can be, but aren't both update
and read synchronized by rq->lock? So I don't understand the latter
case, and the former should be addressed (missing its own update?).


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-16 Thread Yuyang Du
On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote:
> > >  
> > > - rq->clock_skip_update = 0;
> > > + /* Clear ACT, preserve everything else */
> > > + rq->clock_update_flags ^= RQCF_ACT_SKIP;
> > 
> > The comment says "Clear ACT", but this is really xor, and I am not sure
> > this is even what you want.
>  
> Urgh, you're right. I'm not sure what I was thinking when I wrote
> that.

It happens, ;)
 
> > In addition, would it be simpler to do this?
> > 
> > update_rq_clock()
> > if (flags & RQCF_ACT_SKIP)
> > flags <<= 1; /* effective skip is an update */
> > return;
> > 
> > flags = RQCF_UPDATED;
> 
> No because if someone calls rq_clock() immediately after __schedule(),
> or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
> should trigger a warning since the clock has not actually been
> updated.

Well, I don't know how concurrent it can be, but aren't both update
and read synchronized by rq->lock? So I don't understand the latter
case, and the former should be addressed (missing its own update?).


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-16 Thread Matt Fleming
On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote:
> Hi Matt,
> 
> Thanks for Ccing me.
> 
> I am indeed interested, because I recently encountered an rq clock
> issue, which is that the clock jumps about 200ms when I was
> experimenting the "flat util hierarchy" patches, which really annoyed
> me, and I had to stop to figure out what is wrong (but haven't yet
> figured out ;))
> 
> First, this patchset does not solve my problem, but never mind, by
> reviewing your patches, I have some comments:
 
Thanks for the review. One gap that this patch series doesn't address
is that some callers of update_rq_clock() do not pin rq->lock, which
makes the diagnostic checks useless in that case.

I plan on handling that next, but I wanted to get this series out as
soon as possible for review.

> On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
> >  
> > -   rq->clock_skip_update = 0;
> > +   /* Clear ACT, preserve everything else */
> > +   rq->clock_update_flags ^= RQCF_ACT_SKIP;
> 
> The comment says "Clear ACT", but this is really xor, and I am not sure
> this is even what you want.
 
Urgh, you're right. I'm not sure what I was thinking when I wrote
that.

> In addition, would it be simpler to do this?
> 
> update_rq_clock()
>   if (flags & RQCF_ACT_SKIP)
>   flags <<= 1; /* effective skip is an update */
>   return;
> 
>   flags = RQCF_UPDATED;

No because if someone calls rq_clock() immediately after __schedule(),
or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
should trigger a warning since the clock has not actually been
updated.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-16 Thread Matt Fleming
On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote:
> Hi Matt,
> 
> Thanks for Ccing me.
> 
> I am indeed interested, because I recently encountered an rq clock
> issue, which is that the clock jumps about 200ms when I was
> experimenting the "flat util hierarchy" patches, which really annoyed
> me, and I had to stop to figure out what is wrong (but haven't yet
> figured out ;))
> 
> First, this patchset does not solve my problem, but never mind, by
> reviewing your patches, I have some comments:
 
Thanks for the review. One gap that this patch series doesn't address
is that some callers of update_rq_clock() do not pin rq->lock, which
makes the diagnostic checks useless in that case.

I plan on handling that next, but I wanted to get this series out as
soon as possible for review.

> On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
> >  
> > -   rq->clock_skip_update = 0;
> > +   /* Clear ACT, preserve everything else */
> > +   rq->clock_update_flags ^= RQCF_ACT_SKIP;
> 
> The comment says "Clear ACT", but this is really xor, and I am not sure
> this is even what you want.
 
Urgh, you're right. I'm not sure what I was thinking when I wrote
that.

> In addition, would it be simpler to do this?
> 
> update_rq_clock()
>   if (flags & RQCF_ACT_SKIP)
>   flags <<= 1; /* effective skip is an update */
>   return;
> 
>   flags = RQCF_UPDATED;

No because if someone calls rq_clock() immediately after __schedule(),
or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
should trigger a warning since the clock has not actually been
updated.


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-15 Thread Yuyang Du
Hi Matt,

Thanks for Ccing me.

I am indeed interested, because I recently encountered an rq clock
issue, which is that the clock jumps about 200ms when I was
experimenting the "flat util hierarchy" patches, which really annoyed
me, and I had to stop to figure out what is wrong (but haven't yet
figured out ;))

First, this patchset does not solve my problem, but never mind, by
reviewing your patches, I have some comments:

On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
>  
> - rq->clock_skip_update = 0;
> + /* Clear ACT, preserve everything else */
> + rq->clock_update_flags ^= RQCF_ACT_SKIP;

The comment says "Clear ACT", but this is really xor, and I am not sure
this is even what you want.

In addition, would it be simpler to do this?

update_rq_clock()
if (flags & RQCF_ACT_SKIP)
flags <<= 1; /* effective skip is an update */
return;

flags = RQCF_UPDATED;

Thanks,
Yuyang


Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-15 Thread Yuyang Du
Hi Matt,

Thanks for Ccing me.

I am indeed interested, because I recently encountered an rq clock
issue, which is that the clock jumps about 200ms when I was
experimenting the "flat util hierarchy" patches, which really annoyed
me, and I had to stop to figure out what is wrong (but haven't yet
figured out ;))

First, this patchset does not solve my problem, but never mind, by
reviewing your patches, I have some comments:

On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
>  
> - rq->clock_skip_update = 0;
> + /* Clear ACT, preserve everything else */
> + rq->clock_update_flags ^= RQCF_ACT_SKIP;

The comment says "Clear ACT", but this is really xor, and I am not sure
this is even what you want.

In addition, would it be simpler to do this?

update_rq_clock()
if (flags & RQCF_ACT_SKIP)
flags <<= 1; /* effective skip is an update */
return;

flags = RQCF_UPDATED;

Thanks,
Yuyang