Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Wed, May 18, 2016 at 09:41:20AM +0100, Matt Fleming wrote: > On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote: > > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > > > So, if the code looks like the following, either now or in the future, > > > > > > static void __schedule(bool preempt) > > > { > > > ... > > > /* Clear RQCF_ACT_SKIP */ > > > rq->clock_update_flags = 0; > > > ... > > > delta = rq_clock(); > > > } > > > > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, > > you clear everything. > > That was sloppy on my part but intentional because that's what the > code looks like in tip/sched/core today. Sure, rq->clock_update_flags = 0 is itself all right, just say what you do. > It was purely meant to demonstrate that setting RQCF_UPDATE just > because RQCF_ACT_SKIP is set does not make sense. You can replace the > clearing line with the correct bit masking operation. I don't understand how you demonstrated that does not make sense. Anways, you sort it out. > But I get it, the pseudo-code was confusing. I'll send out a v2. > > > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but > > actually it does not matter), of course you will get a warning... > > Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch. > > > In addition, it looks like multiple skips are possible, so: > > I'm not sure what you mean, could you elaborate? > > > update_rq_clock() { > > rq->clock_update_flags |= RQCF_UPDATE; > > > > ... > > } > > > > instead of clearing the skip flag there. > > Huh? RQCF_*_SKIP are not cleared in update_rq_clock(). Yeah, I previously cleared the SKIP bit, which is not right, so I corrected.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Wed, May 18, 2016 at 09:41:20AM +0100, Matt Fleming wrote: > On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote: > > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > > > So, if the code looks like the following, either now or in the future, > > > > > > static void __schedule(bool preempt) > > > { > > > ... > > > /* Clear RQCF_ACT_SKIP */ > > > rq->clock_update_flags = 0; > > > ... > > > delta = rq_clock(); > > > } > > > > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, > > you clear everything. > > That was sloppy on my part but intentional because that's what the > code looks like in tip/sched/core today. Sure, rq->clock_update_flags = 0 is itself all right, just say what you do. > It was purely meant to demonstrate that setting RQCF_UPDATE just > because RQCF_ACT_SKIP is set does not make sense. You can replace the > clearing line with the correct bit masking operation. I don't understand how you demonstrated that does not make sense. Anways, you sort it out. > But I get it, the pseudo-code was confusing. I'll send out a v2. > > > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but > > actually it does not matter), of course you will get a warning... > > Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch. > > > In addition, it looks like multiple skips are possible, so: > > I'm not sure what you mean, could you elaborate? > > > update_rq_clock() { > > rq->clock_update_flags |= RQCF_UPDATE; > > > > ... > > } > > > > instead of clearing the skip flag there. > > Huh? RQCF_*_SKIP are not cleared in update_rq_clock(). Yeah, I previously cleared the SKIP bit, which is not right, so I corrected.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote: > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > > So, if the code looks like the following, either now or in the future, > > > > static void __schedule(bool preempt) > > { > > ... > > /* Clear RQCF_ACT_SKIP */ > > rq->clock_update_flags = 0; > > ... > > delta = rq_clock(); > > } > > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, > you clear everything. That was sloppy on my part but intentional because that's what the code looks like in tip/sched/core today. It was purely meant to demonstrate that setting RQCF_UPDATE just because RQCF_ACT_SKIP is set does not make sense. You can replace the clearing line with the correct bit masking operation. But I get it, the pseudo-code was confusing. I'll send out a v2. > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but > actually it does not matter), of course you will get a warning... Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch. > In addition, it looks like multiple skips are possible, so: I'm not sure what you mean, could you elaborate? > update_rq_clock() { > rq->clock_update_flags |= RQCF_UPDATE; > > ... > } > > instead of clearing the skip flag there. Huh? RQCF_*_SKIP are not cleared in update_rq_clock().
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote: > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > > So, if the code looks like the following, either now or in the future, > > > > static void __schedule(bool preempt) > > { > > ... > > /* Clear RQCF_ACT_SKIP */ > > rq->clock_update_flags = 0; > > ... > > delta = rq_clock(); > > } > > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, > you clear everything. That was sloppy on my part but intentional because that's what the code looks like in tip/sched/core today. It was purely meant to demonstrate that setting RQCF_UPDATE just because RQCF_ACT_SKIP is set does not make sense. You can replace the clearing line with the correct bit masking operation. But I get it, the pseudo-code was confusing. I'll send out a v2. > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but > actually it does not matter), of course you will get a warning... Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch. > In addition, it looks like multiple skips are possible, so: I'm not sure what you mean, could you elaborate? > update_rq_clock() { > rq->clock_update_flags |= RQCF_UPDATE; > > ... > } > > instead of clearing the skip flag there. Huh? RQCF_*_SKIP are not cleared in update_rq_clock().
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > So, if the code looks like the following, either now or in the future, > > static void __schedule(bool preempt) > { > ... > /* Clear RQCF_ACT_SKIP */ > rq->clock_update_flags = 0; > ... > delta = rq_clock(); > } Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, you clear everything. And if you clear the RQCF_UPDATE also (maybe you shouldn't, but actually it does not matter), of course you will get a warning... In addition, it looks like multiple skips are possible, so: update_rq_clock() { rq->clock_update_flags |= RQCF_UPDATE; ... } instead of clearing the skip flag there.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > So, if the code looks like the following, either now or in the future, > > static void __schedule(bool preempt) > { > ... > /* Clear RQCF_ACT_SKIP */ > rq->clock_update_flags = 0; > ... > delta = rq_clock(); > } Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, you clear everything. And if you clear the RQCF_UPDATE also (maybe you shouldn't, but actually it does not matter), of course you will get a warning... In addition, it looks like multiple skips are possible, so: update_rq_clock() { rq->clock_update_flags |= RQCF_UPDATE; ... } instead of clearing the skip flag there.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote: > On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote: > > > > No because if someone calls rq_clock() immediately after __schedule(), > > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we > > should trigger a warning since the clock has not actually been > > updated. > > Well, I don't know how concurrent it can be, but aren't both update > and read synchronized by rq->lock? So I don't understand the latter > case, and the former should be addressed (missing its own update?). I'm not talking about concurrency; when I said "someone" above, I was referring to code. So, if the code looks like the following, either now or in the future, static void __schedule(bool preempt) { ... /* Clear RQCF_ACT_SKIP */ rq->clock_update_flags = 0; ... delta = rq_clock(); } I would expect to see a warning triggered, because we've read the rq clock outside of the code area where we know it's safe to do so without a clock update. The solution for that bug may be as simple as rearranging the code, delta = rq_clock(); ... rq->clock_update_flags = 0; but we definitely want to catch such bugs in the first instance.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote: > On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote: > > > > No because if someone calls rq_clock() immediately after __schedule(), > > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we > > should trigger a warning since the clock has not actually been > > updated. > > Well, I don't know how concurrent it can be, but aren't both update > and read synchronized by rq->lock? So I don't understand the latter > case, and the former should be addressed (missing its own update?). I'm not talking about concurrency; when I said "someone" above, I was referring to code. So, if the code looks like the following, either now or in the future, static void __schedule(bool preempt) { ... /* Clear RQCF_ACT_SKIP */ rq->clock_update_flags = 0; ... delta = rq_clock(); } I would expect to see a warning triggered, because we've read the rq clock outside of the code area where we know it's safe to do so without a clock update. The solution for that bug may be as simple as rearranging the code, delta = rq_clock(); ... rq->clock_update_flags = 0; but we definitely want to catch such bugs in the first instance.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote: > > > > > > - rq->clock_skip_update = 0; > > > + /* Clear ACT, preserve everything else */ > > > + rq->clock_update_flags ^= RQCF_ACT_SKIP; > > > > The comment says "Clear ACT", but this is really xor, and I am not sure > > this is even what you want. > > Urgh, you're right. I'm not sure what I was thinking when I wrote > that. It happens, ;) > > In addition, would it be simpler to do this? > > > > update_rq_clock() > > if (flags & RQCF_ACT_SKIP) > > flags <<= 1; /* effective skip is an update */ > > return; > > > > flags = RQCF_UPDATED; > > No because if someone calls rq_clock() immediately after __schedule(), > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we > should trigger a warning since the clock has not actually been > updated. Well, I don't know how concurrent it can be, but aren't both update and read synchronized by rq->lock? So I don't understand the latter case, and the former should be addressed (missing its own update?).
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote: > > > > > > - rq->clock_skip_update = 0; > > > + /* Clear ACT, preserve everything else */ > > > + rq->clock_update_flags ^= RQCF_ACT_SKIP; > > > > The comment says "Clear ACT", but this is really xor, and I am not sure > > this is even what you want. > > Urgh, you're right. I'm not sure what I was thinking when I wrote > that. It happens, ;) > > In addition, would it be simpler to do this? > > > > update_rq_clock() > > if (flags & RQCF_ACT_SKIP) > > flags <<= 1; /* effective skip is an update */ > > return; > > > > flags = RQCF_UPDATED; > > No because if someone calls rq_clock() immediately after __schedule(), > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we > should trigger a warning since the clock has not actually been > updated. Well, I don't know how concurrent it can be, but aren't both update and read synchronized by rq->lock? So I don't understand the latter case, and the former should be addressed (missing its own update?).
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote: > Hi Matt, > > Thanks for Ccing me. > > I am indeed interested, because I recently encountered an rq clock > issue, which is that the clock jumps about 200ms when I was > experimenting the "flat util hierarchy" patches, which really annoyed > me, and I had to stop to figure out what is wrong (but haven't yet > figured out ;)) > > First, this patchset does not solve my problem, but never mind, by > reviewing your patches, I have some comments: Thanks for the review. One gap that this patch series doesn't address is that some callers of update_rq_clock() do not pin rq->lock, which makes the diagnostic checks useless in that case. I plan on handling that next, but I wanted to get this series out as soon as possible for review. > On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote: > > > > - rq->clock_skip_update = 0; > > + /* Clear ACT, preserve everything else */ > > + rq->clock_update_flags ^= RQCF_ACT_SKIP; > > The comment says "Clear ACT", but this is really xor, and I am not sure > this is even what you want. Urgh, you're right. I'm not sure what I was thinking when I wrote that. > In addition, would it be simpler to do this? > > update_rq_clock() > if (flags & RQCF_ACT_SKIP) > flags <<= 1; /* effective skip is an update */ > return; > > flags = RQCF_UPDATED; No because if someone calls rq_clock() immediately after __schedule(), or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we should trigger a warning since the clock has not actually been updated.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote: > Hi Matt, > > Thanks for Ccing me. > > I am indeed interested, because I recently encountered an rq clock > issue, which is that the clock jumps about 200ms when I was > experimenting the "flat util hierarchy" patches, which really annoyed > me, and I had to stop to figure out what is wrong (but haven't yet > figured out ;)) > > First, this patchset does not solve my problem, but never mind, by > reviewing your patches, I have some comments: Thanks for the review. One gap that this patch series doesn't address is that some callers of update_rq_clock() do not pin rq->lock, which makes the diagnostic checks useless in that case. I plan on handling that next, but I wanted to get this series out as soon as possible for review. > On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote: > > > > - rq->clock_skip_update = 0; > > + /* Clear ACT, preserve everything else */ > > + rq->clock_update_flags ^= RQCF_ACT_SKIP; > > The comment says "Clear ACT", but this is really xor, and I am not sure > this is even what you want. Urgh, you're right. I'm not sure what I was thinking when I wrote that. > In addition, would it be simpler to do this? > > update_rq_clock() > if (flags & RQCF_ACT_SKIP) > flags <<= 1; /* effective skip is an update */ > return; > > flags = RQCF_UPDATED; No because if someone calls rq_clock() immediately after __schedule(), or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we should trigger a warning since the clock has not actually been updated.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
Hi Matt, Thanks for Ccing me. I am indeed interested, because I recently encountered an rq clock issue, which is that the clock jumps about 200ms when I was experimenting the "flat util hierarchy" patches, which really annoyed me, and I had to stop to figure out what is wrong (but haven't yet figured out ;)) First, this patchset does not solve my problem, but never mind, by reviewing your patches, I have some comments: On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote: > > - rq->clock_skip_update = 0; > + /* Clear ACT, preserve everything else */ > + rq->clock_update_flags ^= RQCF_ACT_SKIP; The comment says "Clear ACT", but this is really xor, and I am not sure this is even what you want. In addition, would it be simpler to do this? update_rq_clock() if (flags & RQCF_ACT_SKIP) flags <<= 1; /* effective skip is an update */ return; flags = RQCF_UPDATED; Thanks, Yuyang
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
Hi Matt, Thanks for Ccing me. I am indeed interested, because I recently encountered an rq clock issue, which is that the clock jumps about 200ms when I was experimenting the "flat util hierarchy" patches, which really annoyed me, and I had to stop to figure out what is wrong (but haven't yet figured out ;)) First, this patchset does not solve my problem, but never mind, by reviewing your patches, I have some comments: On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote: > > - rq->clock_skip_update = 0; > + /* Clear ACT, preserve everything else */ > + rq->clock_update_flags ^= RQCF_ACT_SKIP; The comment says "Clear ACT", but this is really xor, and I am not sure this is even what you want. In addition, would it be simpler to do this? update_rq_clock() if (flags & RQCF_ACT_SKIP) flags <<= 1; /* effective skip is an update */ return; flags = RQCF_UPDATED; Thanks, Yuyang