[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-08 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9523 )

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..

KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

This changes the timing for the collection of stack samples into the
metrics log to avoid two issues:

1) Inflexible configuration: in many cases we might want to sample
stacks more or less frequently than metrics. The previous implementation
used the same schedule for both.

2) Sampling bias: if we collect stack samples on a fixed schedule, we
risk unintended correlation with background tasks that might run on a
similar fixed schedule. For example, if we collect stacks exactly once a
minute, and the user has some monitoring software which polls the HTTP
server once a minute for some data, we might either line up perfectly
with the polling (in which case we'd overestimate its impact) or line of
perfectly away from the polling (in which case we'd never see its
effects).

This patch changes the wakeups for stacks and metrics to be decoupled.
In addition, it adds randomness to the stack sampling. The configuration
now represents the mean sampling rate rather than an exact sampling
rate.

I manually ran a master with -diagnostics-log-stack-traces-interval-ms=2000
and verified that the stack sample times were between 0 and 4 seconds apart
while the metric dumps were still exactly one minute apart. I also plan
to modify some local test clusters to configure this to be relatively
frequent to try to suss out any races or bugs which might occur in a
real workload.

Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Reviewed-on: http://gerrit.cloudera.org:8080/9523
Tested-by: Todd Lipcon 
Reviewed-by: Adar Dembo 
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
2 files changed, 77 insertions(+), 49 deletions(-)

Approvals:
  Todd Lipcon: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9523 )

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Mar 2018 18:49:42 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-08 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-08 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9523 )

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Mar 2018 18:10:14 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-07 Thread Todd Lipcon (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9523

to look at the new patch set (#3).

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..

KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

This changes the timing for the collection of stack samples into the
metrics log to avoid two issues:

1) Inflexible configuration: in many cases we might want to sample
stacks more or less frequently than metrics. The previous implementation
used the same schedule for both.

2) Sampling bias: if we collect stack samples on a fixed schedule, we
risk unintended correlation with background tasks that might run on a
similar fixed schedule. For example, if we collect stacks exactly once a
minute, and the user has some monitoring software which polls the HTTP
server once a minute for some data, we might either line up perfectly
with the polling (in which case we'd overestimate its impact) or line of
perfectly away from the polling (in which case we'd never see its
effects).

This patch changes the wakeups for stacks and metrics to be decoupled.
In addition, it adds randomness to the stack sampling. The configuration
now represents the mean sampling rate rather than an exact sampling
rate.

I manually ran a master with -diagnostics-log-stack-traces-interval-ms=2000
and verified that the stack sample times were between 0 and 4 seconds apart
while the metric dumps were still exactly one minute apart. I also plan
to modify some local test clusters to configure this to be relatively
frequent to try to suss out any races or bugs which might occur in a
real workload.

Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
2 files changed, 77 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/9523/3
--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9523 )

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.h
File src/kudu/server/diagnostics_log.h:

http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.h@59
PS2, Line 59:   enum WakeupType {
> enum class is the new hotness.
Done


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@75
PS2, Line 75: If this is set to "
:  "a non-positive number, stack traces will be not be 
periodically logged.
> How about making this a uint and using 0 to disable?
Done


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@169
PS2, Line 169: Random rng(GetRandomSeed32());
> Why create a new PRNG each time?
seemed easier than making it a class member, and this isn't perf critical 
enough that the seed computation matters much


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@182
PS2, Line 182:   __builtin_unreachable();
> Fancy. If you're feeling generous, there are quite a few places in the code
I suppose it coudl be. This isn't the first use of it, though. It's not 
supported by MSVC but I think we have so many Linux-isms that this is the least 
of our problems :) FWIW Clang on windows is also now pretty feasible (Chrome 
just switched to it)


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@204
PS2, Line 204: } else if (MonoTime::Now() > next_log) {
> >= is more correct, no?
Done


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@208
PS2, Line 208:   wakeups.emplace(ComputeNextWakeup(what), what);
> what what what
Ack ack ack


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@214
PS2, Line 214: // Unlock the mutex while actually logging metrics or stacks 
since it's somewhat
> This is much cleaner, thanks for the cleanup.
Ack


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@222
PS2, Line 222:   WARN_NOT_OK(LogStacks(reason), "Unable to collect stacks 
to diagnostics log");
> So error no longer feds into the next logging time? Didn't find it useful?
yea, at first I tried to maintain it, but then it just added complexity and I 
couldn't really see the value relative to the extra 5-10 lines of code



--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:49:37 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-07 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9523 )

Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.h
File src/kudu/server/diagnostics_log.h:

http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.h@59
PS2, Line 59:   enum WakeupType {
enum class is the new hotness.


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@75
PS2, Line 75: If this is set to "
:  "a non-positive number, stack traces will be not be 
periodically logged.
How about making this a uint and using 0 to disable?


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@169
PS2, Line 169: Random rng(GetRandomSeed32());
Why create a new PRNG each time?


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@182
PS2, Line 182:   __builtin_unreachable();
Fancy. If you're feeling generous, there are quite a few places in the codebase 
where we add a comment like // unreachable instead of actually using this, and 
they could be fixed up.

Also, should this be abstracted away by something in gutil/port.h? Or do you 
expect it to be the same everywhere, including in compilers we might not 
support yet (like MSVC)?


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@204
PS2, Line 204: } else if (MonoTime::Now() > next_log) {
>= is more correct, no?


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@208
PS2, Line 208:   wakeups.emplace(ComputeNextWakeup(what), what);
what what what


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@214
PS2, Line 214: // Unlock the mutex while actually logging metrics or stacks 
since it's somewhat
This is much cleaner, thanks for the cleanup.


http://gerrit.cloudera.org:8080/#/c/9523/2/src/kudu/server/diagnostics_log.cc@222
PS2, Line 222:   WARN_NOT_OK(LogStacks(reason), "Unable to collect stacks 
to diagnostics log");
So error no longer feds into the next logging time? Didn't find it useful?



--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Wed, 07 Mar 2018 20:22:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

2018-03-06 Thread Todd Lipcon (Code Review)
Hello Mike Percy, Adar Dembo,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9523

to review the following change.


Change subject: KUDU-2297 (part 6): enable periodic stack sampling by default, 
avoid bias
..

KUDU-2297 (part 6): enable periodic stack sampling by default, avoid bias

This changes the timing for the collection of stack samples into the
metrics log to avoid two issues:

1) Inflexible configuration: in many cases we might want to sample
stacks more or less frequently than metrics. The previous implementation
used the same schedule for both.

2) Sampling bias: if we collect stack samples on a fixed schedule, we
risk unintended correlation with background tasks that might run on a
similar fixed schedule. For example, if we collect stacks exactly once a
minute, and the user has some monitoring software which polls the HTTP
server once a minute for some data, we might either line up perfectly
with the polling (in which case we'd overestimate its impact) or line of
perfectly away from the polling (in which case we'd never see its
effects).

This patch changes the wakeups for stacks and metrics to be decoupled.
In addition, it adds randomness to the stack sampling. The configuration
now represents the mean sampling rate rather than an exact sampling
rate.

I manually ran a master with -diagnostics-log-stack-traces-interval-ms=2000
and verified that the stack sample times were between 0 and 4 seconds apart
while the metric dumps were still exactly one minute apart. I also plan
to modify some local test clusters to configure this to be relatively
frequent to try to suss out any races or bugs which might occur in a
real workload.

Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
---
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
2 files changed, 77 insertions(+), 49 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/9523/1
--
To view, visit http://gerrit.cloudera.org:8080/9523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I538eb59f95a695fa2da50f24ed82148715b15e44
Gerrit-Change-Number: 9523
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy