[
https://issues.apache.org/jira/browse/GEODE-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Burcham updated GEODE-9002:
--------------------------------
Description:
Linux performance icon Brendan Gregg advocates the
[USE|http://www.brendangregg.com/usemethod.html] method of performance
analysis: Utilization Saturation and Errors.
When it comes to CPU, Geode captures a number of _utilization_ statistics. Some
are direct like LinuxSystemStats cpuIdle and cpuActive. Others are indirect
like:
* DistributionStats
** heartbeatsSent: you may see a gap in the every-five-seconds heartbeats
* StatSampler
** delayDuration: you may see a rise when CPU is scarce
** sampleCount: you may see an interruption in the regular once-per-second
sampling
* (G1GC collector)
** (various memory utilization statistics may indicate memory pressure which
in turn can give rise to long GC pauses)
* LinuxSystemStats
** cpuSteal: indicating that the virtualization environment has not given the
VM its share of CPU
But utilization statistics alone can't tell you when a resource (like CPU) is
_saturated_, i.e. when demand is higher than the servicing ability. If you're
just looking at utilization metrics, then a saturated system might look a lot
like a system just below saturation. In order to tell the difference,
saturation metrics are needed.
In the case of CPU, there is a conceptual queue in front of each processor.
Tasks (operating system threads) that are ready to run, enter a queue, and
after some delay, are given a time slice by an actual physical CPU.
You might think that Geode's LinuxSystemStats loadAverage1 and 5 and 15, might
fit this bill. Those statistics do provide some saturation information. The
problem is, they conflate CPU with I/O and other things (see [Linux Load
Averages: Solving the
Mystery|[http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html].)]
A better, more specific measure of CPU saturation is available through
statistics exposed via the /proc/schedstat virtual file.
When this ticket is complete, there will be a new statistic type called
LinuxThreadScheduler, with three associated statistics gathered directly from
/proc/schedstat or derived from data gathered from it:
* runningTimeNanos: sum of all time spent running by tasks on this processor
in nanoseconds
* queuedTimeNanos: sum of all time spent waiting to run by tasks on this
processor in nanoseconds
* tasksScheduledCount: # of tasks (not necessarily unique) given to the
processor
* meanTaskQueuedTimeNanos: average time that a ready-to-run task waited for a
CPU, since the last sample, in nanoseconds
One "statistic" will be gathered for each CPU. So a Geode process running on a
two-CPU system will capture two statistics, called "cpu0", "cpu1", each of this
new type.
By default Geode will not gather these new statistics. A TBD Java system
property will be used to enable gathering the new LinuxThreadScheduler
statistic.
was:
Linux performance icon Brendan Gregg advocates the
[USE|http://www.brendangregg.com/usemethod.html] method of performance
analysis: Utilization Saturation and Errors.
When it comes to CPU, Geode captures a number of _utilization_ statistics. Some
are direct like LinuxSystemStats cpuIdle and cpuActive. Others are indirect
like:
But utilization statistics alone can't tell you when a resource (like CPU) is
_saturated_, i.e. when demand is higher than the servicing ability. If you're
just looking at utilization metrics, then a saturated system might look a lot
like a system just below saturation. In order to tell the difference,
saturation metrics are needed.
In the case of CPU, there is a conceptual queue in front of each processor.
Tasks (operating system threads) that are ready to run, enter a queue, and
after some delay, are given a time slice by an actual physical CPU.
You might think that Geode's LinuxSystemStats loadAverage1 and 5 and 15, might
fit this bill. Those statistics do provide some saturation information. The
problem is, they conflate CPU with I/O and other things (see [Linux Load
Averages: Solving the
Mystery|[http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html].)]
A better, more specific measure of CPU saturation is available through
statistics exposed via the /proc/schedstat virtual file.
When this ticket is complete, there will be a new statistic type called
LinuxThreadScheduler, with three associated statistics gathered directly from
/proc/schedstat or derived from data gathered from it:
* runningTimeNanos: sum of all time spent running by tasks on this processor
in nanoseconds
* queuedTimeNanos: sum of all time spent waiting to run by tasks on this
processor in nanoseconds
* tasksScheduledCount: # of tasks (not necessarily unique) given to the
processor
* meanTaskQueuedTimeNanos: average time that a ready-to-run task waited for a
CPU, since the last sample, in nanoseconds
One "statistic" will be gathered for each CPU. So a Geode process running on a
two-CPU system will capture two statistics, called "cpu0", "cpu1", each of this
new type.
By default Geode will not gather these new statistics. A TBD Java system
property will be used to enable gathering the new LinuxThreadScheduler
statistic.
> Add Statistic for /proc/schedstat
> ---------------------------------
>
> Key: GEODE-9002
> URL: https://issues.apache.org/jira/browse/GEODE-9002
> Project: Geode
> Issue Type: New Feature
> Components: statistics
> Reporter: Bill Burcham
> Assignee: Bill Burcham
> Priority: Major
> Labels: pull-request-available
>
> Linux performance icon Brendan Gregg advocates the
> [USE|http://www.brendangregg.com/usemethod.html] method of performance
> analysis: Utilization Saturation and Errors.
> When it comes to CPU, Geode captures a number of _utilization_ statistics.
> Some are direct like LinuxSystemStats cpuIdle and cpuActive. Others are
> indirect like:
> * DistributionStats
> ** heartbeatsSent: you may see a gap in the every-five-seconds heartbeats
> * StatSampler
> ** delayDuration: you may see a rise when CPU is scarce
> ** sampleCount: you may see an interruption in the regular once-per-second
> sampling
> * (G1GC collector)
> ** (various memory utilization statistics may indicate memory pressure which
> in turn can give rise to long GC pauses)
> * LinuxSystemStats
> ** cpuSteal: indicating that the virtualization environment has not given
> the VM its share of CPU
>
> But utilization statistics alone can't tell you when a resource (like CPU) is
> _saturated_, i.e. when demand is higher than the servicing ability. If
> you're just looking at utilization metrics, then a saturated system might
> look a lot like a system just below saturation. In order to tell the
> difference, saturation metrics are needed.
> In the case of CPU, there is a conceptual queue in front of each processor.
> Tasks (operating system threads) that are ready to run, enter a queue, and
> after some delay, are given a time slice by an actual physical CPU.
> You might think that Geode's LinuxSystemStats loadAverage1 and 5 and 15,
> might fit this bill. Those statistics do provide some saturation information.
> The problem is, they conflate CPU with I/O and other things (see [Linux Load
> Averages: Solving the
> Mystery|[http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html].)]
> A better, more specific measure of CPU saturation is available through
> statistics exposed via the /proc/schedstat virtual file.
> When this ticket is complete, there will be a new statistic type called
> LinuxThreadScheduler, with three associated statistics gathered directly from
> /proc/schedstat or derived from data gathered from it:
> * runningTimeNanos: sum of all time spent running by tasks on this processor
> in nanoseconds
> * queuedTimeNanos: sum of all time spent waiting to run by tasks on this
> processor in nanoseconds
> * tasksScheduledCount: # of tasks (not necessarily unique) given to the
> processor
> * meanTaskQueuedTimeNanos: average time that a ready-to-run task waited for
> a CPU, since the last sample, in nanoseconds
> One "statistic" will be gathered for each CPU. So a Geode process running on
> a two-CPU system will capture two statistics, called "cpu0", "cpu1", each of
> this new type.
> By default Geode will not gather these new statistics. A TBD Java system
> property will be used to enable gathering the new LinuxThreadScheduler
> statistic.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)