[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263500#comment-17263500
 ] 

Walter Underwood commented on SOLR-15056:
-----------------------------------------

Thanks, I'll incorporate these changes about core. I'm still learning the 
internals.

I simplified the wording throughout the documentation. This should make it 
easier to read for non-native speakers of English.

[~atri] 2. I understand the metric is not available on a few JVMs. It is 
available in free JVMs like Amazon Corretto. If someone is running a cluster 
under heavy load, it is probably worth switching to one of those JVMs. 

[~atri] 3. My comment about 50-95% in the original report was based on a 
misreading of the documentation, sorry.

[~atri] 4. getSystemLoadAverage() is not normalized to the number of 
processors. It is an unbounded number. From the documentation: "The system load 
average is the sum of the number of runnable entities queued to the available 
processors and the number of runnable entities running on the available 
processors averaged over a period of time." When I vertically scale an AWS 
instance to one with more processors, I would also need to update all of the 
circuit breaker configs!

https://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage()

Load average has been an unbounded number for as long as I've been using Unix, 
forty years.

In my experience, load average only gets large after the system has hit 100% 
CPU. It is not very useful for predicting overload. I'm glad to keep it, but it 
needs a more accurate name and description.

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> -----------------------------------------------------------------------
>
>                 Key: SOLR-15056
>                 URL: https://issues.apache.org/jira/browse/SOLR-15056
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 8.7
>            Reporter: Walter Underwood
>            Priority: Major
>              Labels: Metrics
>         Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to