[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263245#comment-17263245
 ] 

Atri Sharma commented on SOLR-15056:
------------------------------------

Hi Walter,

 

Thanks for tackling this and apologies for the delay in response.

 

I took a look at the patch and here are my comments:

1. Majority of the changes in the patch are to make CircuitBreaker aware of the 
SolrCore that is being used – which is a cyclic dependency since SolrCore owns 
CircuitBreakerManager which in turn owns CircuitBreakers. Also, the API becomes 
ugly, so please fix this change.

 

2. If you look at the original discussion around this metric, the com.sun 
package wasn't used due to it being a specific implementation. I agree that it 
is a cleaner metric to use, but we cannot be depending on something that isn't 
available on certain VMs.

 

3. I am curious to understand your assertion about the 50-95% range – that is 
specifically for JVM heap usage based circuit breaker, not the CPU circuit 
breaker. Maybe the documentation needs to be clarified?

 

4. OperatingSystemMXBean.getSystemLoadAverage() is a good metric because it 
takes the number of CPUs and the current process queue length into account, 
averaging it by time. I would want to keep this metric unless it does not work 
for a specific OS (Windows, like you said). Maybe we can use 
OperatingSystemMXBean.getSystemCPULoad() as the fallback mechanism if the first 
one returns -1?

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> -----------------------------------------------------------------------
>
>                 Key: SOLR-15056
>                 URL: https://issues.apache.org/jira/browse/SOLR-15056
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 8.7
>            Reporter: Walter Underwood
>            Priority: Major
>              Labels: Metrics
>         Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to