[ https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263245#comment-17263245 ]
Atri Sharma commented on SOLR-15056: ------------------------------------ Hi Walter, Thanks for tackling this and apologies for the delay in response. I took a look at the patch and here are my comments: 1. Majority of the changes in the patch are to make CircuitBreaker aware of the SolrCore that is being used – which is a cyclic dependency since SolrCore owns CircuitBreakerManager which in turn owns CircuitBreakers. Also, the API becomes ugly, so please fix this change. 2. If you look at the original discussion around this metric, the com.sun package wasn't used due to it being a specific implementation. I agree that it is a cleaner metric to use, but we cannot be depending on something that isn't available on certain VMs. 3. I am curious to understand your assertion about the 50-95% range – that is specifically for JVM heap usage based circuit breaker, not the CPU circuit breaker. Maybe the documentation needs to be clarified? 4. OperatingSystemMXBean.getSystemLoadAverage() is a good metric because it takes the number of CPUs and the current process queue length into account, averaging it by time. I would want to keep this metric unless it does not work for a specific OS (Windows, like you said). Maybe we can use OperatingSystemMXBean.getSystemCPULoad() as the fallback mechanism if the first one returns -1? > CPU circuit breaker needs to use CPU utilization, not Unix load average > ----------------------------------------------------------------------- > > Key: SOLR-15056 > URL: https://issues.apache.org/jira/browse/SOLR-15056 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics > Affects Versions: 8.7 > Reporter: Walter Underwood > Priority: Major > Labels: Metrics > Attachments: SOLR-15056.patch > > > The config range, 50% to 95%, assumes that the circuit breaker is triggered > by a CPU utilization metric that goes from 0% to 100%. But the code uses the > metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of > the count of processes waiting to run. It is effectively unbounded. I've seen > it as high as 50 to 100. It is not bound by 1.0 (100%). > A good limit for load average would need to be aware of the number of CPUs > available to the JVM. A load average of 8 is no problem for a 32 CPU host. It > is a critical situation for a 2 CPU host. > Also, load average is a Unix OS metric. I don't know if it is even available > on Windows. > Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good > choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses > "load", but it is a usage metric. > From the Javadoc: > > Returns the "recent cpu usage" for the whole system. This value is a double > >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle > >during the recent period of time observed, while a value of 1.0 means that > >all CPUs were actively running 100% of the time during the recent period > >being observed. All values betweens 0.0 and 1.0 are possible depending of > >the activities going on in the system. If the system recent cpu usage is not > >available, the method returns a negative value. > https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad() > Also update the documentation to explain which JMX metrics are used for the > memory and CPU circuit breakers. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org