[ 
https://issues.apache.org/jira/browse/SOLR-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767226#comment-17767226
 ] 

Alex Deparvu commented on SOLR-16986:
-------------------------------------

very interesting. I was playing with something like this but never managed to 
clean it up for a proposal.

the aggregation part is interesting, I would have thought per-node CPU is good 
in understanding 'local' problems (sudden spikes, uneven load distribution) but 
I don't see how global aggregation would help better. not against it, just 
curious.

just a few random thoughts (you are probably well aware of most of this 
already):
* collecting threadUserTime is also useful and diffing the 2 gives some more 
detail like total cpu, user mode time, system mode time
* Solr uses tread pools so this collection needs to happen inside a given 
execution, otherwise metrics are not useful
* along the same lines, I was also playing with per-thread allocated memory. 
yes there are a few hoops you need to go through but sometimes this metric is 
available and it looks interesting. (just an example 
https://github.com/scala/compiler-benchmark/blob/f7d789fbada662ed76d351a4ab5fe34b200ec770/compilation/src/main/scala/scala/tools/nsc/ExtendedThreadMxBean.java#L5)
* it is very easy to expose this new data as a `MetricSet` under the 
`Group.jvm` metrics registry, but the prometheus exporter needs some updates to 
include this data


> Measure and aggregate thread CPU time in distributed search
> -----------------------------------------------------------
>
>                 Key: SOLR-16986
>                 URL: https://issues.apache.org/jira/browse/SOLR-16986
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Priority: Major
>
> Solr responses include "QTime", which in retrospect might have been better 
> named "elapsedTime".  We propose adding here a "cpuTime" to return the amount 
> of time consumed by 
> ManagementFactory.getThreadMXBean().[getThreadCpuTime|https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/ThreadMXBean.html]().
>   Unlike QTime, this will need to be aggregated across distributed requests.  
> This work item will only do the aggregation work for distributed search, 
> although it could be extended for other scenarios in future work items.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to