[ 
https://issues.apache.org/jira/browse/CASSANDRA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800887#comment-15800887
 ] 

Maxime Fouilleul commented on CASSANDRA-13096:
----------------------------------------------

We use the jmx prometheus exporter 
(https://github.com/prometheus/jmx_exporter). 

>From the README:
{quote}
Note that the scraper always processes all mBeans, even if they're not exported.
{quote}

We use the default use in examples 
(https://github.com/prometheus/jmx_exporter/blob/master/example_configs/cassandra.yml):
{code}
---
lowercaseOutputLabelNames: true
lowercaseOutputName: true
rules:
- pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), 
scope=(\S*), name=(\S*)><>(Count|Value)
  name: cassandra_$1_$3
  labels:
    address: "$2"
- pattern: org.apache.cassandra.metrics<type=(\S*)(?:, 
((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?,
    name=(\S*)><>(Count|Value)
  name: cassandra_$1_$5
  labels:
    "$1": "$4"
    "$2": "$3"
{code}

We hit a bug recently (https://issues.apache.org/jira/browse/CASSANDRA-11594), 
it looks like the same kind of issue, maybe it could help...

Thanks for your help. 

> Snapshots slow down jmx scrapping
> ---------------------------------
>
>                 Key: CASSANDRA-13096
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13096
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Maxime Fouilleul
>         Attachments: CPU Load.png, Clear Snapshots.png, JMX Scrape 
> Duration.png
>
>
> Hello,
> We are scraping the jmx metrics through a prometheus exporter and we noticed 
> that some nodes became really long to answer (more than 20 seconds). After 
> some investigations we do not find any hardware problem or overload issues on 
> there "slow" nodes. It happens on different clusters, some with only few giga 
> bytes of dataset and it does not seams to be related to a specific version 
> neither as it happens on 2.1, 2.2 and 3.0 nodes. 
> After some unsuccessful actions, one of our ideas was to clean the snapshots 
> staying on one problematic node:
> {code}
> nodetool clearsnapshot
> {code}
> And the magic happens... as you can see in the attached diagrams, the second 
> we cleared the snapshots, the CPU activity dropped immediatly and the 
> duration to scrape the jmx metrics goes from +20 secs to instantaneous...
> Can you enlighten us on this issue? Once again, it appears on our three 2.1, 
> 2.2 and 3.0 versions, on different volumetry and it is not systematically 
> linked to the snapshots as we have some nodes with the same snapshots volume 
> which are going pretty well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to