Maxime Fouilleul created CASSANDRA-13096:
--------------------------------------------

             Summary: Snapshots slow down jmx scrapping
                 Key: CASSANDRA-13096
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13096
             Project: Cassandra
          Issue Type: Bug
            Reporter: Maxime Fouilleul
         Attachments: Capture d’écran 2017-01-04 à 15.52.47.png, Capture 
d’écran 2017-01-04 à 15.53.23.png, Capture d’écran 2017-01-04 à 17.02.01.png

Hello,

We are scraping the jmx metrics through a prometheus exporter and we noticed 
that some nodes became really long to answer (more than 20 seconds). After some 
investigations we do not find any hardware problem or overload issues on there 
"slow" nodes. It happens on different clusters, some with only few giga bytes 
of dataset and it does not seams to be related to a specific version neither as 
it happens on 2.1, 2.2 and 3.0 nodes. 

After some unsuccessful actions, one of our ideas was to clean the snapshots 
staying on one problematic node:

{code}
nodetool clearsnapshot
{code}

And the magic happens... as you can see in the attached diagrams, the second we 
cleared the snapshots, the CPU activity dropped immediatly and the duration to 
scrape the jmx metrics goes from +20 secs to instantaneous...

Can you enlighten us on this issue? Once again, it appears on our three 2.1, 
2.2 and 3.0 versions, on different volumetry and it is not systematically 
linked to the snapshots as we have some nodes with the same snapshots volume 
which are going pretty well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to