Stanislav Vishnevskiy created CASSANDRA-13687:
-------------------------------------------------

             Summary: Abnormal heap growth and long GC during repair.
                 Key: CASSANDRA-13687
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13687
             Project: Cassandra
          Issue Type: Bug
            Reporter: Stanislav Vishnevskiy
         Attachments: 3.0.14.png, 3.0.9.png

We recently upgraded from 3.0.9 to 3.0.14 to get the fix from CASSANDRA-13004

Sadly 3 out of the last 7 nights we have had to wake up due Cassandra dying on 
us. We currently don't have any data to help reproduce this, but maybe since 
there aren't many commits between the 2 version it might be obvious.

Basically we trigger a parallel incremental repair from a single node every 
night at 1AM. That node will sometimes start allocating a lot and keeping the 
heap maxed and triggering GC. Some of these GC can last up to 2 minutes. This 
effectively destroys the whole cluster due to timeouts to this node.

The only solution we currently have is to drain the node and restart the 
repair, it has worked fine the second time every time.

I attached heap charts from 3.0.9 and 3.0.14.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to