Hinted Handoff - related OOM
----------------------------

                 Key: CASSANDRA-3624
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3624
             Project: Cassandra
          Issue Type: Bug
            Reporter: Marcus Eriksson


One of our nodes had collected alot of hints for another node, so when the dead 
node came back and the row mutations were read back from disk, the node died 
with an OOM-exception (and kept dying after restart, even with increased heap 
(from 8G to 12G)). The heap dump contained alot of SuperColumns and our 
application does not use those (but HH does). 

I'm guessing that each mutation is big so that PAGE_SIZE*<mutation_size> does 
not fit in memory (will check this tomorrow)

A simple fix (if my assumption above is correct) would be to reduce the 
PAGE_SIZE in HintedHandOffManager.java to something like 10 (or even 1?) to 
reduce the memory pressure. The performance hit would be small since we are 
doing the hinted handoff throttle delay sleep before sending every *mutation* 
anyway (not every page), thoughts?

If anyone runs in to the same problem, I got the node started again by simply 
removing the HintsColumnFamily* files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to