[ https://issues.apache.org/jira/browse/CASSANDRA-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Radim Kolar updated CASSANDRA-3624: ----------------------------------- Comment: was deleted (was: I have this problem too but i do not have large rows, i have huge number of small rows (max 180 bytes serialized)) > Hinted Handoff - related OOM > ---------------------------- > > Key: CASSANDRA-3624 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3624 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Marcus Eriksson > Assignee: Jonathan Ellis > Labels: hintedhandoff > Fix For: 1.0.7 > > Attachments: 3624.txt > > > One of our nodes had collected alot of hints for another node, so when the > dead node came back and the row mutations were read back from disk, the node > died with an OOM-exception (and kept dying after restart, even with increased > heap (from 8G to 12G)). The heap dump contained alot of SuperColumns and our > application does not use those (but HH does). > I'm guessing that each mutation is big so that PAGE_SIZE*<mutation_size> does > not fit in memory (will check this tomorrow) > A simple fix (if my assumption above is correct) would be to reduce the > PAGE_SIZE in HintedHandOffManager.java to something like 10 (or even 1?) to > reduce the memory pressure. The performance hit would be small since we are > doing the hinted handoff throttle delay sleep before sending every *mutation* > anyway (not every page), thoughts? > If anyone runs in to the same problem, I got the node started again by simply > removing the HintsColumnFamily* files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira