[ https://issues.apache.org/jira/browse/CASSANDRA-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581324#comment-14581324 ]
Alan Boudreault commented on CASSANDRA-9573: -------------------------------------------- Tried to delete all my jna jars and add "-Dcassandra.boot_without_jna=true" but getting the following error. I'll check more tomorrow morning. {code} java.lang.NoClassDefFoundError: com/sun/jna/Native at org.apache.cassandra.utils.memory.MemoryUtil.allocate(MemoryUtil.java:82) ~[main/:na] at org.apache.cassandra.io.util.Memory.<init>(Memory.java:74) ~[main/:na] at org.apache.cassandra.io.util.SafeMemory.<init>(SafeMemory.java:32) ~[main/:na] at org.apache.cassandra.io.compress.CompressionMetadata$Writer.<init>(CompressionMetadata.java:274) ~[main/:na] at org.apache.cassandra.io.compress.CompressionMetadata$Writer.open(CompressionMetadata.java:288) ~[main/:na] at org.apache.cassandra.io.compress.CompressedSequentialWriter.<init>(CompressedSequentialWriter.java:73) ~[main/:na] at org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:167) ~[main/:na] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.<init>(BigTableWriter.java:75) ~[main/:na] at org.apache.cassandra.io.sstable.format.big.BigFormat$WriterFactory.open(BigFormat.java:107) ~[main/:na] at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:84) ~[main/:na] at org.apache.cassandra.db.Memtable$FlushRunnable.createFlushWriter(Memtable.java:419) ~[main/:na] at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:366) ~[main/:na] at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:351) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na] at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1121) ~[main/:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_76] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_76] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_76] Caused by: java.lang.ClassNotFoundException: com.sun.jna.Native at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_76] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_76] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_76] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_76] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_76] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_76] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_76] ... 19 common frames omitted {code} > OOM when loading compressed sstables (system.hints) > --------------------------------------------------- > > Key: CASSANDRA-9573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9573 > Project: Cassandra > Issue Type: Bug > Reporter: Alan Boudreault > Assignee: Benedict > Priority: Critical > Fix For: 2.2.0 rc2 > > Attachments: hs_err_pid11243.log, > java-hints-issue-2015-06-09.snapshot, system.log, yourkit.ss.tar.gz > > > [~andrew.tolbert] discovered an issue while running endurance tests on 2.2. A > Node was not able to start and was killed by the OOM Killer. > Briefly, Cassandra use an excessive amount of memory when loading compressed > sstables (off-heap?). We have initially seen the issue with system.hints > before knowing it was related to compression. system.hints use lz4 > compression by default. If we have a sstable of, say 8-10G, Cassandra will be > killed by the OOM killer after 1-2 minutes. I can reproduce that bug > everytime locally. > * the issue also happens if we have 10G of data splitted in 13MB sstables. > * I can reproduce the issue if I put a lot of data in the system.hints table. > * I cannot reproduce the issue with a standard table using the same > compression (LZ4). Something seems to be different when it's hints? > You wont see anything in the node system.log but you'll see this in > /var/log/syslog.log: > {code} > Out of memory: Kill process 30777 (java) score 600 or sacrifice child > {code} > The issue has been introduced in this commit but is not related to the > performance issue in CASSANDRA-9240: > https://github.com/apache/cassandra/commit/aedce5fc6ba46ca734e91190cfaaeb23ba47a846 > Here is the core dump and some yourkit snapshots in attachments. I am not > sure you will be able to get useful information from them. > core dump: http://dl.alanb.ca/core.tar.gz > Not sure if this is related, but all dumps and snapshot points to > EstimatedHistogramReservoir ... and we can see many > javax.management.InstanceAlreadyExistsException: > org.apache.cassandra.metrics:... exceptions in system.log before it hangs > then crash. > To reproduce the issue: > 1. created a cluster of 3 nodes > 2. start the whole cluster > 3. shutdown node2 and node3 > 4. writes 10-15G of data on node1 with replication factor 3. You should see a > lot of hints. > 5. stop node1 > 6. start node2 and node3 > 7. start node1, you should OOM. > //cc [~tjake] [~benedict] [~andrew.tolbert] -- This message was sent by Atlassian JIRA (v6.3.4#6332)