[jira] [Commented] (CASSANDRA-8719) Using thrift HSHA with offheap_objects appears to corrupt data
[ https://issues.apache.org/jira/browse/CASSANDRA-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344007#comment-14344007 ] Karl Mueller commented on CASSANDRA-8719: - Is it possible to have this issue on 2.0.10? Using thrift HSHA with offheap_objects appears to corrupt data -- Key: CASSANDRA-8719 URL: https://issues.apache.org/jira/browse/CASSANDRA-8719 Project: Cassandra Issue Type: Bug Components: Core Reporter: Randy Fradin Assignee: Benedict Fix For: 2.1.3 Attachments: 8719.txt, repro8719.sh Copying my comment from CASSANDRA-6285 to a new issue since that issue is long closed and I'm not sure if they are related... I am getting this exception using Thrift HSHA in 2.1.0: {quote} INFO [CompactionExecutor:8] 2015-01-26 13:32:51,818 CompactionTask.java (line 138) Compacting [SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-2-Data.db'), SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-1-Data.db')] INFO [CompactionExecutor:8] 2015-01-26 13:32:51,890 ColumnFamilyStore.java (line 856) Enqueuing flush of compactions_in_progress: 212 (0%) on-heap, 20 (0%) off-heap INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,892 Memtable.java (line 326) Writing Memtable-compactions_in_progress@1155018639(0 serialized bytes, 1 ops, 0%/0% of on/off-heap limit) INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,896 Memtable.java (line 360) Completed flushing /tmp/cass_test/cassandra/TestCassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-2-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1422296630707, position=430226) ERROR [CompactionExecutor:8] 2015-01-26 13:32:51,906 CassandraDaemon.java (line 166) Exception in thread Thread[CompactionExecutor:8,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(131206587314004820534098544948237170809, 80010001000c62617463685f6d757461746500) = current key DecoratedKey(14775611966645399672119169777260659240, 726f776b65793030385f31343232323937313537353835) writing into /tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-tmp-ka-3-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:196) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:110) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:177) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235) ~[apache-cassandra-2.1.0.jar:2.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_40] at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40] {quote} I don't think it's caused by CASSANDRA-8211, because it happens during the first compaction that takes place between the first 2 SSTables to get flushed from an initially empty column family. Also, I've only been able to reproduce it when using both *hsha* for the rpc server and *offheap_objects* for memtable allocation. If I switch either to sync or to offheap_buffers or heap_buffers then I cannot reproduce the problem. Also under the same circumstances I'm pretty sure I've seen incorrect data being returned to a client multiget_slice request before any SSTables had been flushed yet, so I presume
[jira] [Commented] (CASSANDRA-8719) Using thrift HSHA with offheap_objects appears to corrupt data
[ https://issues.apache.org/jira/browse/CASSANDRA-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344077#comment-14344077 ] Karl Mueller commented on CASSANDRA-8719: - OK, thanks. I am seeing corruption in 2.0.10, but I'm not sure yet whether it's in cassandra or outside cassandra yet. Using thrift HSHA with offheap_objects appears to corrupt data -- Key: CASSANDRA-8719 URL: https://issues.apache.org/jira/browse/CASSANDRA-8719 Project: Cassandra Issue Type: Bug Components: Core Reporter: Randy Fradin Assignee: Benedict Fix For: 2.1.3 Attachments: 8719.txt, repro8719.sh Copying my comment from CASSANDRA-6285 to a new issue since that issue is long closed and I'm not sure if they are related... I am getting this exception using Thrift HSHA in 2.1.0: {quote} INFO [CompactionExecutor:8] 2015-01-26 13:32:51,818 CompactionTask.java (line 138) Compacting [SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-2-Data.db'), SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-1-Data.db')] INFO [CompactionExecutor:8] 2015-01-26 13:32:51,890 ColumnFamilyStore.java (line 856) Enqueuing flush of compactions_in_progress: 212 (0%) on-heap, 20 (0%) off-heap INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,892 Memtable.java (line 326) Writing Memtable-compactions_in_progress@1155018639(0 serialized bytes, 1 ops, 0%/0% of on/off-heap limit) INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,896 Memtable.java (line 360) Completed flushing /tmp/cass_test/cassandra/TestCassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-2-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1422296630707, position=430226) ERROR [CompactionExecutor:8] 2015-01-26 13:32:51,906 CassandraDaemon.java (line 166) Exception in thread Thread[CompactionExecutor:8,1,RMI Runtime] java.lang.RuntimeException: Last written key DecoratedKey(131206587314004820534098544948237170809, 80010001000c62617463685f6d757461746500) = current key DecoratedKey(14775611966645399672119169777260659240, 726f776b65793030385f31343232323937313537353835) writing into /tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-tmp-ka-3-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:196) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:110) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:177) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.0.jar:2.1.0] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235) ~[apache-cassandra-2.1.0.jar:2.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_40] at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40] {quote} I don't think it's caused by CASSANDRA-8211, because it happens during the first compaction that takes place between the first 2 SSTables to get flushed from an initially empty column family. Also, I've only been able to reproduce it when using both *hsha* for the rpc server and *offheap_objects* for memtable allocation. If I switch either to sync or to offheap_buffers or heap_buffers then I cannot reproduce the problem. Also under the same circumstances I'm pretty sure I've seen incorrect data being returned to a client
[jira] [Created] (CASSANDRA-8330) Confusing Message: ConfigurationException: Found system keyspace files, but they couldn't be loaded!
Karl Mueller created CASSANDRA-8330: --- Summary: Confusing Message: ConfigurationException: Found system keyspace files, but they couldn't be loaded! Key: CASSANDRA-8330 URL: https://issues.apache.org/jira/browse/CASSANDRA-8330 Project: Cassandra Issue Type: Bug Environment: cassandra 2.0.10 Reporter: Karl Mueller Priority: Minor I restarted a node which was not responding to cqlsh. It produced this error: INFO [SSTableBatchOpen:3] 2014-11-17 16:36:50,388 SSTableReader.java (line 223) Opening /data2/data-cassandra/system/local/system-local-jb-304 (133 bytes) INFO [SSTableBatchOpen:2] 2014-11-17 16:36:50,388 SSTableReader.java (line 223) Opening /data2/data-cassandra/system/local/system-local-jb-305 (80 bytes) INFO [main] 2014-11-17 16:36:50,393 AutoSavingCache.java (line 114) reading saved cache /data2/cache-cassandra/system-local-KeyCache-b.db ERROR [main] 2014-11-17 16:36:50,543 CassandraDaemon.java (line 265) Fatal exception during initialization org.apache.cassandra.exceptions.ConfigurationException: Found system keyspace files, but they couldn't be loaded! at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:554) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) After deleting the cache, I still got this error: INFO 16:41:43,718 Opening /data2/data-cassandra/system/local/system-local-jb-304 (133 bytes) INFO 16:41:43,718 Opening /data2/data-cassandra/system/local/system-local-jb-305 (80 bytes) ERROR 16:41:43,877 Fatal exception during initialization org.apache.cassandra.exceptions.ConfigurationException: Found system keyspace files, but they couldn't be loaded! at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:554) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) I think possibly the node had corrupted one of the files due to it being in a bad state. This would be impossible to replicate, so I don't think the actual bug is that helpful. What I did find very confusing was the error message. There's nothing to indicate what the problem is! Is it a corrupt file? A valid file with bad information in it? Referencing something that doesn't exist?! I fixed it by deleting the system keyspace and starting it with its token, but many people wouldn't know to do that at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183674#comment-14183674 ] Karl Mueller commented on CASSANDRA-8177: - serial repairs are also terrible for us in 2.0.10 parallel is better sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Assignee: Yuki Morishita Attachments: cassc-week.png, iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183757#comment-14183757 ] Karl Mueller commented on CASSANDRA-8177: - Sequential repair is meant to be used where validation compaction on all replica will impact on overall cluster performance. If parallel repair does the job, then stick with it is fine. Why on earth is serial repair the default then?? parallel is a better default! sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Assignee: Yuki Morishita Attachments: cassc-week.png, iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181487#comment-14181487 ] Karl Mueller commented on CASSANDRA-7966: - No, I saw this on multiple nodes on two separate clusters 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Assignee: Marcus Eriksson Priority: Minor Attachments: dev-cass00-log.txt This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: {noformat} INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-7966: Attachment: dev-cass00-log.txt log a bit before the exception.. I don't think there is much interesting before or after it, seems normal after 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Assignee: Marcus Eriksson Priority: Minor Attachments: dev-cass00-log.txt This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: {noformat} INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174036#comment-14174036 ] Karl Mueller commented on CASSANDRA-7966: - Yes, I run upgradesstable before every x.y upgrade 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Priority: Minor This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: {noformat} INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174044#comment-14174044 ] Karl Mueller commented on CASSANDRA-7966: - No, I haven't done it. I wasn't aware running upgradesstables *after* an upgrade was standard practice :) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Priority: Minor This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: {noformat} INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174121#comment-14174121 ] Karl Mueller commented on CASSANDRA-7966: - OK thanks - let me know if there's more info needed :) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Assignee: Marcus Eriksson Priority: Minor This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: {noformat} INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8024) No boot finished or ready message anymore upon startup completion to CLI
Karl Mueller created CASSANDRA-8024: --- Summary: No boot finished or ready message anymore upon startup completion to CLI Key: CASSANDRA-8024 URL: https://issues.apache.org/jira/browse/CASSANDRA-8024 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.10 Reporter: Karl Mueller Priority: Trivial This is trivial, but cassandra logs the following to its log: ... INFO [main] 2014-09-29 23:10:35,793 CassandraDaemon.java (line 575) No gossip backlog; proceeding INFO [main] 2014-09-29 23:10:35,979 Server.java (line 156) Starting listening for CQL clients on kaos-cass00.sv.walmartlabs.com/10.93.12.10:9042... INFO [main] 2014-09-29 23:10:36,048 ThriftServer.java (line 99) Using TFramedTransport with a max frame size of 15728640 bytes. However, on the command line I only see: INFO 23:10:30,005 Compacted 4 sstables to [/data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-67,]. 1,333 bytes to 962 (~72% of original) in 32ms = 0.028670MB/s. 15 total partitions merged to 12. Partition merge counts were {1:11, 2:2, } INFO 23:10:35,793 No gossip backlog; proceeding I would be nice if the Starting listening for.. or some other startup complete message went to the command line STDOUT. There used to be one, I think, but there isn't anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
Karl Mueller created CASSANDRA-7966: --- Summary: 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Priority: Minor This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138271#comment-14138271 ] Karl Mueller commented on CASSANDRA-7966: - It's not a new node. This is a very old cluster that's been migrated since the 0.6.x days. It was running 1.2.18, and I'm upgrading it to 2.0.10. upgradesstables was run on every node in it using 1.2.18 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException Key: CASSANDRA-7966 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966 Project: Cassandra Issue Type: Bug Environment: JDK 1.7 Reporter: Karl Mueller Priority: Minor This happened on a new node when starting 2.0.10 after 1.2.18 with complete upgradesstables run: {noformat} INFO 15:31:11,532 Enqueuing flush of Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops) INFO 15:31:11,547 Completed flushing /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, position=164409) ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7507) OOM creates unreliable state - die instantly better
Karl Mueller created CASSANDRA-7507: --- Summary: OOM creates unreliable state - die instantly better Key: CASSANDRA-7507 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507 Project: Cassandra Issue Type: New Feature Reporter: Karl Mueller Priority: Minor I had a cassandra node run OOM. My heap had enough headroom, there was just something which either was a bug or some unfortunate amount of short-term memory utilization. This resulted in the following error: WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java (line 1713) Some hints were not written before shutdown. This is not supposed to happen. You should (a) run repair, and (b) file a bug report There are no other messages of relevance besides the OOM error about 90 minutes earlier. My (limited) understanding of the JVM and Cassandra says that when it goes OOM, it will attempt to signal cassandra to shut down cleanly. The problem, in my view, is that with an OOM situation, nothing is guaranteed anymore. I believe it's impossible to reliably cleanly shut down at this point, and therefore it's wrong to even try. Yes, ideally things could be written out, flushed to disk, memory messages written, other nodes notified, etc. but why is there any reason to believe any of those steps could happen? Would happen? Couldn't bad data be written at this point to disk rather than good data? Some network messages delivered, but not others? I think Cassandra should have the option to (and possibly default) to kill itself immediately upon the OOM condition happening in a hard way, and not rely on the java-based clean shutdown process. Cassandra already handles recovery from unclean shutdown, and it's not a big deal. My node, for example, kept in a sort-of alive state for 90 minutes where who knows what it was doing or not doing. I don't know enough about the JVM and options for it to know the best exact implementation of die instantly on OOM, but it should be something that's possible either with some flags or a C library (which doesn't rely on java memory to do something which it may not be able to get!) Short version: a kill -9 of all C* processes in that instance without needing more java memory, when OOM is raised -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7507) OOM creates unreliable state - die instantly better
[ https://issues.apache.org/jira/browse/CASSANDRA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054320#comment-14054320 ] Karl Mueller commented on CASSANDRA-7507: - We're 1.2.16 at present I'm not so concerned as much about something causing the cassandra node to run out of memory as much as how it handles the situation of being out of memory. I think due to the unreliability of java allocations in JVM OOM, cassandra should not attempt to do anything - even cleanly shut down. It would be better if it just dies immediately. Or, possibly make this an option for this behavior, for those of us who would rather have a node be down than broken/dying/zombie/corrupting itself. OOM creates unreliable state - die instantly better --- Key: CASSANDRA-7507 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507 Project: Cassandra Issue Type: New Feature Reporter: Karl Mueller Priority: Minor I had a cassandra node run OOM. My heap had enough headroom, there was just something which either was a bug or some unfortunate amount of short-term memory utilization. This resulted in the following error: WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java (line 1713) Some hints were not written before shutdown. This is not supposed to happen. You should (a) run repair, and (b) file a bug report There are no other messages of relevance besides the OOM error about 90 minutes earlier. My (limited) understanding of the JVM and Cassandra says that when it goes OOM, it will attempt to signal cassandra to shut down cleanly. The problem, in my view, is that with an OOM situation, nothing is guaranteed anymore. I believe it's impossible to reliably cleanly shut down at this point, and therefore it's wrong to even try. Yes, ideally things could be written out, flushed to disk, memory messages written, other nodes notified, etc. but why is there any reason to believe any of those steps could happen? Would happen? Couldn't bad data be written at this point to disk rather than good data? Some network messages delivered, but not others? I think Cassandra should have the option to (and possibly default) to kill itself immediately upon the OOM condition happening in a hard way, and not rely on the java-based clean shutdown process. Cassandra already handles recovery from unclean shutdown, and it's not a big deal. My node, for example, kept in a sort-of alive state for 90 minutes where who knows what it was doing or not doing. I don't know enough about the JVM and options for it to know the best exact implementation of die instantly on OOM, but it should be something that's possible either with some flags or a C library (which doesn't rely on java memory to do something which it may not be able to get!) Short version: a kill -9 of all C* processes in that instance without needing more java memory, when OOM is raised -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7507) OOM creates unreliable state - die instantly better
[ https://issues.apache.org/jira/browse/CASSANDRA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054433#comment-14054433 ] Karl Mueller edited comment on CASSANDRA-7507 at 7/8/14 2:23 AM: - if a bug can cause the clean exit after OOM to fail as expected, then isn't it considered a problem? I guess if I'm considering the value of a clean exit versus possibly staying up, being in a weird state, or not writing the right data to disk, I would always prefer it to die without worrying about a clean exit. As I said, in my opinion, Cassandra already handles dying unexpectedly fine - there's no need to handle it cleanly when there's any risk. If there's no risk of something like 7133 happening (or a similar bug), then sure, clean exit is sensible, but that's clearly not guaranteed. Replaying some logs and then flushing is not a big deal compared to potentially bad data, zombie states, etc. - in my view, at least. was (Author: kmueller): if a bug can cause the clean exit after OOM to fail as expected, then isn't it considered a problem? I guess if I'm considering the value of a clean exit versus possibly staying up or being in a weird state, I would always prefer it to die without worrying about a clean exit. As I said, in my opinion, Cassandra already handles dying unexpectedly fine - there's no need to handle it cleanly when there's any risk. If there's no risk of something like 7133 happening (or a similar bug), then sure, clean exit is sensible, but that's clearly not guaranteed. Replaying some logs and then flushing is not a big deal compared to potentially bad data, zombie states, etc. - in my view, at least. OOM creates unreliable state - die instantly better --- Key: CASSANDRA-7507 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507 Project: Cassandra Issue Type: New Feature Reporter: Karl Mueller Priority: Minor I had a cassandra node run OOM. My heap had enough headroom, there was just something which either was a bug or some unfortunate amount of short-term memory utilization. This resulted in the following error: WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java (line 1713) Some hints were not written before shutdown. This is not supposed to happen. You should (a) run repair, and (b) file a bug report There are no other messages of relevance besides the OOM error about 90 minutes earlier. My (limited) understanding of the JVM and Cassandra says that when it goes OOM, it will attempt to signal cassandra to shut down cleanly. The problem, in my view, is that with an OOM situation, nothing is guaranteed anymore. I believe it's impossible to reliably cleanly shut down at this point, and therefore it's wrong to even try. Yes, ideally things could be written out, flushed to disk, memory messages written, other nodes notified, etc. but why is there any reason to believe any of those steps could happen? Would happen? Couldn't bad data be written at this point to disk rather than good data? Some network messages delivered, but not others? I think Cassandra should have the option to (and possibly default) to kill itself immediately upon the OOM condition happening in a hard way, and not rely on the java-based clean shutdown process. Cassandra already handles recovery from unclean shutdown, and it's not a big deal. My node, for example, kept in a sort-of alive state for 90 minutes where who knows what it was doing or not doing. I don't know enough about the JVM and options for it to know the best exact implementation of die instantly on OOM, but it should be something that's possible either with some flags or a C library (which doesn't rely on java memory to do something which it may not be able to get!) Short version: a kill -9 of all C* processes in that instance without needing more java memory, when OOM is raised -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7507) OOM creates unreliable state - die instantly better
[ https://issues.apache.org/jira/browse/CASSANDRA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054433#comment-14054433 ] Karl Mueller commented on CASSANDRA-7507: - if a bug can cause the clean exit after OOM to fail as expected, then isn't it considered a problem? I guess if I'm considering the value of a clean exit versus possibly staying up or being in a weird state, I would always prefer it to die without worrying about a clean exit. As I said, in my opinion, Cassandra already handles dying unexpectedly fine - there's no need to handle it cleanly when there's any risk. If there's no risk of something like 7133 happening (or a similar bug), then sure, clean exit is sensible, but that's clearly not guaranteed. Replaying some logs and then flushing is not a big deal compared to potentially bad data, zombie states, etc. - in my view, at least. OOM creates unreliable state - die instantly better --- Key: CASSANDRA-7507 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507 Project: Cassandra Issue Type: New Feature Reporter: Karl Mueller Priority: Minor I had a cassandra node run OOM. My heap had enough headroom, there was just something which either was a bug or some unfortunate amount of short-term memory utilization. This resulted in the following error: WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java (line 1713) Some hints were not written before shutdown. This is not supposed to happen. You should (a) run repair, and (b) file a bug report There are no other messages of relevance besides the OOM error about 90 minutes earlier. My (limited) understanding of the JVM and Cassandra says that when it goes OOM, it will attempt to signal cassandra to shut down cleanly. The problem, in my view, is that with an OOM situation, nothing is guaranteed anymore. I believe it's impossible to reliably cleanly shut down at this point, and therefore it's wrong to even try. Yes, ideally things could be written out, flushed to disk, memory messages written, other nodes notified, etc. but why is there any reason to believe any of those steps could happen? Would happen? Couldn't bad data be written at this point to disk rather than good data? Some network messages delivered, but not others? I think Cassandra should have the option to (and possibly default) to kill itself immediately upon the OOM condition happening in a hard way, and not rely on the java-based clean shutdown process. Cassandra already handles recovery from unclean shutdown, and it's not a big deal. My node, for example, kept in a sort-of alive state for 90 minutes where who knows what it was doing or not doing. I don't know enough about the JVM and options for it to know the best exact implementation of die instantly on OOM, but it should be something that's possible either with some flags or a C library (which doesn't rely on java memory to do something which it may not be able to get!) Short version: a kill -9 of all C* processes in that instance without needing more java memory, when OOM is raised -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7094) Keyspace name quoting is handled inconsistently and strangely
Karl Mueller created CASSANDRA-7094: --- Summary: Keyspace name quoting is handled inconsistently and strangely Key: CASSANDRA-7094 URL: https://issues.apache.org/jira/browse/CASSANDRA-7094 Project: Cassandra Issue Type: Bug Components: Tools Environment: cassandra 1.2.16 Reporter: Karl Mueller Priority: Trivial Keyspaces which are named starting with capital letters (and perhaps other things) sometimes require double quotes and sometimes do not. For example, describe works without quotes: cqlsh describe keyspace ProductGenomeLocal; CREATE KEYSPACE ProductGenomeLocal WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '3' }; USE ProductGenomeLocal; [...] But use will not: cqlsh use ProductGenomeLocal; Bad Request: Keyspace 'productgenomelocal' does not exist It seems that qoutes should only really be necessary when there's spaces or other symbols that need to be quoted. At the least, the acceptance or failures of quotes should be consistent. Other minor annoyance: tab expansion works in use and describe with quotes, but will not work in either without quotes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-5989) java.lang.OutOfMemoryError: Requested array size exceeds VM limit
[ https://issues.apache.org/jira/browse/CASSANDRA-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917686#comment-13917686 ] Karl Mueller commented on CASSANDRA-5989: - I certainly am not using Gora. java.lang.OutOfMemoryError: Requested array size exceeds VM limit - Key: CASSANDRA-5989 URL: https://issues.apache.org/jira/browse/CASSANDRA-5989 Project: Cassandra Issue Type: Bug Environment: Cassandra 1.2.8 Oracle Java(TM) SE Runtime Environment (build 1.7.0_25-b15) RHEL6 Reporter: Karl Mueller This occurred in one of our nodes today. I don't have any helpful information on what is going on beforehand yet - logs don't have anything I could see that's tied for sure to it. A few things happened in the logs beforehand. A little bit of standard GC, a bunch of status-logger entries 10 minutes before the crash, and a few nodes going up and down on the gossip. ERROR [Thrift:7495] 2013-09-03 11:01:12,486 CassandraDaemon.java (line 192) Exception in thread Thread[Thrift:7495,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.thrift.transport.TFramedTransport.write(TFramedTransport.java:146) at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:163) at org.apache.cassandra.thrift.TBinaryProtocol.writeBinary(TBinaryProtocol.java:69) at org.apache.cassandra.thrift.Column.write(Column.java:579) at org.apache.cassandra.thrift.CqlRow.write(CqlRow.java:439) at org.apache.cassandra.thrift.CqlResult.write(CqlResult.java:602) at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.write(Cassandra.java:37895) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:34) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6549) java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy
Karl Mueller created CASSANDRA-6549: --- Summary: java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy Key: CASSANDRA-6549 URL: https://issues.apache.org/jira/browse/CASSANDRA-6549 Project: Cassandra Issue Type: Bug Environment: Sun JDK 1.7 cassandra 1.2.13 Reporter: Karl Mueller Getting many of these since upgrading to 1.2.13: ERROR [Thrift:3141] 2014-01-03 13:21:34,909 CustomTThreadPoolServer.java (line 217) Error occurred during processing of message. java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy at org.apache.cassandra.db.ConsistencyLevel.localQuorumFor(ConsistencyLevel.java:93) at org.apache.cassandra.db.ConsistencyLevel.blockFor(ConsistencyLevel.java:114) at org.apache.cassandra.service.ReadCallback.init(ReadCallback.java:65) at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:880) at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:816) at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:108) at org.apache.cassandra.thrift.CassandraServer.internal_get(CassandraServer.java:413) at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:443) at org.apache.cassandra.thrift.Cassandra$Processor$get.getResult(Cassandra.java:3399) at org.apache.cassandra.thrift.Cassandra$Processor$get.getResult(Cassandra.java:3387) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) We are running Simple Strategy. If there is an invalid client consistency level being used, it should not cause errors on the server. Is this related to CASSANDRA-6238 ? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6545) LOCAL_QUORUM still doesn't work with SimpleStrategy but don't throw a meaningful error message anymore
[ https://issues.apache.org/jira/browse/CASSANDRA-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861961#comment-13861961 ] Karl Mueller commented on CASSANDRA-6545: - We were also bit by this in my duplicate issue 6549 LOCAL_QUORUM still doesn't work with SimpleStrategy but don't throw a meaningful error message anymore -- Key: CASSANDRA-6545 URL: https://issues.apache.org/jira/browse/CASSANDRA-6545 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Alex Liu Fix For: 1.2.14 It seems it was the intent of CASSANDRA-6238 originally, though I've tracked to the commit of CASSANDRA-6309 (f7efaffadace3e344eeb4a1384fa72c73d8422b0 to be precise) but in any case, ConsistencyLevel.validateForWrite does not reject LOCAL_QUORUM when SimpleStrategy is used anymore, yet ConsistencyLevel.blockFor definitively cast the strategy to NTS for LOCAL_QUORUM (in localQuorumFor() to be precise). Which results in a ClassCastException as reported by https://datastax-oss.atlassian.net/browse/JAVA-241. Note that while we're at it, I tend to agree with Aleksey comment on CASSANDRA-6238, why not make EACH_QUORUM == QUORUM for SimpleStrategy too? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (CASSANDRA-6433) snapshot race with compaction causes missing link error
Karl Mueller created CASSANDRA-6433: --- Summary: snapshot race with compaction causes missing link error Key: CASSANDRA-6433 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433 Project: Cassandra Issue Type: Bug Environment: EL6 Oracle Java 1.7.40 Reporter: Karl Mueller Priority: Minor Cassandra 1.2.11 When trying to snapshot, I encountered this error. It appears that snapshot doesn't lock the sstable list in a keyspace which can cause a race condition with compaction. (I think it's compaction, at least) [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12 *** dev-cass01 (1) *** Nodetool command snapshot -t pre-1.2.12 failed! Output: Requested creating snapshot for: all keyspaces Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612) at org.apache.cassandra.db.Table.snapshot(Table.java:194) at org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6433) snapshot race with compaction causes missing link error
[ https://issues.apache.org/jira/browse/CASSANDRA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837003#comment-13837003 ] Karl Mueller commented on CASSANDRA-6433: - Update: this doesn't appear to be related to a race during the snapshot. This file appears to be missing entirely. Got the error in a 2nd snapshot attempt, and: [cassandra@dev-cass01 ~]$ ls -la /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db snapshot race with compaction causes missing link error --- Key: CASSANDRA-6433 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433 Project: Cassandra Issue Type: Bug Environment: EL6 Oracle Java 1.7.40 Reporter: Karl Mueller Priority: Minor Cassandra 1.2.11 When trying to snapshot, I encountered this error. It appears that snapshot doesn't lock the sstable list in a keyspace which can cause a race condition with compaction. (I think it's compaction, at least) [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12 *** dev-cass01 (1) *** Nodetool command snapshot -t pre-1.2.12 failed! Output: Requested creating snapshot for: all keyspaces Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612) at org.apache.cassandra.db.Table.snapshot(Table.java:194) at org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at
[jira] [Commented] (CASSANDRA-6433) snapshot race with compaction causes missing link error
[ https://issues.apache.org/jira/browse/CASSANDRA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837005#comment-13837005 ] Karl Mueller commented on CASSANDRA-6433: - Bah short paste [cassandra@dev-cass01 ~]$ ls -la /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db ls: cannot access /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db: No such file or directory I need to restart it so I can snapshot it and run my upgrade soon. snapshot race with compaction causes missing link error --- Key: CASSANDRA-6433 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433 Project: Cassandra Issue Type: Bug Environment: EL6 Oracle Java 1.7.40 Reporter: Karl Mueller Priority: Minor Cassandra 1.2.11 When trying to snapshot, I encountered this error. It appears that snapshot doesn't lock the sstable list in a keyspace which can cause a race condition with compaction. (I think it's compaction, at least) [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12 *** dev-cass01 (1) *** Nodetool command snapshot -t pre-1.2.12 failed! Output: Requested creating snapshot for: all keyspaces Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612) at org.apache.cassandra.db.Table.snapshot(Table.java:194) at org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[jira] [Created] (CASSANDRA-6435) nodetool outputs xss and jamm errors in 1.2.12
Karl Mueller created CASSANDRA-6435: --- Summary: nodetool outputs xss and jamm errors in 1.2.12 Key: CASSANDRA-6435 URL: https://issues.apache.org/jira/browse/CASSANDRA-6435 Project: Cassandra Issue Type: Bug Reporter: Karl Mueller Priority: Minor Since 1.2.12, just running nodetool is producing this output. Probably this is related to CASSANDRA-6273. it's unclear to me whether jamm is actually not being loaded, but clearly nodetool should not be having this output, which is likely from cassandra-env.sh [cassandra@dev-cass00 cassandra]$ /data2/cassandra/bin/nodetool ring xss = -ea -javaagent:/data2/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms14G -Xmx14G -Xmn1G -XX:+HeapDumpOnOutOfMemoryError -Xss256k Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 170141183460469231731687303715884105727 10.93.15.10 rack1 Up Normal 123.82 GB 20.00% 34028236692093846346337460743176821145 10.93.15.11 rack1 Up Normal 124 GB 20.00% 68056473384187692692674921486353642290 10.93.15.12 rack1 Up Normal 123.97 GB 20.00% 102084710076281539039012382229530463436 10.93.15.13 rack1 Up Normal 124.03 GB 20.00% 136112946768375385385349842972707284581 10.93.15.14 rack1 Up Normal 123.93 GB 20.00% 170141183460469231731687303715884105727 ERROR 16:20:01,408 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6433) snapshot race with compaction causes missing link error
[ https://issues.apache.org/jira/browse/CASSANDRA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837136#comment-13837136 ] Karl Mueller commented on CASSANDRA-6433: - Yes, very likely. snapshot race with compaction causes missing link error --- Key: CASSANDRA-6433 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433 Project: Cassandra Issue Type: Bug Environment: EL6 Oracle Java 1.7.40 Reporter: Karl Mueller Priority: Minor Cassandra 1.2.11 When trying to snapshot, I encountered this error. It appears that snapshot doesn't lock the sstable list in a keyspace which can cause a race condition with compaction. (I think it's compaction, at least) [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12 *** dev-cass01 (1) *** Nodetool command snapshot -t pre-1.2.12 failed! Output: Requested creating snapshot for: all keyspaces Exception in thread main java.lang.RuntimeException: Tried to hard link to file that does not exist /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72) at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612) at org.apache.cassandra.db.Table.snapshot(Table.java:194) at org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6435) nodetool outputs xss and jamm errors in 1.2.12
[ https://issues.apache.org/jira/browse/CASSANDRA-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837280#comment-13837280 ] Karl Mueller commented on CASSANDRA-6435: - If this helps: [root@dev-cass00 ~]# java -version java version 1.7.0_40 Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) nodetool outputs xss and jamm errors in 1.2.12 -- Key: CASSANDRA-6435 URL: https://issues.apache.org/jira/browse/CASSANDRA-6435 Project: Cassandra Issue Type: Bug Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Since 1.2.12, just running nodetool is producing this output. Probably this is related to CASSANDRA-6273. it's unclear to me whether jamm is actually not being loaded, but clearly nodetool should not be having this output, which is likely from cassandra-env.sh [cassandra@dev-cass00 cassandra]$ /data2/cassandra/bin/nodetool ring xss = -ea -javaagent:/data2/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms14G -Xmx14G -Xmn1G -XX:+HeapDumpOnOutOfMemoryError -Xss256k Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 == Address RackStatus State LoadOwns Token 170141183460469231731687303715884105727 10.93.15.10 rack1 Up Normal 123.82 GB 20.00% 34028236692093846346337460743176821145 10.93.15.11 rack1 Up Normal 124 GB 20.00% 68056473384187692692674921486353642290 10.93.15.12 rack1 Up Normal 123.97 GB 20.00% 102084710076281539039012382229530463436 10.93.15.13 rack1 Up Normal 124.03 GB 20.00% 136112946768375385385349842972707284581 10.93.15.14 rack1 Up Normal 123.93 GB 20.00% 170141183460469231731687303715884105727 ERROR 16:20:01,408 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin
[ https://issues.apache.org/jira/browse/CASSANDRA-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792400#comment-13792400 ] Karl Mueller commented on CASSANDRA-6092: - This doesn't make sense. You change a compaction strategy, it should start to take effect. You shouldn't have to do anything else. This is a bug, plain and simple. For one thing, the most likely users who want to use leveled compaction are people like me who compact nightly to get rid of old update rows. We're the most likely ones to have a single sstable. This is not a bizarre corner case, but basic functionality! Leveled Compaction after ALTER TABLE creates pending but does not actually begin Key: CASSANDRA-6092 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092 Project: Cassandra Issue Type: Bug Environment: Cassandra 1.2.10 Oracle Java 1.7.0_u40 RHEL6.4 Reporter: Karl Mueller Assignee: Daniel Meyer Running Cassandra 1.2.10. N=5, RF=3 On this Column Family (ProductGenomeDev/Node), it's been major compacted into a single, large sstable. There's no activity on the table at the time of the ALTER command. I changed it to Leveled Compaction with the command below. cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; Log entries confirm the change happened. [...]column_metadata={},compactionStrategyClass=class org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160} [...] nodetool compactionstats shows pending compactions, but there's no activity: pending tasks: 750 12 hours later, nothing has still happened, same number pending. The expectation would be that compactions would proceed immediately to convert everything to Leveled Compaction as soon as the ALTER TABLE command goes. I try a simple write into the CF, and then flush the nodes. This kicks off compaction on 3 nodes. (RF=3) cqlsh:ProductGenomeDev insert into Node (key, column1, value) values ('test123', 'test123', 'test123'); cqlsh:ProductGenomeDev select * from Node where key = 'test123'; key | column1 | value -+-+- test123 | test123 | test123 cqlsh:ProductGenomeDev delete from Node where key = 'test123'; After a flush on every node, now I see: [cassandra@dev-cass00 ~]$ cas exec nt compactionstats *** dev-cass00 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass04 (0) *** pending tasks: 752 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 341881 643290447928 bytes 0.53% Active compaction remaining time :n/a *** dev-cass01 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass02 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3374975141 642764512481 bytes 0.53% Active compaction remaining time :n/a *** dev-cass03 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3591320948 643017643573 bytes 0.56% Active compaction remaining time :n/a After inserting and deleting more columns, enough that all nodes have new data, and flushing, now compactions are proceeding on all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin
[ https://issues.apache.org/jira/browse/CASSANDRA-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791980#comment-13791980 ] Karl Mueller commented on CASSANDRA-6092: - I'll try this work-around. A lot easier than what I did by inserting, deleting, flushing data! Leveled Compaction after ALTER TABLE creates pending but does not actually begin Key: CASSANDRA-6092 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092 Project: Cassandra Issue Type: Bug Environment: Cassandra 1.2.10 Oracle Java 1.7.0_u40 RHEL6.4 Reporter: Karl Mueller Assignee: Daniel Meyer Running Cassandra 1.2.10. N=5, RF=3 On this Column Family (ProductGenomeDev/Node), it's been major compacted into a single, large sstable. There's no activity on the table at the time of the ALTER command. I changed it to Leveled Compaction with the command below. cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; Log entries confirm the change happened. [...]column_metadata={},compactionStrategyClass=class org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160} [...] nodetool compactionstats shows pending compactions, but there's no activity: pending tasks: 750 12 hours later, nothing has still happened, same number pending. The expectation would be that compactions would proceed immediately to convert everything to Leveled Compaction as soon as the ALTER TABLE command goes. I try a simple write into the CF, and then flush the nodes. This kicks off compaction on 3 nodes. (RF=3) cqlsh:ProductGenomeDev insert into Node (key, column1, value) values ('test123', 'test123', 'test123'); cqlsh:ProductGenomeDev select * from Node where key = 'test123'; key | column1 | value -+-+- test123 | test123 | test123 cqlsh:ProductGenomeDev delete from Node where key = 'test123'; After a flush on every node, now I see: [cassandra@dev-cass00 ~]$ cas exec nt compactionstats *** dev-cass00 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass04 (0) *** pending tasks: 752 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 341881 643290447928 bytes 0.53% Active compaction remaining time :n/a *** dev-cass01 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass02 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3374975141 642764512481 bytes 0.53% Active compaction remaining time :n/a *** dev-cass03 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3591320948 643017643573 bytes 0.56% Active compaction remaining time :n/a After inserting and deleting more columns, enough that all nodes have new data, and flushing, now compactions are proceeding on all nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin
Karl Mueller created CASSANDRA-6092: --- Summary: Leveled Compaction after ALTER TABLE creates pending but does not actually begin Key: CASSANDRA-6092 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092 Project: Cassandra Issue Type: Bug Environment: Cassandra 1.2.10 Oracle Java 1.7.0_u40 RHEL6.4 Reporter: Karl Mueller Running Cassandra 1.2.10. N=5, RF=3 On this Column Family (ProductGenomeDev/Node), it's been major compacted into a single, large sstable. There's no activity on the table at the time of the ALTER command. I changed it to Leveled Compaction with the command below. cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; Log entries confirm the change happened. [...]column_metadata={},compactionStrategyClass=class org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160} [...] nodetool compactionstats shows pending compactions, but there's no activity: pending tasks: 750 12 hours later, nothing has still happened, same number pending. The expectation would be that compactions would proceed immediately to convert everything to Leveled Compaction as soon as the ALTER TABLE command goes. I try a simple write into the CF, and then flush the nodes. This kicks off compaction on 3 nodes. (RF=3) cqlsh:ProductGenomeDev insert into Node (key, column1, value) values ('test123', 'test123', 'test123'); cqlsh:ProductGenomeDev select * from Node where key = 'test123'; key | column1 | value -+-+- test123 | test123 | test123 cqlsh:ProductGenomeDev delete from Node where key = 'test123'; After a flush on every node, now I see: [cassandra@dev-cass00 ~]$ cas exec nt compactionstats *** dev-cass00 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass04 (0) *** pending tasks: 752 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 341881 643290447928 bytes 0.53% Active compaction remaining time :n/a *** dev-cass01 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass02 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3374975141 642764512481 bytes 0.53% Active compaction remaining time :n/a *** dev-cass03 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3591320948 643017643573 bytes 0.56% Active compaction remaining time :n/a After inserting and deleting more columns, enough that all nodes have new data, and flushing, now compactions are proceeding on all nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin
[ https://issues.apache.org/jira/browse/CASSANDRA-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-6092: Since Version: 1.2.10 Leveled Compaction after ALTER TABLE creates pending but does not actually begin Key: CASSANDRA-6092 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092 Project: Cassandra Issue Type: Bug Environment: Cassandra 1.2.10 Oracle Java 1.7.0_u40 RHEL6.4 Reporter: Karl Mueller Running Cassandra 1.2.10. N=5, RF=3 On this Column Family (ProductGenomeDev/Node), it's been major compacted into a single, large sstable. There's no activity on the table at the time of the ALTER command. I changed it to Leveled Compaction with the command below. cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; Log entries confirm the change happened. [...]column_metadata={},compactionStrategyClass=class org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160} [...] nodetool compactionstats shows pending compactions, but there's no activity: pending tasks: 750 12 hours later, nothing has still happened, same number pending. The expectation would be that compactions would proceed immediately to convert everything to Leveled Compaction as soon as the ALTER TABLE command goes. I try a simple write into the CF, and then flush the nodes. This kicks off compaction on 3 nodes. (RF=3) cqlsh:ProductGenomeDev insert into Node (key, column1, value) values ('test123', 'test123', 'test123'); cqlsh:ProductGenomeDev select * from Node where key = 'test123'; key | column1 | value -+-+- test123 | test123 | test123 cqlsh:ProductGenomeDev delete from Node where key = 'test123'; After a flush on every node, now I see: [cassandra@dev-cass00 ~]$ cas exec nt compactionstats *** dev-cass00 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass04 (0) *** pending tasks: 752 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 341881 643290447928 bytes 0.53% Active compaction remaining time :n/a *** dev-cass01 (0) *** pending tasks: 750 Active compaction remaining time :n/a *** dev-cass02 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3374975141 642764512481 bytes 0.53% Active compaction remaining time :n/a *** dev-cass03 (0) *** pending tasks: 751 compaction typekeyspace column family completed total unit progress CompactionProductGenomeDevNode 3591320948 643017643573 bytes 0.56% Active compaction remaining time :n/a After inserting and deleting more columns, enough that all nodes have new data, and flushing, now compactions are proceeding on all nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5989) java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Karl Mueller created CASSANDRA-5989: --- Summary: java.lang.OutOfMemoryError: Requested array size exceeds VM limit Key: CASSANDRA-5989 URL: https://issues.apache.org/jira/browse/CASSANDRA-5989 Project: Cassandra Issue Type: Bug Environment: Cassandra 1.2.8 Oracle Java(TM) SE Runtime Environment (build 1.7.0_25-b15) RHEL6 Reporter: Karl Mueller This occurred in one of our nodes today. I don't have any helpful information on what is going on beforehand yet - logs don't have anything I could see that's tied for sure to it. A few things happened in the logs beforehand. A little bit of standard GC, a bunch of status-logger entries 10 minutes before the crash, and a few nodes going up and down on the gossip. ERROR [Thrift:7495] 2013-09-03 11:01:12,486 CassandraDaemon.java (line 192) Exception in thread Thread[Thrift:7495,5,main] java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.thrift.transport.TFramedTransport.write(TFramedTransport.java:146) at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:163) at org.apache.cassandra.thrift.TBinaryProtocol.writeBinary(TBinaryProtocol.java:69) at org.apache.cassandra.thrift.Column.write(Column.java:579) at org.apache.cassandra.thrift.CqlRow.write(CqlRow.java:439) at org.apache.cassandra.thrift.CqlResult.write(CqlResult.java:602) at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.write(Cassandra.java:37895) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:34) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5990) Hinted Handoff: java.lang.ArithmeticException: / by zero
Karl Mueller created CASSANDRA-5990: --- Summary: Hinted Handoff: java.lang.ArithmeticException: / by zero Key: CASSANDRA-5990 URL: https://issues.apache.org/jira/browse/CASSANDRA-5990 Project: Cassandra Issue Type: Bug Environment: cassandra 1.2.8 Oracle Java 1.7.0_25-b15 RHEL6 Reporter: Karl Mueller Priority: Minor This node was down for a few hours. When bringing it back up, I saw this error in the logs. I'm not sure if it's receiving or sending hinted hand-offs. INFO [HintedHandoff:1] 2013-09-09 14:41:04,020 HintedHandOffManager.java (line 292) Started hinted handoff for host: 42bba02f-3088-4be1-8cb2-748a6f15e15d with IP: /10.93.12.14 ERROR [HintedHandoff:1] 2013-09-09 14:41:04,024 CassandraDaemon.java (line 192) Exception in thread Thread[HintedHandoff:1,1,main] java.lang.ArithmeticException: / by zero at org.apache.cassandra.db.HintedHandOffManager.calculatePageSize(HintedHandOffManager.java:441) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:299) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:278) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:497) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5068) CLONE - Once a host has been hinted to, log messages for it repeat every 10 mins even if no hints are delivered
[ https://issues.apache.org/jira/browse/CASSANDRA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557574#comment-13557574 ] Karl Mueller commented on CASSANDRA-5068: - I'm also seeing this, running 1.1.8 :) CLONE - Once a host has been hinted to, log messages for it repeat every 10 mins even if no hints are delivered --- Key: CASSANDRA-5068 URL: https://issues.apache.org/jira/browse/CASSANDRA-5068 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.6, 1.2.0 Environment: cassandra 1.1.6 java 1.6.0_30 Reporter: Peter Haggerty Assignee: Brandon Williams Priority: Minor Labels: hinted, hintedhandoff, phantom We have 0 row hinted handoffs every 10 minutes like clockwork. This impacts our ability to monitor the cluster by adding persistent noise in the handoff metric. Previous mentions of this issue are here: http://www.mail-archive.com/user@cassandra.apache.org/msg25982.html The hinted handoffs can be scrubbed away with nodetool -h 127.0.0.1 scrub system HintsColumnFamily but they return after anywhere from a few minutes to multiple hours later. These started to appear after an upgrade to 1.1.6 and haven't gone away despite rolling cleanups, rolling restarts, multiple rounds of scrubbing, etc. A few things we've noticed about the handoffs: 1. The phantom handoff endpoint changes after a non-zero handoff comes through 2. Sometimes a non-zero handoff will be immediately followed by an off schedule phantom handoff to the endpoint the phantom had been using before 3. The sstable2json output seems to include multiple sub-sections for each handoff with the same deletedAt information. The phantom handoff endpoint changes after a non-zero handoff comes through: INFO [HintedHandoff:1] 2012-12-11 06:57:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1 INFO [HintedHandoff:1] 2012-12-11 07:07:35,092 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1 INFO [HintedHandoff:1] 2012-12-11 07:07:37,915 HintedHandOffManager.java (line 392) Finished hinted handoff of 1058 rows to endpoint /10.10.10.2 INFO [HintedHandoff:1] 2012-12-11 07:17:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2 INFO [HintedHandoff:1] 2012-12-11 07:27:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2 Sometimes a non-zero handoff will be immediately followed by an off schedule phantom handoff to the endpoint the phantom had been using before: INFO [HintedHandoff:1] 2012-12-12 21:47:39,335 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3 INFO [HintedHandoff:1] 2012-12-12 21:57:39,335 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3 INFO [HintedHandoff:1] 2012-12-12 22:07:43,319 HintedHandOffManager.java (line 392) Finished hinted handoff of 1416 rows to endpoint /10.10.10.4 INFO [HintedHandoff:1] 2012-12-12 22:07:43,320 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3 INFO [HintedHandoff:1] 2012-12-12 22:17:39,357 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4 INFO [HintedHandoff:1] 2012-12-12 22:27:39,337 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4 The first few entries from one of the json files: { 0aaa: { ccf5dc203a2211e2e154da71a9bb: { deletedAt: -9223372036854775808, subColumns: [] }, ccf603303a2211e2e154da71a9bb: { deletedAt: -9223372036854775808, subColumns: [] }, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5088) Major compaction IOException in 1.1.8
Karl Mueller created CASSANDRA-5088: --- Summary: Major compaction IOException in 1.1.8 Key: CASSANDRA-5088 URL: https://issues.apache.org/jira/browse/CASSANDRA-5088 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.8 Reporter: Karl Mueller Upgraded 1.1.6 to 1.1.8. Now I'm trying to do a major compaction, and seeing this: ERROR [CompactionExecutor:129] 2012-12-22 10:33:44,217 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[CompactionExecutor:129,1,RMI Runtime] java.io.IOError: java.io.IOException: Bad file descriptor at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:65) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:195) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:298) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Bad file descriptor at sun.nio.ch.FileDispatcher.preClose0(Native Method) at sun.nio.ch.FileDispatcher.preClose(FileDispatcher.java:59) at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:96) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97) at java.io.FileInputStream.close(FileInputStream.java:258) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:131) at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:121) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97) at java.io.RandomAccessFile.close(RandomAccessFile.java:541) at org.apache.cassandra.io.util.RandomAccessReader.close(RandomAccessReader.java:224) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:130) at org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:89) at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:61) ... 9 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5088) Major compaction IOException in 1.1.8
[ https://issues.apache.org/jira/browse/CASSANDRA-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538905#comment-13538905 ] Karl Mueller commented on CASSANDRA-5088: - What would be helpful to debug? The same environment worked fine for 1.1.6 - I didn't try 1.1.7 I'm running a bit old of a JDK - I could try upgrading it. Major compaction IOException in 1.1.8 - Key: CASSANDRA-5088 URL: https://issues.apache.org/jira/browse/CASSANDRA-5088 Project: Cassandra Issue Type: Bug Affects Versions: 1.1.8 Reporter: Karl Mueller Upgraded 1.1.6 to 1.1.8. Now I'm trying to do a major compaction, and seeing this: ERROR [CompactionExecutor:129] 2012-12-22 10:33:44,217 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[CompactionExecutor:129,1,RMI Runtime] java.io.IOError: java.io.IOException: Bad file descriptor at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:65) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:195) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:298) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Bad file descriptor at sun.nio.ch.FileDispatcher.preClose0(Native Method) at sun.nio.ch.FileDispatcher.preClose(FileDispatcher.java:59) at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:96) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97) at java.io.FileInputStream.close(FileInputStream.java:258) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:131) at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:121) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97) at java.io.RandomAccessFile.close(RandomAccessFile.java:541) at org.apache.cassandra.io.util.RandomAccessReader.close(RandomAccessReader.java:224) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:130) at org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:89) at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:61) ... 9 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4802) Regular startup log has confusing Bootstrap/Replace/Move completed! without boostrap, replace, or move
[ https://issues.apache.org/jira/browse/CASSANDRA-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476509#comment-13476509 ] Karl Mueller commented on CASSANDRA-4802: - Bootstrap means something specifically with cassandra in that you think some data has streamed in. I think Startup completed would be great. If there IS a bootstrap/replace/move then I think the message ought to specify which has happened and that it's ready now (if it's easy to do) :) Regular startup log has confusing Bootstrap/Replace/Move completed! without boostrap, replace, or move Key: CASSANDRA-4802 URL: https://issues.apache.org/jira/browse/CASSANDRA-4802 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.12 Environment: RHEL6, JDK1.6 Reporter: Karl Mueller Assignee: Vijay Priority: Trivial A regular startup completes successfully, but it has a confusing message the end of the startup: INFO 15:19:29,137 Bootstrap/Replace/Move completed! Now serving reads. This happens despite no bootstrap, replace, or move. While purely cosmetic, this makes you wonder what the node just did - did it just bootstrap?! It should simply read something like Startup completed! Now serving reads unless it actually has done one of the actions in the error message. Complete log at the end: INFO 15:13:30,522 Log replay complete, 6274 replayed mutations INFO 15:13:30,527 Cassandra version: 1.0.12 INFO 15:13:30,527 Thrift API version: 19.20.0 INFO 15:13:30,527 Loading persisted ring state INFO 15:13:30,541 Starting up server gossip INFO 15:13:30,542 Enqueuing flush of Memtable-LocationInfo@1828864224(29/36 serialized/live bytes, 1 ops) INFO 15:13:30,543 Writing Memtable-LocationInfo@1828864224(29/36 serialized/live bytes, 1 ops) INFO 15:13:30,550 Completed flushing /data2/data-cassandra/system/LocationInfo-hd-274-Data.db (80 bytes) INFO 15:13:30,563 Starting Messaging Service on port 7000 INFO 15:13:30,571 Using saved token 31901471898837980949691369446728269823 INFO 15:13:30,572 Enqueuing flush of Memtable-LocationInfo@294410307(53/66 serialized/live bytes, 2 ops) INFO 15:13:30,573 Writing Memtable-LocationInfo@294410307(53/66 serialized/live bytes, 2 ops) INFO 15:13:30,579 Completed flushing /data2/data-cassandra/system/LocationInfo-hd-275-Data.db (163 bytes) INFO 15:13:30,581 Node kaos-cass02.xxx/1.2.3.4 state jump to normal INFO 15:13:30,598 Bootstrap/Replace/Move completed! Now serving reads. INFO 15:13:30,600 Will not load MX4J, mx4j-tools.jar is not in the classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-4802) Regular startup log has confusing Bootstrap/Replace/Move completed! without boostrap, replace, or move
Karl Mueller created CASSANDRA-4802: --- Summary: Regular startup log has confusing Bootstrap/Replace/Move completed! without boostrap, replace, or move Key: CASSANDRA-4802 URL: https://issues.apache.org/jira/browse/CASSANDRA-4802 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.12 Environment: RHEL6, JDK1.6 Reporter: Karl Mueller Priority: Trivial A regular startup completes successfully, but it has a confusing message the end of the startup: INFO 15:19:29,137 Bootstrap/Replace/Move completed! Now serving reads. This happens despite no bootstrap, replace, or move. While purely cosmetic, this makes you wonder what the node just did - did it just bootstrap?! It should simply read something like Startup completed! Now serving reads unless it actually has done one of the actions in the error message. Complete log at the end: INFO 15:13:30,522 Log replay complete, 6274 replayed mutations INFO 15:13:30,527 Cassandra version: 1.0.12 INFO 15:13:30,527 Thrift API version: 19.20.0 INFO 15:13:30,527 Loading persisted ring state INFO 15:13:30,541 Starting up server gossip INFO 15:13:30,542 Enqueuing flush of Memtable-LocationInfo@1828864224(29/36 serialized/live bytes, 1 ops) INFO 15:13:30,543 Writing Memtable-LocationInfo@1828864224(29/36 serialized/live bytes, 1 ops) INFO 15:13:30,550 Completed flushing /data2/data-cassandra/system/LocationInfo-hd-274-Data.db (80 bytes) INFO 15:13:30,563 Starting Messaging Service on port 7000 INFO 15:13:30,571 Using saved token 31901471898837980949691369446728269823 INFO 15:13:30,572 Enqueuing flush of Memtable-LocationInfo@294410307(53/66 serialized/live bytes, 2 ops) INFO 15:13:30,573 Writing Memtable-LocationInfo@294410307(53/66 serialized/live bytes, 2 ops) INFO 15:13:30,579 Completed flushing /data2/data-cassandra/system/LocationInfo-hd-275-Data.db (163 bytes) INFO 15:13:30,581 Node kaos-cass02.xxx/1.2.3.4 state jump to normal INFO 15:13:30,598 Bootstrap/Replace/Move completed! Now serving reads. INFO 15:13:30,600 Will not load MX4J, mx4j-tools.jar is not in the classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4446) nodetool drain sometimes doesn't mark commitlog fully flushed
[ https://issues.apache.org/jira/browse/CASSANDRA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471960#comment-13471960 ] Karl Mueller commented on CASSANDRA-4446: - Also seeing this in an upgrade from 1.0.xx to 1.1.15: INFO 16:29:17,486 completed pre-loading (3 keys) key cache. INFO 16:29:17,495 Replaying /data2/commit-cassandra/CommitLog-1349727956484.log INFO 16:29:17,503 Replaying /data2/commit-cassandra/CommitLog-1349727956484.log INFO 16:29:18,495 GC for ParNew: 3506 ms for 4 collections, 1963062320 used; max is 17095983104 INFO 16:29:18,498 Finished reading /data2/commit-cassandra/CommitLog-1349727956484.log INFO 16:29:18,499 Log replay complete, 0 replayed mutations This is a standard upgrade process which includes a drain nodetool drain sometimes doesn't mark commitlog fully flushed - Key: CASSANDRA-4446 URL: https://issues.apache.org/jira/browse/CASSANDRA-4446 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: ubuntu 10.04 64bit Linux HOSTNAME 2.6.32-345-ec2 #48-Ubuntu SMP Wed May 2 19:29:55 UTC 2012 x86_64 GNU/Linux sun JVM cassandra 1.0.10 installed from apache deb Reporter: Robert Coli Attachments: cassandra.1.0.10.replaying.log.after.exception.during.drain.txt I recently wiped a customer's QA cluster. I drained each node and verified that they were drained. When I restarted the nodes, I saw the commitlog replay create a memtable and then flush it. I have attached a sanitized log snippet from a representative node at the time. It appears to show the following : 1) Drain begins 2) Drain triggers flush 3) Flush triggers compaction 4) StorageService logs DRAINED message 5) compaction thread excepts 6) on restart, same CF creates a memtable 7) and then flushes it [1] The columnfamily involved in the replay in 7) is the CF for which the compaction thread excepted in 5). This seems to suggest a timing issue whereby the exception in 5) prevents the flush in 3) from marking all the segments flushed, causing them to replay after restart. In case it might be relevant, I did an online change of compaction strategy from Leveled to SizeTiered during the uptime period preceding this drain. [1] Isn't commitlog replay not supposed to automatically trigger a flush in modern cassandra? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401681#comment-13401681 ] Karl Mueller commented on CASSANDRA-4347: - Brandon, Since you can reproduce, do you still want the logs? I think I still have them if needed. IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Attachments: LocationInfo-hd-279-Data.db, dev-cass-post-assassinate-gossipinfo.txt, kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401685#comment-13401685 ] Karl Mueller commented on CASSANDRA-4347: - Mine log's more special! ;) just kidding. My opinion on the urgency of the bug would depend on how long 1.0.x will be around. It's sort of an annoying yet in-your-face type of bug that doesn't really seem to have a problem beyond creating a lot of bad log entries. Yet I could see people running into it, and then having to find the workaround. Perhaps in the interim some type of log message could simply be added about maybe trying assassinate? It should be easy to see Oh there's two IPs for this one token. Is one old? IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Attachments: LocationInfo-hd-279-Data.db, dev-cass-post-assassinate-gossipinfo.txt, kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401700#comment-13401700 ] Karl Mueller commented on CASSANDRA-4347: - Actually, this morning I started to see the same messages, approximately 3 days later.. Related to https://issues.apache.org/jira/browse/CASSANDRA-2961 somehow? Some people on IRC thought so, maybe. Assassinate is NOT removing them successfully, anymore. IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Attachments: LocationInfo-hd-279-Data.db, dev-cass-post-assassinate-gossipinfo.txt, kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-4347: Attachment: LocationInfo-hd-279-Data.db LocationInfo file attached from after node is re-IP'd and rejoins the cluster. This is in the problem state. I also have system snapshots of before the move and after the assassinate, as well as a node that isn't moving (same snapshots) if you want them. IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Attachments: LocationInfo-hd-279-Data.db, dev-cass-post-assassinate-gossipinfo.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-4347: Attachment: kaos-cass03-gossipinfo-postmove.txt kaos-cass00-gossipinfo-postmove.txt This is the gossipinfo from two points of view. both postmove.txt files are after the node has changed IPs. kaos-cass00 is the node which moved IPs. The old IP is 10.12.8.97. The new IP is 10.93.12.10. kaos-cass03 is a node which did not move. It's IP, if needed, is 10.12.8.87 I also have the gossipinfo from after the assassinate if needed. IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Attachments: LocationInfo-hd-279-Data.db, dev-cass-post-assassinate-gossipinfo.txt, kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396215#comment-13396215 ] Karl Mueller commented on CASSANDRA-4347: - You mean, before I did the assassinate? All of the nodes at this point are post-assassinate. I'm attaching the gossipinfo from the 3-node cluster in the current state which is showing some old IPs. (I thought assassinate went cross-cluster?) I'm moving another cluster this week, and I'll try to grab a gossipinfo and the system tables during transition from that set. I expect it will have the same issues. IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Priority: Minor In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-4347: Attachment: dev-cass-post-assassinate-gossipinfo.txt This file contains gossipinfo from the 3-node cluster we already moved, after assassinate has run on each node for its own old IP. The new IPs are all 10.93.15.xx and the old IPs are all 10.12.x.x. The old IPs are as follows: dev-cass00 - 10.12.9.160 dev-cass01 - 10.12.9.157 dev-cass02 - 10.12.9.33 I believe dev-cass00 has restarted since the assinate, but the others haven't. New IPs are: dev-cass00 - 10.93.15.10 dev-cass01 - 10.93.15.11 dev-cass02 - 10.93.15.12 IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Priority: Minor Attachments: dev-cass-post-assassinate-gossipinfo.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
[ https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396251#comment-13396251 ] Karl Mueller commented on CASSANDRA-4347: - OK, I'll grab one this week when we do the move. I assume you want the LocationInfo CF, or do you want the entire system keyspace? IP change of node requires assassinate to really remove old IP -- Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Assignee: Brandon Williams Priority: Minor Attachments: dev-cass-post-assassinate-gossipinfo.txt In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP
Karl Mueller created CASSANDRA-4347: --- Summary: IP change of node requires assassinate to really remove old IP Key: CASSANDRA-4347 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.10 Environment: RHEL6, 64bit Reporter: Karl Mueller Priority: Minor In changing the IP addresses of nodes one-by-one, the node successfully moves itself and its token. Everything works properly. However, the node which had its IP changed (but NOT other nodes in the ring) continues to have some type of state associated with the old IP and produces log messages like this: INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) InetAddress /10.12.9.157 is now UP INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same token 113427455640312821154458202477256070484. Ignoring /10.12.9.157 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) InetAddress /10.12.9.157 is now dead. INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) FatClient /10.12.9.157 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node /10.12.9.157 is now part of the cluster Other nodes do NOT have the old IP showing up in logs. It's only the node that moved. The old IP doesn't show up in ring anywhere or in any other fashion. The cluster seems to be fully operational, so I think it's just a cleanup issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files
[ https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263492#comment-13263492 ] Karl Mueller commented on CASSANDRA-4182: - Yes it maxes one CPU. That's what one CPU can do. multithreaded compaction very slow with large single data file and a few tiny data files Key: CASSANDRA-4182 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.9 Environment: Redhat Sun JDK 1.6.0_20-b02 Reporter: Karl Mueller Turning on multithreaded compaction makes compaction time take nearly twice as long in our environment, which includes a very large SStable and a few smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT turned off. compaction_throughput_mb_per_sec is set to 0. We currently compact about 500 GB of data nightly due to overwrites. (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out completely) The time it takes to do the compaction is: 451m13.284s (multithreaded) 273m58.740s (multihtreaded disabled) Our nodes run on SSDs and therefore have a high read and write rate available to them. The primary CF they're compacting right now, with most of the data, is localized to a very large file (~300+GB) and a few tiny files (1-10GB) since the CF has become far less active. I would expect the multithreaded compaction to be no worse than the single threaded compaction, or perhaps a higher cost in CPU for the same performance, but it's half the speed with the same CPU usage, or more CPU. I have two graphs available from testing 2 or 3 compactions which demonstrate some interesting characteristics. 1.0.9 was installed on the 21st with MT turned on. Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned off seems to perform as well as 0.8.7. http://www.xney.com/temp/cass-irq.png (interrupts) http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks) This demonstrates a large increase in rescheduling interrupts and only half the bandwidth used on the disks. I suspect this is because some kind of threads are thrashing or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files
[ https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263907#comment-13263907 ] Karl Mueller commented on CASSANDRA-4182: - Yes I figure it's a worst-case scenario pretty much. I didn't expect it to be any faster than single-threaded, possibly a bit slower or taking more CPU. However, it's a LOT slower (~80% slower). I'd be happy if it were the same speed as the single thread for the worst case with more CPU. multithreaded compaction very slow with large single data file and a few tiny data files Key: CASSANDRA-4182 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.9 Environment: Redhat Sun JDK 1.6.0_20-b02 Reporter: Karl Mueller Turning on multithreaded compaction makes compaction time take nearly twice as long in our environment, which includes a very large SStable and a few smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT turned off. compaction_throughput_mb_per_sec is set to 0. We currently compact about 500 GB of data nightly due to overwrites. (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out completely) The time it takes to do the compaction is: 451m13.284s (multithreaded) 273m58.740s (multihtreaded disabled) Our nodes run on SSDs and therefore have a high read and write rate available to them. The primary CF they're compacting right now, with most of the data, is localized to a very large file (~300+GB) and a few tiny files (1-10GB) since the CF has become far less active. I would expect the multithreaded compaction to be no worse than the single threaded compaction, or perhaps a higher cost in CPU for the same performance, but it's half the speed with the same CPU usage, or more CPU. I have two graphs available from testing 2 or 3 compactions which demonstrate some interesting characteristics. 1.0.9 was installed on the 21st with MT turned on. Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned off seems to perform as well as 0.8.7. http://www.xney.com/temp/cass-irq.png (interrupts) http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks) This demonstrates a large increase in rescheduling interrupts and only half the bandwidth used on the disks. I suspect this is because some kind of threads are thrashing or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files
[ https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-4182: Priority: Major (was: Minor) Changing the priority on this to Major since it's actually a significant problem. The workaround is to not use multi-threaded compaction, but this will impact a mixed deployment of classic compactions and the new leveldb. If I'm wrong about it, feel free to change it back of course. :) multithreaded compaction very slow with large single data file and a few tiny data files Key: CASSANDRA-4182 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.9 Environment: Redhat Sun JDK 1.6.0_20-b02 Reporter: Karl Mueller Turning on multithreaded compaction makes compaction time take nearly twice as long in our environment, which includes a very large SStable and a few smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT turned off. compaction_throughput_mb_per_sec is set to 0. We currently compact about 500 GB of data nightly due to overwrites. (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out completely) The time it takes to do the compaction is: 451m13.284s (multithreaded) 273m58.740s (multihtreaded disabled) Our nodes run on SSDs and therefore have a high read and write rate available to them. The primary CF they're compacting right now, with most of the data, is localized to a very large file (~300+GB) and a few tiny files (1-10GB) since the CF has become far less active. I would expect the multithreaded compaction to be no worse than the single threaded compaction, or perhaps a higher cost in CPU for the same performance, but it's half the speed with the same CPU usage, or more CPU. I have two graphs available from testing 2 or 3 compactions which demonstrate some interesting characteristics. 1.0.9 was installed on the 21st with MT turned on. Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned off seems to perform as well as 0.8.7. http://www.xney.com/temp/cass-irq.png (interrupts) http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks) This demonstrates a large increase in rescheduling interrupts and only half the bandwidth used on the disks. I suspect this is because some kind of threads are thrashing or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files
Karl Mueller created CASSANDRA-4182: --- Summary: multithreaded compaction very slow with large single data file and a few tiny data files Key: CASSANDRA-4182 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.9 Environment: Redhat Sun JDK 1.6.0_20-b02 Reporter: Karl Mueller Priority: Minor Turning on multithreaded compaction makes compaction time take nearly twice as long in our environment, which includes a very large SStable and a few smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT turned off. compaction_throughput_mb_per_sec is set to 0. We currently compact about 500 GB of data nightly due to overwrites. (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out completely) The time it takes to do the compaction is: 451m13.284s (multithreaded) 273m58.740s (multihtreaded disabled) Our nodes run on SSDs and therefore have a high read and write rate available to them. The primary CF they're compacting right now, with most of the data, is localized to a very large file (~300+GB) and a few tiny files (1-10GB) since the CF has become far less active. I would expect the multithreaded compaction to be no worse than the single threaded compaction, or perhaps a higher cost in CPU for the same performance, but it's half the speed with the same CPU usage, or more CPU. I have two graphs available from testing 2 or 3 compactions which demonstrate some interesting characteristics. 1.0.9 was installed on the 21st with MT turned on. Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned off seems to perform as well as 0.8.7. http://www.xney.com/temp/cass-irq.png (interrupts) http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks) This demonstrates a large increase in rescheduling interrupts and only half the bandwidth used on the disks. I suspect this is because some kind of threads are thrashing or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2353) JMX call StorageService.Operations.getNaturalEndpoints returns an NPE
[ https://issues.apache.org/jira/browse/CASSANDRA-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008666#comment-13008666 ] Karl Mueller commented on CASSANDRA-2353: - I don't have 0.7.4 in my environment.. but when I upgrade to .4 or .5 I'll post it. I think driftx on irc had a newer version. JMX call StorageService.Operations.getNaturalEndpoints returns an NPE - Key: CASSANDRA-2353 URL: https://issues.apache.org/jira/browse/CASSANDRA-2353 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 0.7.0 Reporter: Karl Mueller Priority: Minor Fix For: 0.7.5 The JMX operation StorageService.Operations.getNaturalEndpoints in cassandra.db always returns an NPE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2353) JMX call StorageService.Operations.getNaturalEndpoints returns an NPE
JMX call StorageService.Operations.getNaturalEndpoints returns an NPE - Key: CASSANDRA-2353 URL: https://issues.apache.org/jira/browse/CASSANDRA-2353 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 0.7.0 Reporter: Karl Mueller Priority: Minor The JMX operation StorageService.Operations.getNaturalEndpoints in cassandra.db always returns an NPE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
[ https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983513#action_12983513 ] Karl Mueller commented on CASSANDRA-1932: - Yes, I agree it was user error -karl NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) - Key: CASSANDRA-1932 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.1 Reporter: Karl Mueller Assignee: Ryan King Fix For: 0.7.1 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
[ https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977180#action_12977180 ] Karl Mueller commented on CASSANDRA-1932: - Not sure. I have all of the -f-*.db files from running it saved, but had to remove them from the active Cassandra cluster. (I ran the 0.7.1 branch by mistakes as i hadn't realized there was a 0.7.0 branch) What would be helpful? NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) - Key: CASSANDRA-1932 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.1 Reporter: Karl Mueller Assignee: Ryan King Fix For: 0.7.1 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
[ https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977553#action_12977553 ] Karl Mueller commented on CASSANDRA-1932: - They were all -f- versions. -e- versions worked fine. NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) - Key: CASSANDRA-1932 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.1 Reporter: Karl Mueller Assignee: Ryan King Fix For: 0.7.1 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1931) Internal error processing insert java.lang.AssertionError at org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:219)
Internal error processing insert java.lang.AssertionError at org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:219) --- Key: CASSANDRA-1931 URL: https://issues.apache.org/jira/browse/CASSANDRA-1931 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.1 Environment: Linux Fedora 12 x86_64 Reporter: Karl Mueller ERROR [pool-1-thread-137] 2011-01-03 18:22:21,751 Cassandra.java (line 2960) Internal error processing insert java.lang.AssertionError at org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:219) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:174) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:412) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:349) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2952) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) - Key: CASSANDRA-1932 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.1 Reporter: Karl Mueller ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
[ https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977136#action_12977136 ] Karl Mueller commented on CASSANDRA-1932: - I do have a very large # of rows (150-200MM on many nodes) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) - Key: CASSANDRA-1932 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.1 Reporter: Karl Mueller Assignee: Ryan King Fix For: 0.7.1 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1883) NPE in get_slice quorum read
NPE in get_slice quorum read Key: CASSANDRA-1883 URL: https://issues.apache.org/jira/browse/CASSANDRA-1883 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.0 rc 2 Environment: Linux Fedora 12 x86_64 Reporter: Karl Mueller Priority: Minor Getting this NPE as of the 2010-12-17 0.7 trunk. Some data may be corrupt somewhere on a node. It could be a null key somewhere. ERROR [pool-1-thread-28] 2010-12-18 12:53:20,411 Cassandra.java (line 2707) Internal error processing get_slice java.lang.NullPointerException at org.apache.cassandra.service.DigestMismatchException.init(DigestMismatchException.java:30) at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:92) at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:43) at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:91) at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:362) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229) at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128) at org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:225) at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:301) at org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:263) at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2699) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-1883) NPE in get_slice quorum read
[ https://issues.apache.org/jira/browse/CASSANDRA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972872#action_12972872 ] Karl Mueller commented on CASSANDRA-1883: - Additional information: one of the SSD raid0s went bad recently. This may have produced weird data for one cassandra node. NPE in get_slice quorum read Key: CASSANDRA-1883 URL: https://issues.apache.org/jira/browse/CASSANDRA-1883 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.0 Environment: Linux Fedora 12 x86_64 Reporter: Karl Mueller Priority: Minor Fix For: 0.7.0 Getting this NPE as of the 2010-12-17 0.7 trunk. Some data may be corrupt somewhere on a node. It could be a null key somewhere. ERROR [pool-1-thread-28] 2010-12-18 12:53:20,411 Cassandra.java (line 2707) Internal error processing get_slice java.lang.NullPointerException at org.apache.cassandra.service.DigestMismatchException.init(DigestMismatchException.java:30) at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:92) at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:43) at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:91) at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:362) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229) at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128) at org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:225) at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:301) at org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:263) at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2699) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CASSANDRA-1884) sstable2json / sstablekeys should verify key order in -Data and -Index files
sstable2json / sstablekeys should verify key order in -Data and -Index files Key: CASSANDRA-1884 URL: https://issues.apache.org/jira/browse/CASSANDRA-1884 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.7.0 Reporter: Karl Mueller Priority: Minor Some cassandra users use sstable2json and sstablekeys to check -Data and -Index files for integrity. For example, if compaction fails, you can find out which files are causing the compaction to fail because they're corrupt. One type of corruption that can happen in both -Data and -Index files are keys getting out of order. (This shouldn't happen, but it can) Cassandra catches this error during compaction, but both tools didn't catch it. This small patch simply causes an IO Exception during export if it finds out of order keys. Some further work on it may make this optional with a command-line switch - it may be useful to export the data to json even though it's out of order to manually play it back, or have another script re-order it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1883) NPE in get_slice quorum read
[ https://issues.apache.org/jira/browse/CASSANDRA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Mueller updated CASSANDRA-1883: Attachment: digestmismatch-debug.txt this is a debug output from a node with this NPE happening around the same time. If you need more from the log, I have the rest of it available NPE in get_slice quorum read Key: CASSANDRA-1883 URL: https://issues.apache.org/jira/browse/CASSANDRA-1883 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.0 Environment: Linux Fedora 12 x86_64 Reporter: Karl Mueller Priority: Minor Fix For: 0.7.0 Attachments: digestmismatch-debug.txt Getting this NPE as of the 2010-12-17 0.7 trunk. Some data may be corrupt somewhere on a node. It could be a null key somewhere. ERROR [pool-1-thread-28] 2010-12-18 12:53:20,411 Cassandra.java (line 2707) Internal error processing get_slice java.lang.NullPointerException at org.apache.cassandra.service.DigestMismatchException.init(DigestMismatchException.java:30) at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:92) at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:43) at org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:91) at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:362) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229) at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128) at org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:225) at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:301) at org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:263) at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2699) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.