[jira] [Commented] (CASSANDRA-8719) Using thrift HSHA with offheap_objects appears to corrupt data

2015-03-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344007#comment-14344007
 ] 

Karl Mueller commented on CASSANDRA-8719:
-

Is it possible to have this issue on 2.0.10? 


 Using thrift HSHA with offheap_objects appears to corrupt data
 --

 Key: CASSANDRA-8719
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8719
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Randy Fradin
Assignee: Benedict
 Fix For: 2.1.3

 Attachments: 8719.txt, repro8719.sh


 Copying my comment from CASSANDRA-6285 to a new issue since that issue is 
 long closed and I'm not sure if they are related...
 I am getting this exception using Thrift HSHA in 2.1.0:
 {quote}
  INFO [CompactionExecutor:8] 2015-01-26 13:32:51,818 CompactionTask.java 
 (line 138) Compacting 
 [SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-2-Data.db'),
  
 SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-1-Data.db')]
  INFO [CompactionExecutor:8] 2015-01-26 13:32:51,890 ColumnFamilyStore.java 
 (line 856) Enqueuing flush of compactions_in_progress: 212 (0%) on-heap, 20 
 (0%) off-heap
  INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,892 Memtable.java (line 
 326) Writing Memtable-compactions_in_progress@1155018639(0 serialized bytes, 
 1 ops, 0%/0% of on/off-heap limit)
  INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,896 Memtable.java (line 
 360) Completed flushing 
 /tmp/cass_test/cassandra/TestCassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-2-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1422296630707, 
 position=430226)
 ERROR [CompactionExecutor:8] 2015-01-26 13:32:51,906 CassandraDaemon.java 
 (line 166) Exception in thread Thread[CompactionExecutor:8,1,RMI Runtime]
 java.lang.RuntimeException: Last written key 
 DecoratedKey(131206587314004820534098544948237170809, 
 80010001000c62617463685f6d757461746500) = current key 
 DecoratedKey(14775611966645399672119169777260659240, 
 726f776b65793030385f31343232323937313537353835) writing into 
 /tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-tmp-ka-3-Data.db
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:196) 
 ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:110)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:177)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
 ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_40]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_40]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_40]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_40]
 at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
 {quote}
 I don't think it's caused by CASSANDRA-8211, because it happens during the 
 first compaction that takes place between the first 2 SSTables to get flushed 
 from an initially empty column family.
 Also, I've only been able to reproduce it when using both *hsha* for the rpc 
 server and *offheap_objects* for memtable allocation. If I switch either to 
 sync or to offheap_buffers or heap_buffers then I cannot reproduce the 
 problem. Also under the same circumstances I'm pretty sure I've seen 
 incorrect data being returned to a client multiget_slice request before any 
 SSTables had been flushed yet, so I presume 

[jira] [Commented] (CASSANDRA-8719) Using thrift HSHA with offheap_objects appears to corrupt data

2015-03-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344077#comment-14344077
 ] 

Karl Mueller commented on CASSANDRA-8719:
-

OK, thanks. I am seeing corruption in 2.0.10, but I'm not sure yet whether it's 
in cassandra or outside cassandra yet.

 Using thrift HSHA with offheap_objects appears to corrupt data
 --

 Key: CASSANDRA-8719
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8719
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Randy Fradin
Assignee: Benedict
 Fix For: 2.1.3

 Attachments: 8719.txt, repro8719.sh


 Copying my comment from CASSANDRA-6285 to a new issue since that issue is 
 long closed and I'm not sure if they are related...
 I am getting this exception using Thrift HSHA in 2.1.0:
 {quote}
  INFO [CompactionExecutor:8] 2015-01-26 13:32:51,818 CompactionTask.java 
 (line 138) Compacting 
 [SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-2-Data.db'),
  
 SSTableReader(path='/tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-ka-1-Data.db')]
  INFO [CompactionExecutor:8] 2015-01-26 13:32:51,890 ColumnFamilyStore.java 
 (line 856) Enqueuing flush of compactions_in_progress: 212 (0%) on-heap, 20 
 (0%) off-heap
  INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,892 Memtable.java (line 
 326) Writing Memtable-compactions_in_progress@1155018639(0 serialized bytes, 
 1 ops, 0%/0% of on/off-heap limit)
  INFO [MemtableFlushWriter:8] 2015-01-26 13:32:51,896 Memtable.java (line 
 360) Completed flushing 
 /tmp/cass_test/cassandra/TestCassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-2-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1422296630707, 
 position=430226)
 ERROR [CompactionExecutor:8] 2015-01-26 13:32:51,906 CassandraDaemon.java 
 (line 166) Exception in thread Thread[CompactionExecutor:8,1,RMI Runtime]
 java.lang.RuntimeException: Last written key 
 DecoratedKey(131206587314004820534098544948237170809, 
 80010001000c62617463685f6d757461746500) = current key 
 DecoratedKey(14775611966645399672119169777260659240, 
 726f776b65793030385f31343232323937313537353835) writing into 
 /tmp/cass_test/cassandra/TestCassandra/data/test_ks/test_cf-1c45da40a58911e4826751fbbc77b187/test_ks-test_cf-tmp-ka-3-Data.db
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:196) 
 ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:110)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:177)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
 ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_40]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_40]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_40]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_40]
 at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
 {quote}
 I don't think it's caused by CASSANDRA-8211, because it happens during the 
 first compaction that takes place between the first 2 SSTables to get flushed 
 from an initially empty column family.
 Also, I've only been able to reproduce it when using both *hsha* for the rpc 
 server and *offheap_objects* for memtable allocation. If I switch either to 
 sync or to offheap_buffers or heap_buffers then I cannot reproduce the 
 problem. Also under the same circumstances I'm pretty sure I've seen 
 incorrect data being returned to a client 

[jira] [Created] (CASSANDRA-8330) Confusing Message: ConfigurationException: Found system keyspace files, but they couldn't be loaded!

2014-11-17 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-8330:
---

 Summary: Confusing Message: ConfigurationException: Found system 
keyspace files, but they couldn't be loaded!
 Key: CASSANDRA-8330
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8330
 Project: Cassandra
  Issue Type: Bug
 Environment: cassandra 2.0.10
Reporter: Karl Mueller
Priority: Minor


I restarted a node which was not responding to cqlsh. It produced this error:

 INFO [SSTableBatchOpen:3] 2014-11-17 16:36:50,388 SSTableReader.java (line 
223) Opening /data2/data-cassandra/system/local/system-local-jb-304 (133 bytes)
 INFO [SSTableBatchOpen:2] 2014-11-17 16:36:50,388 SSTableReader.java (line 
223) Opening /data2/data-cassandra/system/local/system-local-jb-305 (80 bytes)
 INFO [main] 2014-11-17 16:36:50,393 AutoSavingCache.java (line 114) reading 
saved cache /data2/cache-cassandra/system-local-KeyCache-b.db
ERROR [main] 2014-11-17 16:36:50,543 CassandraDaemon.java (line 265) Fatal 
exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Found system keyspace 
files, but they couldn't be loaded!
at 
org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:554)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)

After deleting the cache, I still got this error:

 INFO 16:41:43,718 Opening 
/data2/data-cassandra/system/local/system-local-jb-304 (133 bytes)
 INFO 16:41:43,718 Opening 
/data2/data-cassandra/system/local/system-local-jb-305 (80 bytes)
ERROR 16:41:43,877 Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Found system keyspace 
files, but they couldn't be loaded!
at 
org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:554)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)



I think possibly the node had corrupted one of the files due to it being in a 
bad state. This would be impossible to replicate, so I don't think the actual 
bug is that helpful.

What I did find very confusing was the error message. There's nothing to 
indicate what the problem is! Is it a corrupt file? A valid file with bad 
information in it? Referencing something that doesn't exist?! 

I fixed it by deleting the system keyspace and starting it with its token, but 
many people wouldn't know to do that at all.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair

2014-10-24 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183674#comment-14183674
 ] 

Karl Mueller commented on CASSANDRA-8177:
-

serial repairs are also terrible for us in 2.0.10

parallel is better

 sequential repair is much more expensive than parallel repair
 -

 Key: CASSANDRA-8177
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges
Assignee: Yuki Morishita
 Attachments: cassc-week.png, iostats.png


 This is with 2.0.10
 The attached graph shows io read/write throughput (as measured with iostat) 
 when doing repairs.
 The large hump on the left is a sequential repair of one node.  The two much 
 smaller peaks on the right are parallel repairs.
 This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't 
 recommended).  Cassandra reports load of 40 gigs.
 We noticed a similar problem with a larger cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair

2014-10-24 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183757#comment-14183757
 ] 

Karl Mueller commented on CASSANDRA-8177:
-

Sequential repair is meant to be used where validation compaction on all 
replica will impact on overall cluster performance. If parallel repair does the 
job, then stick with it is fine.

Why on earth is serial repair the default then??  parallel is a better default!

 sequential repair is much more expensive than parallel repair
 -

 Key: CASSANDRA-8177
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177
 Project: Cassandra
  Issue Type: Bug
Reporter: Sean Bridges
Assignee: Yuki Morishita
 Attachments: cassc-week.png, iostats.png


 This is with 2.0.10
 The attached graph shows io read/write throughput (as measured with iostat) 
 when doing repairs.
 The large hump on the left is a sequential repair of one node.  The two much 
 smaller peaks on the right are parallel repairs.
 This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't 
 recommended).  Cassandra reports load of 40 gigs.
 We noticed a similar problem with a larger cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-10-23 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181487#comment-14181487
 ] 

Karl Mueller commented on CASSANDRA-7966:
-

No, I saw this on multiple nodes on two separate clusters


 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
 java.lang.IllegalArgumentException
 

 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7
Reporter: Karl Mueller
Assignee: Marcus Eriksson
Priority: Minor
 Attachments: dev-cass00-log.txt


 This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
 upgradesstables run:
 {noformat}
  INFO 15:31:11,532 Enqueuing flush of 
 Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
  INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
 serialized/live bytes, 1 ops)
  INFO 15:31:11,547 Completed flushing 
 /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
 position=164409)
 ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
 at 
 org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
 at 
 org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
 at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-10-20 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-7966:

Attachment: dev-cass00-log.txt

log a bit before the exception.. I don't think there is much interesting before 
or after it, seems normal after

 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
 java.lang.IllegalArgumentException
 

 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7
Reporter: Karl Mueller
Assignee: Marcus Eriksson
Priority: Minor
 Attachments: dev-cass00-log.txt


 This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
 upgradesstables run:
 {noformat}
  INFO 15:31:11,532 Enqueuing flush of 
 Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
  INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
 serialized/live bytes, 1 ops)
  INFO 15:31:11,547 Completed flushing 
 /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
 position=164409)
 ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
 at 
 org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
 at 
 org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
 at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-10-16 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174036#comment-14174036
 ] 

Karl Mueller commented on CASSANDRA-7966:
-

Yes, I run upgradesstable before every x.y upgrade

 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
 java.lang.IllegalArgumentException
 

 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7
Reporter: Karl Mueller
Priority: Minor

 This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
 upgradesstables run:
 {noformat}
  INFO 15:31:11,532 Enqueuing flush of 
 Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
  INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
 serialized/live bytes, 1 ops)
  INFO 15:31:11,547 Completed flushing 
 /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
 position=164409)
 ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
 at 
 org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
 at 
 org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
 at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-10-16 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174044#comment-14174044
 ] 

Karl Mueller commented on CASSANDRA-7966:
-

No, I haven't done it. I wasn't aware running upgradesstables *after* an 
upgrade was standard practice :)

 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
 java.lang.IllegalArgumentException
 

 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7
Reporter: Karl Mueller
Priority: Minor

 This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
 upgradesstables run:
 {noformat}
  INFO 15:31:11,532 Enqueuing flush of 
 Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
  INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
 serialized/live bytes, 1 ops)
  INFO 15:31:11,547 Completed flushing 
 /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
 position=164409)
 ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
 at 
 org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
 at 
 org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
 at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-10-16 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174121#comment-14174121
 ] 

Karl Mueller commented on CASSANDRA-7966:
-

OK thanks - let me know if there's more info needed :)

 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
 java.lang.IllegalArgumentException
 

 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7
Reporter: Karl Mueller
Assignee: Marcus Eriksson
Priority: Minor

 This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
 upgradesstables run:
 {noformat}
  INFO 15:31:11,532 Enqueuing flush of 
 Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
  INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
 serialized/live bytes, 1 ops)
  INFO 15:31:11,547 Completed flushing 
 /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
 position=164409)
 ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
 at 
 org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
 at 
 org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
 at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8024) No boot finished or ready message anymore upon startup completion to CLI

2014-09-30 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-8024:
---

 Summary: No boot finished or ready message anymore upon startup 
completion to CLI
 Key: CASSANDRA-8024
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8024
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 2.0.10
Reporter: Karl Mueller
Priority: Trivial


This is trivial, but cassandra logs the following to its log:

 ...
 INFO [main] 2014-09-29 23:10:35,793 CassandraDaemon.java (line 575) No gossip 
backlog; proceeding
 INFO [main] 2014-09-29 23:10:35,979 Server.java (line 156) Starting listening 
for CQL clients on kaos-cass00.sv.walmartlabs.com/10.93.12.10:9042...
 INFO [main] 2014-09-29 23:10:36,048 ThriftServer.java (line 99) Using 
TFramedTransport with a max frame size of 15728640 bytes.


However, on the command line I only see:

 INFO 23:10:30,005 Compacted 4 sstables to 
[/data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-67,].
  1,333 bytes to 962 (~72% of original) in 32ms = 0.028670MB/s.  15 total 
partitions merged to 12.  Partition merge counts were {1:11, 2:2, }
 INFO 23:10:35,793 No gossip backlog; proceeding


I would be nice if the Starting listening for.. or some other startup 
complete message went to the command line STDOUT. There used to be one, I 
think, but there isn't anymore.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-09-17 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-7966:
---

 Summary: 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
java.lang.IllegalArgumentException
 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7

Reporter: Karl Mueller
Priority: Minor



This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
upgradesstables run:


 INFO 15:31:11,532 Enqueuing flush of 
Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
 INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
serialized/live bytes, 1 ops)
 INFO 15:31:11,547 Completed flushing 
/data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
position=164409)
ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
at 
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
at 
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
at 
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
at 
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7966) 1.2.18 - 2.0.10 upgrade compactions_in_progress: java.lang.IllegalArgumentException

2014-09-17 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138271#comment-14138271
 ] 

Karl Mueller commented on CASSANDRA-7966:
-

It's not a new node. This is a very old cluster that's been migrated since the 
0.6.x days. It was running 1.2.18, and I'm upgrading it to 2.0.10.

upgradesstables was run on every node in it using 1.2.18

 1.2.18 - 2.0.10 upgrade compactions_in_progress: 
 java.lang.IllegalArgumentException
 

 Key: CASSANDRA-7966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7966
 Project: Cassandra
  Issue Type: Bug
 Environment: JDK 1.7
Reporter: Karl Mueller
Priority: Minor

 This happened on a new node when starting 2.0.10 after 1.2.18 with complete 
 upgradesstables run:
 {noformat}
  INFO 15:31:11,532 Enqueuing flush of 
 Memtable-compactions_in_progress@1366724594(0/0 serialized/live bytes, 1 ops)
  INFO 15:31:11,532 Writing Memtable-compactions_in_progress@1366724594(0/0 
 serialized/live bytes, 1 ops)
  INFO 15:31:11,547 Completed flushing 
 /data2/data-cassandra/system/compactions_in_progress/system-compactions_in_progress-jb-10-Data.db
  (42 bytes) for commitlog position ReplayPosition(segmentId=1410993002452, 
 position=164409)
 ERROR 15:31:11,550 Exception in thread Thread[CompactionExecutor:36,1,main]
 java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:267)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
 at 
 org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:112)
 at 
 org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:116)
 at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:150)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:85)
 at 
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
 at 
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:143)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-7507) OOM creates unreliable state - die instantly better

2014-07-07 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-7507:
---

 Summary: OOM creates unreliable state - die instantly better
 Key: CASSANDRA-7507
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507
 Project: Cassandra
  Issue Type: New Feature
Reporter: Karl Mueller
Priority: Minor


I had a cassandra node run OOM. My heap had enough headroom, there was just 
something which either was a bug or some unfortunate amount of short-term 
memory utilization. This resulted in the following error:

 WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java 
(line 1713) Some hints were not written before shutdown.  This is not supposed 
to happen.  You should (a) run repair, and (b) file a bug report

There are no other messages of relevance besides the OOM error about 90 minutes 
earlier.

My (limited) understanding of the JVM and Cassandra says that when it goes OOM, 
it will attempt to signal cassandra to shut down cleanly. The problem, in my 
view, is that with an OOM situation, nothing is guaranteed anymore. I believe 
it's impossible to reliably cleanly shut down at this point, and therefore 
it's wrong to even try. 

Yes, ideally things could be written out, flushed to disk, memory messages 
written, other nodes notified, etc. but why is there any reason to believe any 
of those steps could happen? Would happen? Couldn't bad data be written at this 
point to disk rather than good data? Some network messages delivered, but not 
others?

I think Cassandra should have the option to (and possibly default) to kill 
itself immediately upon the OOM condition happening in a hard way, and not rely 
on the java-based clean shutdown process. Cassandra already handles recovery 
from unclean shutdown, and it's not a big deal. My node, for example, kept in a 
sort-of alive state for 90 minutes where who knows what it was doing or not 
doing.

I don't know enough about the JVM and options for it to know the best exact 
implementation of die instantly on OOM, but it should be something that's 
possible either with some flags or a C library (which doesn't rely on java 
memory to do something which it may not be able to get!)

Short version: a kill -9 of all C* processes in that instance without needing 
more java memory, when OOM is raised



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7507) OOM creates unreliable state - die instantly better

2014-07-07 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054320#comment-14054320
 ] 

Karl Mueller commented on CASSANDRA-7507:
-

We're 1.2.16 at present

I'm not so concerned as much about something causing the cassandra node to run 
out of memory as much as how it handles the situation of being out of memory. I 
think due to the unreliability of java allocations in JVM OOM, cassandra should 
not attempt to do anything - even cleanly shut down. It would be better if it 
just dies immediately.

Or, possibly make this an option for this behavior, for those of us who would 
rather have a node be down than broken/dying/zombie/corrupting itself. 

 OOM creates unreliable state - die instantly better
 ---

 Key: CASSANDRA-7507
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507
 Project: Cassandra
  Issue Type: New Feature
Reporter: Karl Mueller
Priority: Minor

 I had a cassandra node run OOM. My heap had enough headroom, there was just 
 something which either was a bug or some unfortunate amount of short-term 
 memory utilization. This resulted in the following error:
  WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java 
 (line 1713) Some hints were not written before shutdown.  This is not 
 supposed to happen.  You should (a) run repair, and (b) file a bug report
 There are no other messages of relevance besides the OOM error about 90 
 minutes earlier.
 My (limited) understanding of the JVM and Cassandra says that when it goes 
 OOM, it will attempt to signal cassandra to shut down cleanly. The problem, 
 in my view, is that with an OOM situation, nothing is guaranteed anymore. I 
 believe it's impossible to reliably cleanly shut down at this point, and 
 therefore it's wrong to even try. 
 Yes, ideally things could be written out, flushed to disk, memory messages 
 written, other nodes notified, etc. but why is there any reason to believe 
 any of those steps could happen? Would happen? Couldn't bad data be written 
 at this point to disk rather than good data? Some network messages delivered, 
 but not others?
 I think Cassandra should have the option to (and possibly default) to kill 
 itself immediately upon the OOM condition happening in a hard way, and not 
 rely on the java-based clean shutdown process. Cassandra already handles 
 recovery from unclean shutdown, and it's not a big deal. My node, for 
 example, kept in a sort-of alive state for 90 minutes where who knows what it 
 was doing or not doing.
 I don't know enough about the JVM and options for it to know the best exact 
 implementation of die instantly on OOM, but it should be something that's 
 possible either with some flags or a C library (which doesn't rely on java 
 memory to do something which it may not be able to get!)
 Short version: a kill -9 of all C* processes in that instance without needing 
 more java memory, when OOM is raised



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7507) OOM creates unreliable state - die instantly better

2014-07-07 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054433#comment-14054433
 ] 

Karl Mueller edited comment on CASSANDRA-7507 at 7/8/14 2:23 AM:
-

if a bug can cause the clean exit after OOM to fail as expected, then isn't it 
considered a problem?

I guess if I'm considering the value of a clean exit versus possibly staying 
up, being in a weird state, or not writing the right data to disk, I would 
always prefer it to die without worrying about a clean exit. As I said, in my 
opinion, Cassandra already handles dying unexpectedly fine - there's no need to 
handle it cleanly when there's any risk. 

If there's no risk of something like 7133 happening (or a similar bug), then 
sure, clean exit is sensible, but that's clearly not guaranteed. Replaying some 
logs and then flushing is not a big deal compared to potentially bad data, 
zombie states, etc. - in my view, at least.



was (Author: kmueller):
if a bug can cause the clean exit after OOM to fail as expected, then isn't it 
considered a problem?

I guess if I'm considering the value of a clean exit versus possibly staying 
up or being in a weird state, I would always prefer it to die without worrying 
about a clean exit. As I said, in my opinion, Cassandra already handles dying 
unexpectedly fine - there's no need to handle it cleanly when there's any risk. 

If there's no risk of something like 7133 happening (or a similar bug), then 
sure, clean exit is sensible, but that's clearly not guaranteed. Replaying some 
logs and then flushing is not a big deal compared to potentially bad data, 
zombie states, etc. - in my view, at least.


 OOM creates unreliable state - die instantly better
 ---

 Key: CASSANDRA-7507
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507
 Project: Cassandra
  Issue Type: New Feature
Reporter: Karl Mueller
Priority: Minor

 I had a cassandra node run OOM. My heap had enough headroom, there was just 
 something which either was a bug or some unfortunate amount of short-term 
 memory utilization. This resulted in the following error:
  WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java 
 (line 1713) Some hints were not written before shutdown.  This is not 
 supposed to happen.  You should (a) run repair, and (b) file a bug report
 There are no other messages of relevance besides the OOM error about 90 
 minutes earlier.
 My (limited) understanding of the JVM and Cassandra says that when it goes 
 OOM, it will attempt to signal cassandra to shut down cleanly. The problem, 
 in my view, is that with an OOM situation, nothing is guaranteed anymore. I 
 believe it's impossible to reliably cleanly shut down at this point, and 
 therefore it's wrong to even try. 
 Yes, ideally things could be written out, flushed to disk, memory messages 
 written, other nodes notified, etc. but why is there any reason to believe 
 any of those steps could happen? Would happen? Couldn't bad data be written 
 at this point to disk rather than good data? Some network messages delivered, 
 but not others?
 I think Cassandra should have the option to (and possibly default) to kill 
 itself immediately upon the OOM condition happening in a hard way, and not 
 rely on the java-based clean shutdown process. Cassandra already handles 
 recovery from unclean shutdown, and it's not a big deal. My node, for 
 example, kept in a sort-of alive state for 90 minutes where who knows what it 
 was doing or not doing.
 I don't know enough about the JVM and options for it to know the best exact 
 implementation of die instantly on OOM, but it should be something that's 
 possible either with some flags or a C library (which doesn't rely on java 
 memory to do something which it may not be able to get!)
 Short version: a kill -9 of all C* processes in that instance without needing 
 more java memory, when OOM is raised



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7507) OOM creates unreliable state - die instantly better

2014-07-07 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054433#comment-14054433
 ] 

Karl Mueller commented on CASSANDRA-7507:
-

if a bug can cause the clean exit after OOM to fail as expected, then isn't it 
considered a problem?

I guess if I'm considering the value of a clean exit versus possibly staying 
up or being in a weird state, I would always prefer it to die without worrying 
about a clean exit. As I said, in my opinion, Cassandra already handles dying 
unexpectedly fine - there's no need to handle it cleanly when there's any risk. 

If there's no risk of something like 7133 happening (or a similar bug), then 
sure, clean exit is sensible, but that's clearly not guaranteed. Replaying some 
logs and then flushing is not a big deal compared to potentially bad data, 
zombie states, etc. - in my view, at least.


 OOM creates unreliable state - die instantly better
 ---

 Key: CASSANDRA-7507
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7507
 Project: Cassandra
  Issue Type: New Feature
Reporter: Karl Mueller
Priority: Minor

 I had a cassandra node run OOM. My heap had enough headroom, there was just 
 something which either was a bug or some unfortunate amount of short-term 
 memory utilization. This resulted in the following error:
  WARN [StorageServiceShutdownHook] 2014-06-30 09:38:38,251 StorageProxy.java 
 (line 1713) Some hints were not written before shutdown.  This is not 
 supposed to happen.  You should (a) run repair, and (b) file a bug report
 There are no other messages of relevance besides the OOM error about 90 
 minutes earlier.
 My (limited) understanding of the JVM and Cassandra says that when it goes 
 OOM, it will attempt to signal cassandra to shut down cleanly. The problem, 
 in my view, is that with an OOM situation, nothing is guaranteed anymore. I 
 believe it's impossible to reliably cleanly shut down at this point, and 
 therefore it's wrong to even try. 
 Yes, ideally things could be written out, flushed to disk, memory messages 
 written, other nodes notified, etc. but why is there any reason to believe 
 any of those steps could happen? Would happen? Couldn't bad data be written 
 at this point to disk rather than good data? Some network messages delivered, 
 but not others?
 I think Cassandra should have the option to (and possibly default) to kill 
 itself immediately upon the OOM condition happening in a hard way, and not 
 rely on the java-based clean shutdown process. Cassandra already handles 
 recovery from unclean shutdown, and it's not a big deal. My node, for 
 example, kept in a sort-of alive state for 90 minutes where who knows what it 
 was doing or not doing.
 I don't know enough about the JVM and options for it to know the best exact 
 implementation of die instantly on OOM, but it should be something that's 
 possible either with some flags or a C library (which doesn't rely on java 
 memory to do something which it may not be able to get!)
 Short version: a kill -9 of all C* processes in that instance without needing 
 more java memory, when OOM is raised



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7094) Keyspace name quoting is handled inconsistently and strangely

2014-04-25 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-7094:
---

 Summary: Keyspace name quoting is handled inconsistently and 
strangely 
 Key: CASSANDRA-7094
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7094
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: cassandra 1.2.16
Reporter: Karl Mueller
Priority: Trivial


Keyspaces which are named starting with capital letters (and perhaps other 
things) sometimes require double quotes and sometimes do not.

For example, describe works without quotes:

cqlsh describe keyspace ProductGenomeLocal;

CREATE KEYSPACE ProductGenomeLocal WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '3'
};

USE ProductGenomeLocal;
[...]

But use will not:

cqlsh use ProductGenomeLocal;
Bad Request: Keyspace 'productgenomelocal' does not exist

It seems that qoutes should only really be necessary when there's spaces or 
other symbols that need to be quoted. 

At the least, the acceptance or failures of quotes should be consistent.

Other minor annoyance: tab expansion works in use and describe with quotes, but 
will not work in either without quotes.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5989) java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-03-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917686#comment-13917686
 ] 

Karl Mueller commented on CASSANDRA-5989:
-

I certainly am not using Gora.

 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 -

 Key: CASSANDRA-5989
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5989
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 1.2.8
 Oracle Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
 RHEL6
Reporter: Karl Mueller

 This occurred in one of our nodes today. I don't have any helpful information 
 on what is going on beforehand yet - logs don't have anything I could see 
 that's tied for sure to it.
 A few things happened in the logs beforehand. A little bit of standard GC, a 
 bunch of status-logger entries 10 minutes before the crash, and a few nodes 
 going up and down on the gossip.
 ERROR [Thrift:7495] 2013-09-03 11:01:12,486 CassandraDaemon.java (line 192) 
 Exception in thread Thread[Thrift:7495,5,main]
 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at 
 org.apache.thrift.transport.TFramedTransport.write(TFramedTransport.java:146)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:163)
 at 
 org.apache.cassandra.thrift.TBinaryProtocol.writeBinary(TBinaryProtocol.java:69)
 at org.apache.cassandra.thrift.Column.write(Column.java:579)
 at org.apache.cassandra.thrift.CqlRow.write(CqlRow.java:439)
 at org.apache.cassandra.thrift.CqlResult.write(CqlResult.java:602)
 at 
 org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.write(Cassandra.java:37895)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:34)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6549) java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy

2014-01-03 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-6549:
---

 Summary: java.lang.ClassCastException: 
org.apache.cassandra.locator.SimpleStrategy cannot be cast to 
org.apache.cassandra.locator.NetworkTopologyStrategy
 Key: CASSANDRA-6549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6549
 Project: Cassandra
  Issue Type: Bug
 Environment: Sun JDK 1.7
cassandra 1.2.13
Reporter: Karl Mueller


Getting many of these since upgrading to 1.2.13:

ERROR [Thrift:3141] 2014-01-03 13:21:34,909 CustomTThreadPoolServer.java (line 
217) Error occurred during processing of message.
java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy 
cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy
at 
org.apache.cassandra.db.ConsistencyLevel.localQuorumFor(ConsistencyLevel.java:93)
at 
org.apache.cassandra.db.ConsistencyLevel.blockFor(ConsistencyLevel.java:114)
at 
org.apache.cassandra.service.ReadCallback.init(ReadCallback.java:65)
at 
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:880)
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:816)
at 
org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:108)
at 
org.apache.cassandra.thrift.CassandraServer.internal_get(CassandraServer.java:413)
at 
org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:443)
at 
org.apache.cassandra.thrift.Cassandra$Processor$get.getResult(Cassandra.java:3399)
at 
org.apache.cassandra.thrift.Cassandra$Processor$get.getResult(Cassandra.java:3387)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

We are running Simple Strategy. If there is an invalid client consistency level 
being used, it should not cause errors on the server.

Is this related to CASSANDRA-6238 ?




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6545) LOCAL_QUORUM still doesn't work with SimpleStrategy but don't throw a meaningful error message anymore

2014-01-03 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861961#comment-13861961
 ] 

Karl Mueller commented on CASSANDRA-6545:
-

We were also bit by this in my duplicate issue 6549

 LOCAL_QUORUM still doesn't work with SimpleStrategy but don't throw a 
 meaningful error message anymore
 --

 Key: CASSANDRA-6545
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6545
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Alex Liu
 Fix For: 1.2.14


 It seems it was the intent of CASSANDRA-6238 originally, though I've tracked 
 to the commit of CASSANDRA-6309 (f7efaffadace3e344eeb4a1384fa72c73d8422b0 to 
 be precise) but in any case, ConsistencyLevel.validateForWrite does not 
 reject LOCAL_QUORUM when SimpleStrategy is used anymore, yet 
 ConsistencyLevel.blockFor definitively cast the strategy to NTS for 
 LOCAL_QUORUM (in localQuorumFor() to be precise). Which results in a 
 ClassCastException as reported by 
 https://datastax-oss.atlassian.net/browse/JAVA-241.
 Note that while we're at it, I tend to agree with Aleksey comment on 
 CASSANDRA-6238, why not make EACH_QUORUM == QUORUM for SimpleStrategy too?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (CASSANDRA-6433) snapshot race with compaction causes missing link error

2013-12-02 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-6433:
---

 Summary: snapshot race with compaction causes missing link error
 Key: CASSANDRA-6433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433
 Project: Cassandra
  Issue Type: Bug
 Environment: EL6
Oracle Java 1.7.40
Reporter: Karl Mueller
Priority: Minor


Cassandra 1.2.11

When trying to snapshot, I encountered this error. It appears that snapshot 
doesn't lock the sstable list in a keyspace which can cause a race condition 
with compaction. (I think it's compaction, at least)

[cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12
*** dev-cass01 (1) ***
 
Nodetool command snapshot -t pre-1.2.12 failed!
 
Output:
 
Requested creating snapshot for: all keyspaces
Exception in thread main java.lang.RuntimeException: Tried to hard link to 
file that does not exist 
/data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db
at 
org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612)
at org.apache.cassandra.db.Table.snapshot(Table.java:194)
at 
org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6433) snapshot race with compaction causes missing link error

2013-12-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837003#comment-13837003
 ] 

Karl Mueller commented on CASSANDRA-6433:
-

Update: this doesn't appear to be related to a race during the snapshot. This 
file appears to be missing entirely.

Got the error in a 2nd snapshot attempt, and:

[cassandra@dev-cass01 ~]$ ls -la 
/data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db


 snapshot race with compaction causes missing link error
 ---

 Key: CASSANDRA-6433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433
 Project: Cassandra
  Issue Type: Bug
 Environment: EL6
 Oracle Java 1.7.40
Reporter: Karl Mueller
Priority: Minor

 Cassandra 1.2.11
 When trying to snapshot, I encountered this error. It appears that snapshot 
 doesn't lock the sstable list in a keyspace which can cause a race condition 
 with compaction. (I think it's compaction, at least)
 [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12
 *** dev-cass01 (1) ***
  
 Nodetool command snapshot -t pre-1.2.12 failed!
  
 Output:
  
 Requested creating snapshot for: all keyspaces
 Exception in thread main java.lang.RuntimeException: Tried to hard link to 
 file that does not exist 
 /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db
 at 
 org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612)
 at org.apache.cassandra.db.Table.snapshot(Table.java:194)
 at 
 org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
 at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
 at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
 at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
 at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
 at sun.rmi.transport.Transport$1.run(Transport.java:177)
 at sun.rmi.transport.Transport$1.run(Transport.java:174)
 at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
 at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 

[jira] [Commented] (CASSANDRA-6433) snapshot race with compaction causes missing link error

2013-12-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837005#comment-13837005
 ] 

Karl Mueller commented on CASSANDRA-6433:
-

Bah short paste

[cassandra@dev-cass01 ~]$ ls -la 
/data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db
ls: cannot access 
/data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db:
 No such file or directory

I need to restart it so I can snapshot it and run my upgrade soon. 

 snapshot race with compaction causes missing link error
 ---

 Key: CASSANDRA-6433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433
 Project: Cassandra
  Issue Type: Bug
 Environment: EL6
 Oracle Java 1.7.40
Reporter: Karl Mueller
Priority: Minor

 Cassandra 1.2.11
 When trying to snapshot, I encountered this error. It appears that snapshot 
 doesn't lock the sstable list in a keyspace which can cause a race condition 
 with compaction. (I think it's compaction, at least)
 [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12
 *** dev-cass01 (1) ***
  
 Nodetool command snapshot -t pre-1.2.12 failed!
  
 Output:
  
 Requested creating snapshot for: all keyspaces
 Exception in thread main java.lang.RuntimeException: Tried to hard link to 
 file that does not exist 
 /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db
 at 
 org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612)
 at org.apache.cassandra.db.Table.snapshot(Table.java:194)
 at 
 org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
 at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
 at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
 at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
 at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
 at sun.rmi.transport.Transport$1.run(Transport.java:177)
 at sun.rmi.transport.Transport$1.run(Transport.java:174)
 at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
 at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 

[jira] [Created] (CASSANDRA-6435) nodetool outputs xss and jamm errors in 1.2.12

2013-12-02 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-6435:
---

 Summary: nodetool outputs xss and jamm errors in 1.2.12
 Key: CASSANDRA-6435
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6435
 Project: Cassandra
  Issue Type: Bug
Reporter: Karl Mueller
Priority: Minor


Since 1.2.12, just running nodetool is producing this output. Probably this is 
related to CASSANDRA-6273.

it's unclear to me whether jamm is actually not being loaded, but clearly 
nodetool should not be having this output, which is likely from cassandra-env.sh

[cassandra@dev-cass00 cassandra]$ /data2/cassandra/bin/nodetool ring
xss =  -ea -javaagent:/data2/cassandra/bin/../lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms14G -Xmx14G -Xmn1G 
-XX:+HeapDumpOnOutOfMemoryError -Xss256k
Note: Ownership information does not include topology; for complete 
information, specify a keyspace

Datacenter: datacenter1
==
Address  RackStatus State   LoadOwns
Token

170141183460469231731687303715884105727
10.93.15.10  rack1   Up Normal  123.82 GB   20.00%  
34028236692093846346337460743176821145
10.93.15.11  rack1   Up Normal  124 GB  20.00%  
68056473384187692692674921486353642290
10.93.15.12  rack1   Up Normal  123.97 GB   20.00%  
102084710076281539039012382229530463436
10.93.15.13  rack1   Up Normal  124.03 GB   20.00%  
136112946768375385385349842972707284581
10.93.15.14  rack1   Up Normal  123.93 GB   20.00%  
170141183460469231731687303715884105727

ERROR 16:20:01,408 Unable to initialize MemoryMeter (jamm not specified as 
javaagent).  This means Cassandra will be unable to measure object sizes 
accurately and may consequently OOM.




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6433) snapshot race with compaction causes missing link error

2013-12-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837136#comment-13837136
 ] 

Karl Mueller commented on CASSANDRA-6433:
-

Yes, very likely.

 snapshot race with compaction causes missing link error
 ---

 Key: CASSANDRA-6433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6433
 Project: Cassandra
  Issue Type: Bug
 Environment: EL6
 Oracle Java 1.7.40
Reporter: Karl Mueller
Priority: Minor

 Cassandra 1.2.11
 When trying to snapshot, I encountered this error. It appears that snapshot 
 doesn't lock the sstable list in a keyspace which can cause a race condition 
 with compaction. (I think it's compaction, at least)
 [cassandra@dev-cass00 ~]$ cas cluster snap pre-1.2.12
 *** dev-cass01 (1) ***
  
 Nodetool command snapshot -t pre-1.2.12 failed!
  
 Output:
  
 Requested creating snapshot for: all keyspaces
 Exception in thread main java.lang.RuntimeException: Tried to hard link to 
 file that does not exist 
 /data2/data-cassandra/csprocessor/csprocessor/csprocessor-csprocessor-ic-4-Summary.db
 at 
 org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:72)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1095)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1567)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1612)
 at org.apache.cassandra.db.Table.snapshot(Table.java:194)
 at 
 org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2233)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
 at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
 at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
 at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
 at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
 at sun.rmi.transport.Transport$1.run(Transport.java:177)
 at sun.rmi.transport.Transport$1.run(Transport.java:174)
 at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
 at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6435) nodetool outputs xss and jamm errors in 1.2.12

2013-12-02 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837280#comment-13837280
 ] 

Karl Mueller commented on CASSANDRA-6435:
-

If this helps:

[root@dev-cass00 ~]# java -version
java version 1.7.0_40
Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)


 nodetool outputs xss and jamm errors in 1.2.12
 --

 Key: CASSANDRA-6435
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6435
 Project: Cassandra
  Issue Type: Bug
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor

 Since 1.2.12, just running nodetool is producing this output. Probably this 
 is related to CASSANDRA-6273.
 it's unclear to me whether jamm is actually not being loaded, but clearly 
 nodetool should not be having this output, which is likely from 
 cassandra-env.sh
 [cassandra@dev-cass00 cassandra]$ /data2/cassandra/bin/nodetool ring
 xss =  -ea -javaagent:/data2/cassandra/bin/../lib/jamm-0.2.5.jar 
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms14G -Xmx14G -Xmn1G 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k
 Note: Ownership information does not include topology; for complete 
 information, specify a keyspace
 Datacenter: datacenter1
 ==
 Address  RackStatus State   LoadOwns
 Token
 
 170141183460469231731687303715884105727
 10.93.15.10  rack1   Up Normal  123.82 GB   20.00%  
 34028236692093846346337460743176821145
 10.93.15.11  rack1   Up Normal  124 GB  20.00%  
 68056473384187692692674921486353642290
 10.93.15.12  rack1   Up Normal  123.97 GB   20.00%  
 102084710076281539039012382229530463436
 10.93.15.13  rack1   Up Normal  124.03 GB   20.00%  
 136112946768375385385349842972707284581
 10.93.15.14  rack1   Up Normal  123.93 GB   20.00%  
 170141183460469231731687303715884105727
 ERROR 16:20:01,408 Unable to initialize MemoryMeter (jamm not specified as 
 javaagent).  This means Cassandra will be unable to measure object sizes 
 accurately and may consequently OOM.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin

2013-10-11 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792400#comment-13792400
 ] 

Karl Mueller commented on CASSANDRA-6092:
-

This doesn't make sense. You change a compaction strategy, it should start to 
take effect. You shouldn't have to do anything else. This is a bug, plain and 
simple.

For one thing, the most likely users who want to use leveled compaction are 
people like me who compact nightly to get rid of old update rows. We're the 
most likely ones to have a single sstable.

This is not a bizarre corner case, but basic functionality!


 Leveled Compaction after ALTER TABLE creates pending but does not actually 
 begin
 

 Key: CASSANDRA-6092
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 1.2.10
 Oracle Java 1.7.0_u40
 RHEL6.4
Reporter: Karl Mueller
Assignee: Daniel Meyer

 Running Cassandra 1.2.10.  N=5, RF=3
 On this Column Family (ProductGenomeDev/Node), it's been major compacted into 
 a single, large sstable.
 There's no activity on the table at the time of the ALTER command. I changed 
 it to Leveled Compaction with the command below.
 cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 Log entries confirm the change happened.
 [...]column_metadata={},compactionStrategyClass=class 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160}
  [...]
 nodetool compactionstats shows pending compactions, but there's no activity:
 pending tasks: 750
 12 hours later, nothing has still happened, same number pending. The 
 expectation would be that compactions would proceed immediately to convert 
 everything to Leveled Compaction as soon as the ALTER TABLE command goes.
 I try a simple write into the CF, and then flush the nodes. This kicks off 
 compaction on 3 nodes. (RF=3)
 cqlsh:ProductGenomeDev insert into Node (key, column1, value) values 
 ('test123', 'test123', 'test123');
 cqlsh:ProductGenomeDev select * from Node where key = 'test123';
  key | column1 | value
 -+-+-
  test123 | test123 | test123
 cqlsh:ProductGenomeDev delete from Node where key = 'test123';
 After a flush on every node, now I see:
 [cassandra@dev-cass00 ~]$ cas exec nt compactionstats
 *** dev-cass00 (0) ***
 pending tasks: 750
 Active compaction remaining time :n/a
 *** dev-cass04 (0) ***
 pending tasks: 752
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  341881
 643290447928 bytes 0.53%
 Active compaction remaining time :n/a
 *** dev-cass01 (0) ***
 pending tasks: 750
 Active compaction remaining time :n/a
 *** dev-cass02 (0) ***
 pending tasks: 751
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  3374975141
 642764512481 bytes 0.53%
 Active compaction remaining time :n/a
 *** dev-cass03 (0) ***
 pending tasks: 751
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  3591320948
 643017643573 bytes 0.56%
 Active compaction remaining time :n/a
 After inserting and deleting more columns, enough that all nodes have new 
 data, and flushing, now compactions are proceeding on all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin

2013-10-10 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791980#comment-13791980
 ] 

Karl Mueller commented on CASSANDRA-6092:
-

I'll try this work-around. A lot easier than what I did by inserting, deleting, 
flushing data!

 Leveled Compaction after ALTER TABLE creates pending but does not actually 
 begin
 

 Key: CASSANDRA-6092
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 1.2.10
 Oracle Java 1.7.0_u40
 RHEL6.4
Reporter: Karl Mueller
Assignee: Daniel Meyer

 Running Cassandra 1.2.10.  N=5, RF=3
 On this Column Family (ProductGenomeDev/Node), it's been major compacted into 
 a single, large sstable.
 There's no activity on the table at the time of the ALTER command. I changed 
 it to Leveled Compaction with the command below.
 cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 Log entries confirm the change happened.
 [...]column_metadata={},compactionStrategyClass=class 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160}
  [...]
 nodetool compactionstats shows pending compactions, but there's no activity:
 pending tasks: 750
 12 hours later, nothing has still happened, same number pending. The 
 expectation would be that compactions would proceed immediately to convert 
 everything to Leveled Compaction as soon as the ALTER TABLE command goes.
 I try a simple write into the CF, and then flush the nodes. This kicks off 
 compaction on 3 nodes. (RF=3)
 cqlsh:ProductGenomeDev insert into Node (key, column1, value) values 
 ('test123', 'test123', 'test123');
 cqlsh:ProductGenomeDev select * from Node where key = 'test123';
  key | column1 | value
 -+-+-
  test123 | test123 | test123
 cqlsh:ProductGenomeDev delete from Node where key = 'test123';
 After a flush on every node, now I see:
 [cassandra@dev-cass00 ~]$ cas exec nt compactionstats
 *** dev-cass00 (0) ***
 pending tasks: 750
 Active compaction remaining time :n/a
 *** dev-cass04 (0) ***
 pending tasks: 752
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  341881
 643290447928 bytes 0.53%
 Active compaction remaining time :n/a
 *** dev-cass01 (0) ***
 pending tasks: 750
 Active compaction remaining time :n/a
 *** dev-cass02 (0) ***
 pending tasks: 751
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  3374975141
 642764512481 bytes 0.53%
 Active compaction remaining time :n/a
 *** dev-cass03 (0) ***
 pending tasks: 751
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  3591320948
 643017643573 bytes 0.56%
 Active compaction remaining time :n/a
 After inserting and deleting more columns, enough that all nodes have new 
 data, and flushing, now compactions are proceeding on all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin

2013-09-24 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-6092:
---

 Summary: Leveled Compaction after ALTER TABLE creates pending but 
does not actually begin
 Key: CASSANDRA-6092
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 1.2.10
Oracle Java 1.7.0_u40
RHEL6.4
Reporter: Karl Mueller


Running Cassandra 1.2.10.  N=5, RF=3

On this Column Family (ProductGenomeDev/Node), it's been major compacted into a 
single, large sstable.

There's no activity on the table at the time of the ALTER command. I changed it 
to Leveled Compaction with the command below.

cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 
'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

Log entries confirm the change happened.

[...]column_metadata={},compactionStrategyClass=class 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160}
 [...]

nodetool compactionstats shows pending compactions, but there's no activity:

pending tasks: 750

12 hours later, nothing has still happened, same number pending. The 
expectation would be that compactions would proceed immediately to convert 
everything to Leveled Compaction as soon as the ALTER TABLE command goes.

I try a simple write into the CF, and then flush the nodes. This kicks off 
compaction on 3 nodes. (RF=3)

cqlsh:ProductGenomeDev insert into Node (key, column1, value) values 
('test123', 'test123', 'test123');
cqlsh:ProductGenomeDev select * from Node where key = 'test123';

 key | column1 | value
-+-+-
 test123 | test123 | test123

cqlsh:ProductGenomeDev delete from Node where key = 'test123';


After a flush on every node, now I see:

[cassandra@dev-cass00 ~]$ cas exec nt compactionstats
*** dev-cass00 (0) ***
pending tasks: 750
Active compaction remaining time :n/a
*** dev-cass04 (0) ***
pending tasks: 752
  compaction typekeyspace   column family   completed   
total  unit  progress
   CompactionProductGenomeDevNode  341881
643290447928 bytes 0.53%
Active compaction remaining time :n/a
*** dev-cass01 (0) ***
pending tasks: 750
Active compaction remaining time :n/a
*** dev-cass02 (0) ***
pending tasks: 751
  compaction typekeyspace   column family   completed   
total  unit  progress
   CompactionProductGenomeDevNode  3374975141
642764512481 bytes 0.53%
Active compaction remaining time :n/a
*** dev-cass03 (0) ***
pending tasks: 751
  compaction typekeyspace   column family   completed   
total  unit  progress
   CompactionProductGenomeDevNode  3591320948
643017643573 bytes 0.56%
Active compaction remaining time :n/a



After inserting and deleting more columns, enough that all nodes have new data, 
and flushing, now compactions are proceeding on all nodes.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-6092) Leveled Compaction after ALTER TABLE creates pending but does not actually begin

2013-09-24 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-6092:


Since Version: 1.2.10

 Leveled Compaction after ALTER TABLE creates pending but does not actually 
 begin
 

 Key: CASSANDRA-6092
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6092
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 1.2.10
 Oracle Java 1.7.0_u40
 RHEL6.4
Reporter: Karl Mueller

 Running Cassandra 1.2.10.  N=5, RF=3
 On this Column Family (ProductGenomeDev/Node), it's been major compacted into 
 a single, large sstable.
 There's no activity on the table at the time of the ALTER command. I changed 
 it to Leveled Compaction with the command below.
 cqlsh:ProductGenomeDev alter table Node with compaction = { 'class' : 
 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
 Log entries confirm the change happened.
 [...]column_metadata={},compactionStrategyClass=class 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy,compactionStrategyOptions={sstable_size_in_mb=160}
  [...]
 nodetool compactionstats shows pending compactions, but there's no activity:
 pending tasks: 750
 12 hours later, nothing has still happened, same number pending. The 
 expectation would be that compactions would proceed immediately to convert 
 everything to Leveled Compaction as soon as the ALTER TABLE command goes.
 I try a simple write into the CF, and then flush the nodes. This kicks off 
 compaction on 3 nodes. (RF=3)
 cqlsh:ProductGenomeDev insert into Node (key, column1, value) values 
 ('test123', 'test123', 'test123');
 cqlsh:ProductGenomeDev select * from Node where key = 'test123';
  key | column1 | value
 -+-+-
  test123 | test123 | test123
 cqlsh:ProductGenomeDev delete from Node where key = 'test123';
 After a flush on every node, now I see:
 [cassandra@dev-cass00 ~]$ cas exec nt compactionstats
 *** dev-cass00 (0) ***
 pending tasks: 750
 Active compaction remaining time :n/a
 *** dev-cass04 (0) ***
 pending tasks: 752
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  341881
 643290447928 bytes 0.53%
 Active compaction remaining time :n/a
 *** dev-cass01 (0) ***
 pending tasks: 750
 Active compaction remaining time :n/a
 *** dev-cass02 (0) ***
 pending tasks: 751
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  3374975141
 642764512481 bytes 0.53%
 Active compaction remaining time :n/a
 *** dev-cass03 (0) ***
 pending tasks: 751
   compaction typekeyspace   column family   completed 
   total  unit  progress
CompactionProductGenomeDevNode  3591320948
 643017643573 bytes 0.56%
 Active compaction remaining time :n/a
 After inserting and deleting more columns, enough that all nodes have new 
 data, and flushing, now compactions are proceeding on all nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5989) java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2013-09-09 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-5989:
---

 Summary: java.lang.OutOfMemoryError: Requested array size exceeds 
VM limit
 Key: CASSANDRA-5989
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5989
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 1.2.8
Oracle Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
RHEL6

Reporter: Karl Mueller


This occurred in one of our nodes today. I don't have any helpful information 
on what is going on beforehand yet - logs don't have anything I could see 
that's tied for sure to it.

A few things happened in the logs beforehand. A little bit of standard GC, a 
bunch of status-logger entries 10 minutes before the crash, and a few nodes 
going up and down on the gossip.


ERROR [Thrift:7495] 2013-09-03 11:01:12,486 CassandraDaemon.java (line 192) 
Exception in thread Thread[Thrift:7495,5,main]
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at 
org.apache.thrift.transport.TFramedTransport.write(TFramedTransport.java:146)
at 
org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:163)
at 
org.apache.cassandra.thrift.TBinaryProtocol.writeBinary(TBinaryProtocol.java:69)
at org.apache.cassandra.thrift.Column.write(Column.java:579)
at org.apache.cassandra.thrift.CqlRow.write(CqlRow.java:439)
at org.apache.cassandra.thrift.CqlResult.write(CqlResult.java:602)
at 
org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.write(Cassandra.java:37895)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:34)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5990) Hinted Handoff: java.lang.ArithmeticException: / by zero

2013-09-09 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-5990:
---

 Summary: Hinted Handoff: java.lang.ArithmeticException: / by zero
 Key: CASSANDRA-5990
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5990
 Project: Cassandra
  Issue Type: Bug
 Environment: cassandra 1.2.8
Oracle Java 1.7.0_25-b15
RHEL6
Reporter: Karl Mueller
Priority: Minor


This node was down for a few hours. When bringing it back up, I saw this error 
in the logs. I'm not sure if it's receiving or sending hinted hand-offs.

 INFO [HintedHandoff:1] 2013-09-09 14:41:04,020 HintedHandOffManager.java (line 
292) Started hinted handoff for host: 42bba02f-3088-4be1-8cb2-748a6f15e15d with 
IP: /10.93.12.14
ERROR [HintedHandoff:1] 2013-09-09 14:41:04,024 CassandraDaemon.java (line 192) 
Exception in thread Thread[HintedHandoff:1,1,main]
java.lang.ArithmeticException: / by zero
at 
org.apache.cassandra.db.HintedHandOffManager.calculatePageSize(HintedHandOffManager.java:441)
at 
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:299)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:278)
at 
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
at 
org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:497)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5068) CLONE - Once a host has been hinted to, log messages for it repeat every 10 mins even if no hints are delivered

2013-01-18 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557574#comment-13557574
 ] 

Karl Mueller commented on CASSANDRA-5068:
-

I'm also seeing this, running 1.1.8 :)

 CLONE - Once a host has been hinted to, log messages for it repeat every 10 
 mins even if no hints are delivered
 ---

 Key: CASSANDRA-5068
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5068
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.6, 1.2.0
 Environment: cassandra 1.1.6
 java 1.6.0_30
Reporter: Peter Haggerty
Assignee: Brandon Williams
Priority: Minor
  Labels: hinted, hintedhandoff, phantom

 We have 0 row hinted handoffs every 10 minutes like clockwork. This impacts 
 our ability to monitor the cluster by adding persistent noise in the handoff 
 metric.
 Previous mentions of this issue are here:
 http://www.mail-archive.com/user@cassandra.apache.org/msg25982.html
 The hinted handoffs can be scrubbed away with
 nodetool -h 127.0.0.1 scrub system HintsColumnFamily
 but they return after anywhere from a few minutes to multiple hours later.
 These started to appear after an upgrade to 1.1.6 and haven't gone away 
 despite rolling cleanups, rolling restarts, multiple rounds of scrubbing, etc.
 A few things we've noticed about the handoffs:
 1. The phantom handoff endpoint changes after a non-zero handoff comes through
 2. Sometimes a non-zero handoff will be immediately followed by an off 
 schedule phantom handoff to the endpoint the phantom had been using before
 3. The sstable2json output seems to include multiple sub-sections for each 
 handoff with the same deletedAt information.
 The phantom handoff endpoint changes after a non-zero handoff comes through:
  INFO [HintedHandoff:1] 2012-12-11 06:57:35,093 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
  INFO [HintedHandoff:1] 2012-12-11 07:07:35,092 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
  INFO [HintedHandoff:1] 2012-12-11 07:07:37,915 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 1058 rows to endpoint /10.10.10.2
  INFO [HintedHandoff:1] 2012-12-11 07:17:35,093 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
  INFO [HintedHandoff:1] 2012-12-11 07:27:35,093 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
 Sometimes a non-zero handoff will be immediately followed by an off 
 schedule phantom handoff to the endpoint the phantom had been using before:
  INFO [HintedHandoff:1] 2012-12-12 21:47:39,335 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
  INFO [HintedHandoff:1] 2012-12-12 21:57:39,335 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
  INFO [HintedHandoff:1] 2012-12-12 22:07:43,319 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 1416 rows to endpoint /10.10.10.4
  INFO [HintedHandoff:1] 2012-12-12 22:07:43,320 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
  INFO [HintedHandoff:1] 2012-12-12 22:17:39,357 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
  INFO [HintedHandoff:1] 2012-12-12 22:27:39,337 HintedHandOffManager.java 
 (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
 The first few entries from one of the json files:
 {
 0aaa: {
 ccf5dc203a2211e2e154da71a9bb: {
 deletedAt: -9223372036854775808, 
 subColumns: []
 }, 
 ccf603303a2211e2e154da71a9bb: {
 deletedAt: -9223372036854775808, 
 subColumns: []
 }, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5088) Major compaction IOException in 1.1.8

2012-12-22 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-5088:
---

 Summary: Major compaction IOException in 1.1.8
 Key: CASSANDRA-5088
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5088
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.8
Reporter: Karl Mueller


Upgraded 1.1.6 to 1.1.8.

Now I'm trying to do a major compaction, and seeing this:

ERROR [CompactionExecutor:129] 2012-12-22 10:33:44,217 
AbstractCassandraDaemon.java (line 135) Exception in thread 
Thread[CompactionExecutor:129,1,RMI Runtime]
java.io.IOError: java.io.IOException: Bad file descriptor
at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:65)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:195)
at 
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:298)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Bad file descriptor
at sun.nio.ch.FileDispatcher.preClose0(Native Method)
at sun.nio.ch.FileDispatcher.preClose(FileDispatcher.java:59)
at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:96)
at 
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
at java.io.FileInputStream.close(FileInputStream.java:258)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:131)
at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:121)
at 
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
at java.io.RandomAccessFile.close(RandomAccessFile.java:541)
at 
org.apache.cassandra.io.util.RandomAccessReader.close(RandomAccessReader.java:224)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:130)
at 
org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:89)
at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:61)
... 9 more


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5088) Major compaction IOException in 1.1.8

2012-12-22 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538905#comment-13538905
 ] 

Karl Mueller commented on CASSANDRA-5088:
-

What would be helpful to debug?  The same environment worked fine for 1.1.6 - I 
didn't try 1.1.7

I'm running a bit old of a JDK - I could try upgrading it.

 Major compaction IOException in 1.1.8
 -

 Key: CASSANDRA-5088
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5088
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.8
Reporter: Karl Mueller

 Upgraded 1.1.6 to 1.1.8.
 Now I'm trying to do a major compaction, and seeing this:
 ERROR [CompactionExecutor:129] 2012-12-22 10:33:44,217 
 AbstractCassandraDaemon.java (line 135) Exception in thread 
 Thread[CompactionExecutor:129,1,RMI Runtime]
 java.io.IOError: java.io.IOException: Bad file descriptor
 at 
 org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:65)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:195)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:298)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: Bad file descriptor
 at sun.nio.ch.FileDispatcher.preClose0(Native Method)
 at sun.nio.ch.FileDispatcher.preClose(FileDispatcher.java:59)
 at 
 sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:96)
 at 
 java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
 at java.io.FileInputStream.close(FileInputStream.java:258)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:131)
 at 
 sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:121)
 at 
 java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
 at java.io.RandomAccessFile.close(RandomAccessFile.java:541)
 at 
 org.apache.cassandra.io.util.RandomAccessReader.close(RandomAccessReader.java:224)
 at 
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.close(CompressedRandomAccessReader.java:130)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:89)
 at 
 org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:61)
 ... 9 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4802) Regular startup log has confusing Bootstrap/Replace/Move completed! without boostrap, replace, or move

2012-10-15 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476509#comment-13476509
 ] 

Karl Mueller commented on CASSANDRA-4802:
-

Bootstrap means something specifically with cassandra in that you think some 
data has streamed in.

I think Startup completed would be great.

If there IS a bootstrap/replace/move then I think the message ought to specify 
which has happened and that it's ready now (if it's easy to do) :)

 Regular startup log has confusing Bootstrap/Replace/Move completed! without 
 boostrap, replace, or move
 

 Key: CASSANDRA-4802
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4802
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.12
 Environment: RHEL6, JDK1.6
Reporter: Karl Mueller
Assignee: Vijay
Priority: Trivial

 A regular startup completes successfully, but it has a confusing message the 
 end of the startup:
   INFO 15:19:29,137 Bootstrap/Replace/Move completed! Now serving reads.
 This happens despite no bootstrap, replace, or move.
 While purely cosmetic, this makes you wonder what the node just did - did it 
 just bootstrap?!  It should simply read something like Startup completed! 
 Now serving reads unless it actually has done one of the actions in the 
 error message.
 Complete log at the end:
 INFO 15:13:30,522 Log replay complete, 6274 replayed mutations
  INFO 15:13:30,527 Cassandra version: 1.0.12
  INFO 15:13:30,527 Thrift API version: 19.20.0
  INFO 15:13:30,527 Loading persisted ring state
  INFO 15:13:30,541 Starting up server gossip
  INFO 15:13:30,542 Enqueuing flush of Memtable-LocationInfo@1828864224(29/36 
 serialized/live bytes, 1 ops)
  INFO 15:13:30,543 Writing Memtable-LocationInfo@1828864224(29/36 
 serialized/live bytes, 1 ops)
  INFO 15:13:30,550 Completed flushing 
 /data2/data-cassandra/system/LocationInfo-hd-274-Data.db (80 bytes)
  INFO 15:13:30,563 Starting Messaging Service on port 7000
  INFO 15:13:30,571 Using saved token 31901471898837980949691369446728269823
  INFO 15:13:30,572 Enqueuing flush of Memtable-LocationInfo@294410307(53/66 
 serialized/live bytes, 2 ops)
  INFO 15:13:30,573 Writing Memtable-LocationInfo@294410307(53/66 
 serialized/live bytes, 2 ops)
  INFO 15:13:30,579 Completed flushing 
 /data2/data-cassandra/system/LocationInfo-hd-275-Data.db (163 bytes)
  INFO 15:13:30,581 Node kaos-cass02.xxx/1.2.3.4 state jump to normal
  INFO 15:13:30,598 Bootstrap/Replace/Move completed! Now serving reads.
  INFO 15:13:30,600 Will not load MX4J, mx4j-tools.jar is not in the classpath

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-4802) Regular startup log has confusing Bootstrap/Replace/Move completed! without boostrap, replace, or move

2012-10-12 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-4802:
---

 Summary: Regular startup log has confusing Bootstrap/Replace/Move 
completed! without boostrap, replace, or move
 Key: CASSANDRA-4802
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4802
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.12
 Environment: RHEL6, JDK1.6
Reporter: Karl Mueller
Priority: Trivial


A regular startup completes successfully, but it has a confusing message the 
end of the startup:

  INFO 15:19:29,137 Bootstrap/Replace/Move completed! Now serving reads.

This happens despite no bootstrap, replace, or move.

While purely cosmetic, this makes you wonder what the node just did - did it 
just bootstrap?!  It should simply read something like Startup completed! Now 
serving reads unless it actually has done one of the actions in the error 
message.



Complete log at the end:


INFO 15:13:30,522 Log replay complete, 6274 replayed mutations
 INFO 15:13:30,527 Cassandra version: 1.0.12
 INFO 15:13:30,527 Thrift API version: 19.20.0
 INFO 15:13:30,527 Loading persisted ring state
 INFO 15:13:30,541 Starting up server gossip
 INFO 15:13:30,542 Enqueuing flush of Memtable-LocationInfo@1828864224(29/36 
serialized/live bytes, 1 ops)
 INFO 15:13:30,543 Writing Memtable-LocationInfo@1828864224(29/36 
serialized/live bytes, 1 ops)
 INFO 15:13:30,550 Completed flushing 
/data2/data-cassandra/system/LocationInfo-hd-274-Data.db (80 bytes)
 INFO 15:13:30,563 Starting Messaging Service on port 7000
 INFO 15:13:30,571 Using saved token 31901471898837980949691369446728269823
 INFO 15:13:30,572 Enqueuing flush of Memtable-LocationInfo@294410307(53/66 
serialized/live bytes, 2 ops)
 INFO 15:13:30,573 Writing Memtable-LocationInfo@294410307(53/66 
serialized/live bytes, 2 ops)
 INFO 15:13:30,579 Completed flushing 
/data2/data-cassandra/system/LocationInfo-hd-275-Data.db (163 bytes)
 INFO 15:13:30,581 Node kaos-cass02.xxx/1.2.3.4 state jump to normal
 INFO 15:13:30,598 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 15:13:30,600 Will not load MX4J, mx4j-tools.jar is not in the classpath


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4446) nodetool drain sometimes doesn't mark commitlog fully flushed

2012-10-08 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471960#comment-13471960
 ] 

Karl Mueller commented on CASSANDRA-4446:
-

Also seeing this in an upgrade from 1.0.xx to 1.1.15:

 INFO 16:29:17,486 completed pre-loading (3 keys) key cache.
 INFO 16:29:17,495 Replaying /data2/commit-cassandra/CommitLog-1349727956484.log
 INFO 16:29:17,503 Replaying /data2/commit-cassandra/CommitLog-1349727956484.log
 INFO 16:29:18,495 GC for ParNew: 3506 ms for 4 collections, 1963062320 used; 
max is 17095983104
 INFO 16:29:18,498 Finished reading 
/data2/commit-cassandra/CommitLog-1349727956484.log
 INFO 16:29:18,499 Log replay complete, 0 replayed mutations


This is a standard upgrade process which includes a drain

 nodetool drain sometimes doesn't mark commitlog fully flushed
 -

 Key: CASSANDRA-4446
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4446
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: ubuntu 10.04 64bit
 Linux HOSTNAME 2.6.32-345-ec2 #48-Ubuntu SMP Wed May 2 19:29:55 UTC 2012 
 x86_64 GNU/Linux
 sun JVM
 cassandra 1.0.10 installed from apache deb
Reporter: Robert Coli
 Attachments: 
 cassandra.1.0.10.replaying.log.after.exception.during.drain.txt


 I recently wiped a customer's QA cluster. I drained each node and verified 
 that they were drained. When I restarted the nodes, I saw the commitlog 
 replay create a memtable and then flush it. I have attached a sanitized log 
 snippet from a representative node at the time. 
 It appears to show the following :
 1) Drain begins
 2) Drain triggers flush
 3) Flush triggers compaction
 4) StorageService logs DRAINED message
 5) compaction thread excepts
 6) on restart, same CF creates a memtable
 7) and then flushes it [1]
 The columnfamily involved in the replay in 7) is the CF for which the 
 compaction thread excepted in 5). This seems to suggest a timing issue 
 whereby the exception in 5) prevents the flush in 3) from marking all the 
 segments flushed, causing them to replay after restart.
 In case it might be relevant, I did an online change of compaction strategy 
 from Leveled to SizeTiered during the uptime period preceding this drain.
 [1] Isn't commitlog replay not supposed to automatically trigger a flush in 
 modern cassandra?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-26 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401681#comment-13401681
 ] 

Karl Mueller commented on CASSANDRA-4347:
-

Brandon,

Since you can reproduce, do you still want the logs?  I think I still have them 
if needed.

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor
 Attachments: LocationInfo-hd-279-Data.db, 
 dev-cass-post-assassinate-gossipinfo.txt, 
 kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-26 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401685#comment-13401685
 ] 

Karl Mueller commented on CASSANDRA-4347:
-

Mine log's more special! ;)  just kidding.

My opinion on the urgency of the bug would depend on how long 1.0.x will be 
around.  It's sort of an annoying yet in-your-face type of bug that doesn't 
really seem to have a problem beyond creating a lot of bad log entries.  Yet 
I could see people running into it, and then having to find the workaround.

Perhaps in the interim some type of log message could simply be added about 
maybe trying assassinate?  It should be easy to see Oh there's two IPs for 
this one token.  Is one old?

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor
 Attachments: LocationInfo-hd-279-Data.db, 
 dev-cass-post-assassinate-gossipinfo.txt, 
 kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-26 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401700#comment-13401700
 ] 

Karl Mueller commented on CASSANDRA-4347:
-

Actually, this morning I started to see the same messages, approximately 3 days 
later..

Related to https://issues.apache.org/jira/browse/CASSANDRA-2961 somehow?  Some 
people on IRC thought so, maybe.

Assassinate is NOT removing them successfully, anymore.

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor
 Attachments: LocationInfo-hd-279-Data.db, 
 dev-cass-post-assassinate-gossipinfo.txt, 
 kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-25 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-4347:


Attachment: LocationInfo-hd-279-Data.db

LocationInfo file attached from after node is re-IP'd and rejoins the cluster.  
This is in the problem state.

I also have system snapshots of before the move and after the assassinate, as 
well as a node that isn't moving (same snapshots) if you want them.

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor
 Attachments: LocationInfo-hd-279-Data.db, 
 dev-cass-post-assassinate-gossipinfo.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-25 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-4347:


Attachment: kaos-cass03-gossipinfo-postmove.txt
kaos-cass00-gossipinfo-postmove.txt

This is the gossipinfo from two points of view.  both postmove.txt files are 
after the node has changed IPs.

kaos-cass00 is the node which moved IPs.  The old IP is 10.12.8.97.  The new IP 
is 10.93.12.10.

kaos-cass03 is a node which did not move.  It's IP, if needed, is 10.12.8.87

I also have the gossipinfo from after the assassinate if needed. 

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor
 Attachments: LocationInfo-hd-279-Data.db, 
 dev-cass-post-assassinate-gossipinfo.txt, 
 kaos-cass00-gossipinfo-postmove.txt, kaos-cass03-gossipinfo-postmove.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-18 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396215#comment-13396215
 ] 

Karl Mueller commented on CASSANDRA-4347:
-

You mean, before I did the assassinate?  All of the nodes at this point are 
post-assassinate.  I'm attaching the gossipinfo from the 3-node cluster in the 
current state which is showing some old IPs.

(I thought assassinate went cross-cluster?)

I'm moving another cluster this week, and I'll try to grab a gossipinfo and the 
system tables during transition from that set.  I expect it will have the same 
issues.

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Priority: Minor

 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-18 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-4347:


Attachment: dev-cass-post-assassinate-gossipinfo.txt

This file contains gossipinfo from the 3-node cluster we already moved, after 
assassinate has run on each node for its own old IP.

The new IPs are all 10.93.15.xx and the old IPs are all 10.12.x.x.

The old IPs are as follows:

dev-cass00 - 10.12.9.160
dev-cass01 - 10.12.9.157
dev-cass02 - 10.12.9.33

I believe dev-cass00 has restarted since the assinate, but the others haven't.

New IPs are:

dev-cass00 - 10.93.15.10
dev-cass01 - 10.93.15.11
dev-cass02 - 10.93.15.12

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Priority: Minor
 Attachments: dev-cass-post-assassinate-gossipinfo.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-18 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396251#comment-13396251
 ] 

Karl Mueller commented on CASSANDRA-4347:
-

OK, I'll grab one this week when we do the move.

I assume you want the LocationInfo CF, or do you want the entire system 
keyspace?

 IP change of node requires assassinate to really remove old IP
 --

 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Assignee: Brandon Williams
Priority: Minor
 Attachments: dev-cass-post-assassinate-gossipinfo.txt


 In changing the IP addresses of nodes one-by-one, the node successfully moves 
 itself and its token.  Everything works properly.
 However, the node which had its IP changed (but NOT other nodes in the ring) 
 continues to have some type of state associated with the old IP and produces 
 log messages like this:
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
 InetAddress /10.12.9.157 is now UP
  INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
 Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the 
 same token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
  INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
 InetAddress /10.12.9.157 is now dead.
  INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
 FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
 /10.12.9.157 is now part of the cluster
 Other nodes do NOT have the old IP showing up in logs.  It's only the node 
 that moved.
 The old IP doesn't show up in ring anywhere or in any other fashion.  The 
 cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4347) IP change of node requires assassinate to really remove old IP

2012-06-15 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-4347:
---

 Summary: IP change of node requires assassinate to really remove 
old IP
 Key: CASSANDRA-4347
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4347
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.10
 Environment: RHEL6, 64bit
Reporter: Karl Mueller
Priority: Minor


In changing the IP addresses of nodes one-by-one, the node successfully moves 
itself and its token.  Everything works properly.

However, the node which had its IP changed (but NOT other nodes in the ring) 
continues to have some type of state associated with the old IP and produces 
log messages like this:


 INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 838) Node 
/10.12.9.157 is now part of the cluster
 INFO [GossipStage:1] 2012-06-15 15:25:01,490 Gossiper.java (line 804) 
InetAddress /10.12.9.157 is now UP
 INFO [GossipStage:1] 2012-06-15 15:25:01,491 StorageService.java (line 1017) 
Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same 
token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
 INFO [GossipTasks:1] 2012-06-15 15:25:11,373 Gossiper.java (line 818) 
InetAddress /10.12.9.157 is now dead.
 INFO [GossipTasks:1] 2012-06-15 15:25:32,380 Gossiper.java (line 632) 
FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
 INFO [GossipStage:1] 2012-06-15 15:26:32,490 Gossiper.java (line 838) Node 
/10.12.9.157 is now part of the cluster
 INFO [GossipStage:1] 2012-06-15 15:26:32,491 Gossiper.java (line 804) 
InetAddress /10.12.9.157 is now UP
 INFO [GossipStage:1] 2012-06-15 15:26:32,491 StorageService.java (line 1017) 
Nodes /10.12.9.157 and dev-cass01.sv.walmartlabs.com/10.93.15.11 have the same 
token 113427455640312821154458202477256070484.  Ignoring /10.12.9.157
 INFO [GossipTasks:1] 2012-06-15 15:26:42,402 Gossiper.java (line 818) 
InetAddress /10.12.9.157 is now dead.
 INFO [GossipTasks:1] 2012-06-15 15:27:03,410 Gossiper.java (line 632) 
FatClient /10.12.9.157 has been silent for 3ms, removing from gossip
 INFO [GossipStage:1] 2012-06-15 15:28:04,533 Gossiper.java (line 838) Node 
/10.12.9.157 is now part of the cluster


Other nodes do NOT have the old IP showing up in logs.  It's only the node that 
moved.

The old IP doesn't show up in ring anywhere or in any other fashion.  The 
cluster seems to be fully operational, so I think it's just a cleanup issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files

2012-04-27 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263492#comment-13263492
 ] 

Karl Mueller commented on CASSANDRA-4182:
-

Yes it maxes one CPU.  That's what one CPU can do.

 multithreaded compaction very slow with large single data file and a few tiny 
 data files
 

 Key: CASSANDRA-4182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.9
 Environment: Redhat
 Sun JDK 1.6.0_20-b02
Reporter: Karl Mueller

 Turning on multithreaded compaction makes compaction time take nearly twice 
 as long in our environment, which includes a very large SStable and a few 
 smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT 
 turned off.  
 compaction_throughput_mb_per_sec is set to 0.  
 We currently compact about 500 GB of data nightly due to overwrites.  
 (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out 
 completely)  The time it takes to do the compaction is:
 451m13.284s (multithreaded)
 273m58.740s (multihtreaded disabled)
 Our nodes run on SSDs and therefore have a high read and write rate available 
 to them. The primary CF they're compacting right now, with most of the data, 
 is localized to a very large file (~300+GB) and a few tiny files (1-10GB) 
 since the CF has become far less active.  
 I would expect the multithreaded compaction to be no worse than the single 
 threaded compaction, or perhaps a higher cost in CPU for the same 
 performance, but it's half the speed with the same CPU usage, or more CPU. 
 I have two graphs available from testing 2 or 3 compactions which demonstrate 
 some interesting characteristics.  1.0.9 was installed on the 21st with MT 
 turned on.  Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned 
 off seems to perform as well as 0.8.7.
 http://www.xney.com/temp/cass-irq.png  (interrupts)
 http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks)
 This demonstrates a large increase in rescheduling interrupts and only half 
 the bandwidth used on the disks.  I suspect this is because some kind of 
 threads are thrashing or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files

2012-04-27 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263907#comment-13263907
 ] 

Karl Mueller commented on CASSANDRA-4182:
-

Yes I figure it's a worst-case scenario pretty much.  I didn't expect it to be 
any faster than single-threaded, possibly a bit slower or taking more CPU.  

However, it's a LOT slower (~80% slower). 

I'd be happy if it were the same speed as the single thread for the worst case 
with more CPU.  


 multithreaded compaction very slow with large single data file and a few tiny 
 data files
 

 Key: CASSANDRA-4182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.9
 Environment: Redhat
 Sun JDK 1.6.0_20-b02
Reporter: Karl Mueller

 Turning on multithreaded compaction makes compaction time take nearly twice 
 as long in our environment, which includes a very large SStable and a few 
 smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT 
 turned off.  
 compaction_throughput_mb_per_sec is set to 0.  
 We currently compact about 500 GB of data nightly due to overwrites.  
 (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out 
 completely)  The time it takes to do the compaction is:
 451m13.284s (multithreaded)
 273m58.740s (multihtreaded disabled)
 Our nodes run on SSDs and therefore have a high read and write rate available 
 to them. The primary CF they're compacting right now, with most of the data, 
 is localized to a very large file (~300+GB) and a few tiny files (1-10GB) 
 since the CF has become far less active.  
 I would expect the multithreaded compaction to be no worse than the single 
 threaded compaction, or perhaps a higher cost in CPU for the same 
 performance, but it's half the speed with the same CPU usage, or more CPU. 
 I have two graphs available from testing 2 or 3 compactions which demonstrate 
 some interesting characteristics.  1.0.9 was installed on the 21st with MT 
 turned on.  Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned 
 off seems to perform as well as 0.8.7.
 http://www.xney.com/temp/cass-irq.png  (interrupts)
 http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks)
 This demonstrates a large increase in rescheduling interrupts and only half 
 the bandwidth used on the disks.  I suspect this is because some kind of 
 threads are thrashing or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files

2012-04-26 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-4182:


Priority: Major  (was: Minor)

Changing the priority on this to Major since it's actually a significant 
problem.  The workaround is to not use multi-threaded compaction, but this will 
impact a mixed deployment of classic compactions and the new leveldb.

If I'm wrong about it, feel free to change it back of course. :)

 multithreaded compaction very slow with large single data file and a few tiny 
 data files
 

 Key: CASSANDRA-4182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.9
 Environment: Redhat
 Sun JDK 1.6.0_20-b02
Reporter: Karl Mueller

 Turning on multithreaded compaction makes compaction time take nearly twice 
 as long in our environment, which includes a very large SStable and a few 
 smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT 
 turned off.  
 compaction_throughput_mb_per_sec is set to 0.  
 We currently compact about 500 GB of data nightly due to overwrites.  
 (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out 
 completely)  The time it takes to do the compaction is:
 451m13.284s (multithreaded)
 273m58.740s (multihtreaded disabled)
 Our nodes run on SSDs and therefore have a high read and write rate available 
 to them. The primary CF they're compacting right now, with most of the data, 
 is localized to a very large file (~300+GB) and a few tiny files (1-10GB) 
 since the CF has become far less active.  
 I would expect the multithreaded compaction to be no worse than the single 
 threaded compaction, or perhaps a higher cost in CPU for the same 
 performance, but it's half the speed with the same CPU usage, or more CPU. 
 I have two graphs available from testing 2 or 3 compactions which demonstrate 
 some interesting characteristics.  1.0.9 was installed on the 21st with MT 
 turned on.  Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned 
 off seems to perform as well as 0.8.7.
 http://www.xney.com/temp/cass-irq.png  (interrupts)
 http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks)
 This demonstrates a large increase in rescheduling interrupts and only half 
 the bandwidth used on the disks.  I suspect this is because some kind of 
 threads are thrashing or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files

2012-04-24 Thread Karl Mueller (JIRA)
Karl Mueller created CASSANDRA-4182:
---

 Summary: multithreaded compaction very slow with large single data 
file and a few tiny data files
 Key: CASSANDRA-4182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.9
 Environment: Redhat
Sun JDK 1.6.0_20-b02
Reporter: Karl Mueller
Priority: Minor


Turning on multithreaded compaction makes compaction time take nearly twice as 
long in our environment, which includes a very large SStable and a few smaller 
ones, relative to either 0.8.x with MT turned off or 1.0.x with MT turned off.  

compaction_throughput_mb_per_sec is set to 0.  

We currently compact about 500 GB of data nightly due to overwrites.  (LevelDB 
will probably be enabled on the busy CFs once 1.0.x is rolled out completely)  
The time it takes to do the compaction is:

451m13.284s (multithreaded)
273m58.740s (multihtreaded disabled)

Our nodes run on SSDs and therefore have a high read and write rate available 
to them. The primary CF they're compacting right now, with most of the data, is 
localized to a very large file (~300+GB) and a few tiny files (1-10GB) since 
the CF has become far less active.  

I would expect the multithreaded compaction to be no worse than the single 
threaded compaction, or perhaps a higher cost in CPU for the same performance, 
but it's half the speed with the same CPU usage, or more CPU. 

I have two graphs available from testing 2 or 3 compactions which demonstrate 
some interesting characteristics.  1.0.9 was installed on the 21st with MT 
turned on.  Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned 
off seems to perform as well as 0.8.7.

http://www.xney.com/temp/cass-irq.png  (interrupts)

http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks)

This demonstrates a large increase in rescheduling interrupts and only half the 
bandwidth used on the disks.  I suspect this is because some kind of threads 
are thrashing or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-2353) JMX call StorageService.Operations.getNaturalEndpoints returns an NPE

2011-03-18 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008666#comment-13008666
 ] 

Karl Mueller commented on CASSANDRA-2353:
-

I don't have 0.7.4 in my environment.. but when I upgrade to .4 or .5 I'll post 
it.  I think driftx on irc had a newer version.

 JMX call StorageService.Operations.getNaturalEndpoints returns an NPE
 -

 Key: CASSANDRA-2353
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2353
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 0.7.0
Reporter: Karl Mueller
Priority: Minor
 Fix For: 0.7.5


 The JMX operation StorageService.Operations.getNaturalEndpoints in 
 cassandra.db always returns an NPE.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CASSANDRA-2353) JMX call StorageService.Operations.getNaturalEndpoints returns an NPE

2011-03-17 Thread Karl Mueller (JIRA)
JMX call StorageService.Operations.getNaturalEndpoints returns an NPE
-

 Key: CASSANDRA-2353
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2353
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 0.7.0
Reporter: Karl Mueller
Priority: Minor


The JMX operation StorageService.Operations.getNaturalEndpoints in cassandra.db 
always returns an NPE.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-18 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983513#action_12983513
 ] 

Karl Mueller commented on CASSANDRA-1932:
-

Yes, I agree it was user error

-karl

 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-04 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977180#action_12977180
 ] 

Karl Mueller commented on CASSANDRA-1932:
-

Not sure.  I have all of the -f-*.db files from running it saved, but had to 
remove them from the active Cassandra cluster.  (I ran the 0.7.1 branch by 
mistakes as i hadn't realized there was a 0.7.0 branch)

What would be helpful?


 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-04 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977553#action_12977553
 ] 

Karl Mueller commented on CASSANDRA-1932:
-

They were all -f- versions.  -e- versions worked fine.

 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1931) Internal error processing insert java.lang.AssertionError at org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:219)

2011-01-03 Thread Karl Mueller (JIRA)
Internal error processing insert java.lang.AssertionError  at 
org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:219)
---

 Key: CASSANDRA-1931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1931
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
 Environment: Linux Fedora 12 x86_64
Reporter: Karl Mueller


ERROR [pool-1-thread-137] 2011-01-03 18:22:21,751 Cassandra.java (line 2960) 
Internal error processing insert
java.lang.AssertionError
at 
org.apache.cassandra.service.StorageProxy.sendMessages(StorageProxy.java:219)
at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:174)
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:412)
at 
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:349)
at 
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2952)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-03 Thread Karl Mueller (JIRA)
NegativeArraySizeException at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
-

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller


ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.NegativeArraySizeException
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
at 
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at 
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1932) NegativeArraySizeException at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)

2011-01-03 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977136#action_12977136
 ] 

Karl Mueller commented on CASSANDRA-1932:
-

I do have a very large # of rows (150-200MM on many nodes)

 NegativeArraySizeException at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 -

 Key: CASSANDRA-1932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1932
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Karl Mueller
Assignee: Ryan King
 Fix For: 0.7.1


 ERROR [ReadStage:30017] 2011-01-03 19:28:45,406 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.NegativeArraySizeException
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:28)
 at 
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:9)
 at 
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:104)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1219)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1081)
 at org.apache.cassandra.db.Table.getRow(Table.java:384)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1883) NPE in get_slice quorum read

2010-12-18 Thread Karl Mueller (JIRA)
NPE in get_slice quorum read


 Key: CASSANDRA-1883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1883
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.0 rc 2
 Environment: Linux Fedora 12 x86_64
Reporter: Karl Mueller
Priority: Minor


Getting this NPE as of the 2010-12-17 0.7 trunk.  Some data may be corrupt 
somewhere on a node.  It could be a null key somewhere.

ERROR [pool-1-thread-28] 2010-12-18 12:53:20,411 Cassandra.java (line 2707) 
Internal error processing get_slice
java.lang.NullPointerException
at 
org.apache.cassandra.service.DigestMismatchException.init(DigestMismatchException.java:30)
at 
org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:92)
at 
org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:43)
at 
org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:91)
at 
org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:362)
at 
org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
at 
org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128)
at 
org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:225)
at 
org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:301)
at 
org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:263)
at 
org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2699)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1883) NPE in get_slice quorum read

2010-12-18 Thread Karl Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972872#action_12972872
 ] 

Karl Mueller commented on CASSANDRA-1883:
-

Additional information: one of the SSD raid0s went bad recently.  This may have 
produced weird data for one cassandra node.   

 NPE in get_slice quorum read
 

 Key: CASSANDRA-1883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1883
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.0
 Environment: Linux Fedora 12 x86_64
Reporter: Karl Mueller
Priority: Minor
 Fix For: 0.7.0


 Getting this NPE as of the 2010-12-17 0.7 trunk.  Some data may be corrupt 
 somewhere on a node.  It could be a null key somewhere.
 ERROR [pool-1-thread-28] 2010-12-18 12:53:20,411 Cassandra.java (line 2707) 
 Internal error processing get_slice
 java.lang.NullPointerException
 at 
 org.apache.cassandra.service.DigestMismatchException.init(DigestMismatchException.java:30)
 at 
 org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:92)
 at 
 org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:43)
 at 
 org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:91)
 at 
 org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:362)
 at 
 org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
 at 
 org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128)
 at 
 org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:225)
 at 
 org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:301)
 at 
 org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:263)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2699)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (CASSANDRA-1884) sstable2json / sstablekeys should verify key order in -Data and -Index files

2010-12-18 Thread Karl Mueller (JIRA)
sstable2json / sstablekeys should verify key order in -Data and -Index files


 Key: CASSANDRA-1884
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1884
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.7.0
Reporter: Karl Mueller
Priority: Minor


Some cassandra users use sstable2json and sstablekeys to check -Data and -Index 
files for integrity.  For example, if compaction fails, you can find out which 
files are causing the compaction to fail because they're corrupt.  One type of 
corruption that can happen in both -Data and -Index files are keys getting out 
of order.  (This shouldn't happen, but it can)  Cassandra catches this error 
during compaction, but both tools didn't catch it.  

This small patch simply causes an IO Exception during export if it finds out of 
order keys.  

Some further work on it may make this optional with a command-line switch - it 
may be useful to export the data to json even though it's out of order to 
manually play it back, or have another script re-order it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (CASSANDRA-1883) NPE in get_slice quorum read

2010-12-18 Thread Karl Mueller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Mueller updated CASSANDRA-1883:


Attachment: digestmismatch-debug.txt

this is a debug output from a node with this NPE happening around the same 
time.  If you need more from the log, I have the rest of it available

 NPE in get_slice quorum read
 

 Key: CASSANDRA-1883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1883
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.0
 Environment: Linux Fedora 12 x86_64
Reporter: Karl Mueller
Priority: Minor
 Fix For: 0.7.0

 Attachments: digestmismatch-debug.txt


 Getting this NPE as of the 2010-12-17 0.7 trunk.  Some data may be corrupt 
 somewhere on a node.  It could be a null key somewhere.
 ERROR [pool-1-thread-28] 2010-12-18 12:53:20,411 Cassandra.java (line 2707) 
 Internal error processing get_slice
 java.lang.NullPointerException
 at 
 org.apache.cassandra.service.DigestMismatchException.init(DigestMismatchException.java:30)
 at 
 org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:92)
 at 
 org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:43)
 at 
 org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:91)
 at 
 org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:362)
 at 
 org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
 at 
 org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:128)
 at 
 org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:225)
 at 
 org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:301)
 at 
 org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:263)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2699)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.