Which SSTable caused CorruptSSTableException?
Running Cassandra 1.2.9 in AWS with a 12 host cluster, I am getting lots of CorruptSSTableException in system.log on one of my hosts. Is it possible to find out which SSTable(s) is/are corrupt? I'm currently running "nodetool scrub" on the relevant host, but that doesn't seem like an efficient way to fix the problem (if it fixes it at all). Here's an example error: ERROR [ReplicateOnWriteStage:1070] 2014-02-25 17:43:03,518 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:1070,5,main] java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:65) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593) ... 3 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:394) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:380) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:116) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:60) ... 15 more
Re: Conference or training recommendations
Datstax recently started offering free virtual training. You may want to try that first: http://www.datastax.com/what-we-offer/products-services/training/virtual-training There are also many Cassandra meetups around the world: http://cassandra.meetup.com/ Datstax also offers classroom training, but it is not cheap: http://www.datastax.com/what-we-offer/products-services/training As for conferences, this year's Cassandra Summits were in San Francisco in June and London in October. I have not seen an announcement of next year's summit(s). -Ike On Sun, Dec 15, 2013 at 12:07 PM, Robert Wille wrote: > I’d like to attend a conference or some form of training to become more > proficient and knowledgable about Cassandra. Any suggestions? >
CQL workaround for modifying a primary key
What is the best practice for modifying the primary key definition of a table in Cassandra 1.2.9? Say I have this table: CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); I want to add a new column named version and include that column in the primary key. CQL will let me add the column, but you can't change the primary key for an existing table. So I drop the table and recreate it: DROP TABLE temperature; CREATE TABLE temperature ( weatherstation_id text, version int, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,version,event_time) ); But then I start getting errors like this: java.io.FileNotFoundException: /var/lib/cassandra/data/test/temperature/test-temperature-ic-8316-Data.db (No such file or directory) So I guess the drop table doesn't actually delete the data, and I end up with a problem like this: https://issues.apache.org/jira/browse/CASSANDRA-4857 What's a good workaround for this, assuming I don;t want to change the name of my table? Should I just truncate the table, then drop it and recreate it? Thanks. -Ike Walker
Re: Output of "nodetool ring" with virtual nodes
Hi Paulo, Yes, that is expected. Now that you are using virtual nodes you should use "nodetool status" to see an output similar to what you saw with "nodetool ring" before you enabled virtual nodes. -Ike Walker On Oct 15, 2013, at 11:45 AM, Paulo Motta wrote: > Hello, > > I recently did the "Enabling virtual nodes on an existing production cluster" > procedure > (http://www.datastax.com/documentation/cassandra/1.2/webhelp/cassandra/configuration/configVnodesProduction_t.html), > and noticed that the output of the command "nodetool ring" changes > significantly when virtual nodes are enabled in a new data center. > > Before, it showed only 1 token per node, now it shows 256 tokens per node > (output below). So, that means 256*N entries, which makes the command > unreadable, while before it was pretty useful to check the cluster status in > a human-readable format. Moreover, the command is taking much longer to > execute. > > Is this expected behavior, or did I make any mistake during the procedure? > > Cassandra version: 1.2.10 > > Before it was like this: > > Datacenter: VNodesDisabled > == > Replicas: 3 > > Address RackStatus State LoadOwns > Token > > 28356863910078205239614050619314017619 > AAA.BBB.CCC.1 x Up Normal 236.49 GB 20.83% > 113427455640312821154458002477256070480 > AAA.BBB.CCC.2 x Up Normal 347.6 GB29.17% > 77981375752715064543690004203113548455 > AAA.BBB.CCC.3 x Up Normal 332.46 GB 37.50% > 106338614526609105785626408013334622686 > AAA.BBB.CCC.4 x Up Normal 198.94 GB 20.83% > 141784319550391026443072753090570088104 > AAA.BBB.CCC.5 x Up Normal 330.68 GB 33.33% > 92159807707754167187997289512070557265 > AAA.BBB.CCC.6 x Up Normal 268.64 GB 25.00% > 155962751505430129087380028400227096915 > AAA.BBB.CCC.7 x Up Normal 262.43 GB 25.00% > 163051967482949680409533666060055601314 > AAA.BBB.CCC.8 x Up Normal 200.18 GB 16.67% > 1 > AAA.BBB.CCC.9 x Up Normal 189.13 GB 16.67% > 120516671617832372476611040132084574885 > AAA.BBB.CCC.10 x Up Normal 220.7 GB25.00% > 42535295865117307932921025928971026429 > AAA.BBB.CCC.11 x Up Normal 259.36 GB 25.00% > 35446079887597756610768088274142522024 > AAA.BBB.CCC.12 x Up Normal 270.32 GB 25.00% > 28356863910078205088614550619314017619 > > > Now it is like this: > Datacenter: VNodesEnabled > == > Replicas: 3 > > Address RackStatus State LoadOwns > Token > > 168998414504718061309167200639854699955 > XXX.YYY.ZZZ.1y Up Normal 122.84 KB 0.00% > 4176479009577065052560790400565254 > XXX.YYY.ZZZ.1y Up Normal 122.84 KB 0.00% > 291517050854558940844583227825291566 > XXX.YYY.ZZZ.1y Up Normal 122.84 KB 0.00% > 389126351568277133928956802249918052 > XXX.YYY.ZZZ.1y Up Normal 122.84 KB 0.00% > 504218791605899949008255495493335240 > XXX.YYY.ZZZ.2y Up Normal 122.84 KB 0.00% > 4176479009577065052560790400565254 > XXX.YYY.ZZZ.2y Up Normal 122.84 KB 0.00% > 291517050854558940844583227825291566 > XXX.YYY.ZZZ.2y Up Normal 122.84 KB 0.00% > 389126351568277133928956802249918052 > XXX.YYY.ZZZ.2y Up Normal 122.84 KB 0.00% > 504218791605899949008255495493335240 > XXX.YYY.ZZZ.3y Up Normal 122.84 KB 0.00% > 4176479009577065052560790400565254 > XXX.YYY.ZZZ.3y Up Normal 122.84 KB 0.00% > 291517050854558940844583227825291566 > XXX.YYY.ZZZ.3y Up Normal 122.84 KB 0.00%
Re: Long running nodetool move operation
The restart worked. Thanks, Rob! After the restart I ran 'nodetool move' again, used 'nodetool netstats | grep -v "0%"' to verify that data was actively streaming, and the move completed successfully. -Ike On Sep 10, 2013, at 11:04 AM, Ike Walker wrote: > Below is the output of "nodetool netstats". > > I've never run that before, but from what I can read it shows no incoming > streams, and a bunch of outgoing streams to two other nodes, all at 0%. > > I'll try the restart. > > Thanks. > > nodetool netstats > Mode: MOVING > Streaming to: /10.xxx.xx.xx > > ... > Streaming to: /10.xxx.xx.xxx > > ... > Not receiving any streams. > Pool NameActive Pending Completed > Commandsn/a 0 243401039 > Responses n/a 0 295522535 > > On Sep 9, 2013, at 10:54 PM, Robert Coli wrote: > >> On Mon, Sep 9, 2013 at 7:08 PM, Ike Walker wrote: >> I've been using nodetool move to rebalance my cluster. Most of the moves >> take under an hour, or a few hours at most. The current move has taken 4+ >> days so I'm afraid it will never complete. What's the best way to cancel it >> and try again? >> >> What does "nodetool netstats" say? If it shows no streams in progress, the >> move is probably hung... >> >> Restart the affected node. If that doesn't work, restart other nodes which >> might have been receiving a stream. I think in the case of "move" it should >> work to just restart the affected node. Restart the move, you will re-stream >> anything you already streamed once. >> >> https://issues.apache.org/jira/browse/CASSANDRA-3486 >> >> If this ticket were completed, it would presumably include the ability to >> stop other hung streaming operations, like "move". >> >> =Rob >
Re: Long running nodetool move operation
Below is the output of "nodetool netstats". I've never run that before, but from what I can read it shows no incoming streams, and a bunch of outgoing streams to two other nodes, all at 0%. I'll try the restart. Thanks. nodetool netstats Mode: MOVING Streaming to: /10.xxx.xx.xx ... Streaming to: /10.xxx.xx.xxx ... Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 243401039 Responses n/a 0 295522535 On Sep 9, 2013, at 10:54 PM, Robert Coli wrote: > On Mon, Sep 9, 2013 at 7:08 PM, Ike Walker wrote: > I've been using nodetool move to rebalance my cluster. Most of the moves take > under an hour, or a few hours at most. The current move has taken 4+ days so > I'm afraid it will never complete. What's the best way to cancel it and try > again? > > What does "nodetool netstats" say? If it shows no streams in progress, the > move is probably hung... > > Restart the affected node. If that doesn't work, restart other nodes which > might have been receiving a stream. I think in the case of "move" it should > work to just restart the affected node. Restart the move, you will re-stream > anything you already streamed once. > > https://issues.apache.org/jira/browse/CASSANDRA-3486 > > If this ticket were completed, it would presumably include the ability to > stop other hung streaming operations, like "move". > > =Rob
Long running nodetool move operation
I've been using nodetool move to rebalance my cluster. Most of the moves take under an hour, or a few hours at most. The current move has taken 4+ days so I'm afraid it will never complete. What's the best way to cancel it and try again? I'm running a cluster of 12 nodes at AWS. Each node runs Cassandra 1.2.5 on an m1.xlarge EC2 instance, and they are spread across 3 availability zones within a single region. I've seen some of these errors in the log. I'm not sure if it's related or not: ERROR [CompactionExecutor:4092] 2013-09-10 01:31:49,783 CassandraDaemon.java (line 175) Exception in thread Thread[CompactionExecutor:4092,1,main] java.lang.IndexOutOfBoundsException: index (1) must be less than size (1) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) at com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45) at org.apache.cassandra.db.marshal.CompositeType.getComparator(CompositeType.java:94) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:76) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:128) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:114) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:109) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:219) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumnsFromSSTable(ColumnFamilySerializer.java:149) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:114) at org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:98) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:160) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:76) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:57) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Here's the status of the cluster as reported by nodetool ring, showing the one node in "Moving" state: Datacenter: us-east == Address RackStatus State LoadOwns Token 127605887595351923798765477786913079290 10.xxx.xxx.xxx 1c Up Normal 224.53 GB 25.00% 0 10.xxx.xxx.xxx 1d Up Moving 297.46 GB 2.44% 4150051970709140963435425752946440221 10.xxx.xxx.xxx 1d Up Normal 107.75 GB 5.89% 14178431955039102644307275309657008810 10.xxx.xxx.xxx 1e Up Normal 82.75 GB8.33% 28356863910078205288614550619314017620 10.xxx.xxx.xxx 1e Up Normal 173.83 GB 2.99% 33451586107772559423309548485325625873 10.xxx.xxx.xxx 1c Up Normal 64.4 GB
How many seed nodes should I use?
What is the best practice for how many seed nodes to have in a Cassandra cluster? I remember reading a recommendation of 2 seeds per datacenter in Datastax documentation for 0.7, but I'm interested to know what other people are doing these days, especially in AWS. I'm running a cluster of 12 nodes at AWS. Each node runs Cassandra 1.2.5 on an m1.xlarge EC2 instance, and they are spread across 3 availability zones within a single region. To keep things simple I currently have all 12 nodes listed as seeds. That seems like overkill to me, but I don't know the pros and cons of too many or too few seeds. Any advice is appreciated. Thanks! -Ike Walker