Nodes Flapping in the RIng
We have a new 6-node cluster running 0.6.13 (Due to some client side issues we need to be on 0.6x for time being) that we are injecting data into and ran into some issues with nodes going down and then up quickly in the ring. All nodes are effected and we have rules out the network layer. It happens on all nodes and seems related to GC or mtable flushes. We had things stable but after a series of data migrations we saw some swapping so we tuned to max heap down and this helped with swapping but the flapping still persists. The systems have 6-cores and 24 GB ram, max heap is at 12G. We are using the Parallel GC colector for throughput. Our run file for starting cassandra looks like this: exec 2>&1 ulimit -n 262144 cd /opt/cassandra-0.6.13 exec chpst -u cassandra java \ -ea \ -Xms4G \ -Xmx12G \ -XX:TargetSurvivorRatio=90 \ -XX:+PrintGCDetails \ -XX:+AggressiveOpts \ -XX:+UseParallelGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Djava.rmi.server.hostname=10.20.3.155 \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcassandra-foreground=yes \ -Dstorage-config=/etc/cassandra \ -cp '/etc/cassandra:/opt/cassandra-0.6.13/lib/*' \ org.apache.cassandra.thrift.CassandraDaemon <&- Our storage conf like this for the mem/disk stuff: mmap 4 64 32 64 16 64 256 0.3 60 12 32 periodic 1 864000 true Any thoughts on this would be really interesting. -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Re: Upgrading to 1.0
If we upgrade and want to use compression, how is the old data handled? Does it read and then write all sstables out to new compressed files one at a time or do something else? I'm considering the storage require on top of what is needed for the existing data. Regards, jake On Wed, Nov 2, 2011 at 2:57 PM, Jonathan Ellis wrote: > 1.0 can read 0.6 data files but is not network-compatible, so you need > to do an all-at-once upgrade. Additionally, the Thrift api changed > started with 0.7; see NEWS.txt for details. > > On Wed, Nov 2, 2011 at 6:46 AM, Jake Maizel wrote: > > Hello, > > We run a medium sized cluster of 12 nodes on 0.6.13 and would like to > move > > to 1.0. What's the best practices for this? Can we do a rolling > upgrade or > > does the entire cluster need to be upgraded at once? > > Regards, > > Jake > > > > -- > > Jake Maizel > > Head of Network Operations > > Soundcloud > > > > Mail & GTalk: j...@soundcloud.com > > Skype: jakecloud > > > > Rosenthaler strasse 13, 101 19, Berlin, DE > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Large Increase in SSTable count after Upgrade to 0.6.13
he last 5000ms 2011-11-04_12:21:55.41788 '' INFO [DroppedMessagesLogger] 12:21:55,417 GCInspector.java:143 Pool NameActive Pending 2011-11-04_12:21:55.41789 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 STREAM-STAGE 0 0 2011-11-04_12:21:55.41851 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 FILEUTILS-DELETE-POOL 0 0 2011-11-04_12:21:55.41877 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 RESPONSE-STAGE0 0 2011-11-04_12:21:55.42379 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 ROW-READ-STAGE8 211 2011-11-04_12:21:55.42403 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 LB-OPERATIONS 0 0 2011-11-04_12:21:55.42427 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 MISCELLANEOUS-POOL0 0 2011-11-04_12:21:55.42448 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 GMFD 0 0 2011-11-04_12:21:55.42473 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 CONSISTENCY-MANAGER 0 0 2011-11-04_12:21:55.42495 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 LB-TARGET 0 0 2011-11-04_12:21:55.42515 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 ROW-MUTATION-STAGE1 1 2011-11-04_12:21:55.42537 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 MESSAGE-STREAMING-POOL0 0 2011-11-04_12:21:55.42561 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 LOAD-BALANCER-STAGE 0 0 2011-11-04_12:21:55.42580 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 FLUSH-SORTER-POOL 0 0 2011-11-04_12:21:55.42602 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 MEMTABLE-POST-FLUSHER 1 3 2011-11-04_12:21:55.42626 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 AE-SERVICE-STAGE 0 0 2011-11-04_12:21:55.42649 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 FLUSH-WRITER-POOL 1 1 2011-11-04_12:21:55.42670 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:157 HINTED-HANDOFF-POOL 1 8 2011-11-04_12:21:55.42695 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:161 CompactionManager n/a 3423 2011-11-04_12:21:55.42717 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:165 ColumnFamilyMemtable ops,data Row cache size/cap Key cache size/cap 2011-11-04_12:21:55.42832 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 system.LocationInfo 0,0 0/0 1/2 2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 system.HintsColumnFamily 0,0 0/0 1/6 2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 SoundCloud.OwnActivities 2545,47090 0/041956/20 2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.ExclusiveTracks 570,11872 0/0 2645/20 2011-11-04_12:21:55.42834 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.Activities 126085,2171439 0/0 20/20 2011-11-04_12:21:55.42872 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.IncomingTracks 95470,1604563 0/0 20/20 We have tried to run manual compactions but these don't seem to happen every, like do to the high pending count. I am wondering what the best way to figure out what is blocking on these nodes, in order to get compaction back in that game. I have considered isolating one node via the network to see if it can catch up once there is no load on it. Not sure of the negative side effects of that. Any suggestions on resolving this? Regards, Jake -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Upgrading to 1.0
Hello, We run a medium sized cluster of 12 nodes on 0.6.13 and would like to move to 1.0. What's the best practices for this? Can we do a rolling upgrade or does the entire cluster need to be upgraded at once? Regards, Jake -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Streaming stuck on one node during Repair
Hello, I have one node of a cluster that is stuck in a streaming out state sending to the node that is being repaired. If I looked the AE Thread in jconsole I see this trace: Name: AE-SERVICE-STAGE:1 State: WAITING on java.util.concurrent.FutureTask$Sync@7e3e0044 Total blocked: 0 Total waited: 23 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) java.util.concurrent.FutureTask.get(FutureTask.java:83) org.apache.cassandra.service.AntiEntropyService$Differencer.performStreamingRepair(AntiEntropyService.java:515) org.apache.cassandra.service.AntiEntropyService$Differencer.run(AntiEntropyService.java:475) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) The Steam stage shows this trace: Name: STREAM-STAGE:1 State: WAITING on org.apache.cassandra.utils.SimpleCondition@1158f928 Total blocked: 9 Total waited: 16 Stack trace: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:38) org.apache.cassandra.streaming.StreamOutManager.waitForStreamCompletion(StreamOutManager.java:164) org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:138) org.apache.cassandra.service.AntiEntropyService$Differencer$1.runMayThrow(AntiEntropyService.java:511) org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Is there a way to unstick these threads? Or am I stuck restarting the node and then rerunning the entire repair? All the other nodes seemed to complete properly and one is still running. I am thinking to wait until the current one finishes and then restart the stuck nodes then once its up run repair again on the node needing it. Thoughts? (0.6.6 on a 7 nodes cluster) -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Repairing lost data
Hello, In a cluster running 0.6.6 one node lost part of a data file due to an operator error. An older file was moved in place to bring cassandra up again. Now we get lots of these in the log: 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258 CassandraDaemon.java:87 Uncaught exception in thread Thread[ROW-READ-STAGE:4327,5,main] 2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException 2011-08-27_10:30:55.26220 at org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326) 2011-08-27_10:30:55.26220 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) 2011-08-27_10:30:55.26221 at java.io.DataInputStream.readUTF(DataInputStream.java:592) 2011-08-27_10:30:55.26221 at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) 2011-08-27_10:30:55.26222 at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.(SSTableSliceIterator.java:125) 2011-08-27_10:30:55.26222 at org.apache.cassandra.db.filter.SSTableSliceIterator.(SSTableSliceIterator.java:59) 2011-08-27_10:30:55.26223 at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) 2011-08-27_10:30:55.26223 at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990) 2011-08-27_10:30:55.26224 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901) 2011-08-27_10:30:55.26224 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870) 2011-08-27_10:30:55.26224 at org.apache.cassandra.db.Table.getRow(Table.java:382) 2011-08-27_10:30:55.26225 at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) 2011-08-27_10:30:55.26225 at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70) 2011-08-27_10:30:55.26226 at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49) 2011-08-27_10:30:55.26226 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-08-27_10:30:55.26227 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-08-27_10:30:55.26227 at java.lang.Thread.run(Thread.java:662) Is it possible to use nodetool repair to fix this with the current data set? I issued a repair command and the other nodes seem to be doing the correct things but I concerned by this: "Uncaught exception in thread Thread[ROW-READ-STAGE:4327,5,main]" Will the affect node ever be able to do anything? Also, only Data file was affected, the index and Filter files are still the originals. Should I keep these or do anything else with them? My alternative is to delete all the data and run repair again which I have done in the past and it works but takes a while with a large data set. I am open to ideas and any suggestions are welcome. -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Upgrade to a different version?
We are running 0.6.6 and are considering upgrading to either 0.6.8 or one of the 0.7.x releases. What is the recommended version and procedure? What are the issues we face? Are there any specific storage gotchas we need to be aware of? Are there any docs around this process for review? Thanks, jake -- Jake Maizel Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Help with Error on reading sstable
I'm getting this error after a space problem caused issues during a repair operation on one of six nodes in our cluster: 2011-02-22_11:54:50.26788 'ERROR [ROW-READ-STAGE:305] 11:54:50,267 CassandraDaemon.java:87 Uncaught exception in thread Thread[ROW-READ-STAGE:305,5,main] 2011-02-22_11:54:50.26789 'java.lang.ArrayIndexOutOfBoundsException 2011-02-22_11:54:50.26789 at org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326) 2011-02-22_11:54:50.26790 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) 2011-02-22_11:54:50.26790 at java.io.DataInputStream.readUTF(DataInputStream.java:592) 2011-02-22_11:54:50.26790 at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) 2011-02-22_11:54:50.26791 at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.(SSTableSliceIterator.java:125) 2011-02-22_11:54:50.26791 at org.apache.cassandra.db.filter.SSTableSliceIterator.(SSTableSliceIterator.java:59) 2011-02-22_11:54:50.26792 at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) 2011-02-22_11:54:50.26792 at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990) 2011-02-22_11:54:50.26793 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901) 2011-02-22_11:54:50.26793 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870) 2011-02-22_11:54:50.26794 at org.apache.cassandra.db.Table.getRow(Table.java:382) 2011-02-22_11:54:50.26794 at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) 2011-02-22_11:54:50.26794 at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70) 2011-02-22_11:54:50.26795 at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49) 2011-02-22_11:54:50.26795 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-02-22_11:54:50.26796 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-02-22_11:54:50.26796 at java.lang.Thread.run(Thread.java:619) 2011-02-22_11:54:54.71933 'ERROR [ROW-READ-STAGE:302] 11:54:54,718 DebuggableThreadPoolExecutor.java:102 Error in ThreadPoolExecutor 2011-02-22_11:54:54.71935 'java.lang.ArrayIndexOutOfBoundsException 2011-02-22_11:54:54.71935 at org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326) 2011-02-22_11:54:54.71936 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) 2011-02-22_11:54:54.71936 at java.io.DataInputStream.readUTF(DataInputStream.java:592) 2011-02-22_11:54:54.71937 at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) 2011-02-22_11:54:54.71937 at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.(SSTableSliceIterator.java:125) 2011-02-22_11:54:54.71937 at org.apache.cassandra.db.filter.SSTableSliceIterator.(SSTableSliceIterator.java:59) 2011-02-22_11:54:54.71938 at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) 2011-02-22_11:54:54.71938 at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990) 2011-02-22_11:54:54.71939 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901) 2011-02-22_11:54:54.71939 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870) 2011-02-22_11:54:54.71941 at org.apache.cassandra.db.Table.getRow(Table.java:382) 2011-02-22_11:54:54.71942 at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) 2011-02-22_11:54:54.71942 at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70) 2011-02-22_11:54:54.71942 at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49) 2011-02-22_11:54:54.71943 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 2011-02-22_11:54:54.71943 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 2011-02-22_11:54:54.71944 at java.lang.Thread.run(Thread.java:619) I am thinking that there was a failure with writing out an SSTable because of space and now its corrupt. Also, the repair caused a huge amount of disk to be used and therefore ran out. Currently, is there a way to clear space in this situation? Would running a clean up help? Running ver 0.6.6. Thanks, -- Jake Maizel
Re: Ran out of space during cleanup.. HELP
I was in a similar situation and luckily had snapshots to clear and gain space but you are correct. I would be careful of using the disk more than 50% as the anit-compaction during cleanup could fail. I don't have any experience with adding a data directory on the fly. On Wed, Dec 8, 2010 at 4:51 PM, Mark wrote: > Did both but didn't seem to help. I have another drive on that machine with > some free space. If I add another directory to the DataFileDirectory config > and restart, will it start using that directory? > > Anything else I can do? > > This actually leads me to an important question. Should I always make sure > that Cassandra doesn't get past 50% of the drives free space, otherwise an > anticompaction like this can just destroy the machine? > > On 12/8/10 1:12 AM, Jake Maizel wrote: >> >> Also, look for any snapshots that can be cleared with nodetool >> clearsnapshot or just run the command to remove any that exist. >> >> On Wed, Dec 8, 2010 at 9:04 AM, Oleg Anastasyev >> wrote: >>> >>> Mark gmail.com> writes: >>> >>>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush >>>> at >>>> On 12/7/10 8:44 PM, Mark wrote: >>>>> >>>>> 3 Node cluster and I just ran a nodetool cleanup on node #3. 1 and 2 >>>>> are now at 100% disk space. What should I do? >>>> >>> >>> Is there files with -tmp n their names ? >>> Try to remove them to free up disk space. >>> >>> >>> >> >> > -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Re: How to Tell if Decommission has Completed
Indeed, it is. Also, the node being decommissioned drops out of the ring when it is completed. Trail and error. Thanks for following up. On Wed, Dec 8, 2010 at 4:39 PM, Nick Bailey wrote: > I believe the decommission call is blocking in both .6 and .7, so once it > returns it should have completed. > > On Wed, Dec 8, 2010 at 3:10 AM, Jake Maizel wrote: >> >> Hello, >> >> Is there a definitive way to tell if a Decommission operation has >> completed, such as a log message similar to what happens with a Drain >> command? >> >> Thanks. >> >> -- >> Jake Maizel >> Network Operations >> Soundcloud >> >> Mail & GTalk: j...@soundcloud.com >> Skype: jakecloud >> >> Rosenthaler strasse 13, 101 19, Berlin, DE > > -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Re: Ran out of space during cleanup.. HELP
Also, look for any snapshots that can be cleared with nodetool clearsnapshot or just run the command to remove any that exist. On Wed, Dec 8, 2010 at 9:04 AM, Oleg Anastasyev wrote: > > Mark gmail.com> writes: > >> Caused by: java.lang.RuntimeException: Insufficient disk space to flush >> at >> > >> On 12/7/10 8:44 PM, Mark wrote: >> > 3 Node cluster and I just ran a nodetool cleanup on node #3. 1 and 2 >> > are now at 100% disk space. What should I do? >> >> > > > Is there files with -tmp n their names ? > Try to remove them to free up disk space. > > > -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
How to Tell if Decommission has Completed
Hello, Is there a definitive way to tell if a Decommission operation has completed, such as a log message similar to what happens with a Drain command? Thanks. -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Re: Best Practice for Data Center Migration
Thanks for the followup. I have a few follow on questions: In the case of using decommission, any idea of what happens when we get to the last node in the old data center? Do you think it will decommission properly? I agree that this sounds like the easiest method. We have to see if we can support the storage requirement as we go down the cluster and decommission. In the case of changing the RF and dropping the entire old cluster here's what I was thinking: We change the RF to 4 which I take as meaning that there will be two copies of data in each cluster. So, if we just turn off all the nodes in the old data center then we still have two copies of all data in the new data center and then we can rebuild and cleanup things with nodetool to get to a normal state. We would then turn down the RF to 3 and rebuild in order to get back to our original config. The reason I thought this would work is that since RackAware alternates replica placement and we have inserted the new data center nodes in between the old key ranges evenly, a pair of nodes in the new DC would each get a replica of the data. That would give us some redundancy until we can rebuild. I am probably making a bad assumption about the RackAwareStrategy that blocks this. If so, it'd be nice if you could explain it to me. If you have another idea that might be worth discussing I'd appreciate it. Thanks, Jake On Thu, Dec 2, 2010 at 6:11 PM, Jonathan Ellis wrote: > On Thu, Dec 2, 2010 at 4:08 AM, Jake Maizel wrote: >> Hello, >> >> We have a ring of 12 nodes with 6 in one data center and 6 in another. >> We want to shutdown all 6 nodes in data center 1 in order to close >> it down. We are using a replication factor of 3 and are using >> RackAwareStrategy with version 0.6.6. >> >> We have been thinking that using decomission on each of the nodes in >> the old data center one at a time would do the trick. Does this sound >> reasonable? > > That is the simplest approach. The major downside is that > RackAwareStrategy guarantees you will have at least one copy of _each_ > row in both DCs, so when you are down to 1 node in dc1 it will have a > copy of all the data. If you have a small enough data volume to make > this feasible then that is the option I would go with. > >> We have also been considering increasing the replication factor to 4 >> and then just shutting down all the old nodes. Would that work as far >> as data availability would go? > > Not sure what you are thinking of there, but probably not. :) > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Best Practice for Data Center Migration
Hello, We have a ring of 12 nodes with 6 in one data center and 6 in another. We want to shutdown all 6 nodes in data center 1 in order to close it down. We are using a replication factor of 3 and are using RackAwareStrategy with version 0.6.6. Are there any best practices for doing this type of operation? We have been thinking that using decomission on each of the nodes in the old data center one at a time would do the trick. Does this sound reasonable? We have also been considering increasing the replication factor to 4 and then just shutting down all the old nodes. Would that work as far as data availability would go? Any other suggestions? Thanks. -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Disk Full Error on Cleanup
Hello, I keep running into the following error while running a nodetool cleanup: ERROR [COMPACTION-POOL:1] 2010-11-26 12:36:38,383 CassandraDaemon.java (line 87) Uncaught exception in thread Thread[COMPACTION-POOL:1,5,main] java.lang.UnsupportedOperationException: disk full at org.apache.cassandra.db.CompactionManager.doAntiCompaction(CompactionManager.java:344) at org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:410) at org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:48) at org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:129) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) There is 80GB used out of a 150GB partition that is dedicated to the cassandra data partition. So, this seems to related to another issue around a disk that doesn't have enough space for the anit-compaction. This keep nodetool cleanup from completing since this occurs at the beginning of the run. I also tired running compact on this node and then cleanup but the same error results. Any ideas or pointers? -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE
Questions about RackAwareStrategy and Multiple Data Centers
Hello, (I tried my best to read all I could before posting but I really couldn't find info to answer my questions. So, here's my post.) I have some questions. Background: We have a 6-node Cassandra cluster running in one data center with the following config: Cassandra 0.6.6 Replicas: 3 Placement: RackUnaware originally Using Standard data storage and mmap index storage RAM 16GB Per node load: roughly 100GB +- 20 We then added a second 6-node cluster in a second data center with the goal of migrating data to this new DC and then shutting down the original nodes in the DC. We switched all nodes to RackAwareStrategy and restarted. We set up seeds on one of the new nodes pointing to three of the old nodes (nodes 2, 4, 6). We did not add any new nodes as seeds to the old ones. All went according to plan with injecting the new nodes into the original key spaces half way between each of the original nodes. this just worked magically, as advertised. :) We ran nodetool repair on the new nodes, one at a time, waiting until activity finished (Indicated by 0 compaction and 0 AE stages). We then moved to running repair on the original nodes. This is where my questions came up. We see that after starting repair on one node, we get lots of GC (However, we are not swapping and disk io seems fine). We also see increases in the pending queue for AE stages (Seems normal, on the order of 40-80 pending stages). What doesn't seem normal is that we see large increase in the AE pending queue on all other nodes not running repair (I would expect this on neighbors, but not all nodes) and it seems to take forever for these queues to drain (Forever = over 24 hrs). Here are some questions I have (I can provide any additional info required): 1. If a node we run repair on finishes, indicated by compaction and AE being 0, but the next node we want to repair still has non-zero queues for C and AE, can we still start up the repair? 2. What is the effect of running repair on more than one node at a time under 0.6.6? I realize its not recommended but I accidentally did this and am curious of the effect. 3. Is large GC activity normal during a repair outside the documented "GC Storm" cases? By the way, really great work on cassandra from an operations POV. I've enjoyed working with it. Regards and thanks for any help. Jake -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE -- Jake Maizel Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE