Nodes Flapping in the RIng

2011-11-10 Thread Jake Maizel
We have a new 6-node cluster running 0.6.13 (Due to some client side issues
we need to be on 0.6x for time being) that we are injecting data into and
ran into some issues with nodes going down and then up quickly in the
ring.  All nodes are effected and we have rules out the network layer.

It happens on all nodes and seems related to GC or mtable flushes.  We had
things stable but after a series of data migrations we saw some swapping so
we tuned to max heap down and this helped with swapping but the flapping
still persists.

The systems have 6-cores and 24 GB ram, max heap is at 12G.   We are using
the Parallel GC colector for throughput.

Our run file for starting cassandra looks like this:

exec 2>&1

ulimit -n 262144

cd /opt/cassandra-0.6.13

exec chpst -u cassandra java \
  -ea \
  -Xms4G \
  -Xmx12G \
  -XX:TargetSurvivorRatio=90 \
  -XX:+PrintGCDetails \
  -XX:+AggressiveOpts \
  -XX:+UseParallelGC \
  -XX:+CMSParallelRemarkEnabled \
  -XX:SurvivorRatio=128 \
  -XX:MaxTenuringThreshold=0 \
  -Djava.rmi.server.hostname=10.20.3.155 \
  -Dcom.sun.management.jmxremote.port=8080 \
  -Dcom.sun.management.jmxremote.ssl=false \
  -Dcom.sun.management.jmxremote.authenticate=false \
  -Dcassandra-foreground=yes \
  -Dstorage-config=/etc/cassandra \
  -cp '/etc/cassandra:/opt/cassandra-0.6.13/lib/*' \
  org.apache.cassandra.thrift.CassandraDaemon <&-

Our storage conf like this for the mem/disk stuff:



  


  mmap
  4
  64

  32
  64

  16

  64
  256
  0.3
  60

  12
  32

  periodic
  1

  864000

  true


Any thoughts on this would be really interesting.

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Re: Upgrading to 1.0

2011-11-05 Thread Jake Maizel
If we upgrade and want to use compression, how is the old data handled?
Does it read and then write all sstables out to new compressed files one at
a time or do something else?  I'm considering the storage require on top of
what is needed for the existing data.

Regards,

jake

On Wed, Nov 2, 2011 at 2:57 PM, Jonathan Ellis  wrote:

> 1.0 can read 0.6 data files but is not network-compatible, so you need
> to do an all-at-once upgrade.  Additionally, the Thrift api changed
> started with 0.7; see NEWS.txt for details.
>
> On Wed, Nov 2, 2011 at 6:46 AM, Jake Maizel  wrote:
> > Hello,
> > We run a medium sized cluster of 12 nodes on 0.6.13 and would like to
> move
> > to 1.0.  What's the best practices for this?  Can we do a rolling
> upgrade or
> > does the entire cluster need to be upgraded at once?
> > Regards,
> > Jake
> >
> > --
> > Jake Maizel
> > Head of Network Operations
> > Soundcloud
> >
> > Mail & GTalk: j...@soundcloud.com
> > Skype: jakecloud
> >
> > Rosenthaler strasse 13, 101 19, Berlin, DE
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Large Increase in SSTable count after Upgrade to 0.6.13

2011-11-04 Thread Jake Maizel
he last 5000ms
2011-11-04_12:21:55.41788 '' INFO [DroppedMessagesLogger] 12:21:55,417
GCInspector.java:143 Pool NameActive   Pending
2011-11-04_12:21:55.41789 '' INFO [DroppedMessagesLogger] 12:21:55,418
GCInspector.java:157 STREAM-STAGE  0 0
2011-11-04_12:21:55.41851 '' INFO [DroppedMessagesLogger] 12:21:55,418
GCInspector.java:157 FILEUTILS-DELETE-POOL 0 0
2011-11-04_12:21:55.41877 '' INFO [DroppedMessagesLogger] 12:21:55,418
GCInspector.java:157 RESPONSE-STAGE0 0
2011-11-04_12:21:55.42379 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 ROW-READ-STAGE8   211
2011-11-04_12:21:55.42403 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 LB-OPERATIONS 0 0
2011-11-04_12:21:55.42427 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 MISCELLANEOUS-POOL0 0
2011-11-04_12:21:55.42448 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 GMFD  0 0
2011-11-04_12:21:55.42473 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 CONSISTENCY-MANAGER   0 0
2011-11-04_12:21:55.42495 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 LB-TARGET 0 0
2011-11-04_12:21:55.42515 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 ROW-MUTATION-STAGE1 1
2011-11-04_12:21:55.42537 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 MESSAGE-STREAMING-POOL0 0
2011-11-04_12:21:55.42561 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 LOAD-BALANCER-STAGE   0 0
2011-11-04_12:21:55.42580 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 FLUSH-SORTER-POOL 0 0
2011-11-04_12:21:55.42602 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 MEMTABLE-POST-FLUSHER 1 3
2011-11-04_12:21:55.42626 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 AE-SERVICE-STAGE  0 0
2011-11-04_12:21:55.42649 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 FLUSH-WRITER-POOL 1 1
2011-11-04_12:21:55.42670 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:157 HINTED-HANDOFF-POOL   1 8
2011-11-04_12:21:55.42695 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:161 CompactionManager   n/a  3423
2011-11-04_12:21:55.42717 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:165 ColumnFamilyMemtable ops,data  Row
cache size/cap  Key cache size/cap
2011-11-04_12:21:55.42832 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:168 system.LocationInfo   0,0
0/0 1/2
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:168 system.HintsColumnFamily  0,0
0/0 1/6
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:168 SoundCloud.OwnActivities   2545,47090
0/041956/20
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,423
GCInspector.java:168 SoundCloud.ExclusiveTracks   570,11872
0/0 2645/20
2011-11-04_12:21:55.42834 '' INFO [DroppedMessagesLogger] 12:21:55,423
GCInspector.java:168 SoundCloud.Activities  126085,2171439
0/0   20/20
2011-11-04_12:21:55.42872 '' INFO [DroppedMessagesLogger] 12:21:55,423
GCInspector.java:168 SoundCloud.IncomingTracks   95470,1604563
0/0   20/20

We have tried to run manual compactions but these don't seem to happen
every, like do to the high pending count.

I am wondering what the best way to figure out what is blocking on these
nodes, in order to get compaction back in that game.

I have considered isolating one node via the network to see if it can catch
up once there is no load on it.  Not sure of the negative side effects of
that.

Any suggestions on resolving this?

Regards,

Jake

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Upgrading to 1.0

2011-11-02 Thread Jake Maizel
Hello,

We run a medium sized cluster of 12 nodes on 0.6.13 and would like to move
to 1.0.  What's the best practices for this?  Can we do a rolling upgrade
or does the entire cluster need to be upgraded at once?

Regards,

Jake

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Streaming stuck on one node during Repair

2011-09-02 Thread Jake Maizel
Hello,

I have one node of a cluster that is stuck in a streaming out state
sending to the node that is being repaired.

If I looked the AE Thread in jconsole I see this trace:

Name: AE-SERVICE-STAGE:1
State: WAITING on java.util.concurrent.FutureTask$Sync@7e3e0044
Total blocked: 0  Total waited: 23

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
java.util.concurrent.FutureTask.get(FutureTask.java:83)
org.apache.cassandra.service.AntiEntropyService$Differencer.performStreamingRepair(AntiEntropyService.java:515)
org.apache.cassandra.service.AntiEntropyService$Differencer.run(AntiEntropyService.java:475)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

The Steam stage shows this trace:

Name: STREAM-STAGE:1
State: WAITING on org.apache.cassandra.utils.SimpleCondition@1158f928
Total blocked: 9  Total waited: 16

Stack trace:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:38)
org.apache.cassandra.streaming.StreamOutManager.waitForStreamCompletion(StreamOutManager.java:164)
org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:138)
org.apache.cassandra.service.AntiEntropyService$Differencer$1.runMayThrow(AntiEntropyService.java:511)
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

Is there a way to unstick these threads?  Or am I stuck restarting the
node and then rerunning the entire repair?  All the other nodes seemed
to complete properly and one is still running.  I am thinking to wait
until the current one finishes and then restart the stuck nodes then
once its up run repair again on the node needing it.

Thoughts?

(0.6.6 on a 7 nodes cluster)



-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Repairing lost data

2011-08-27 Thread Jake Maizel
Hello,

In a cluster running 0.6.6 one node lost part of a data file due to an
operator error.  An older file was moved in place to bring cassandra
up again.

Now we get lots of these in the log:

 2011-08-27_10:30:55.26219 'ERROR [ROW-READ-STAGE:4327] 10:30:55,258
CassandraDaemon.java:87 Uncaught exception in thread
Thread[ROW-READ-STAGE:4327,5,main]
2011-08-27_10:30:55.26219 'java.lang.ArrayIndexOutOfBoundsException
2011-08-27_10:30:55.26220   at
org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326)
2011-08-27_10:30:55.26220   at
java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
2011-08-27_10:30:55.26221   at
java.io.DataInputStream.readUTF(DataInputStream.java:592)
2011-08-27_10:30:55.26221   at
java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
2011-08-27_10:30:55.26222   at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.(SSTableSliceIterator.java:125)
2011-08-27_10:30:55.26222   at
org.apache.cassandra.db.filter.SSTableSliceIterator.(SSTableSliceIterator.java:59)
2011-08-27_10:30:55.26223   at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
2011-08-27_10:30:55.26223   at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990)
2011-08-27_10:30:55.26224   at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901)
2011-08-27_10:30:55.26224   at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870)
2011-08-27_10:30:55.26224   at
org.apache.cassandra.db.Table.getRow(Table.java:382)
2011-08-27_10:30:55.26225   at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
2011-08-27_10:30:55.26225   at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70)
2011-08-27_10:30:55.26226   at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49)
2011-08-27_10:30:55.26226   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
2011-08-27_10:30:55.26227   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2011-08-27_10:30:55.26227   at java.lang.Thread.run(Thread.java:662)

Is it possible to use nodetool repair to fix this with the current data set?

I issued a repair command and the other nodes seem to be doing the
correct things but I concerned  by this: "Uncaught exception in thread
Thread[ROW-READ-STAGE:4327,5,main]"

Will the affect node ever be able to do anything?

 Also, only Data file was affected, the index and Filter files are
still the originals.  Should I keep these or do anything else with
them?

My alternative is to delete all the data and run repair again which I
have done in the past and it works but takes a while with a large data
set.

I am open to ideas and any suggestions are welcome.

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Upgrade to a different version?

2011-03-16 Thread Jake Maizel
We are running 0.6.6 and are considering upgrading to either 0.6.8 or
one of the 0.7.x releases.  What is the recommended version and
procedure?  What are the issues we face?  Are there any specific
storage gotchas we need to be aware of?  Are there any docs around
this process for review?

Thanks,

jake

-- 
Jake Maizel
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Help with Error on reading sstable

2011-02-22 Thread Jake Maizel
I'm getting this error after a space problem caused issues during a
repair operation on one of six nodes in our cluster:

2011-02-22_11:54:50.26788 'ERROR [ROW-READ-STAGE:305] 11:54:50,267
CassandraDaemon.java:87 Uncaught exception in thread
Thread[ROW-READ-STAGE:305,5,main]
2011-02-22_11:54:50.26789 'java.lang.ArrayIndexOutOfBoundsException
2011-02-22_11:54:50.26789   at
org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326)
2011-02-22_11:54:50.26790   at
java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
2011-02-22_11:54:50.26790   at
java.io.DataInputStream.readUTF(DataInputStream.java:592)
2011-02-22_11:54:50.26790   at
java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
2011-02-22_11:54:50.26791   at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.(SSTableSliceIterator.java:125)
2011-02-22_11:54:50.26791   at
org.apache.cassandra.db.filter.SSTableSliceIterator.(SSTableSliceIterator.java:59)
2011-02-22_11:54:50.26792   at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
2011-02-22_11:54:50.26792   at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990)
2011-02-22_11:54:50.26793   at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901)
2011-02-22_11:54:50.26793   at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870)
2011-02-22_11:54:50.26794   at
org.apache.cassandra.db.Table.getRow(Table.java:382)
2011-02-22_11:54:50.26794   at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
2011-02-22_11:54:50.26794   at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70)
2011-02-22_11:54:50.26795   at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49)
2011-02-22_11:54:50.26795   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
2011-02-22_11:54:50.26796   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2011-02-22_11:54:50.26796   at java.lang.Thread.run(Thread.java:619)
2011-02-22_11:54:54.71933 'ERROR [ROW-READ-STAGE:302] 11:54:54,718
DebuggableThreadPoolExecutor.java:102 Error in ThreadPoolExecutor
2011-02-22_11:54:54.71935 'java.lang.ArrayIndexOutOfBoundsException
2011-02-22_11:54:54.71935   at
org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:326)
2011-02-22_11:54:54.71936   at
java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
2011-02-22_11:54:54.71936   at
java.io.DataInputStream.readUTF(DataInputStream.java:592)
2011-02-22_11:54:54.71937   at
java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
2011-02-22_11:54:54.71937   at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.(SSTableSliceIterator.java:125)
2011-02-22_11:54:54.71937   at
org.apache.cassandra.db.filter.SSTableSliceIterator.(SSTableSliceIterator.java:59)
2011-02-22_11:54:54.71938   at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
2011-02-22_11:54:54.71938   at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:990)
2011-02-22_11:54:54.71939   at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:901)
2011-02-22_11:54:54.71939   at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:870)
2011-02-22_11:54:54.71941   at
org.apache.cassandra.db.Table.getRow(Table.java:382)
2011-02-22_11:54:54.71942   at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
2011-02-22_11:54:54.71942   at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:70)
2011-02-22_11:54:54.71942   at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:49)
2011-02-22_11:54:54.71943   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
2011-02-22_11:54:54.71943   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2011-02-22_11:54:54.71944   at java.lang.Thread.run(Thread.java:619)

I am thinking that there was a failure with writing out an SSTable
because of space and now its corrupt.   Also, the repair caused a huge
amount of disk to be used and therefore ran out.  Currently, is there
a way to clear space in this situation?  Would running a clean up
help?

Running ver 0.6.6.

Thanks,

-- 
Jake Maizel


Re: Ran out of space during cleanup.. HELP

2010-12-08 Thread Jake Maizel
I was in a similar situation and luckily had snapshots to clear and
gain space but you are correct.  I would be careful of using the disk
more than 50% as the anit-compaction during cleanup could fail.

I don't have any experience with adding a data directory on the fly.

On Wed, Dec 8, 2010 at 4:51 PM, Mark  wrote:
> Did both but didn't seem to help. I have another drive on that machine with
> some free space. If I add another directory to the DataFileDirectory config
> and restart, will it start using that directory?
>
> Anything else I can do?
>
> This actually leads me to an important question. Should I always make sure
> that Cassandra doesn't get past 50% of the drives free space, otherwise an
> anticompaction like this can just destroy the machine?
>
> On 12/8/10 1:12 AM, Jake Maizel wrote:
>>
>> Also, look for any snapshots that can be cleared with nodetool
>> clearsnapshot or just run the command to remove any that exist.
>>
>> On Wed, Dec 8, 2010 at 9:04 AM, Oleg Anastasyev
>>  wrote:
>>>
>>> Mark  gmail.com>  writes:
>>>
>>>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush
>>>>      at
>>>> On 12/7/10 8:44 PM, Mark wrote:
>>>>>
>>>>> 3 Node cluster and I just ran a nodetool cleanup on node #3. 1 and 2
>>>>> are now at 100% disk space. What should I do?
>>>>
>>>
>>> Is there files with -tmp n their names ?
>>> Try to remove them to free up disk space.
>>>
>>>
>>>
>>
>>
>



-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Re: How to Tell if Decommission has Completed

2010-12-08 Thread Jake Maizel
Indeed, it is.  Also, the node being decommissioned drops out of the
ring when it is completed.  Trail and error.  Thanks for following up.

On Wed, Dec 8, 2010 at 4:39 PM, Nick Bailey  wrote:
> I believe the decommission call is blocking in both .6 and .7, so once it
> returns it should have completed.
>
> On Wed, Dec 8, 2010 at 3:10 AM, Jake Maizel  wrote:
>>
>> Hello,
>>
>> Is there a definitive way to tell if a Decommission operation has
>> completed, such as a log message similar to what happens with a Drain
>> command?
>>
>> Thanks.
>>
>> --
>> Jake Maizel
>> Network Operations
>> Soundcloud
>>
>> Mail & GTalk: j...@soundcloud.com
>> Skype: jakecloud
>>
>> Rosenthaler strasse 13, 101 19, Berlin, DE
>
>



-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Re: Ran out of space during cleanup.. HELP

2010-12-08 Thread Jake Maizel
Also, look for any snapshots that can be cleared with nodetool
clearsnapshot or just run the command to remove any that exist.

On Wed, Dec 8, 2010 at 9:04 AM, Oleg Anastasyev  wrote:
>
> Mark  gmail.com> writes:
>
>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush
>>      at
>> >
>> On 12/7/10 8:44 PM, Mark wrote:
>> > 3 Node cluster and I just ran a nodetool cleanup on node #3. 1 and 2
>> > are now at 100% disk space. What should I do?
>>
>>
>
>
> Is there files with -tmp n their names ?
> Try to remove them to free up disk space.
>
>
>



-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


How to Tell if Decommission has Completed

2010-12-08 Thread Jake Maizel
Hello,

Is there a definitive way to tell if a Decommission operation has
completed, such as a log message similar to what happens with a Drain
command?

Thanks.

-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Re: Best Practice for Data Center Migration

2010-12-03 Thread Jake Maizel
Thanks for the followup.

I have a few follow on questions:

In the case of using decommission, any idea of what happens when we
get to the last node in the old data center?  Do you think it will
decommission properly?

I agree that this sounds like the easiest method.  We have to see if
we can support the storage requirement as we go down the cluster and
decommission.

In the case of changing the RF and dropping the entire old cluster
here's what I was thinking:

We change the RF to 4 which I take as meaning that there will be two
copies of data in each cluster.  So, if we just turn off all the nodes
in the old data center then we still have two copies of all data in
the new data center and then we can rebuild and cleanup things with
nodetool to get to a normal state.  We would then turn down the RF to
3 and rebuild in order to get back to our original config.  The reason
I thought this would work is that since RackAware alternates replica
placement and we have inserted the new data center nodes in between
the old key ranges evenly, a pair of nodes in the new DC would each
get a replica of the data. That would give us some redundancy until we
can rebuild.

I am probably making a bad assumption about the RackAwareStrategy that
blocks this.  If so, it'd be nice if you could explain it to me.

If you have another idea that might be worth discussing I'd appreciate it.

Thanks,

Jake

On Thu, Dec 2, 2010 at 6:11 PM, Jonathan Ellis  wrote:
> On Thu, Dec 2, 2010 at 4:08 AM, Jake Maizel  wrote:
>> Hello,
>>
>> We have a ring of 12 nodes with 6 in one data center and 6 in another.
>>  We want to shutdown all 6 nodes in data center 1 in order to close
>> it down.  We are using a replication factor of 3 and are using
>> RackAwareStrategy with version 0.6.6.
>>
>> We have been thinking that using decomission on each of the nodes in
>> the old data center one at a time would do the trick.  Does this sound
>> reasonable?
>
> That is the simplest approach.  The major downside is that
> RackAwareStrategy guarantees you will have at least one copy of _each_
> row in both DCs, so when you are down to 1 node in dc1 it will have a
> copy of all the data.  If you have a small enough data volume to make
> this feasible then that is the option I would go with.
>
>> We have also been considering increasing the replication factor to 4
>> and then just shutting down all the old nodes.  Would that work as far
>> as data availability would go?
>
> Not sure what you are thinking of there, but probably not. :)
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Best Practice for Data Center Migration

2010-12-02 Thread Jake Maizel
Hello,

We have a ring of 12 nodes with 6 in one data center and 6 in another.
  We want to shutdown all 6 nodes in data center 1 in order to close
it down.  We are using a replication factor of 3 and are using
RackAwareStrategy with version 0.6.6.

Are there any best practices for doing this type of operation?

We have been thinking that using decomission on each of the nodes in
the old data center one at a time would do the trick.  Does this sound
reasonable?

We have also been considering increasing the replication factor to 4
and then just shutting down all the old nodes.  Would that work as far
as data availability would go?

Any other suggestions?

Thanks.

-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Disk Full Error on Cleanup

2010-11-26 Thread Jake Maizel
Hello,

I keep running into the following error while running a nodetool cleanup:

ERROR [COMPACTION-POOL:1] 2010-11-26 12:36:38,383 CassandraDaemon.java
(line 87) Uncaught exception in thread
Thread[COMPACTION-POOL:1,5,main]
java.lang.UnsupportedOperationException: disk full
at 
org.apache.cassandra.db.CompactionManager.doAntiCompaction(CompactionManager.java:344)
at 
org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:410)
at 
org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:48)
at 
org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:129)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

There is 80GB used out of a 150GB partition that is dedicated to the
cassandra data partition.  So, this seems to related to another issue
around a disk that doesn't have enough space for the anit-compaction.
This keep nodetool cleanup from completing since this occurs at the
beginning of the run.

I also tired running compact on this node and then cleanup but the
same error results.

Any ideas or pointers?

-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Questions about RackAwareStrategy and Multiple Data Centers

2010-11-19 Thread Jake Maizel
Hello,

(I tried my best to read all I could before posting but I really
couldn't find info to answer my questions.  So, here's my post.)

I have some questions.

Background:

We have a 6-node Cassandra cluster running in one data center with the
following config:

Cassandra 0.6.6
Replicas: 3
Placement: RackUnaware originally
Using Standard data storage and mmap index storage
RAM 16GB
Per node load: roughly 100GB +- 20

We then added a second 6-node cluster in a second data center with the
goal of migrating data to this new DC and then shutting down the
original nodes in the DC. We switched all nodes to RackAwareStrategy
and restarted.  We set up seeds on one of the new nodes pointing to
three of the old nodes (nodes 2, 4, 6).  We did not add any new nodes
as seeds to the old ones.

All went according to plan with injecting the new nodes into the
original key spaces half way between each of the original nodes.  this
just worked magically, as advertised.  :)

We ran nodetool repair on the new nodes, one at a time, waiting until
activity finished (Indicated by 0 compaction and 0 AE stages).

We then moved to running repair on the original nodes.  This is where
my questions came up.

We see that after starting repair on one node, we get lots of GC
(However, we are not swapping and disk io seems fine).  We also see
increases in the pending queue for AE stages (Seems normal, on the
order of 40-80 pending stages).  What doesn't seem normal is that we
see large increase in the AE pending queue on all other nodes not
running repair (I would expect this on neighbors, but not all nodes)
and it seems to take forever for these queues to drain (Forever = over
24 hrs).

Here are some questions I have (I can provide any additional info required):

1. If a node we run repair on finishes, indicated by compaction and AE
being 0, but the next node we want to repair still has non-zero queues
for C and AE, can we still start up the repair?
2. What is the effect of running repair on more than one node at a
time under 0.6.6?  I realize its not recommended but I accidentally
did this and am curious of the effect.
3. Is large GC activity normal during a repair outside the documented
"GC Storm" cases?

By the way, really great work on cassandra from an operations POV.
I've enjoyed working with it.

Regards and thanks for any help.

Jake

--
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE



-- 
Jake Maizel
Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE