On Tue, Aug 25, 2009 at 7:07 PM, Scott Chacon<scha...@gmail.com> wrote:
> We're playing with Cassandra and would like to get a test cluster
> setup for evaluation.  I've been playing with it on my laptop and EC2,
> which are the resources easily available to me, but not that close to
> what I would be using in a production environment.

Yeah, EC2 is io hell.

Jason at slicehost thinks their VMs and cloud servers' should post
better numbers, fwiw, but it's still going to be best to run on
non-virtualized hardware.

> What would be an ideal machine setup for a Cassandra node?  At least
> two separate physical disks, one for the commit log and another for
> the data, no RAID, 8-16G memory? (I think that's what Evan recommended
> in his blog post)

Right.  Right now you don't get a huge win on writes from going over
commitlog disk + 1 disk per heavily-written columnfamily, but if you
can get to 3 or 4 without much extra cost (e.g. staying in 1U) it will
help read seeks linearly, on average.  So more is better.

If you do have multiple data disks, I would expect JBOD to work better
than raid0-ing things.  (Mostly a wash on writes, but better read
performance.)  But I don't know if anyone has actually tested this.

Digg is running on 16GB machines with a 10 GB heap size, which you can
set in cassandra.in.sh.

Cassandra defaults are tuned so that you can test things out on a 1GB
heap without OOM-ing.  Look in the peformance section; the main ones
are

MemtableSizeInMB
MemtableObjectCountInMillions

If you are also using a 10GB heap then you can just multiply these by
10 as a first step.

If you are using 8 cores you probably want to double ConcurrentReads too.

-Jonathan

Reply via email to