On Tue, Aug 25, 2009 at 7:07 PM, Scott Chacon<scha...@gmail.com> wrote: > We're playing with Cassandra and would like to get a test cluster > setup for evaluation. I've been playing with it on my laptop and EC2, > which are the resources easily available to me, but not that close to > what I would be using in a production environment.
Yeah, EC2 is io hell. Jason at slicehost thinks their VMs and cloud servers' should post better numbers, fwiw, but it's still going to be best to run on non-virtualized hardware. > What would be an ideal machine setup for a Cassandra node? At least > two separate physical disks, one for the commit log and another for > the data, no RAID, 8-16G memory? (I think that's what Evan recommended > in his blog post) Right. Right now you don't get a huge win on writes from going over commitlog disk + 1 disk per heavily-written columnfamily, but if you can get to 3 or 4 without much extra cost (e.g. staying in 1U) it will help read seeks linearly, on average. So more is better. If you do have multiple data disks, I would expect JBOD to work better than raid0-ing things. (Mostly a wash on writes, but better read performance.) But I don't know if anyone has actually tested this. Digg is running on 16GB machines with a 10 GB heap size, which you can set in cassandra.in.sh. Cassandra defaults are tuned so that you can test things out on a 1GB heap without OOM-ing. Look in the peformance section; the main ones are MemtableSizeInMB MemtableObjectCountInMillions If you are also using a 10GB heap then you can just multiply these by 10 as a first step. If you are using 8 cores you probably want to double ConcurrentReads too. -Jonathan