0.5 does seem to be significantly faster - the latency is better and it 
provides significantly more throughput. I'm updating my charts with new values 
now.

One thing that is puzzling is the scan performance. The scan experiment is to 
scan between 1-100 records on each request. My 6 node Cassandra cluster is only 
getting up to about 230 operations/sec, compared to >1400 ops/sec for other 
systems. The latency is quite a bit higher. A chart with these results is here:

http://www.brianfrankcooper.net/pubs/scans.png

Is this the expected performance? I'm using the OrderPreservingPartitioner with 
InitialToken values that should evenly partition the data (and the amount of 
data in /var/cassandra/data is about the same on all servers). I'm using 
get_range_slice() from Java (code snippet below). 

At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
(and the machine with the busiest disk is not the one with highest CPU usage.) 
Network utilization (eth0 %util both in and out) varies from 15%-40% on 
different boxes. So clearly there is some imbalance (and the workload itself is 
skewed via a Zipfian distribution) but I'm surprised that the latencies are so 
high even in this case.

Code snippet - fields is a Set<String> listing the columns I want; recordcount 
is the number of records to return.

SlicePredicate predicate;
if (fields==null)
{
        predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
byte[0],false,1000000));
}
else
{
        Vector<byte[]> fieldlist=new Vector<byte[]>();
        for (String s : fields)
        {
                fieldlist.add(s.getBytes("UTF-8"));
        }
        predicate = new SlicePredicate(fieldlist,null);
}
ColumnParent parent = new ColumnParent("data", null);
                
List<KeySlice> results = 
client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE);
                        
Thanks!

Brian

________________________________________
From: Brian Frank Cooper
Sent: Saturday, January 30, 2010 7:56 AM
To: cassandra-user@incubator.apache.org
Subject: RE: Cassandra versus HBase performance study

Good idea, we'll benchmark 0.5 next.

brian

-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Friday, January 29, 2010 1:13 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra versus HBase performance study

Thanks for posting your results; it is an interesting read and we are
pleased to beat HBase in most workloads. :)

Since you originally benchmarked 0.4.2, you might be interested in the
speed gains in 0.5.  A couple graphs here:
http://spyced.blogspot.com/2010/01/cassandra-05.html

0.6 (beta in a few weeks?) is looking even better. :)

-Jonathan

Reply via email to