Re: Cassandra versus HBase performance study

2010-02-04 Thread Ian Holsman
Hi Brian.
was there any performance changes on the other tests with v0.5 ?
the graphs on the other pages looks remarkably identical.

On Feb 4, 2010, at 11:45 AM, Brian Frank Cooper wrote:

 0.5 does seem to be significantly faster - the latency is better and it 
 provides significantly more throughput. I'm updating my charts with new 
 values now.
 
 One thing that is puzzling is the scan performance. The scan experiment is to 
 scan between 1-100 records on each request. My 6 node Cassandra cluster is 
 only getting up to about 230 operations/sec, compared to 1400 ops/sec for 
 other systems. The latency is quite a bit higher. A chart with these results 
 is here:
 
 http://www.brianfrankcooper.net/pubs/scans.png
 
 Is this the expected performance? I'm using the OrderPreservingPartitioner 
 with InitialToken values that should evenly partition the data (and the 
 amount of data in /var/cassandra/data is about the same on all servers). I'm 
 using get_range_slice() from Java (code snippet below). 
 
 At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
 varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
 (and the machine with the busiest disk is not the one with highest CPU 
 usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
 on different boxes. So clearly there is some imbalance (and the workload 
 itself is skewed via a Zipfian distribution) but I'm surprised that the 
 latencies are so high even in this case.
 
 Code snippet - fields is a SetString listing the columns I want; 
 recordcount is the number of records to return.
 
 SlicePredicate predicate;
 if (fields==null)
 {
   predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
 byte[0],false,100));
 }
 else
 {
   Vectorbyte[] fieldlist=new Vectorbyte[]();
   for (String s : fields)
   {
   fieldlist.add(s.getBytes(UTF-8));
   }
   predicate = new SlicePredicate(fieldlist,null);
 }
 ColumnParent parent = new ColumnParent(data, null);
   
 ListKeySlice results = 
 client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE);
   
 Thanks!
 
 Brian
 
 
 From: Brian Frank Cooper
 Sent: Saturday, January 30, 2010 7:56 AM
 To: cassandra-user@incubator.apache.org
 Subject: RE: Cassandra versus HBase performance study
 
 Good idea, we'll benchmark 0.5 next.
 
 brian
 
 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Friday, January 29, 2010 1:13 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra versus HBase performance study
 
 Thanks for posting your results; it is an interesting read and we are
 pleased to beat HBase in most workloads. :)
 
 Since you originally benchmarked 0.4.2, you might be interested in the
 speed gains in 0.5.  A couple graphs here:
 http://spyced.blogspot.com/2010/01/cassandra-05.html
 
 0.6 (beta in a few weeks?) is looking even better. :)
 
 -Jonathan

--
Ian Holsman
i...@holsman.net





Adding new nodes

2010-02-04 Thread Bill Hastings
Hi All

Could someone explain to me how the following is done - when new nodes are
added how do we read existing data since the topology changes? How does
Cassandra ensure that reads and writes are successful?

Cheers
Bill


Re: Adding new nodes

2010-02-04 Thread Jonathan Ellis
Data is moved to the new correct nodes.

On Thu, Feb 4, 2010 at 10:52 PM, Bill Hastings bllhasti...@gmail.com wrote:
 Hi All

 Could someone explain to me how the following is done - when new nodes are
 added how do we read existing data since the topology changes? How does
 Cassandra ensure that reads and writes are successful?

 Cheers
 Bill



Re: Adding new nodes

2010-02-04 Thread Bill Hastings
Sorry I guess I was not clear enough. While the existing data is being moved
do requests for reads go to the new nodes? If so what if that data has not
yet migrated? There is no problem for writes. But how is the routing for the
reads handled in this situation?

Cheers
Avinash

On Thu, Feb 4, 2010 at 9:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Data is moved to the new correct nodes.

 On Thu, Feb 4, 2010 at 10:52 PM, Bill Hastings bllhasti...@gmail.com
 wrote:
  Hi All
 
  Could someone explain to me how the following is done - when new nodes
 are
  added how do we read existing data since the topology changes? How does
  Cassandra ensure that reads and writes are successful?
 
  Cheers
  Bill
 



Re: Adding new nodes

2010-02-04 Thread Bill Hastings
Hi All

I just had a conversation with one of the FB guys (Avinash) at FB and landed
up signing off as him :). He wasn't quite sure about how this works in the
OSS branch. Hence the question to the broader audience. The question is more
about the change in topology and reads going to a machine before data
migration.

Cheers
Bill

On Thu, Feb 4, 2010 at 9:36 PM, Bill Hastings bllhasti...@gmail.com wrote:

 Sorry I guess I was not clear enough. While the existing data is being
 moved do requests for reads go to the new nodes? If so what if that data has
 not yet migrated? There is no problem for writes. But how is the routing for
 the reads handled in this situation?

 Cheers
 Avinash


 On Thu, Feb 4, 2010 at 9:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Data is moved to the new correct nodes.

 On Thu, Feb 4, 2010 at 10:52 PM, Bill Hastings bllhasti...@gmail.com
 wrote:
  Hi All
 
  Could someone explain to me how the following is done - when new nodes
 are
  added how do we read existing data since the topology changes? How does
  Cassandra ensure that reads and writes are successful?
 
  Cheers
  Bill
 





Re: Adding new nodes

2010-02-04 Thread Avinash Lakshman
Hi All

First off Bill I don't think I brainwashed you to an extent where you start
signing off as me :). Don't do that on my check books. That's an interesting
question and like I had said I am not too sure about this is handled in the
current OSS version. Jonathan is your best bet for this response. Writes
will be handled fine like we discussed.

Cheers
Avinash

On Thu, Feb 4, 2010 at 9:46 PM, Bill Hastings bllhasti...@gmail.com wrote:



 -- Forwarded message --
 From: Bill Hastings bllhasti...@gmail.com
 Date: Thu, Feb 4, 2010 at 9:41 PM
 Subject: Re: Adding new nodes
 To: cassandra-user@incubator.apache.org


 Hi All

 I just had a conversation with one of the FB guys (Avinash) at FB and
 landed up signing off as him :). He wasn't quite sure about how this works
 in the OSS branch. Hence the question to the broader audience. The question
 is more about the change in topology and reads going to a machine before
 data migration.

 Cheers
 Bill


 On Thu, Feb 4, 2010 at 9:36 PM, Bill Hastings bllhasti...@gmail.comwrote:

 Sorry I guess I was not clear enough. While the existing data is being
 moved do requests for reads go to the new nodes? If so what if that data has
 not yet migrated? There is no problem for writes. But how is the routing for
 the reads handled in this situation?

 Cheers
 Avinash


 On Thu, Feb 4, 2010 at 9:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Data is moved to the new correct nodes.

 On Thu, Feb 4, 2010 at 10:52 PM, Bill Hastings bllhasti...@gmail.com
 wrote:
  Hi All
 
  Could someone explain to me how the following is done - when new nodes
 are
  added how do we read existing data since the topology changes? How does
  Cassandra ensure that reads and writes are successful?
 
  Cheers
  Bill
 






 --
 Cheers
 Bill



AUTO: Hernan Badenes is out of the office (returning 02/07/2010)

2010-02-04 Thread Hernan Badenes

I am out of the office until 02/07/2010.

For any urgent matter, please contact Julian Ariel Cerruti/Argentina/IBM


Note: This is an automated response to your message  Adding new nodes
sent on 5/2/10 1:52:47.

This is the only notification you will receive while this person is away.



Re: Adding new nodes

2010-02-04 Thread Vijay
if you are using the Right API's. read requests will not be sent to the
bootstrapping nodes where as writes will be sent.

Regards,
/VJ




On Thu, Feb 4, 2010 at 10:04 PM, Bill Hastings bllhasti...@gmail.comwrote:

 there will be reads that will fail in the interim. Am I way off here?
 Apologies if I am wrong and I will continue looking. I am very curious is
 knowing how this works.