Re: Scaling a cassandra cluster with auto_bootstrap set to false

2013-06-13 Thread Markus Klems
On Thu, Jun 13, 2013 at 11:20 PM, Edward Capriolo  wrote:
> CL.ONE requests for rows which do not exist are very fast.
>
> http://adrianotto.com/2010/08/dev-null-unlimited-scale/
>

Yep, /dev/null is a might force ;-)

I took a look at the YCSB source code and spotted the line of code
that caused our confusion: it's in file
https://github.com/brianfrankcooper/YCSB/blob/master/core/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java
in the "public boolean doTransaction(DB db, Object threadstate)"
method in line 497. No matter what the result of a YCSB transaction
operation is, the method always returns "true". Not sure if this is a
desirable behavior of a benchmarking tool. It makes it difficult to
spot these kind of mistakes.

The problem can also be observed by running this piece of code:

  public static void main(String[] args)
  {
CassandraClient10 cli = new CassandraClient10();

Properties props = new Properties();

props.setProperty("hosts", args[0]);
cli.setProperties(props);

try
{
  cli.init();
} catch (Exception e)
{
  e.printStackTrace();
  System.exit(0);
}

HashMap vals = new HashMap();
vals.put("age", new StringByteIterator("57"));
vals.put("middlename", new StringByteIterator("bradley"));
vals.put("favoritecolor", new StringByteIterator("blue"));
int res = cli.insert("usertable", "BrianFrankCooper", vals);
System.out.println("Result of insert: " + res);

HashMap result = new HashMap();
HashSet fields = new HashSet();
fields.add("middlename");
fields.add("age");
fields.add("favoritecolor");
res = cli.read("usertable", "BrianFrankCooper", null, result);
System.out.println("Result of read: " + res);
for (String s : result.keySet())
{
  System.out.println("[" + s + "]=[" + result.get(s) + "]");
}

res = cli.delete("usertable", "BrianFrankCooper");
System.out.println("Result of delete: " + res);

res = cli.read("usertable", "BrianFrankCooper", null, result);
System.out.println("Result of read: " + res);
for (String s : result.keySet())
{
  System.out.println("[" + s + "]=[" + result.get(s) + "]");
}
  }

which results in:

Result of insert: 0
Result of read: 0
[middlename]=[bradley]
[favoritecolor]=[blue]
[age]=[57]
Result of delete: 0
Result of read: 0
[middlename]=[]
[favoritecolor]=[]
[age]=[]

The second read should not return "true" ("0").

@Robert & Edward, thanks for your help,

-Markus


Re: Scaling a cassandra cluster with auto_bootstrap set to false

2013-06-13 Thread Markus Klems
Robert,

thank you for your explanation. I think you are right. YCSB probably
does not correctly interpret the "missing record" response. We will
look into it and report our results here in the next days.

Thanks,

Markus

On Thu, Jun 13, 2013 at 9:47 PM, Robert Coli  wrote:
> On Thu, Jun 13, 2013 at 10:47 AM, Markus Klems  wrote:
>> One scaling strategy seems interesting but we don't
>> fully understand what is going on, yet. The strategy works like this:
>> add new nodes to a Cassandra cluster with "auto_bootstrap = false" to
>> avoid streaming to the new nodes.
>
> If you set auto_bootstrap to false, new nodes take over responsibility
> for a range of the ring but do not receive the data for the range from
> the old nodes. If you read the new node at CL.ONE, you will get the
> answer that data you wrote to the old node does not exist, because the
> new node did not receive it as part of bootstrap. This is probably not
> what you expect.
>
>> We were a bit surprised that this
>> strategy improved performance considerably and that it worked much
>> better than other strategies that we tried before, both in terms of
>> scaling speed and performance impact during scaling.
>
> CL.ONE requests for rows which do not exist are very fast.
>
>> Would it be necessary (in a production environment) to stream the old 
>> SSTables from the other
>> four nodes at some point in time?
>
> Bootstrapping is necessary for consistency and durability, yes. If you were 
> to :
>
> 1) start new node without bootstrapping it
> 2) run "cleanup" compaction on the old node
>
> You would permanently delete the copy of the data that is no longer
> "supposed" to live on the old node. With a RF of 1, that data would be
> permanently gone. With a RF of >1 you have other copies, but if you
> never bootstrap while adding new nodes you are relatively likely to
> not be able to access those copies over time.
>
> =Rob


Scaling a cassandra cluster with auto_bootstrap set to false

2013-06-13 Thread Markus Klems
Hi Cassandra community,

we are currently experimenting with different Cassandra scaling
strategies. We observed that Cassandra performance decreases
drastically when we insert more data into the cluster (say, going from
60GB to 600GB in a 3-node cluster). So we want to find out how to deal
with this problem. One scaling strategy seems interesting but we don't
fully understand what is going on, yet. The strategy works like this:
add new nodes to a Cassandra cluster with "auto_bootstrap = false" to
avoid streaming to the new nodes. We were a bit surprised that this
strategy improved performance considerably and that it worked much
better than other strategies that we tried before, both in terms of
scaling speed and performance impact during scaling.

Let me share our little experiment with you:

In a initial setup S1 we have 4 nodes where each node is similar to
the Amazon EC2 large instance type, i.e., 4 cores, 15GB memory, 700GB
free disk space, Cassandra replication factor 2. Each node is loaded
with 10 million 1KB rows into a single column family, i.e., ~20 GB
data/node, using the Yahoo Cloud Serving Benchmark (YCSB) tool. All
Cassandra settings are default. In the setup S1 we achieved an average
throughput of ~800 ops/s. The workload is a 95/5 read/update mix with
a Zipfian workload distribution (= YCSB workload B).

Setup S2: We then added two empty nodes to our 4-node cluster with
auto_bootstrap set to false. The throughput that we observered
thereafter tripled from 800 ops/s to 2,400 ops/s. We looked at various
outputs from nodetool commands to understand this effect. On the new
nodes, "$ nodetool info" tells us that the keycache is empty; "$
nodetool cfstats" clearly shows write and read requests coming in. The
memtable columns count and data size are multiple times larger
compared to the other four nodes.

We are wondering: what exactly gets stored on the two new nodes in
setup S2 and where (cache, memtable, disk?). Would it be necessary (in
a production environment) to stream the old SSTables from the other
four nodes at some point in time? Or can we simply be happy with the
performance improvement and leave it like this? Are we missing
something here; can you advise us to look at specific monitoring data
to better understand the observed effect?

Thanks,

Markus Klems


Re: Astyanax

2013-01-08 Thread Markus Klems
The wiki? https://github.com/Netflix/astyanax/wiki


On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima wrote:

> Hi,
> Someone has or could indicate some good tutorial or book to learn Astyanax?
>
> Thanks
>
> --
> Everton Lima Aleixo
> Mestrando em Ciência da Computação pela UFG
> Programador no LUPA
>
>


ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

2011-03-21 Thread Markus Klems
Hi guys,

we are currently benchmarking various configurations of an EC2-based
Cassandra cluster. This is our current setup:

1) 8 nodes where each node is an m1.xlarge EC2 instance
2) Cassandra version 0.6.5
3) Replication Factor = 3
4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT
depending on the consistency level

We have been benchmarking the cluster with YCSB, while altering the
consistency levels ONE, QUORUM, and ALL, ceteris paribus. This works
fine if all nodes are alive. Then, we wanted to benchmark the cluster
performance behavior when one node goes down. So, we killed one node
and tested the cluster with consistency level ONE, which delivered
reasonable throughput of multiple thousand ops/sec. Then, we wanted to
test QUORUM and ALL. However, when one node is down, the cluster
throughput sharply drops to a few operations and then stops responding
to the YCSB client if the consistency level of operations in the
benchmark is set to QUORUM or ALL. For ALL, this behavior would (kind
of) make sense for read requests but we are puzzled that even QUORUM
won't work. And for 100% write operations in consistency level ALL it
won't work either.

Any ideas why the cluster stops responding for QUORUM and ALL?

Thanks,

Markus


Re: Benchmarking Cassandra with YCSB

2011-02-19 Thread Markus Klems
Sure will do. We are currently running a couple of benchmarks on
differently configured EC2 landscapes. We will share our results in
the next weeks.

On Sat, Feb 19, 2011 at 6:53 PM, Lior Golan  wrote:
> Can you share what numbers you are now getting?
>
> -Original Message-
> From: markuskl...@gmail.com [mailto:markuskl...@gmail.com] On Behalf Of 
> Markus Klems
> Sent: Saturday, February 19, 2011 10:53 AM
> To: user@cassandra.apache.org
> Subject: Re: Benchmarking Cassandra with YCSB
>
> Hi,
>
> we sorted out the performance problems and tuned the cluster. In
> particular, we identified the following weak spot in our setup:
> ConcurrentReads and ConcurrentWrites was set to the default values
> which were much too low for our setup. Now, we get some serious
> numbers.
>
> Thanks,
>
> Markus
>
> On Tue, Feb 15, 2011 at 9:09 PM, Aaron Morton  wrote:
>> Initial thoughts are you are overloading the cluster, are their any log 
>> lines about dropping messages?
>>
>> What is the schema, what settings do you have in Cassandra yaml  and what 
>> are CF stats telling you? E.g. Are you switching Memtables too quickly? What 
>> are the write latency numbers?
>>
>> Also 0.7 is much faster.
>>
>> Aaron
>>
>> On 16/02/2011, at 8:59 AM, Thibaut Britz  
>> wrote:
>>
>>> Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
>>> What's your CPU usage during these tests?
>>>
>>>
>>> On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
>>>> Hi there,
>>>>
>>>> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
>>>> High-Mem Quadruple Extra Large EC2 nodes
>>>> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
>>>> (replication factor is 3, random partitioner). We assigned 32 GB RAM
>>>> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
>>>> We also set the user count to a very large number via ulimit -u
>>>> 99.
>>>>
>>>> Our goal is to achieve max throughput by increasing YCSB's threadcount
>>>> parameter (i.e. the number of parallel benchmarking client threads).
>>>> However, this does only improve Cassandra throughput for low numbers
>>>> of threads. If we move to higher threadcounts, throughput does not
>>>> increase and even  decreases. Do you have any idea why this is
>>>> happening and possibly suggestions how to scale throughput to much
>>>> higher numbers? Why is throughput hitting a wall, anyways? And where
>>>> does the latency/throughput tradeoff come from?
>>>>
>>>> Here is our YCSB configuration:
>>>> recordcount=30
>>>> operationcount=100
>>>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>>>> readallfields=true
>>>> readproportion=0.5
>>>> updateproportion=0.5
>>>> scanproportion=0
>>>> insertproportion=0
>>>> threadcount= 500
>>>> target = 1
>>>> hosts=EC2-1,EC2-2,EC2-3
>>>> requestdistribution=uniform
>>>>
>>>> These are typical results for threadcount=1:
>>>> Loading workload...
>>>> Starting test.
>>>>  0 sec: 0 operations;
>>>>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>>>>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>>>>
>>>> These are typical results for threadcount=10:
>>>> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>>>>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>>>>
>>>> These are typical results for threadcount=100:
>>>> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>>>>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>>>>
>>>> These are typical results for threadcount=500:
>>>> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>>>>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
>>>> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>>>>
>>>> We never measured more than ~6000 ops/sec. Are there ways to tune
>>>> Cassandra that we are not aware of? We made some modification to the
>>>> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
>>>> switch to 0.7x or 0.8x. However, if this might solve the scaling
>>>> issues, we might consider to port our modifications to a newer
>>>> Cassandra version...
>>>>
>>>> Thanks,
>>>>
>>>> Markus Klems
>>>>
>>>> Karlsruhe Institute of Technology, Germany
>>>>
>>
>
>


Re: Benchmarking Cassandra with YCSB

2011-02-19 Thread Markus Klems
Hi,

we sorted out the performance problems and tuned the cluster. In
particular, we identified the following weak spot in our setup:
ConcurrentReads and ConcurrentWrites was set to the default values
which were much too low for our setup. Now, we get some serious
numbers.

Thanks,

Markus

On Tue, Feb 15, 2011 at 9:09 PM, Aaron Morton  wrote:
> Initial thoughts are you are overloading the cluster, are their any log lines 
> about dropping messages?
>
> What is the schema, what settings do you have in Cassandra yaml  and what are 
> CF stats telling you? E.g. Are you switching Memtables too quickly? What are 
> the write latency numbers?
>
> Also 0.7 is much faster.
>
> Aaron
>
> On 16/02/2011, at 8:59 AM, Thibaut Britz  
> wrote:
>
>> Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
>> What's your CPU usage during these tests?
>>
>>
>> On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
>>> Hi there,
>>>
>>> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
>>> High-Mem Quadruple Extra Large EC2 nodes
>>> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
>>> (replication factor is 3, random partitioner). We assigned 32 GB RAM
>>> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
>>> We also set the user count to a very large number via ulimit -u
>>> 99.
>>>
>>> Our goal is to achieve max throughput by increasing YCSB's threadcount
>>> parameter (i.e. the number of parallel benchmarking client threads).
>>> However, this does only improve Cassandra throughput for low numbers
>>> of threads. If we move to higher threadcounts, throughput does not
>>> increase and even  decreases. Do you have any idea why this is
>>> happening and possibly suggestions how to scale throughput to much
>>> higher numbers? Why is throughput hitting a wall, anyways? And where
>>> does the latency/throughput tradeoff come from?
>>>
>>> Here is our YCSB configuration:
>>> recordcount=30
>>> operationcount=100
>>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>>> readallfields=true
>>> readproportion=0.5
>>> updateproportion=0.5
>>> scanproportion=0
>>> insertproportion=0
>>> threadcount= 500
>>> target = 1
>>> hosts=EC2-1,EC2-2,EC2-3
>>> requestdistribution=uniform
>>>
>>> These are typical results for threadcount=1:
>>> Loading workload...
>>> Starting test.
>>>  0 sec: 0 operations;
>>>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
>>> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>>>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
>>> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>>>
>>> These are typical results for threadcount=10:
>>> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
>>> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>>>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
>>> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>>>
>>> These are typical results for threadcount=100:
>>> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
>>> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>>>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
>>> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>>>
>>> These are typical results for threadcount=500:
>>> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
>>> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>>>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
>>> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>>>
>>> We never measured more than ~6000 ops/sec. Are there ways to tune
>>> Cassandra that we are not aware of? We made some modification to the
>>> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
>>> switch to 0.7x or 0.8x. However, if this might solve the scaling
>>> issues, we might consider to port our modifications to a newer
>>> Cassandra version...
>>>
>>> Thanks,
>>>
>>> Markus Klems
>>>
>>> Karlsruhe Institute of Technology, Germany
>>>
>


Re: Understand eventually consistent

2011-02-18 Thread Markus Klems
Related question: Is it a good idea to specify ConsistencyLevels on a
per-operation basis? For example: Read ONE Write ALL would deliver
consistent read results, just like Read ALL Write ONE. However, if you
specify Read ONE Write QUORUM you cannot give such guarantees anymore.
Should there be (is there) a programming abstraction on top of
ConsistencyLevel that takes care of these things and makes them
explicit to the application developer?

On Fri, Feb 18, 2011 at 2:04 PM, Anthony John  wrote:
> At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ?
> But yes - if single node READs are opted for, it will go through.
> The original question was - "Why is Cassandra called eventually consistent
> data store?"
> Because at write time, there is not a guarantee that all replicas are
> consistent. But they eventually will be!
> At Quorum write and Read - you will not get inconsistent results and your
> read will force consistency, if such a state has not yet been arrived at for
> the particular piece of data.
> But you have the option of or writing and reading at a lower standard, which
> could result in inconsistencies.
> HTH,
> -JA
> On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood  wrote:
>>
>> But, the reason that it isn't safe to say that we are a strongly
>> consistent store is that if 2 of your 3 replicas were to die and come back
>> with no data, QUORUM might return the wrong result.
>> A requirement of a strongly consistent store is that replicas cannot begin
>> answering queries until they are consistent: this is not a requirement in
>> Cassandra, althought arguably should be an option at some point in the
>> distant future.
>>
>> On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton 
>> wrote:
>>>
>>> For background...
>>> http://wiki.apache.org/cassandra/ArchitectureOverview
>>> (There is a section on consistency in there)
>>> For  deep background...
>>> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
>>>
>>> http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
>>> In short, yes (for all your questions) if you read and write at Quorum
>>> you have consistency behavior for your operations. Even though some nodes
>>> may have an inconsistent view of the data, e.g. one node is partitioned
>>> by a broken network or is overloaded and does not respond.
>>>
>>> Aaron
>>> On 18 Feb, 2011,at 02:11 PM, mcasandra  wrote:
>>>
>>>
>>> Why is Cassandra called eventually consistent data store? Wouldn't it be
>>> consistent if QUORAM is used?
>>>
>>> Another question is when I specify replication factor of 3 and write with
>>> factor of 2 and read with factor of 2 then what happens?
>>>
>>> 1. When write occurs cassandra will return to the client only when the
>>> writes go to commit log on 2 nodes successfully?
>>>
>>> 2. When read happens cassandra will return only when it is able to read
>>> from
>>> 2 nodes and determine that it has consistent copy?
>>> --
>>> View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html
>>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>>> Nabble.com.
>>
>
>


Re: Benchmarking Cassandra with YCSB

2011-02-15 Thread Markus Klems
Good point. When we looked at the EC2 nodes, we measured 120% CPU utilization 
or so. We interpreted this as a false representation of CPU utilization on a 
multi-core machine. Our EC2 nodes have 8 virtual cores each.

Maybe Cassandra 0.6.5 is not so good with execution on multi-core systems?

On 15.02.2011, at 20:59, Thibaut Britz  wrote:

> Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
> What's your CPU usage during these tests?
> 
> 
> On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
>> Hi there,
>> 
>> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
>> High-Mem Quadruple Extra Large EC2 nodes
>> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
>> (replication factor is 3, random partitioner). We assigned 32 GB RAM
>> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
>> We also set the user count to a very large number via ulimit -u
>> 99.
>> 
>> Our goal is to achieve max throughput by increasing YCSB's threadcount
>> parameter (i.e. the number of parallel benchmarking client threads).
>> However, this does only improve Cassandra throughput for low numbers
>> of threads. If we move to higher threadcounts, throughput does not
>> increase and even  decreases. Do you have any idea why this is
>> happening and possibly suggestions how to scale throughput to much
>> higher numbers? Why is throughput hitting a wall, anyways? And where
>> does the latency/throughput tradeoff come from?
>> 
>> Here is our YCSB configuration:
>> recordcount=30
>> operationcount=100
>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>> readallfields=true
>> readproportion=0.5
>> updateproportion=0.5
>> scanproportion=0
>> insertproportion=0
>> threadcount= 500
>> target = 1
>> hosts=EC2-1,EC2-2,EC2-3
>> requestdistribution=uniform
>> 
>> These are typical results for threadcount=1:
>> Loading workload...
>> Starting test.
>>  0 sec: 0 operations;
>>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
>> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
>> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>> 
>> These are typical results for threadcount=10:
>> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
>> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
>> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>> 
>> These are typical results for threadcount=100:
>> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
>> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
>> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>> 
>> These are typical results for threadcount=500:
>> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
>> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
>> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>> 
>> We never measured more than ~6000 ops/sec. Are there ways to tune
>> Cassandra that we are not aware of? We made some modification to the
>> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
>> switch to 0.7x or 0.8x. However, if this might solve the scaling
>> issues, we might consider to port our modifications to a newer
>> Cassandra version...
>> 
>> Thanks,
>> 
>> Markus Klems
>> 
>> Karlsruhe Institute of Technology, Germany
>> 


Benchmarking Cassandra with YCSB

2011-02-15 Thread Markus Klems
Hi there,

we are currently benchmarking a Cassandra 0.6.5 cluster with 3
High-Mem Quadruple Extra Large EC2 nodes
(http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
(replication factor is 3, random partitioner). We assigned 32 GB RAM
to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
We also set the user count to a very large number via ulimit -u
99.

Our goal is to achieve max throughput by increasing YCSB's threadcount
parameter (i.e. the number of parallel benchmarking client threads).
However, this does only improve Cassandra throughput for low numbers
of threads. If we move to higher threadcounts, throughput does not
increase and even  decreases. Do you have any idea why this is
happening and possibly suggestions how to scale throughput to much
higher numbers? Why is throughput hitting a wall, anyways? And where
does the latency/throughput tradeoff come from?

Here is our YCSB configuration:
recordcount=30
operationcount=100
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
threadcount= 500
target = 1
hosts=EC2-1,EC2-2,EC2-3
requestdistribution=uniform

These are typical results for threadcount=1:
Loading workload...
Starting test.
 0 sec: 0 operations;
 10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
 20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]

These are typical results for threadcount=10:
10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
 20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]

These are typical results for threadcount=100:
10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
 20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]

These are typical results for threadcount=500:
10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
 20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]

We never measured more than ~6000 ops/sec. Are there ways to tune
Cassandra that we are not aware of? We made some modification to the
Cassandra 0.6.5 core for experimental reasons, so it's not easy to
switch to 0.7x or 0.8x. However, if this might solve the scaling
issues, we might consider to port our modifications to a newer
Cassandra version...

Thanks,

Markus Klems

Karlsruhe Institute of Technology, Germany