Re: Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-16 Thread Arya Goudarzi
Hey'all,

As Jonathan pointed out in CASSANDRA-1199, this issue seams to be related to 
https://issues.apache.org/jira/browse/THRIFT-788. If you experience slowness 
with multiget_slice, take a look at that bug.

-Arya

- Original Message -
From: Arya Goudarzi agouda...@gaiaonline.com
To: user@cassandra.apache.org, jbellis jbel...@gmail.com
Sent: Wednesday, June 9, 2010 4:51:18 PM
Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice

Hi Jonathan,

This issue persists. I have prepared a code sample which you can use to
reproduce what I am saying. Please see attached. It is using Thrift PHP
libraries straight. I am running Cassandra 0.7 build from May 28th. I
have tried this on a single host with replication factor 1 and 3 node
cluster with replication factor 3. The results remains similar:

100 Sequential Writes took: 0.60781407356262 seconds;
100 Sequential Reads took: 0.23204588890076 seconds;
100 Batch Read took: 0.76512885093689 seconds;

Please advice.

Thank You,
-Arya

- Original Message -
From: Jonathan Ellis jbel...@gmail.com
To: user@cassandra.apache.org
Sent: Monday, June 7, 2010 7:26:30 PM
Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice

That would be surprising (and it is not what you said in the first
message). I suspect something is wrong with your test methodology.

On Mon, Jun 7, 2010 at 11:23 AM, Arya Goudarzi
agouda...@gaiaonline.com wrote:
 But I am not comparing reading 1 column vs 100 columns. I am comparing
 reading of 100 columns in loop iterations (100 consecutive calls) vs
 reading all 100 in batch in one call. Doing the loop is faster than
 doing the batch call. Are you saying this is not surprising?

 - Original Message -
 From: Jonathan Ellis jbel...@gmail.com
 To: user@cassandra.apache.org
 Sent: Saturday, June 5, 2010 6:26:46 AM
 Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice

 reading 1 column, is faster than reading lots of columns. this
 shouldn't be surprising.

 On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi
 agouda...@gaiaonline.com
 wrote:
 Hi Fellows,

 I have the following design for a system which holds basically
 key-value pairs (aka Columns) for each user (SuperColumn Key) in
 different namespaces
 (SuperColumnFamily row key).

 Like this:

 Namesapce-user-column_name = column_value;

 keyspaces:
     - name: NKVP
   replica_placement_strategy:
 org.apache.cassandra.locator.RackUnawareStrategy
   replication_factor: 3
   column_families:
     - name: Namespaces
   column_type: Super
   compare_with: BytesType
   compare_subcolumns_with: BytesType
           rows_cached: 2
           keys_cached: 100

 Cluster using random partitioner.

 I use multiget_slice() for fetching 1 or many columns inside the
 child supercolumn at the same time. This is an awkward performance
 result I
 get:

 100 sequential reads completed in : 0.383 this uses multiget_slice()
 with 1 key, and 1 column name inside the predicate-column_names
 100 batch loaded completed in : 0.786 this uses multiget_slice() with
 1 key, and multiple column names inside the predicate-column_names

 read/write consistency are ONE.

 Questions:

 Why doing 100 sequential reads is faster than doing 100 in batch?
 Is this a good design for my problem?
 Does my issue relate to
 https://issues.apache.org/jira/browse/CASSANDRA-598?

 Now on a single node with replication factor 1 I get this:

 100 sequential reads completed in : 0.438
 100 batch loaded completed in : 0.800

 Please advice as to why is this happening?

 These nodes are VMs. 1 CPU and 1 Gb.

 Best Regards,
 =Arya











 -- Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com




-- Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-07 Thread Arya Goudarzi
But I am not comparing reading 1 column vs 100 columns. I am comparing reading 
of 100 columns in loop iterations (100 consecutive calls) vs reading all 100 in 
batch in one call. Doing the loop is faster than doing the batch call. Are you 
saying this is not surprising? 

- Original Message -
From: Jonathan Ellis jbel...@gmail.com
To: user@cassandra.apache.org
Sent: Saturday, June 5, 2010 6:26:46 AM
Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice

reading 1 column, is faster than reading lots of columns. this
shouldn't be surprising.

On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com
wrote:
 Hi Fellows,

 I have the following design for a system which holds basically
 key-value pairs (aka Columns) for each user (SuperColumn Key) in
 different namespaces
 (SuperColumnFamily row key).

 Like this:

 Namesapce-user-column_name = column_value;

 keyspaces:
     - name: NKVP
   replica_placement_strategy:
 org.apache.cassandra.locator.RackUnawareStrategy
   replication_factor: 3
   column_families:
     - name: Namespaces
   column_type: Super
   compare_with: BytesType
   compare_subcolumns_with: BytesType
           rows_cached: 2
           keys_cached: 100

 Cluster using random partitioner.

 I use multiget_slice() for fetching 1 or many columns inside the child
 supercolumn at the same time. This is an awkward performance result I
 get:

 100 sequential reads completed in : 0.383 this uses multiget_slice()
 with 1 key, and 1 column name inside the predicate-column_names
 100 batch loaded completed in : 0.786 this uses multiget_slice() with
 1 key, and multiple column names inside the predicate-column_names

 read/write consistency are ONE.

 Questions:

 Why doing 100 sequential reads is faster than doing 100 in batch?
 Is this a good design for my problem?
 Does my issue relate to
 https://issues.apache.org/jira/browse/CASSANDRA-598?

 Now on a single node with replication factor 1 I get this:

 100 sequential reads completed in : 0.438
 100 batch loaded completed in : 0.800

 Please advice as to why is this happening?

 These nodes are VMs. 1 CPU and 1 Gb.

 Best Regards,
 =Arya











-- Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-07 Thread Jonathan Ellis
That would be surprising (and it is not what you said in the first
message).  I suspect something is wrong with your test methodology.

On Mon, Jun 7, 2010 at 11:23 AM, Arya Goudarzi agouda...@gaiaonline.com wrote:
 But I am not comparing reading 1 column vs 100 columns. I am comparing 
 reading of 100 columns in loop iterations (100 consecutive calls) vs reading 
 all 100 in batch in one call. Doing the loop is faster than doing the batch 
 call. Are you saying this is not surprising?

 - Original Message -
 From: Jonathan Ellis jbel...@gmail.com
 To: user@cassandra.apache.org
 Sent: Saturday, June 5, 2010 6:26:46 AM
 Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice

 reading 1 column, is faster than reading lots of columns. this
 shouldn't be surprising.

 On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com
 wrote:
 Hi Fellows,

 I have the following design for a system which holds basically
 key-value pairs (aka Columns) for each user (SuperColumn Key) in
 different namespaces
 (SuperColumnFamily row key).

 Like this:

 Namesapce-user-column_name = column_value;

 keyspaces:
     - name: NKVP
   replica_placement_strategy:
 org.apache.cassandra.locator.RackUnawareStrategy
   replication_factor: 3
   column_families:
     - name: Namespaces
   column_type: Super
   compare_with: BytesType
   compare_subcolumns_with: BytesType
           rows_cached: 2
           keys_cached: 100

 Cluster using random partitioner.

 I use multiget_slice() for fetching 1 or many columns inside the child
 supercolumn at the same time. This is an awkward performance result I
 get:

 100 sequential reads completed in : 0.383 this uses multiget_slice()
 with 1 key, and 1 column name inside the predicate-column_names
 100 batch loaded completed in : 0.786 this uses multiget_slice() with
 1 key, and multiple column names inside the predicate-column_names

 read/write consistency are ONE.

 Questions:

 Why doing 100 sequential reads is faster than doing 100 in batch?
 Is this a good design for my problem?
 Does my issue relate to
 https://issues.apache.org/jira/browse/CASSANDRA-598?

 Now on a single node with replication factor 1 I get this:

 100 sequential reads completed in : 0.438
 100 batch loaded completed in : 0.800

 Please advice as to why is this happening?

 These nodes are VMs. 1 CPU and 1 Gb.

 Best Regards,
 =Arya











 -- Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-05 Thread Jonathan Ellis
reading 1 column, is faster than reading lots of columns.  this
shouldn't be surprising.

On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com wrote:
 Hi Fellows,

 I have the following design for a system which holds basically key-value
 pairs (aka Columns) for each user (SuperColumn Key) in different namespaces
 (SuperColumnFamily row key).

 Like this:

 Namesapce-user-column_name = column_value;

 keyspaces:
     - name: NKVP
   replica_placement_strategy:
 org.apache.cassandra.locator.RackUnawareStrategy
   replication_factor: 3
   column_families:
     - name: Namespaces
   column_type: Super
   compare_with: BytesType
   compare_subcolumns_with: BytesType
           rows_cached: 2
           keys_cached: 100

 Cluster using random partitioner.

 I use multiget_slice() for fetching 1 or many columns inside the child
 supercolumn at the same time. This is an awkward performance result I get:

 100 sequential reads completed in : 0.383    this uses multiget_slice() with
 1 key, and 1 column name inside the predicate-column_names
 100 batch loaded completed in : 0.786 this uses multiget_slice() with 1
 key, and multiple column names inside the predicate-column_names

 read/write consistency are ONE.

 Questions:

 Why doing 100 sequential reads is faster than doing 100 in batch?
 Is this a good design for my problem?
 Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598?

 Now on a single node with replication factor 1 I get this:

 100 sequential reads completed in : 0.438
 100 batch loaded completed in : 0.800

 Please advice as to why is this happening?

 These nodes are VMs. 1 CPU and 1 Gb.

 Best Regards,
 =Arya











-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-04 Thread Arya Goudarzi

Hi Fellows, 

I have the following design for a system which holds basically key-value pairs 
(aka Columns) for each user (SuperColumn Key) in different namespaces 
(SuperColumnFamily row key). 

Like this: 

Namesapce-user-column_name = column_value; 

keyspaces: 
- name: NKVP 
replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy 
replication_factor: 3 
column_families: 
- name: Namespaces 
column_type: Super 
compare_with: BytesType 
compare_subcolumns_with: BytesType 
rows_cached: 2 
keys_cached: 100 

Cluster using random partitioner. 

I use multiget_slice() for fetching 1 or many columns inside the child 
supercolumn at the same time. This is an awkward performance result I get: 

100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 
key, and 1 column name inside the predicate-column_names 
100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, 
and multiple column names inside the predicate-column_names 

read/write consistency are ONE. 

Questions: 

Why doing 100 sequential reads is faster than doing 100 in batch? 
Is this a good design for my problem? 
Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? 

Now on a single node with replication factor 1 I get this: 

100 sequential reads completed in : 0.438 
100 batch loaded completed in : 0.800 

Please advice as to why is this happening? 

These nodes are VMs. 1 CPU and 1 Gb. 

Best Regards, 
=Arya