Re: Strage Read Perfoamnce 1xN column slice or N column slice
Hey'all, As Jonathan pointed out in CASSANDRA-1199, this issue seams to be related to https://issues.apache.org/jira/browse/THRIFT-788. If you experience slowness with multiget_slice, take a look at that bug. -Arya - Original Message - From: Arya Goudarzi agouda...@gaiaonline.com To: user@cassandra.apache.org, jbellis jbel...@gmail.com Sent: Wednesday, June 9, 2010 4:51:18 PM Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice Hi Jonathan, This issue persists. I have prepared a code sample which you can use to reproduce what I am saying. Please see attached. It is using Thrift PHP libraries straight. I am running Cassandra 0.7 build from May 28th. I have tried this on a single host with replication factor 1 and 3 node cluster with replication factor 3. The results remains similar: 100 Sequential Writes took: 0.60781407356262 seconds; 100 Sequential Reads took: 0.23204588890076 seconds; 100 Batch Read took: 0.76512885093689 seconds; Please advice. Thank You, -Arya - Original Message - From: Jonathan Ellis jbel...@gmail.com To: user@cassandra.apache.org Sent: Monday, June 7, 2010 7:26:30 PM Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice That would be surprising (and it is not what you said in the first message). I suspect something is wrong with your test methodology. On Mon, Jun 7, 2010 at 11:23 AM, Arya Goudarzi agouda...@gaiaonline.com wrote: But I am not comparing reading 1 column vs 100 columns. I am comparing reading of 100 columns in loop iterations (100 consecutive calls) vs reading all 100 in batch in one call. Doing the loop is faster than doing the batch call. Are you saying this is not surprising? - Original Message - From: Jonathan Ellis jbel...@gmail.com To: user@cassandra.apache.org Sent: Saturday, June 5, 2010 6:26:46 AM Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice reading 1 column, is faster than reading lots of columns. this shouldn't be surprising. On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com wrote: Hi Fellows, I have the following design for a system which holds basically key-value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce-user-column_name = column_value; keyspaces: - name: NKVP replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy replication_factor: 3 column_families: - name: Namespaces column_type: Super compare_with: BytesType compare_subcolumns_with: BytesType rows_cached: 2 keys_cached: 100 Cluster using random partitioner. I use multiget_slice() for fetching 1 or many columns inside the child supercolumn at the same time. This is an awkward performance result I get: 100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 key, and 1 column name inside the predicate-column_names 100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, and multiple column names inside the predicate-column_names read/write consistency are ONE. Questions: Why doing 100 sequential reads is faster than doing 100 in batch? Is this a good design for my problem? Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? Now on a single node with replication factor 1 I get this: 100 sequential reads completed in : 0.438 100 batch loaded completed in : 0.800 Please advice as to why is this happening? These nodes are VMs. 1 CPU and 1 Gb. Best Regards, =Arya -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Strage Read Perfoamnce 1xN column slice or N column slice
But I am not comparing reading 1 column vs 100 columns. I am comparing reading of 100 columns in loop iterations (100 consecutive calls) vs reading all 100 in batch in one call. Doing the loop is faster than doing the batch call. Are you saying this is not surprising? - Original Message - From: Jonathan Ellis jbel...@gmail.com To: user@cassandra.apache.org Sent: Saturday, June 5, 2010 6:26:46 AM Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice reading 1 column, is faster than reading lots of columns. this shouldn't be surprising. On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com wrote: Hi Fellows, I have the following design for a system which holds basically key-value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce-user-column_name = column_value; keyspaces: - name: NKVP replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy replication_factor: 3 column_families: - name: Namespaces column_type: Super compare_with: BytesType compare_subcolumns_with: BytesType rows_cached: 2 keys_cached: 100 Cluster using random partitioner. I use multiget_slice() for fetching 1 or many columns inside the child supercolumn at the same time. This is an awkward performance result I get: 100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 key, and 1 column name inside the predicate-column_names 100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, and multiple column names inside the predicate-column_names read/write consistency are ONE. Questions: Why doing 100 sequential reads is faster than doing 100 in batch? Is this a good design for my problem? Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? Now on a single node with replication factor 1 I get this: 100 sequential reads completed in : 0.438 100 batch loaded completed in : 0.800 Please advice as to why is this happening? These nodes are VMs. 1 CPU and 1 Gb. Best Regards, =Arya -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Strage Read Perfoamnce 1xN column slice or N column slice
That would be surprising (and it is not what you said in the first message). I suspect something is wrong with your test methodology. On Mon, Jun 7, 2010 at 11:23 AM, Arya Goudarzi agouda...@gaiaonline.com wrote: But I am not comparing reading 1 column vs 100 columns. I am comparing reading of 100 columns in loop iterations (100 consecutive calls) vs reading all 100 in batch in one call. Doing the loop is faster than doing the batch call. Are you saying this is not surprising? - Original Message - From: Jonathan Ellis jbel...@gmail.com To: user@cassandra.apache.org Sent: Saturday, June 5, 2010 6:26:46 AM Subject: Re: Strage Read Perfoamnce 1xN column slice or N column slice reading 1 column, is faster than reading lots of columns. this shouldn't be surprising. On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com wrote: Hi Fellows, I have the following design for a system which holds basically key-value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce-user-column_name = column_value; keyspaces: - name: NKVP replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy replication_factor: 3 column_families: - name: Namespaces column_type: Super compare_with: BytesType compare_subcolumns_with: BytesType rows_cached: 2 keys_cached: 100 Cluster using random partitioner. I use multiget_slice() for fetching 1 or many columns inside the child supercolumn at the same time. This is an awkward performance result I get: 100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 key, and 1 column name inside the predicate-column_names 100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, and multiple column names inside the predicate-column_names read/write consistency are ONE. Questions: Why doing 100 sequential reads is faster than doing 100 in batch? Is this a good design for my problem? Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? Now on a single node with replication factor 1 I get this: 100 sequential reads completed in : 0.438 100 batch loaded completed in : 0.800 Please advice as to why is this happening? These nodes are VMs. 1 CPU and 1 Gb. Best Regards, =Arya -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Strage Read Perfoamnce 1xN column slice or N column slice
reading 1 column, is faster than reading lots of columns. this shouldn't be surprising. On Fri, Jun 4, 2010 at 3:52 PM, Arya Goudarzi agouda...@gaiaonline.com wrote: Hi Fellows, I have the following design for a system which holds basically key-value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce-user-column_name = column_value; keyspaces: - name: NKVP replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy replication_factor: 3 column_families: - name: Namespaces column_type: Super compare_with: BytesType compare_subcolumns_with: BytesType rows_cached: 2 keys_cached: 100 Cluster using random partitioner. I use multiget_slice() for fetching 1 or many columns inside the child supercolumn at the same time. This is an awkward performance result I get: 100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 key, and 1 column name inside the predicate-column_names 100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, and multiple column names inside the predicate-column_names read/write consistency are ONE. Questions: Why doing 100 sequential reads is faster than doing 100 in batch? Is this a good design for my problem? Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? Now on a single node with replication factor 1 I get this: 100 sequential reads completed in : 0.438 100 batch loaded completed in : 0.800 Please advice as to why is this happening? These nodes are VMs. 1 CPU and 1 Gb. Best Regards, =Arya -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Strage Read Perfoamnce 1xN column slice or N column slice
Hi Fellows, I have the following design for a system which holds basically key-value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce-user-column_name = column_value; keyspaces: - name: NKVP replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy replication_factor: 3 column_families: - name: Namespaces column_type: Super compare_with: BytesType compare_subcolumns_with: BytesType rows_cached: 2 keys_cached: 100 Cluster using random partitioner. I use multiget_slice() for fetching 1 or many columns inside the child supercolumn at the same time. This is an awkward performance result I get: 100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 key, and 1 column name inside the predicate-column_names 100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, and multiple column names inside the predicate-column_names read/write consistency are ONE. Questions: Why doing 100 sequential reads is faster than doing 100 in batch? Is this a good design for my problem? Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? Now on a single node with replication factor 1 I get this: 100 sequential reads completed in : 0.438 100 batch loaded completed in : 0.800 Please advice as to why is this happening? These nodes are VMs. 1 CPU and 1 Gb. Best Regards, =Arya