Re: Large number of row keys in query kills cluster

2014-06-12 Thread Laing, Michael
Just an FYI, my benchmarking of the new python driver, which uses the
asynchronous CQL native transport, indicates that one can largely overcome
client-to-node latency effects if you employ a suitable level of
concurrency and non-blocking techniques.

Of course response size and other factors come into play, but having a
hundred or so queries simultaneously in the pipeline from each worker
subprocess is a big help.


On Thu, Jun 12, 2014 at 10:46 AM, Jeremy Jongsma 
wrote:

> Good to know, thanks Peter. I am worried about client-to-node latency if I
> have to do 20,000 individual queries, but that makes it clearer that at
> least batching in smaller sizes is a good idea.
>
>
> On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford 
> wrote:
>
>> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma 
>> wrote:
>>
>>> The big problem seems to have been requesting a large number of row keys
>>> combined with a large number of named columns in a query. 20K rows with 20K
>>> columns destroyed my cluster. Splitting it into slices of 100 sequential
>>> queries fixed the performance issue.
>>>
>>> When updating 20K rows at a time, I saw a different issue -
>>> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
>>> that issue.
>>>
>>> Is there any documentation on this? Obviously these limits will vary by
>>> cluster capacity, but for new users it would be great to know that you can
>>> run into problems with large queries, and how they present themselves when
>>> you hit them. The errors I saw are pretty opaque, and took me a couple days
>>> to track down.
>>>
>>>
>> The first thing that comes to mind is the Multiget section on the
>> Datastax anti-patterns page:
>> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets
>>
>>
>>
>> -psanford
>>
>>
>>
>


Re: Large number of row keys in query kills cluster

2014-06-12 Thread Jeremy Jongsma
Good to know, thanks Peter. I am worried about client-to-node latency if I
have to do 20,000 individual queries, but that makes it clearer that at
least batching in smaller sizes is a good idea.


On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford 
wrote:

> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma 
> wrote:
>
>> The big problem seems to have been requesting a large number of row keys
>> combined with a large number of named columns in a query. 20K rows with 20K
>> columns destroyed my cluster. Splitting it into slices of 100 sequential
>> queries fixed the performance issue.
>>
>> When updating 20K rows at a time, I saw a different issue -
>> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
>> that issue.
>>
>> Is there any documentation on this? Obviously these limits will vary by
>> cluster capacity, but for new users it would be great to know that you can
>> run into problems with large queries, and how they present themselves when
>> you hit them. The errors I saw are pretty opaque, and took me a couple days
>> to track down.
>>
>>
> The first thing that comes to mind is the Multiget section on the Datastax
> anti-patterns page:
> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets
>
>
>
> -psanford
>
>
>


Re: Large number of row keys in query kills cluster

2014-06-12 Thread Peter Sanford
On Wed, Jun 11, 2014 at 9:17 PM, Jack Krupansky 
wrote:

>   Hmmm... that multipl-gets section is not present in the 2.0 doc:
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
>
> Was that intentional – is that anti-pattern no longer relevant to C* 2.0?
>

It is still there, but now it is written in terms of CQL. The section
heading is now: "SELECT ... IN or index lookups".

I specifically linked to the 1.2 docs because the question
mentioned Astyanax.


Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jack Krupansky
Hmmm... that multipl-gets section is not present in the 2.0 doc:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html

Was that intentional – is that anti-pattern no longer relevant to C* 2.0?

Matt’s slideshare refers to “unbounded batches” as an anti-pattern:
http://www.slideshare.net/mattdennis

-- Jack Krupansky

From: Peter Sanford 
Sent: Wednesday, June 11, 2014 7:34 PM
To: user@cassandra.apache.org 
Subject: Re: Large number of row keys in query kills cluster

On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma  wrote:

  The big problem seems to have been requesting a large number of row keys 
combined with a large number of named columns in a query. 20K rows with 20K 
columns destroyed my cluster. Splitting it into slices of 100 sequential 
queries fixed the performance issue. 

  When updating 20K rows at a time, I saw a different issue - 
BrokenPipeException from all nodes. Splitting into slices of 1000 fixed that 
issue.


  Is there any documentation on this? Obviously these limits will vary by 
cluster capacity, but for new users it would be great to know that you can run 
into problems with large queries, and how they present themselves when you hit 
them. The errors I saw are pretty opaque, and took me a couple days to track 
down.


The first thing that comes to mind is the Multiget section on the Datastax 
anti-patterns page: 
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets



-psanford



Re: Large number of row keys in query kills cluster

2014-06-11 Thread Peter Sanford
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma 
wrote:

> The big problem seems to have been requesting a large number of row keys
> combined with a large number of named columns in a query. 20K rows with 20K
> columns destroyed my cluster. Splitting it into slices of 100 sequential
> queries fixed the performance issue.
>
> When updating 20K rows at a time, I saw a different issue -
> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
> that issue.
>
> Is there any documentation on this? Obviously these limits will vary by
> cluster capacity, but for new users it would be great to know that you can
> run into problems with large queries, and how they present themselves when
> you hit them. The errors I saw are pretty opaque, and took me a couple days
> to track down.
>
>
The first thing that comes to mind is the Multiget section on the Datastax
anti-patterns page:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets



-psanford


Re: Large number of row keys in query kills cluster

2014-06-11 Thread Robert Coli
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma 
wrote:

> Is there any documentation on this? Obviously these limits will vary by
> cluster capacity, but for new users it would be great to know that you can
> run into problems with large queries, and how they present themselves when
> you hit them. The errors I saw are pretty opaque, and took me a couple days
> to track down.
>

All operations in Cassandra are subject to timeouts denominated in seconds,
defaulting to 10 seconds or less. This strongly suggests that operations
which, for example, operate on 20,000 * 20,000 objects (400Mn) have a
meaningful risk of failure, as they are difficult to accomplish within 10
seconds or less. Lunch is still not free.

In fairness, CQL adds another non-helpful layer of opacity here; but what
you get for it is accessibility and ease of first use.


> In any case this seems like a bug to me - it shouldn't be possible to
> completely lock up a cluster with a valid query that isn't doing a table
> scan, should it?
>

There's lots of valid SQL queries which will "lock up" your server, for
some values of "lock up"?

=Rob


Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
The big problem seems to have been requesting a large number of row keys
combined with a large number of named columns in a query. 20K rows with 20K
columns destroyed my cluster. Splitting it into slices of 100 sequential
queries fixed the performance issue.

When updating 20K rows at a time, I saw a different issue -
BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
that issue.

Is there any documentation on this? Obviously these limits will vary by
cluster capacity, but for new users it would be great to know that you can
run into problems with large queries, and how they present themselves when
you hit them. The errors I saw are pretty opaque, and took me a couple days
to track down.

In any case this seems like a bug to me - it shouldn't be possible to
completely lock up a cluster with a valid query that isn't doing a table
scan, should it?


On Wed, Jun 11, 2014 at 9:33 AM, Jeremy Jongsma  wrote:

> I'm using Astyanax with a query like this:
>
> clusterContext
>   .getClient()
>   .getKeyspace("instruments")
>   .prepareQuery(INSTRUMENTS_CF)
>   .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM)
>   .getKeySlice(new String[] {
> "ROW1",
> "ROW2",
> // 20,000 keys here...
> "ROW2"
>   })
>   .execute();
>
> At the time this query executes the first time (resulting in unresponsive
> cluster), there are zero rows in the column family. Schema is below, pretty
> basic:
>
> CREATE KEYSPACE instruments WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'aws-us-east-1': '2'
> };
>
> CREATE TABLE instruments (
>   key bigint PRIMARY KEY,
>   definition blob,
>   id bigint,
>   name text,
>   symbol text,
>   updated bigint
> ) WITH COMPACT STORAGE AND
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'SnappyCompressor'};
>
>
>
>
> On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael  > wrote:
>
>> Perhaps if you described both the schema and the query in more detail, we
>> could help... e.g. did the query have an IN clause with 2 keys? Or is
>> the key compound? More detail will help.
>>
>>
>> On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma 
>> wrote:
>>
>>> I didn't explain clearly - I'm not requesting 2 unknown keys
>>> (resulting in a full scan), I'm requesting 2 specific rows by key.
>>> On Jun 10, 2014 6:02 PM, "DuyHai Doan"  wrote:
>>>
 Hello Jeremy

 Basically what you are doing is to ask Cassandra to do a distributed
 full scan on all the partitions across the cluster, it's normal that the
 nodes are somehow stressed.

 How did you make the query? Are you using Thrift or CQL3 API?

 Please note that there is another way to get all partition keys :
 SELECT DISTINCT  FROM..., more details here :
 www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
 I ran an application today that attempted to fetch 20,000+ unique row
 keys in one query against a set of completely empty column families. On a
 4-node cluster (EC2 m1.large instances) with the recommended memory
 settings (2 GB heap), every single node immediately ran out of memory and
 became unresponsive, to the point where I had to kill -9 the cassandra
 processes.

 Now clearly this query is not the best idea in the world, but the
 effects of it are a bit disturbing. What could be going on here? Are there
 any other query pitfalls I should be aware of that have the potential to
 explode the entire cluster?

 -j

>>>
>>
>


Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
I'm using Astyanax with a query like this:

clusterContext
  .getClient()
  .getKeyspace("instruments")
  .prepareQuery(INSTRUMENTS_CF)
  .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM)
  .getKeySlice(new String[] {
"ROW1",
"ROW2",
// 20,000 keys here...
"ROW2"
  })
  .execute();

At the time this query executes the first time (resulting in unresponsive
cluster), there are zero rows in the column family. Schema is below, pretty
basic:

CREATE KEYSPACE instruments WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'aws-us-east-1': '2'
};

CREATE TABLE instruments (
  key bigint PRIMARY KEY,
  definition blob,
  id bigint,
  name text,
  symbol text,
  updated bigint
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};




On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael 
wrote:

> Perhaps if you described both the schema and the query in more detail, we
> could help... e.g. did the query have an IN clause with 2 keys? Or is
> the key compound? More detail will help.
>
>
> On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma 
> wrote:
>
>> I didn't explain clearly - I'm not requesting 2 unknown keys
>> (resulting in a full scan), I'm requesting 2 specific rows by key.
>> On Jun 10, 2014 6:02 PM, "DuyHai Doan"  wrote:
>>
>>> Hello Jeremy
>>>
>>> Basically what you are doing is to ask Cassandra to do a distributed
>>> full scan on all the partitions across the cluster, it's normal that the
>>> nodes are somehow stressed.
>>>
>>> How did you make the query? Are you using Thrift or CQL3 API?
>>>
>>> Please note that there is another way to get all partition keys : SELECT
>>> DISTINCT  FROM..., more details here :
>>> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
>>> I ran an application today that attempted to fetch 20,000+ unique row
>>> keys in one query against a set of completely empty column families. On a
>>> 4-node cluster (EC2 m1.large instances) with the recommended memory
>>> settings (2 GB heap), every single node immediately ran out of memory and
>>> became unresponsive, to the point where I had to kill -9 the cassandra
>>> processes.
>>>
>>> Now clearly this query is not the best idea in the world, but the
>>> effects of it are a bit disturbing. What could be going on here? Are there
>>> any other query pitfalls I should be aware of that have the potential to
>>> explode the entire cluster?
>>>
>>> -j
>>>
>>
>


Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael
Perhaps if you described both the schema and the query in more detail, we
could help... e.g. did the query have an IN clause with 2 keys? Or is
the key compound? More detail will help.


On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma  wrote:

> I didn't explain clearly - I'm not requesting 2 unknown keys
> (resulting in a full scan), I'm requesting 2 specific rows by key.
> On Jun 10, 2014 6:02 PM, "DuyHai Doan"  wrote:
>
>> Hello Jeremy
>>
>> Basically what you are doing is to ask Cassandra to do a distributed full
>> scan on all the partitions across the cluster, it's normal that the nodes
>> are somehow stressed.
>>
>> How did you make the query? Are you using Thrift or CQL3 API?
>>
>> Please note that there is another way to get all partition keys : SELECT
>> DISTINCT  FROM..., more details here :
>> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
>> I ran an application today that attempted to fetch 20,000+ unique row
>> keys in one query against a set of completely empty column families. On a
>> 4-node cluster (EC2 m1.large instances) with the recommended memory
>> settings (2 GB heap), every single node immediately ran out of memory and
>> became unresponsive, to the point where I had to kill -9 the cassandra
>> processes.
>>
>> Now clearly this query is not the best idea in the world, but the effects
>> of it are a bit disturbing. What could be going on here? Are there any
>> other query pitfalls I should be aware of that have the potential to
>> explode the entire cluster?
>>
>> -j
>>
>


Re: Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting
in a full scan), I'm requesting 2 specific rows by key.
On Jun 10, 2014 6:02 PM, "DuyHai Doan"  wrote:

> Hello Jeremy
>
> Basically what you are doing is to ask Cassandra to do a distributed full
> scan on all the partitions across the cluster, it's normal that the nodes
> are somehow stressed.
>
> How did you make the query? Are you using Thrift or CQL3 API?
>
> Please note that there is another way to get all partition keys : SELECT
> DISTINCT  FROM..., more details here :
> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
> I ran an application today that attempted to fetch 20,000+ unique row keys
> in one query against a set of completely empty column families. On a 4-node
> cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
> heap), every single node immediately ran out of memory and became
> unresponsive, to the point where I had to kill -9 the cassandra processes.
>
> Now clearly this query is not the best idea in the world, but the effects
> of it are a bit disturbing. What could be going on here? Are there any
> other query pitfalls I should be aware of that have the potential to
> explode the entire cluster?
>
> -j
>


Re: Large number of row keys in query kills cluster

2014-06-10 Thread DuyHai Doan
Hello Jeremy

Basically what you are doing is to ask Cassandra to do a distributed full
scan on all the partitions across the cluster, it's normal that the nodes
are somehow stressed.

How did you make the query? Are you using Thrift or CQL3 API?

Please note that there is another way to get all partition keys : SELECT
DISTINCT  FROM..., more details here :
www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
I ran an application today that attempted to fetch 20,000+ unique row keys
in one query against a set of completely empty column families. On a 4-node
cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
heap), every single node immediately ran out of memory and became
unresponsive, to the point where I had to kill -9 the cassandra processes.

Now clearly this query is not the best idea in the world, but the effects
of it are a bit disturbing. What could be going on here? Are there any
other query pitfalls I should be aware of that have the potential to
explode the entire cluster?

-j