Re: Large number of row keys in query kills cluster
Just an FYI, my benchmarking of the new python driver, which uses the asynchronous CQL native transport, indicates that one can largely overcome client-to-node latency effects if you employ a suitable level of concurrency and non-blocking techniques. Of course response size and other factors come into play, but having a hundred or so queries simultaneously in the pipeline from each worker subprocess is a big help. On Thu, Jun 12, 2014 at 10:46 AM, Jeremy Jongsma wrote: > Good to know, thanks Peter. I am worried about client-to-node latency if I > have to do 20,000 individual queries, but that makes it clearer that at > least batching in smaller sizes is a good idea. > > > On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford > wrote: > >> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma >> wrote: >> >>> The big problem seems to have been requesting a large number of row keys >>> combined with a large number of named columns in a query. 20K rows with 20K >>> columns destroyed my cluster. Splitting it into slices of 100 sequential >>> queries fixed the performance issue. >>> >>> When updating 20K rows at a time, I saw a different issue - >>> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed >>> that issue. >>> >>> Is there any documentation on this? Obviously these limits will vary by >>> cluster capacity, but for new users it would be great to know that you can >>> run into problems with large queries, and how they present themselves when >>> you hit them. The errors I saw are pretty opaque, and took me a couple days >>> to track down. >>> >>> >> The first thing that comes to mind is the Multiget section on the >> Datastax anti-patterns page: >> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets >> >> >> >> -psanford >> >> >> >
Re: Large number of row keys in query kills cluster
Good to know, thanks Peter. I am worried about client-to-node latency if I have to do 20,000 individual queries, but that makes it clearer that at least batching in smaller sizes is a good idea. On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford wrote: > On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma > wrote: > >> The big problem seems to have been requesting a large number of row keys >> combined with a large number of named columns in a query. 20K rows with 20K >> columns destroyed my cluster. Splitting it into slices of 100 sequential >> queries fixed the performance issue. >> >> When updating 20K rows at a time, I saw a different issue - >> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed >> that issue. >> >> Is there any documentation on this? Obviously these limits will vary by >> cluster capacity, but for new users it would be great to know that you can >> run into problems with large queries, and how they present themselves when >> you hit them. The errors I saw are pretty opaque, and took me a couple days >> to track down. >> >> > The first thing that comes to mind is the Multiget section on the Datastax > anti-patterns page: > http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets > > > > -psanford > > >
Re: Large number of row keys in query kills cluster
On Wed, Jun 11, 2014 at 9:17 PM, Jack Krupansky wrote: > Hmmm... that multipl-gets section is not present in the 2.0 doc: > > http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html > > Was that intentional – is that anti-pattern no longer relevant to C* 2.0? > It is still there, but now it is written in terms of CQL. The section heading is now: "SELECT ... IN or index lookups". I specifically linked to the 1.2 docs because the question mentioned Astyanax.
Re: Large number of row keys in query kills cluster
Hmmm... that multipl-gets section is not present in the 2.0 doc: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Was that intentional – is that anti-pattern no longer relevant to C* 2.0? Matt’s slideshare refers to “unbounded batches” as an anti-pattern: http://www.slideshare.net/mattdennis -- Jack Krupansky From: Peter Sanford Sent: Wednesday, June 11, 2014 7:34 PM To: user@cassandra.apache.org Subject: Re: Large number of row keys in query kills cluster On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma wrote: The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100 sequential queries fixed the performance issue. When updating 20K rows at a time, I saw a different issue - BrokenPipeException from all nodes. Splitting into slices of 1000 fixed that issue. Is there any documentation on this? Obviously these limits will vary by cluster capacity, but for new users it would be great to know that you can run into problems with large queries, and how they present themselves when you hit them. The errors I saw are pretty opaque, and took me a couple days to track down. The first thing that comes to mind is the Multiget section on the Datastax anti-patterns page: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets -psanford
Re: Large number of row keys in query kills cluster
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma wrote: > The big problem seems to have been requesting a large number of row keys > combined with a large number of named columns in a query. 20K rows with 20K > columns destroyed my cluster. Splitting it into slices of 100 sequential > queries fixed the performance issue. > > When updating 20K rows at a time, I saw a different issue - > BrokenPipeException from all nodes. Splitting into slices of 1000 fixed > that issue. > > Is there any documentation on this? Obviously these limits will vary by > cluster capacity, but for new users it would be great to know that you can > run into problems with large queries, and how they present themselves when > you hit them. The errors I saw are pretty opaque, and took me a couple days > to track down. > > The first thing that comes to mind is the Multiget section on the Datastax anti-patterns page: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets -psanford
Re: Large number of row keys in query kills cluster
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma wrote: > Is there any documentation on this? Obviously these limits will vary by > cluster capacity, but for new users it would be great to know that you can > run into problems with large queries, and how they present themselves when > you hit them. The errors I saw are pretty opaque, and took me a couple days > to track down. > All operations in Cassandra are subject to timeouts denominated in seconds, defaulting to 10 seconds or less. This strongly suggests that operations which, for example, operate on 20,000 * 20,000 objects (400Mn) have a meaningful risk of failure, as they are difficult to accomplish within 10 seconds or less. Lunch is still not free. In fairness, CQL adds another non-helpful layer of opacity here; but what you get for it is accessibility and ease of first use. > In any case this seems like a bug to me - it shouldn't be possible to > completely lock up a cluster with a valid query that isn't doing a table > scan, should it? > There's lots of valid SQL queries which will "lock up" your server, for some values of "lock up"? =Rob
Re: Large number of row keys in query kills cluster
The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100 sequential queries fixed the performance issue. When updating 20K rows at a time, I saw a different issue - BrokenPipeException from all nodes. Splitting into slices of 1000 fixed that issue. Is there any documentation on this? Obviously these limits will vary by cluster capacity, but for new users it would be great to know that you can run into problems with large queries, and how they present themselves when you hit them. The errors I saw are pretty opaque, and took me a couple days to track down. In any case this seems like a bug to me - it shouldn't be possible to completely lock up a cluster with a valid query that isn't doing a table scan, should it? On Wed, Jun 11, 2014 at 9:33 AM, Jeremy Jongsma wrote: > I'm using Astyanax with a query like this: > > clusterContext > .getClient() > .getKeyspace("instruments") > .prepareQuery(INSTRUMENTS_CF) > .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM) > .getKeySlice(new String[] { > "ROW1", > "ROW2", > // 20,000 keys here... > "ROW2" > }) > .execute(); > > At the time this query executes the first time (resulting in unresponsive > cluster), there are zero rows in the column family. Schema is below, pretty > basic: > > CREATE KEYSPACE instruments WITH replication = { > 'class': 'NetworkTopologyStrategy', > 'aws-us-east-1': '2' > }; > > CREATE TABLE instruments ( > key bigint PRIMARY KEY, > definition blob, > id bigint, > name text, > symbol text, > updated bigint > ) WITH COMPACT STORAGE AND > bloom_filter_fp_chance=0.01 AND > caching='KEYS_ONLY' AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.10 AND > replicate_on_write='true' AND > populate_io_cache_on_flush='false' AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'SnappyCompressor'}; > > > > > On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael > wrote: > >> Perhaps if you described both the schema and the query in more detail, we >> could help... e.g. did the query have an IN clause with 2 keys? Or is >> the key compound? More detail will help. >> >> >> On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma >> wrote: >> >>> I didn't explain clearly - I'm not requesting 2 unknown keys >>> (resulting in a full scan), I'm requesting 2 specific rows by key. >>> On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote: >>> Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get all partition keys : SELECT DISTINCT FROM..., more details here : www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresponsive, to the point where I had to kill -9 the cassandra processes. Now clearly this query is not the best idea in the world, but the effects of it are a bit disturbing. What could be going on here? Are there any other query pitfalls I should be aware of that have the potential to explode the entire cluster? -j >>> >> >
Re: Large number of row keys in query kills cluster
I'm using Astyanax with a query like this: clusterContext .getClient() .getKeyspace("instruments") .prepareQuery(INSTRUMENTS_CF) .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM) .getKeySlice(new String[] { "ROW1", "ROW2", // 20,000 keys here... "ROW2" }) .execute(); At the time this query executes the first time (resulting in unresponsive cluster), there are zero rows in the column family. Schema is below, pretty basic: CREATE KEYSPACE instruments WITH replication = { 'class': 'NetworkTopologyStrategy', 'aws-us-east-1': '2' }; CREATE TABLE instruments ( key bigint PRIMARY KEY, definition blob, id bigint, name text, symbol text, updated bigint ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael wrote: > Perhaps if you described both the schema and the query in more detail, we > could help... e.g. did the query have an IN clause with 2 keys? Or is > the key compound? More detail will help. > > > On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma > wrote: > >> I didn't explain clearly - I'm not requesting 2 unknown keys >> (resulting in a full scan), I'm requesting 2 specific rows by key. >> On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote: >> >>> Hello Jeremy >>> >>> Basically what you are doing is to ask Cassandra to do a distributed >>> full scan on all the partitions across the cluster, it's normal that the >>> nodes are somehow stressed. >>> >>> How did you make the query? Are you using Thrift or CQL3 API? >>> >>> Please note that there is another way to get all partition keys : SELECT >>> DISTINCT FROM..., more details here : >>> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 >>> I ran an application today that attempted to fetch 20,000+ unique row >>> keys in one query against a set of completely empty column families. On a >>> 4-node cluster (EC2 m1.large instances) with the recommended memory >>> settings (2 GB heap), every single node immediately ran out of memory and >>> became unresponsive, to the point where I had to kill -9 the cassandra >>> processes. >>> >>> Now clearly this query is not the best idea in the world, but the >>> effects of it are a bit disturbing. What could be going on here? Are there >>> any other query pitfalls I should be aware of that have the potential to >>> explode the entire cluster? >>> >>> -j >>> >> >
Re: Large number of row keys in query kills cluster
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma wrote: > I didn't explain clearly - I'm not requesting 2 unknown keys > (resulting in a full scan), I'm requesting 2 specific rows by key. > On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote: > >> Hello Jeremy >> >> Basically what you are doing is to ask Cassandra to do a distributed full >> scan on all the partitions across the cluster, it's normal that the nodes >> are somehow stressed. >> >> How did you make the query? Are you using Thrift or CQL3 API? >> >> Please note that there is another way to get all partition keys : SELECT >> DISTINCT FROM..., more details here : >> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 >> I ran an application today that attempted to fetch 20,000+ unique row >> keys in one query against a set of completely empty column families. On a >> 4-node cluster (EC2 m1.large instances) with the recommended memory >> settings (2 GB heap), every single node immediately ran out of memory and >> became unresponsive, to the point where I had to kill -9 the cassandra >> processes. >> >> Now clearly this query is not the best idea in the world, but the effects >> of it are a bit disturbing. What could be going on here? Are there any >> other query pitfalls I should be aware of that have the potential to >> explode the entire cluster? >> >> -j >> >
Re: Large number of row keys in query kills cluster
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting in a full scan), I'm requesting 2 specific rows by key. On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote: > Hello Jeremy > > Basically what you are doing is to ask Cassandra to do a distributed full > scan on all the partitions across the cluster, it's normal that the nodes > are somehow stressed. > > How did you make the query? Are you using Thrift or CQL3 API? > > Please note that there is another way to get all partition keys : SELECT > DISTINCT FROM..., more details here : > www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 > I ran an application today that attempted to fetch 20,000+ unique row keys > in one query against a set of completely empty column families. On a 4-node > cluster (EC2 m1.large instances) with the recommended memory settings (2 GB > heap), every single node immediately ran out of memory and became > unresponsive, to the point where I had to kill -9 the cassandra processes. > > Now clearly this query is not the best idea in the world, but the effects > of it are a bit disturbing. What could be going on here? Are there any > other query pitfalls I should be aware of that have the potential to > explode the entire cluster? > > -j >
Re: Large number of row keys in query kills cluster
Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get all partition keys : SELECT DISTINCT FROM..., more details here : www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresponsive, to the point where I had to kill -9 the cassandra processes. Now clearly this query is not the best idea in the world, but the effects of it are a bit disturbing. What could be going on here? Are there any other query pitfalls I should be aware of that have the potential to explode the entire cluster? -j