Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Hello Jay Your query is : select * from keyspaceuser.company_testusers where lastname = ‘lau’ LIMIT 1 Why do you think that the slowness is due to vnodes and not your query asking for 10 000 results ? On Fri, Sep 19, 2014 at 3:33 AM, Jay Patel pateljay3...@gmail.com wrote: Hi there, We are seeing extreme slow down (500ms to 1s) in query on secondary index with vnode. I'm seeing multiple secondary index scans on a given node in trace output when vnode is enabled. Without vnode, everything is good. Cluster size: 6 nodes Replication factor: 3 Consistency level: local_quorum. Same behavior happens with consistency level of ONE. Snippet from the trace output. Pls see the attached output1.txt for the full log. Are we hitting any bug? Do not understand why coordinator sends requests multiple times to the same node (e.g. 192.168.51.22 in below output) for different token ranges. Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing indexed scan for (max(-8929774302283364912), max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 | Executing indexed scan for (max(-8854653908608918942), max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 | Executing indexed scan for (max(-8762620856967633953), max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing indexed scan for (max(-8668275030769104047), max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 | Executing indexed scan for (max(-8659066486210615614), max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 | Executing indexed scan for (max(-8419137646248370231), max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 | Executing indexed scan for (max(-8416786876632807845), max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing indexed scan for (max(-8315889933848495185), max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing indexed scan for (max(-8270922890152952193), max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 | Executing indexed scan for (max(-8260813759533312175), max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 | Executing indexed scan for (max(-8234845345932129353), max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 | Thanks, Jay
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote: Hi there, We are seeing extreme slow down (500ms to 1s) in query on secondary index with vnode. I'm seeing multiple secondary index scans on a given node in trace output when vnode is enabled. Without vnode, everything is good. Cluster size: 6 nodes Replication factor: 3 Consistency level: local_quorum. Same behavior happens with consistency level of ONE. Snippet from the trace output. Pls see the attached output1.txt for the full log. Are we hitting any bug? Do not understand why coordinator sends requests multiple times to the same node (e.g. 192.168.51.22 in below output) for different token ranges. Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing indexed scan for (max(-8929774302283364912), max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 | Executing indexed scan for (max(-8854653908608918942), max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 | Executing indexed scan for (max(-8762620856967633953), max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing indexed scan for (max(-8668275030769104047), max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 | Executing indexed scan for (max(-8659066486210615614), max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 | Executing indexed scan for (max(-8419137646248370231), max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 | Executing indexed scan for (max(-8416786876632807845), max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing indexed scan for (max(-8315889933848495185), max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing indexed scan for (max(-8270922890152952193), max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 | Executing indexed scan for (max(-8260813759533312175), max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 | Executing indexed scan for (max(-8234845345932129353), max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 | Thanks, Jay -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
RE: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Agreed. We only use secondary indexes for column families that are relatively small (~5k rows). For anything larger, we store the data into a wide row (but this depends on your data model) -Original Message- From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of Jonathan Haddad Sent: Friday, September 19, 2014 4:01 AM To: user@cassandra.apache.org Subject: Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6). Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote: Hi there, We are seeing extreme slow down (500ms to 1s) in query on secondary index with vnode. I'm seeing multiple secondary index scans on a given node in trace output when vnode is enabled. Without vnode, everything is good. Cluster size: 6 nodes Replication factor: 3 Consistency level: local_quorum. Same behavior happens with consistency level of ONE. Snippet from the trace output. Pls see the attached output1.txt for the full log. Are we hitting any bug? Do not understand why coordinator sends requests multiple times to the same node (e.g. 192.168.51.22 in below output) for different token ranges. Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing indexed scan for (max(-8929774302283364912), max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 | Executing indexed scan for (max(-8854653908608918942), max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 | Executing indexed scan for (max(-8762620856967633953), max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing indexed scan for (max(-8668275030769104047), max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 | Executing indexed scan for (max(-8659066486210615614), max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 | Executing indexed scan for (max(-8419137646248370231), max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 | Executing indexed scan for (max(-8416786876632807845), max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing indexed scan for (max(-8315889933848495185), max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing indexed scan for (max(-8270922890152952193), max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 | Executing indexed scan for (max(-8260813759533312175), max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 | Executing indexed scan for (max(-8234845345932129353), max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 | Thanks, Jay -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Jon's advice is definitely still true, but in 2.1 there is https://issues.apache.org/jira/browse/CASSANDRA-1337, which parallelizes the fetching of ranges. On Fri, Sep 19, 2014 at 6:57 AM, Parag Patel ppa...@clearpoolgroup.com wrote: Agreed. We only use secondary indexes for column families that are relatively small (~5k rows). For anything larger, we store the data into a wide row (but this depends on your data model) -Original Message- From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of Jonathan Haddad Sent: Friday, September 19, 2014 4:01 AM To: user@cassandra.apache.org Subject: Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6). Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote: Hi there, We are seeing extreme slow down (500ms to 1s) in query on secondary index with vnode. I'm seeing multiple secondary index scans on a given node in trace output when vnode is enabled. Without vnode, everything is good. Cluster size: 6 nodes Replication factor: 3 Consistency level: local_quorum. Same behavior happens with consistency level of ONE. Snippet from the trace output. Pls see the attached output1.txt for the full log. Are we hitting any bug? Do not understand why coordinator sends requests multiple times to the same node (e.g. 192.168.51.22 in below output) for different token ranges. Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing indexed scan for (max(-8929774302283364912), max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 | Executing indexed scan for (max(-8854653908608918942), max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 | Executing indexed scan for (max(-8762620856967633953), max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing indexed scan for (max(-8668275030769104047), max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 | Executing indexed scan for (max(-8659066486210615614), max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 | Executing indexed scan for (max(-8419137646248370231), max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 | Executing indexed scan for (max(-8416786876632807845), max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing indexed scan for (max(-8315889933848495185), max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing indexed scan for (max(-8270922890152952193), max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 | Executing indexed scan for (max(-8260813759533312175), max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 | Executing indexed scan for (max(-8234845345932129353), max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 | Thanks, Jay -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade -- Tyler Hobbs DataStax http://datastax.com/
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Thanks folks for all your inputs! Yes, I totally agree that we need to have a custom column family for indexing. However, we're trying to upgrade our existing cluster from non-vnode to vnode, and queries using secondary indexes breaks badly which used to be good with non-vnode. Btw, there is no data in the table. Table is empty. Query is fired on the empty table. From the tracing ouput, I don't understand why it's doing multiple scans on one node. With non-vnode, there is only one scan per node same query works fine. If you look at the output1.txt attached earlier, coordinator is firing index scan on a given node (for example, 192.168.51.22 in the below snippet from output1.txt) multiple times for different token ranges. Why can't it fire only one time? With non-vnode, it's only one time query comes back very fast. Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | *192.168.51.22* | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | *192.168.51.22 *| Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing indexed scan for (max(-8929774302283364912), max(-8854653908608918942)] | 23:11:31,001 | *192.168.51.22* | On Fri, Sep 19, 2014 at 9:39 AM, Tyler Hobbs ty...@datastax.com wrote: Jon's advice is definitely still true, but in 2.1 there is https://issues.apache.org/jira/browse/CASSANDRA-1337, which parallelizes the fetching of ranges. On Fri, Sep 19, 2014 at 6:57 AM, Parag Patel ppa...@clearpoolgroup.com wrote: Agreed. We only use secondary indexes for column families that are relatively small (~5k rows). For anything larger, we store the data into a wide row (but this depends on your data model) -Original Message- From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of Jonathan Haddad Sent: Friday, September 19, 2014 4:01 AM To: user@cassandra.apache.org Subject: Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6). Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel pateljay3...@gmail.com wrote: Hi there, We are seeing extreme slow down (500ms to 1s) in query on secondary index with vnode. I'm seeing multiple secondary index scans on a given node in trace output when vnode is enabled. Without vnode, everything is good. Cluster size: 6 nodes Replication factor: 3 Consistency level: local_quorum. Same behavior happens with consistency level of ONE. Snippet from the trace output. Pls see the attached output1.txt for the full log. Are we hitting any bug? Do not understand why coordinator sends requests multiple times to the same node (e.g. 192.168.51.22 in below output) for different token ranges. Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing indexed scan for (max(-8929774302283364912), max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 | Executing indexed scan for (max(-8854653908608918942), max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 | Executing indexed scan for (max(-8762620856967633953), max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing indexed scan for (max(-8668275030769104047), max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 | Executing indexed scan for (max(-8659066486210615614), max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 | Executing indexed scan for (max(-8419137646248370231), max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 | Executing indexed scan for (max(-8416786876632807845), max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing indexed scan for (max(-8315889933848495185), max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing indexed scan for (max(-8270922890152952193), max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 | Executing indexed scan for (max(-8260813759533312175), max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 | Executing indexed scan for (max(-8234845345932129353), max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 | Thanks, Jay -- Jon Haddad http
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel pateljay3...@gmail.com wrote: Btw, there is no data in the table. Table is empty. Query is fired on the empty table. This is actually the worst case for secondary index lookups. From the tracing ouput, I don't understand why it's doing multiple scans on one node. With non-vnode, there is only one scan per node same query works fine. If you look at the output1.txt attached earlier, coordinator is firing index scan on a given node (for example, 192.168.51.22 in the below snippet from output1.txt) multiple times for different token ranges. Why can't it fire only one time? With non-vnode, it's only one time query comes back very fast. It will merge requests to neighboring ranges when the same node is a replica for both of them. Without vnodes, this usually results in all ranges for a node being merged. With vnodes, merging still happens, but not all ranges can be merged. -- Tyler Hobbs DataStax http://datastax.com/
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
It will merge requests to neighboring ranges when the same node is a replica for both of them. Without vnodes, this usually results in all ranges for a node being merged. With vnodes, merging still happens, but not all ranges can be merged. -- But does it implies that with vnodes, there are actually extra work to do for scanning indices ? If yes, is this extra load rather I/O bound or CPU bound ? On Fri, Sep 19, 2014 at 11:10 PM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel pateljay3...@gmail.com wrote: Btw, there is no data in the table. Table is empty. Query is fired on the empty table. This is actually the worst case for secondary index lookups. From the tracing ouput, I don't understand why it's doing multiple scans on one node. With non-vnode, there is only one scan per node same query works fine. If you look at the output1.txt attached earlier, coordinator is firing index scan on a given node (for example, 192.168.51.22 in the below snippet from output1.txt) multiple times for different token ranges. Why can't it fire only one time? With non-vnode, it's only one time query comes back very fast. It will merge requests to neighboring ranges when the same node is a replica for both of them. Without vnodes, this usually results in all ranges for a node being merged. With vnodes, merging still happens, but not all ranges can be merged. -- Tyler Hobbs DataStax http://datastax.com/
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan doanduy...@gmail.com wrote: But does it implies that with vnodes, there are actually extra work to do for scanning indices ? Yes. If yes, is this extra load rather I/O bound or CPU bound ? It doesn't necessarily change what the query is bound by, except perhaps in the case where you have almost no matching results. There are more messages to dispatch and handle. -- Tyler Hobbs DataStax http://datastax.com/
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan doanduy...@gmail.com wrote: But does it implies that with vnodes, there are actually extra work to do for scanning indices ? Vnodes are just nodes, so they have all the problems-associated-with-many-nodes one would get with 256x as many nodes. =Rob
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Thanks Tyler for the details. I'm still trying to understand what you described. Just to simplify my question what I don't understand: When coordinator fires indexed scan request to node 192.168.51.22, why don't it ask that node to check all of its (at least primary) ranges for the queried data, at once. Also, internally that node should be able to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], etc. ] Seems like it needs to query data in token order. So, min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22. But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)]) is on 192.168.51.25 so fire query there. Then, next range (max(- *9136021049555745100*), max(-8959555493872108621)] again on 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace output. I found below comment in https://issues.apache.org/jira/browse/CASSANDRA-4858. The problem is that we have to scan the nodes in token order so we dont break the existing API's, if we do so then we are sending a lot more requests and waiting for the response than the number of nodes. Don't understand the restriction though - don't break the existing API's. With non-vnode, it only queries a particular node only one time..Btw, in the worst case, I understand secondary index query has to scan all the nodes in cluster sometime (empty table or high cardinality index?) but I don't understand why vnode makes it to scan the *same node *multiple times. If RF is 1, then also I see this behavior. Snippet from output1.txt attached earlier: Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Great if you or someone can describe further. Thanks!! On Fri, Sep 19, 2014 at 2:33 PM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan doanduy...@gmail.com wrote: But does it implies that with vnodes, there are actually extra work to do for scanning indices ? Yes. If yes, is this extra load rather I/O bound or CPU bound ? It doesn't necessarily change what the query is bound by, except perhaps in the case where you have almost no matching results. There are more messages to dispatch and handle. -- Tyler Hobbs DataStax http://datastax.com/
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Thanks Robert for your intput but that sounds little crazy to me. Still physical node is the same so why can't it just do one indexed scan for all the contiguous or non-contiguous token ranges (vnodes) held by that physical node. I doubt that it needs to respect token order for some reason hence the multiple scans. Great if you or someone can help me clarify below doubts (in the context of trace output): When coordinator fires indexed scan request to node 192.168.51.22, why don't it ask that node to check all of its (at least primary) ranges for the queried data, at once. Also, internally that node should be able to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], and etc. ] Seems like it needs to query data in token order. So, min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22. But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)]) is on 192.168.51.25 so fire query there. Then, next range (max(- *9136021049555745100*), max(-8959555493872108621)] again on 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace output. I found below comment in https://issues.apache.org/jira/browse/CASSANDRA-4858. The problem is that we have to scan the nodes in token order so we dont break the existing API's, if we do so then we are sending a lot more requests and waiting for the response than the number of nodes. Don't understand the restriction though - don't break the existing API's. With non-vnode, it only queries a particular node only one time..Btw, in the worst case, I understand secondary index query has to scan all the nodes in cluster sometime (empty table or high cardinality index?) but I don't understand why vnode makes it to scan the *same node *multiple times. If RF is 1, then also I see this behavior. Snippet from output1.txt attached earlier: Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | On Fri, Sep 19, 2014 at 2:54 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan doanduy...@gmail.com wrote: But does it implies that with vnodes, there are actually extra work to do for scanning indices ? Vnodes are just nodes, so they have all the problems-associated-with-many-nodes one would get with 256x as many nodes. =Rob
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel pateljay3...@gmail.com wrote: When coordinator fires indexed scan request to node 192.168.51.22, why don't it ask that node to check all of its (at least primary) ranges for the queried data, at once. Also, internally that node should be able to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], etc. ] Seems like it needs to query data in token order. So, min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22. But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)]) is on 192.168.51.25 so fire query there. Then, next range (max(- *9136021049555745100*), max(-8959555493872108621)] again on 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace output. The coordinator certainly could batch multiple range requests that are going to the same replica. It's an optimization that would primarily help the empty table/high cardinality case, but you're welcome to open a ticket. 3.0 is the earliest this would make it in. I found below comment in https://issues.apache.org/jira/browse/CASSANDRA-4858. The problem is that we have to scan the nodes in token order so we dont break the existing API's, if we do so then we are sending a lot more requests and waiting for the response than the number of nodes. Don't understand the restriction though - don't break the existing API's. I think he's just saying that we have to make sure we return results in token order (and if there's a limit on the query, return the first N results when listed in token order). With non-vnode, it only queries a particular node only one time..Btw, in the worst case, I understand secondary index query has to scan all the nodes in cluster sometime (empty table or high cardinality index?) but I don't understand why vnode makes it to scan the *same node *multiple times. If RF is 1, then also I see this behavior. Snippet from output1.txt attached earlier: Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | I'm not sure how your question here is different from the one above. -- Tyler Hobbs DataStax http://datastax.com/
Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Thanks Tyler for clarification. I'll opened a tix CASSANDRA-7982 https://issues.apache.org/jira/browse/CASSANDRA-7982. For now, I've assigned to myself and put you as a reviewer. Pls. change assignment as you prefer.. Assume that we now batch the requests send only one request to the replica: What's the extra overhead incurred by vnode to process the secondary index request on the replica? In other words, does replica still has to fire individual queries internally for all the token ranges [(max(-9193352069377957523), max(-9136021049555745100), etc.], or it can be optimized to be done in one shot? If multiple queries, then how much overhead it adds? (in terms of latency because of multiple disk lookups, etc.?) Would you mind to point me C* code location (class/method) to explore more? Also, can you help understand what it means by min() and max() in the trace output? [min(-9223372036854775808), max(-9193352069377957523)] vs. (max(-8959555493872108621), max(-8929774302283364912)] Jay On Fri, Sep 19, 2014 at 3:28 PM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel pateljay3...@gmail.com wrote: When coordinator fires indexed scan request to node 192.168.51.22, why don't it ask that node to check all of its (at least primary) ranges for the queried data, at once. Also, internally that node should be able to just do one scan through all of the ranges held by it, isn't it? (e.g. [min(-9223372036854775808), max(-9193352069377957523), and (max(-9136021049555745100), max(-8959555493872108621)], etc. ] Seems like it needs to query data in token order. So, min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22. But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)]) is on 192.168.51.25 so fire query there. Then, next range (max(- *9136021049555745100*), max(-8959555493872108621)] again on 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace output. The coordinator certainly could batch multiple range requests that are going to the same replica. It's an optimization that would primarily help the empty table/high cardinality case, but you're welcome to open a ticket. 3.0 is the earliest this would make it in. I found below comment in https://issues.apache.org/jira/browse/CASSANDRA-4858. The problem is that we have to scan the nodes in token order so we dont break the existing API's, if we do so then we are sending a lot more requests and waiting for the response than the number of nodes. Don't understand the restriction though - don't break the existing API's. I think he's just saying that we have to make sure we return results in token order (and if there's a limit on the query, return the first N results when listed in token order). With non-vnode, it only queries a particular node only one time..Btw, in the worst case, I understand secondary index query has to scan all the nodes in cluster sometime (empty table or high cardinality index?) but I don't understand why vnode makes it to scan the *same node *multiple times. If RF is 1, then also I see this behavior. Snippet from output1.txt attached earlier: Executing indexed scan for [min(-9223372036854775808), max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing indexed scan for (max(-9193352069377957523), max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 | Executing indexed scan for (max(-9136021049555745100), max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 | Executing indexed scan for (max(-8959555493872108621), max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | I'm not sure how your question here is different from the one above. -- Tyler Hobbs DataStax http://datastax.com/