[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes and range scans of small tables
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-1337: --- Description: currently, we read the indexed rows (and rows for a range scan) from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). was: currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). parallelize fetching rows for low-cardinality indexes and range scans of small tables - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337-v4.patch, 1337-v5.patch, 1337.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows (and rows for a range scan) from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes and range scans of small tables
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-1337: --- Summary: parallelize fetching rows for low-cardinality indexes and range scans of small tables (was: parallelize fetching rows for low-cardinality indexes) parallelize fetching rows for low-cardinality indexes and range scans of small tables - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337-v4.patch, 1337-v5.patch, 1337.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-1337: --- Attachment: (was: 0001-Concurrent-range-and-2ary-index-subqueries.patch) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-1337: --- Attachment: 1337-v5.patch 1337-v5.patch includes your cleanup (with the fix I mentioned) and adds a 10% margin to the estimate along with switching to Math.ceil(). Added dest is here: https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-1337 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-1337: --- Attachment: (was: 1337-v5.patch) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-1337: --- Attachment: 1337-v5.patch 1337-v5.patch includes your cleanup (along with the minor fix I mentioned), adds a 10% margin, and uses Math.ceil(). Added dtest is here: https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-1337 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, 1337-v5.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-1337: --- Attachment: 0001-Concurrent-range-and-2ary-index-subqueries.patch Patch 0001 is a somewhat re-worked version of David's patch against trunk. Breaking down estimates by the filter type was neglecting all non-compact cql3 queries (and some compact ones). A simpler way to estimate the number of results is to break down the queries into four groups: * Secondary index query: use the mean columns from the index CF (one column per data row or cql3 data row) * Non-cql3 range query: use the estimated keys * cql3 range query, compact (compact storage or single-component primary key): use the estimated keys * cql3 range query, non-compact: use (estimated_keys * mean_columns) / (number_of_defined_columns) The last case is the least accurate. When collections are involved, it will overestimate the number of cql3 rows that will be returned, meaning additional ranges may need to be queried, but I think this is an acceptable optimization degradation. Another change from David's patch is that if an insufficient number of results are fetched by the first round, the concurrency factor will be recalculated based on the results we've seen so far instead of simply being set to 1. I wasn't sure where to add tests for this; would dtests be the best place? parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 0001-Concurrent-range-and-2ary-index-subqueries.patch, 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Reviewer: Jonathan Ellis (was: Sylvain Lebresne) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Tyler Hobbs Priority: Minor Fix For: 2.1 Attachments: 0001-Concurrent-range-and-2ary-index-subqueries.patch, 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Fix Version/s: (was: 2.0) 2.1 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Priority: Minor Fix For: 2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Assignee: (was: David Alves) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Priority: Minor Fix For: 2.0 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Fix Version/s: (was: 1.2.1) 2.0 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 2.0 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Fix Version/s: (was: 1.2.1) 1.2.0 rc1 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2.0 rc1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Fix Version/s: (was: 1.2.0 rc1) 1.2.1 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Reviewer: slebresne (was: vijay2...@yahoo.com) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated CASSANDRA-1337: --- Attachment: 1337-v4.patch Corrected 2ndary index cfactor calc and (slightly) optimized gathering rows. parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2.1 Attachments: 1137-bugfix.patch, 1337.patch, 1337-v4.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated CASSANDRA-1337: --- Attachment: 1337.patch Clean rehash that addressed Sylvain's (very helpful comments) including implementing for the CQL3 case. It estimated concurrency factor the following ways: - Primary Indexes + Thrift - divides cfs by RF - 2ndary indexes + Thrift - uses the mean col count of the most selective index to estimate the number of keys - CQL3 + IdentityFilter - uses the estimated keys + mean col count to estimate cols per node - CQL3 + Names filter - assumes cols with names are present and uses estimated keys to calculate cols per node - CQL3 - Other filters - as sylvain mentioned because we have no idea on the selectivity of the col filter we cannot estimate how many cols will be returned per node so we revert to concurrecy factor = 1. Reimplemented parallel the parallel execution part to make it a lot cleaner IMO (previous implementation was adapting sequential execution which made it difficult to read) cql_test.py dtest is failing in the same place as trunk ,need to look into it to make sure Sylvain's dtest passes parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2.1 Attachments: 1137-bugfix.patch, 1337.patch, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Priority: Minor (was: Major) Fix Version/s: (was: 1.2.0) 1.2.1 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2.1 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 1137-bugfix.patch, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated CASSANDRA-1337: --- Attachment: 1137-bugfix.patch patch that addresses the bugs raised by sylvain. (StoragProxyTest and cql_test.py both pass) namely: - local path counts as one less handler - enough check moven out of the remote branch - estimatedKeysPerRange take into account replication factor - columns.maxIsColumns sets concurrency to 1 still working on the dtest that proves (or disproves that this works) I'd like to move the rest of the issues raised by sylvain to another ticket. parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Fix For: 1.2 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, 1137-bugfix.patch, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-1337: Priority: Major (was: Minor) (changing the priority to major because basically I think that currently it does more harm than good, and we should either fix it before release or revert it) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Fix For: 1.2 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Alves updated CASSANDRA-1337: --- Attachment: CASSANDRA-1337.patch direct rework of Jake Luciani's original patch applied to trunk. untested. just to get the discussion going parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Reviewer: vijay2...@yahoo.com parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: David Alves Priority: Minor Fix For: 1.2 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt, CASSANDRA-1337.patch Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-1337: Fix Version/s: (was: 0.7.2) 0.7.3 parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.3 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-1337: -- Attachment: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.1 Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-1337: -- Remaining Estimate: 24h Original Estimate: 24h parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.1 Original Estimate: 24h Remaining Estimate: 24h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-1337: -- Remaining Estimate: 8h (was: 24h) Original Estimate: 8h (was: 24h) parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.1 Original Estimate: 8h Remaining Estimate: 8h currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1337: -- Description: currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). was: currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another query (but, if our estimate is incorrect, we do need to loop and do a 2nd query). parallelize fetching rows for low-cardinality indexes - Key: CASSANDRA-1337 URL: https://issues.apache.org/jira/browse/CASSANDRA-1337 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Priority: Minor Fix For: 0.7.1 currently, we read the indexed rows from the first node (in partitioner order); if that does not have enough matching rows, we read the rows from the next, and so forth. we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel, such that we have a high chance of getting enough rows w/o having to do another round of queries (but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough data or we have fetched from each node). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.