[jira] Issue Comment Edited: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices

Stu Hood (JIRA) Thu, 14 Oct 2010 22:15:02 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921230#action_12921230
 ]


Stu Hood edited comment on CASSANDRA-1600 at 10/15/10 1:14 AM:
---------------------------------------------------------------

> applies row_filter during a row scan
I'm willing to say that this should be "considered harmful". Without a toggle 
to disable the RPC_TIMEOUT, we'd be setting people up for their unbounded scans 
to succeed in testing, and then cause cascading failures in production by going 
into retry loops that scan (70 MB * 10s) of data before timing out.

Indexed scans are safe, in that the worst case is that you don't match anything 
in your index, and you have to get empty results from every node in your 
cluster. Empty results are cheap.

EDIT: And yes, I realize that our current scheme for boolean operations between 
clauses ends up reverting to a scan, but that is fixable via a merge join of 
the indexes (or 1472), which would preserve the safety I mention.

      was (Author: stuhood):
    > applies row_filter during a row scan
I'm willing to say that this should be "considered harmful". Without a toggle 
to disable the RPC_TIMEOUT, we'd be setting people up for their unbounded scans 
to succeed in testing, and then cause cascading failures in production by going 
into retry loops that scan (70 MB * 10s) of data before timing out.

Indexed scans are safe, in that the worst case is that you don't match anything 
in your index, and you have to get empty results from every node in your 
cluster. Empty results are cheap.
  
> Merge get_indexed_slices with get_range_slices
> ----------------------------------------------
>
>                 Key: CASSANDRA-1600
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Stu Hood
>             Fix For: 0.7.0
>
>         Attachments: 
> 0001-Add-optional-IndexClause-to-KeyRange-and-serialize-w.patch, 
> 0002-Drop-the-IndexClause.count-parameter.patch, 
> 0003-Execute-RangeSliceCommands-using-scan-when-an-IndexC.patch, 
> 0004-Remove-get_indexed_slices-method.patch, 
> 0005-Update-system-tests-to-use-get_range_slices.patch, 
> 0006-Remove-start_key-from-IndexClause-for-the-start_key-.patch, 
> 0007-Respect-end_key-for-filtered-queries.patch
>
>
> From a comment on 1157:
> {quote}
> IndexClause only has a start key for get_indexed_slices, but it would seem 
> that the reasoning behind using 'KeyRange' for get_range_slices applies there 
> as well, since if you know the range you care about in the primary index, you 
> don't want to continue scanning until you exhaust 'count' (or the cluster).
> Since it would appear that get_indexed_slices would benefit from a KeyRange, 
> why not smash get_(range|indexed)_slices together, and make IndexClause an 
> optional field on KeyRange?
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices

Reply via email to