[ https://issues.apache.org/jira/browse/CASSANDRA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850492#action_12850492 ]
Stu Hood commented on CASSANDRA-749: ------------------------------------ > This isn't possible the "pretend index is a supercolumn row" approach. I'm not sure that I understand why... can you give an example? The key in the pseudo CF would be the original indexed value, and each top level column in the index row would be a row from the base (from one node), so filtering within the base row could be applied locally on each node. > multiget(rowpredicate, columnpredicate)* The rowpredicate containing an "index scan" parameter is very interesting, and does clarify slow operations. But, I can easily image a situation where someone wanted to use both a "named keys" and "index scan" rowpredicate at once, which would still be very efficient, but which would require a list<rowpredicate>. I agree that placing the "index scan" predicate in the first position in the method call is essential, which is why I suggested the pseudo-CF api: ---- An interesting parallel is to compare the proposed api to Python's array slicing syntax, which is extremely elegant. I imagine that our ideal API is one that allows either named keys or a key range at every level of nesting. The following paragraphs only refer to key/name slicing, and don't go into 'value' queries. As long as you concretely define a key or range of keys to search for at each level (such as [key1:key5][name1:name2][subname5]), your operation can run in bounded time. But, to provide for more flexibility, the get_range_slices method in the current API allows something like: [ ? ][name5] The question mark represents an unbounded level, which may mean a full table scan without finding 'subname5' (very dangerous, not scalable). This is one of the places where we need secondary indexes: we want columns containing _any_ value for subname5 bunched together into an index. Comparing to the Python array API highlights the fact that prefix searches are always safe, and that by always having a parent predicate, you achieve bounded time operations. This is why placing the "index scan" predicate in the first position is so clear. ---- This brings us back to the pseudo-CF api: why have 3 types of rowpredicates, and 2+ types of columnpredicates when, by asking users to define views that shuffle their data into a form that allows for prefix queries, we can do something like: multiget(list<predicate> predicates) ... with a predicate (key range or key list) required for every level, and only the last level allowing an unbounded predicate. With this API, the "named keys" + "index scan" query I pointed out above would look like (with an indexed 'age' column): multiget( [ predicate(key is 27), predicate(name in [ben, george]), predicate(subname is any) ] ) > Secondary indices for column families > ------------------------------------- > > Key: CASSANDRA-749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-749 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Gary Dusbabek > Assignee: Gary Dusbabek > Priority: Minor > Fix For: 0.8 > > Attachments: 0001-simple-secondary-indices.patch, > views-discussion-2.txt, views-discussion.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.