Andrés de la Peña created CASSANDRA-8717:
--------------------------------------------

             Summary: Top-k queries with custom secondary indexes
                 Key: CASSANDRA-8717
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Andrés de la Peña
            Priority: Minor
             Fix For: 2.1.3
         Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch

As presented in [Cassandra Summit Europe 
2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
modified to support general top-k queries with minimum changes in Cassandra 
codebase. This way, custom 2i implementations could provide relevance search, 
sorting by columns, etc.

Top-k queries retrieve the k best results for a certain query. That implies 
querying the k best rows in each token range and then sort them in order to 
obtain the k globally best rows. 

For doing that, we propose two additional methods in class 
SecondaryIndexSearcher:

{code:java}
public boolean requiresFullScan(List<IndexExpression> clause)
{
    return false;
}

public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
{
    return rows;
}
{code}

The first one indicates if a query performed in the index requires querying all 
the nodes in the ring. It is necessary in top-k queries because we do not know 
which node are the best results. The second method specifies how to sort all 
the partial node results according to the query. 

Then we add two similar methods to the class AbstractRangeCommand:

{code:java}
    this.searcher = 
Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);

public boolean requiresFullScan() {
    return searcher == null ? false : searcher.requiresFullScan(rowFilter);
}

public List<Row> combine(List<Row> rows)
{
    return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
}
{code}

Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
shown in the attached patch.

We think that the proposed approach provides very useful functionality with 
minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to