[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes

Todd Nine (JIRA) Mon, 29 Aug 2011 19:15:03 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093344#comment-13093344
 ]


Todd Nine edited comment on CASSANDRA-2915 at 8/30/11 2:13 AM:
---------------------------------------------------------------

I think forcing users to install classes for common use cases would cause 
issues with adoption.  What about creating new CQL commands to handle this?  
When creating an index in a db, you would define the fields and the manner in 
which they are indexed.  Could we do something like the following?


create index on [colname] in [colfamily] using [index type 1] as 
[indexFieldName], [index type 2] as [indexFieldName], [index type n] as 
[indexFieldName]?

drop index [indexFieldName] in [colfamily] on [colname]



This way clients such as JPA can update and create indexes, without the need to 
install custom classes on Cassandra itself.  They also have the ability to 
directly reference the field name when using CQL queries.

Assuming that the index class types exist in the Lucene classpath, you get the 
1 to many mappings for column to indexing strategy.  This would allow more 
advanced clients such as the JPA plugin to automatically add indexes to the 
document based on indexes defined on persistent fields, without generating any 
code the user has to install in the Cassandra runtime.  If users want to 
install custom analyzers, they still have the option to do so, and would gain 
access to it via CQL.

      was (Author: tnine):
    I think forcing users to install classes for common use cases would cause 
issues with adoption.  What about creating new CQL commands to handle this?  
When creating an index in a db, you would define the fields and the manner in 
which they are indexed.  Could we do something like the following?


create index [colname] in [colfamily] using [index type 1] as [indexFieldName], 
[index type 2] as [indexFieldName], [index type n] as [indexFieldName]?

drop index [indexFieldName] in [colfamily] on [colname]



This way clients such as JPA can update and create indexes, without the need to 
install custom classes on Cassandra itself.  They also have the ability to 
directly reference the field name when using CQL queries.

Assuming that the index class types exist in the Lucene classpath, you get the 
1 to many mappings for column to indexing strategy.  This would allow more 
advanced clients such as the JPA plugin to automatically add indexes to the 
document based on indexes defined on persistent fields, without generating any 
code the user has to install in the Cassandra runtime.  If users want to 
install custom analyzers, they still have the option to do so, and would gain 
access to it via CQL.
  
> Lucene based Secondary Indexes
> ------------------------------
>
>                 Key: CASSANDRA-2915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: Jason Rutherglen
>              Labels: secondary_index
>
> Secondary indexes (of type KEYS) suffer from a number of limitations in their 
> current form:
>    - Multiple IndexClauses only work when there is a subset of rows under the 
> highest clause
>    - One new column family is created per index this means 10 new CFs for 10 
> secondary indexes
> This ticket will use the Lucene library to implement secondary indexes as one 
> index per CF, and utilize the Lucene query engine to handle multiple index 
> clauses. Also, by using the Lucene we get a highly optimized file format.
> There are a few parallels we can draw between Cassandra and Lucene.
> Lucene indexes segments in memory then flushes them to disk so we can sync 
> our memtable flushes to lucene flushes. Lucene also has optimize() which 
> correlates to our compaction process, so these can be sync'd as well.
> We will also need to correlate column validators to Lucene tokenizers, so the 
> data can be stored properly, the big win in once this is done we can perform 
> complex queries within a column like wildcard searches.
> The downside of this approach is we will need to read before write since 
> documents in Lucene are written as complete documents. For random workloads 
> with lot's of indexed columns this means we need to read the document from 
> the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes

Reply via email to