[ https://issues.apache.org/jira/browse/CASSANDRA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634964#comment-17634964 ]
Caleb Rackliffe commented on CASSANDRA-16052: --------------------------------------------- Had a quick chat w/ [~mikea] earlier today, and we've come up w/ a phased plan for getting the SAI components reviewed, into the feature branch, and delivered. The following will all likely correspond to their own Jiras attached to this epic: *Phase 1 - Index API and Memtable Indexing* The simplest component of SAI we can break off first, review, and test is the Memtable-adjacent index and SAI's integration w/ the C* 2i API. When this phase is complete, we should be able to create indexes on text and numeric data and query those indexes while the base table data still resides in memory. *Phase 2 - SSTable Indexing Tools and On-Disk Format for Text Indexing* With the 2i integration and Memtable indexes working, we can introduce the on-disk components that make SSTable indexing possible generally (index building, result collation between Memtable indexes and SSTable indexes, SSTable-level shared data for multiple indexed columns) and the first user of those components, the disk-based trie that supports text indexing. When this phase is complete, we should have an end-to-end solution for basic text indexing. *Phase 3 - On-Disk Format for Numeric Indexing* With the general tools that support SSTable indexing complete in phase 2, we can add the on-disk format for numeric indexing. With this phase complete, we'll have end-to-end support for numeric equality and range queries. *Phase 4 - Harry* Whether we've already developed a model for testing generic indexing/filtering or not, at the conclusion of phase 3, we'll want to figure out the best way for Harry to exercise SAI. ([~ifesdjeen] and I have had some preliminary discussion around this.) This is ordered after the first 3 phases in a gatekeeping sense, but given SAI is just an indexing _implementation_, work on a Harry model could happen before or concurrently with them. *Phase 5 - LIKE Support and Statement Restriction Cleanup* At the conclusion of phase 4, we should have a solid working version of SAI that supports basic numeric and text indexing. However, we may still want to build support for text prefix queries via the {{LIKE}} operator to get to rough feature parity w/ SASI. (Whether we need suffix/contains/full text regex support is more debatable.) Also, there are some superficial bits of cleanup we may need to do in CQL space around when certain boolean queries in SAI (like in SASI) require {{ALLOW FILTERING}} even when the query only restrictions on indexed columns. > CEP-7 Storage Attached Indexes > ------------------------------ > > Key: CASSANDRA-16052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16052 > Project: Cassandra > Issue Type: Epic > Components: Feature/2i Index > Reporter: Zhao Yang > Assignee: Caleb Rackliffe > Priority: Normal > Fix For: 4.x > > Time Spent: 40m > Remaining Estimate: 0h > > [CEP|https://docs.google.com/document/d/1V830eAMmQAspjJdjviVZIaSolVGvZ1hVsqOLWyV0DS4/edit#heading=h.67ap6rr1mxr] > - A new index implementation, called Storage > Attached Index(SAI), based on the advancement made by SASI. > * disk usage by sharing of common data between multiple column indexes on > the same table and better compression of on-disk structures. > * numeric range query performance with modified KDTree and collection type > support. > * compaction performance and stability for larger data set. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org