[ 
https://issues.apache.org/jira/browse/CASSANDRA-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742989#comment-17742989
 ] 

Caleb Rackliffe commented on CASSANDRA-18656:
---------------------------------------------

After more testing and code diving, I think the proposal I've outlined isn't 
going to be viable for legacy 2i. Because it relies on iterating over the 
partition keys included in newly streamed SSTables, then reading data from the 
backing table before finally sending updates to the index, which is just a 
hidden table, the new SSTables *must* be added to the backing table's live set. 
Even if we were to create some kind of intermediate SSTable state before 
{{LIVE}} for indexing to read from, the legacy 2i is implemented as a hidden 
table. Updating it is just writing to its Memtable, and it would be able to 
accept queries incrementally while the backing SSTables were still no in the 
view of the base {{ColumnFamilyStore}}.

SAI is a different story. There, we already have per-SSTable index building 
support, and in some cases the SSTable-attached indexes are even streamed in an 
already-built state. I'm going to shift gears to focus on fixing SAI 
specifically here, although hopefully in a way that could easily be applied to 
any index implementation supporting incremental/per-SSTable builds.

> Ensure SSTable streaming transactions do not commit before building attached 
> secondary indexes
> ----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18656
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18656
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Streaming, Feature/2i Index, Feature/SAI, 
> Local/Startup and Shutdown
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.x
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Back in 2015, we identified in CASSANDRA-10130 a case where failures in 2i 
> builds after SSTable streaming could leave indexes in a partially built 
> state, even after a restart, requiring manual operator intervention. There, 
> and in CASSANDRA-13725, we made an attempt to remedy this situation, ensuring 
> that indexes would at least be rebuilt on restart after this kind of failure. 
> However, there are some difficulties the solution there does not address.
> Let's look at a simple example...
> Suppose an SSTable has been streamed to a node, and that node arrives in 
> {{CassandraStreamReceiver#finished()}}. We'll call {{finishTransaction()}} to 
> make the presence of the new SSTables durable, and then we'll call 
> {{ColumnFamilyStore#addSStables()}}, which add the table to the {{Tracker}}, 
> making it available for reads. We then notify listeners about the new 
> SSTable, among them the {{SecondaryIndexManager}}, which will do a blocking 
> index build for the new SSTable. Conceptually, at this point, we already have 
> a problem (if a transient one), as there are live SSTables that have not been 
> indexed.
> What if the 2i build fails, though? Let's assume it fails because of a 
> disorderly (or orderly!) node shutdown. Some index implementations (SASI, 
> SAI) might be able to rebuild incrementally, but the legacy 2i has no way of 
> doing this right now. A full index rebuild on a large table could take a very 
> long time (days, weeks, etc.) and is ultimately not a viable way to proceed. 
> Let's say we were able to build incrementally though, and we had an SAI index 
> that did exactly this on node restart. We would still have a gap in 
> availability, because on startup, {{ColumnFamilyStore}} (see constructor) 
> does not block on its calls to {{SecondaryIndexManager#addIndex()}}, which, 
> via {{createIndex()}} actuate the index building process.
> Of course, SAI implements a notion of "queryability" that would quickly take 
> the node out of rotation for queries across the cluster. Once its 
> initialization task runs on restart, the indexes in question would 
> immediately be marked non-queryable. SAI builds incrementally, and might be 
> able to block startup to do so in this case. Legacy 2i cannot reasonably do 
> this though.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to