Caleb Rackliffe created CASSANDRA-18656:
-------------------------------------------

             Summary: Ensure SSTable streaming transactions do not commit 
before building attached secondary indexes
                 Key: CASSANDRA-18656
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18656
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Streaming, Feature/2i Index, Feature/SAI, 
Local/Startup and Shutdown
            Reporter: Caleb Rackliffe
            Assignee: Caleb Rackliffe


Back in 2015, we identified in CASSANDRA-10130 a case where failures in 2i 
builds after SSTable streaming could leave indexes in a partially built state, 
even after a restart, requiring manual operator intervention. There, and in 
CASSANDRA-13725, we made an attempt to remedy this situation, ensuring that 
indexes would at least be rebuilt on restart after this kind of failure. 
However, there are some difficulties the solution there does not address.

Let's look at a simple example...

Suppose an SSTable has been streamed to a node, and that node arrives in 
{{CassandraStreamReceiver#finished()}}. We'll call {{finishTransaction()}} to 
make the presence of the new SSTables durable, and then we'll call 
{{ColumnFamilyStore#addSStables()}}, which add the table to the {{Tracker}}, 
making it available for reads. We then notify listeners about the new SSTable, 
among them the {{SecondaryIndexManager}}, which will do a blocking index build 
for the new SSTable. Conceptually, at this point, we already have a problem (if 
a transient one), as there are live SSTables that have not been indexed.

What if the 2i build fails, though? Let's assume it fails because of a 
disorderly (or orderly!) node shutdown. Some index implementations (SASI, SAI) 
might be able to rebuild incrementally, but the legacy 2i has no way of doing 
this right now. A full index rebuild on a large table could take a very long 
time (days, weeks, etc.) and is ultimately not a viable way to proceed. Let's 
say we were able to build incrementally though, and we had an SAI index that 
did exactly this on node restart. We would still have a gap in availability, 
because on startup, {{ColumnFamilyStore}} (see constructor) does not block on 
its calls to {{SecondaryIndexManager#addIndex()}}, which, via {{createIndex()}} 
actuate the index building process.

Of course, SAI implements a notion of "queryability" that would quickly take 
the node out of rotation for queries across the cluster. Once its 
initialization task runs on restart, the indexes in question would immediately 
be marked non-queryable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to