[ https://issues.apache.org/jira/browse/CASSANDRA-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742593#comment-17742593 ]
Caleb Rackliffe commented on CASSANDRA-18656: --------------------------------------------- Pushed more progress on this today [here|https://github.com/apache/cassandra/pull/2477]. My goal is to have something reviewable by tomorrow/Friday... > Ensure SSTable streaming transactions do not commit before building attached > secondary indexes > ---------------------------------------------------------------------------------------------- > > Key: CASSANDRA-18656 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18656 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Streaming, Feature/2i Index, Feature/SAI, > Local/Startup and Shutdown > Reporter: Caleb Rackliffe > Assignee: Caleb Rackliffe > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Back in 2015, we identified in CASSANDRA-10130 a case where failures in 2i > builds after SSTable streaming could leave indexes in a partially built > state, even after a restart, requiring manual operator intervention. There, > and in CASSANDRA-13725, we made an attempt to remedy this situation, ensuring > that indexes would at least be rebuilt on restart after this kind of failure. > However, there are some difficulties the solution there does not address. > Let's look at a simple example... > Suppose an SSTable has been streamed to a node, and that node arrives in > {{CassandraStreamReceiver#finished()}}. We'll call {{finishTransaction()}} to > make the presence of the new SSTables durable, and then we'll call > {{ColumnFamilyStore#addSStables()}}, which add the table to the {{Tracker}}, > making it available for reads. We then notify listeners about the new > SSTable, among them the {{SecondaryIndexManager}}, which will do a blocking > index build for the new SSTable. Conceptually, at this point, we already have > a problem (if a transient one), as there are live SSTables that have not been > indexed. > What if the 2i build fails, though? Let's assume it fails because of a > disorderly (or orderly!) node shutdown. Some index implementations (SASI, > SAI) might be able to rebuild incrementally, but the legacy 2i has no way of > doing this right now. A full index rebuild on a large table could take a very > long time (days, weeks, etc.) and is ultimately not a viable way to proceed. > Let's say we were able to build incrementally though, and we had an SAI index > that did exactly this on node restart. We would still have a gap in > availability, because on startup, {{ColumnFamilyStore}} (see constructor) > does not block on its calls to {{SecondaryIndexManager#addIndex()}}, which, > via {{createIndex()}} actuate the index building process. > Of course, SAI implements a notion of "queryability" that would quickly take > the node out of rotation for queries across the cluster. Once its > initialization task runs on restart, the indexes in question would > immediately be marked non-queryable. SAI builds incrementally, and might be > able to block startup to do so in this case. Legacy 2i cannot reasonably do > this though. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org