[ https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775361#comment-16775361 ]
Blake Eggleston edited comment on CASSANDRA-15027 at 2/22/19 5:34 PM: ---------------------------------------------------------------------- Nice. Your follow on changes look good to me, I have 2 nits, but those can just be fixed on commit. * We should log the session id in compaction manager when an anti-compaction is cancelled (and probably when there's an error as well) * Some error handling should be added to the commit fixing the race between proposeFuture and hasFailure so nodetool doesn't hang if there's an error in the callback edit: proposed fixes [here|https://github.com/bdeggleston/cassandra/commit/02d7d9e09983db0d4661486b17adc375e17be24f] was (Author: bdeggleston): Nice. Your follow on changes look good to me, I have 2 nits, but those can just be fixed on commit. * We should log the session id in compaction manager when an anti-compaction is cancelled (and probably when there's an error as well) * Some error handling should be added to the commit fixing the race between proposeFuture and hasFailure so nodetool doesn't hang if there's an error in the callback > Handle IR prepare phase failures less race prone by waiting for all results > --------------------------------------------------------------------------- > > Key: CASSANDRA-15027 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15027 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Local/Compaction > Reporter: Stefan Podkowinski > Assignee: Stefan Podkowinski > Priority: Major > Fix For: 4.x > > > Handling incremental repairs as a coordinator begins by sending a > {{PrepareConsistentRequest}} message to all participants, which may also > include the coordinator itself. Participants will run anti-compactions upon > receiving such a message and report the result of the operation back to the > coordinator. > Once we receive a failure response from any of the participants, we fail-fast > in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn > completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then > the repair command will terminate with an error status, as expected. > The issue is that in case the node will both be coordinator and participant, > we may end up with a local session and submitted anti-compactions, which will > be executed without any coordination with the coordinator session (on same > node). This may result in situations where running repair commands right > after another, may cause overlapping execution of anti-compactions that will > cause the following (misleading) message to show up in the logs and will > cause the repair to fail again: > "Prepare phase for incremental repair session %s has failed because it > encountered intersecting sstables belonging to another incremental repair > session (%s). This is by starting an incremental repair session before a > previous one has completed. Check nodetool repair_admin for hung sessions and > fix them." -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org