[jira] [Comment Edited] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

Blake Eggleston (JIRA) Fri, 22 Feb 2019 09:45:04 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775361#comment-16775361
 ]


Blake Eggleston edited comment on CASSANDRA-15027 at 2/22/19 5:34 PM:
----------------------------------------------------------------------

Nice. Your follow on changes look good to me, I have 2 nits, but those can just 
be fixed on commit.

* We should log the session id in compaction manager when an anti-compaction is 
cancelled (and probably when there's an error as well)
* Some error handling should be added to the commit fixing the race between 
proposeFuture and hasFailure so nodetool doesn't hang if there's an error in 
the callback

edit: proposed fixes 
[here|https://github.com/bdeggleston/cassandra/commit/02d7d9e09983db0d4661486b17adc375e17be24f]


was (Author: bdeggleston):
Nice. Your follow on changes look good to me, I have 2 nits, but those can just 
be fixed on commit.
 
* We should log the session id in compaction manager when an anti-compaction is 
cancelled (and probably when there's an error as well)
* Some error handling should be added to the commit fixing the race between 
proposeFuture and hasFailure so nodetool doesn't hang if there's an error in 
the callback

> Handle IR prepare phase failures less race prone by waiting for all results
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15027
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair, Local/Compaction
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>            Priority: Major
>             Fix For: 4.x
>
>
> Handling incremental repairs as a coordinator begins by sending a 
> {{PrepareConsistentRequest}} message to all participants, which may also 
> include the coordinator itself. Participants will run anti-compactions upon 
> receiving such a message and report the result of the operation back to the 
> coordinator.
> Once we receive a failure response from any of the participants, we fail-fast 
> in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn 
> completes the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then 
> the repair command will terminate with an error status, as expected.
> The issue is that in case the node will both be coordinator and participant, 
> we may end up with a local session and submitted anti-compactions, which will 
> be executed without any coordination with the coordinator session (on same 
> node). This may result in situations where running repair commands right 
> after another, may cause overlapping execution of anti-compactions that will 
> cause the following (misleading) message to show up in the logs and will 
> cause the repair to fail again:
>  "Prepare phase for incremental repair session %s has failed because it 
> encountered intersecting sstables belonging to another incremental repair 
> session (%s). This is by starting an incremental repair session before a 
> previous one has completed. Check nodetool repair_admin for hung sessions and 
> fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

Reply via email to