[ https://issues.apache.org/jira/browse/CASSANDRA-13797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398846#comment-16398846 ]
Blake Eggleston commented on CASSANDRA-13797: --------------------------------------------- Agreed this should be a new ticket. Maybe the right fix is to backport CASSANDRA-13521 with concurrent validations set to something reasonable? I think the incorrect assumption of this ticket that's causing problems was that the validation executor was bounded, which obviously isn't the case. > RepairJob blocks on syncTasks > ----------------------------- > > Key: CASSANDRA-13797 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13797 > Project: Cassandra > Issue Type: Bug > Components: Repair > Reporter: Blake Eggleston > Assignee: Blake Eggleston > Priority: Major > Fix For: 3.0.15, 3.11.1, 4.0 > > > The thread running {{RepairJob}} blocks while it waits for the validations it > starts to complete ([see > here|https://github.com/bdeggleston/cassandra/blob/9fdec0a82851f5c35cd21d02e8c4da8fc685edb2/src/java/org/apache/cassandra/repair/RepairJob.java#L185]). > However, the downstream callbacks (ie: the post-repair cleanup stuff) aren't > waiting for {{RepairJob#run}} to return, they're waiting for a result to be > set on RepairJob the future, which happens after the sync tasks have > completed. This post repair cleanup stuff also immediately shuts down the > executor {{RepairJob#run}} is running in. So in noop repair sessions, where > there's nothing to stream, I'm seeing the callbacks sometimes fire before > {{RepairJob#run}} wakes up, and causing an {{InterruptedException}} is thrown. > I'm pretty sure this can just be removed, but I'd like a second opinion. This > appears to just be a holdover from before repair coordination became async. I > thought it might be doing some throttling by blocking, but each repair > session gets it's own executor, and validation is throttled by the fixed > size executors doing the actual work of validation, so I don't think we need > to keep this around. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org