[ https://issues.apache.org/jira/browse/CASSANDRA-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640295#comment-16640295 ]
Blake Eggleston commented on CASSANDRA-14804: --------------------------------------------- [~chovatia.jayd...@gmail.com] I’m not sure how we'd get to the state in t2. We wait for an hour on a semaphore we instantiate in {{prepareForRepair}}, and {{removeParentRepairSession}} is synchronized on the object monitor. One shouldn’t block the other. I think the jstack in the description is missing the thread where the {{ActiveRepairService}} monitor is being held. > Running repair on multiple nodes in parallel could halt entire repair > ---------------------------------------------------------------------- > > Key: CASSANDRA-14804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14804 > Project: Cassandra > Issue Type: Bug > Components: Repair > Reporter: Jaydeepkumar Chovatia > Priority: Major > Fix For: 3.0.18 > > > Possible deadlock if we run repair on multiple nodes at the same time. We > have come across a situation in production in which if we repair multiple > nodes at the same time then repair hangs forever. Here are the details: > Time t1 > {{node-1}} has issued repair command to {{node-2}} but due to some reason > {{node-2}} didn't receive request hence {{node-1}} is awaiting at > [prepareForRepair > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333] > for 1 hour *with lock* > Time t2 > {{node-2}} sent prepare repair request to {{node-1}}, some exception > occurred on {{node-1}} and it is trying to cleanup parent session > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L172] > but {{node-1}} cannot get lock as 1 hour of time has not yet elapsed (above > one) > snippet of jstack on {{node-1}} > {quote}"Thread-888" #262588 daemon prio=5 os_prio=0 waiting on condition > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for (a java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332) > - locked <> (a org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:214) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > "AntiEntropyStage:1" #1789 daemon prio=5 os_prio=0 waiting for monitor entry > [] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421) > - waiting to lock <> (a org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$9/864248990.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748){quote} > Time t3: > {{node-2}}(and possibly other nodes {{node-3}}…) sent [prepare request > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/ActiveRepairService.java#L333] > to {{node-1}}, but {{node-1}}’s AntiEntropyStage thread is busy awaiting for > lock at {{ActiveRepairService.removeParentRepairSession}}, hence {{node-2}}, > {{node-3}} (and possibly other nodes) will also go in 1 hour wait *with > lock*. This rolling effect continues and stalls repair in entire ring. > If we totally stop triggering repair then system would recover slowly but > here are the two major problems with this: > 1. Externally there is no way to decide whether to trigger new repair or > wait for system to recover > 2. In this case system recovers eventually but it takes probably {{n}} hours > where n = #of repair requests fired, only way to come out of this situation > is either to do a rolling restart of entire ring or wait for {{n}} hours > before triggering new repair request > Please let me know if my above analysis makes sense or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org