[ https://issues.apache.org/jira/browse/CASSANDRA-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621555#comment-16621555 ]
Jeff Jirsa edited comment on CASSANDRA-14674 at 9/20/18 6:07 AM: ----------------------------------------------------------------- {quote}Is this causing any problems for you? An occasional hung thread isn't ideal, but I don't think this affects correctness.{quote} I'm not sure if this is the scenario I recall, but there exists at least one scenario like this (in 2.1) where you end up unable to run any repairs until you bounce the host, because the {{executor}} in {{ActiveRepairService}} is started with [4|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L94] threads, and if you get 4 stuck threads, you're broken forever. was (Author: jjirsa): {quote}Is this causing any problems for you? An occasional hung thread isn't ideal, but I don't think this affects correctness.{quote} I'm not sure if this is the scenario I recall, but there exists at least one scenario like this (in 2.1) where you end up unable to run any repairs until you bounce the host, because {{ActiveRepairService}} is started with [4|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L94] threads, and if you get 4 stuck threads, you're broken forever. > Repair Validation message request could get stuck forever at sender side > ------------------------------------------------------------------------ > > Key: CASSANDRA-14674 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14674 > Project: Cassandra > Issue Type: Bug > Components: Repair > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Major > > Validation request message as part of repair are currently sent as > [sendOneWay|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/ValidationTask.java#L56] > and then it waits at > [Futures.getUnchecked|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/RepairJob.java#L160]. > If sender doesn’t hear back from receiver for whatever reason then thread is > blocked forever. I’ve reproduced following stack trace at sender side by > deliberately ignoring > [VALIDATION_REQUST|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L114] > at receiver side. > {quote}"Repair#1:1" #301 daemon prio=5 os_prio=0 tid=0x00007f5a62060800 > nid=0x13198 waiting on condition [0x00007f5a5cc6c000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > parking to wait for <0x00000005c6ba9630> (a > com.google.common.util.concurrent.AbstractFuture$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137) > at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509) > at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1858015030.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:745) > {quote} > AFAIK we should be using {{sendRR}} for this instead of {{sendOneWay}}. > Please let me know if my understanding is correct or not. > I am working on a fix to make it {{sendRR}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org