[ 
https://issues.apache.org/jira/browse/CASSANDRA-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974462#comment-13974462
 ] 

Jackson Chung commented on CASSANDRA-6415:
------------------------------------------

I ran into the stuck issue on 1.2.10

Upgraded to 1.2.16, I could see repair is not "stuck", in a sense I see 
multiple repair sessions/stages started and finished.

But, in the end (after waiting a long time), I see that there is no more 
activity from the log, and also compactionstats/netstats, but yet the tpstats 
still show Active and Pending count in the stages:

AntiEntropyStage                  1         2           5073         0          
       0
AntiEntropySessions               1         1             44         0          
       0


> Snapshot repair blocks for ever if something happens to the "I made my 
> snapshot" response
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6415
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6415
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Yuki Morishita
>              Labels: repair
>             Fix For: 1.2.13, 2.0.4
>
>         Attachments: 6415-1.2.txt
>
>
> The "snapshotLatch.await();" can be waiting for ever and block all repair 
> operations indefinitely if something happens that another node doesn't 
> respond.
> {noformat}
>             public void makeSnapshots(Collection<InetAddress> endpoints)
>             {
>                 try
>                 {
>                     snapshotLatch = new CountDownLatch(endpoints.size());
>                     IAsyncCallback callback = new IAsyncCallback()
>                     {
>                         public boolean isLatencyForSnitch()
>                         {
>                             return false;
>                         }
>                         public void response(MessageIn msg)
>                         {
>                             RepairJob.this.snapshotLatch.countDown();
>                         }
>                     };
>                     for (InetAddress endpoint : endpoints)
>                         MessagingService.instance().sendRR(new 
> SnapshotCommand(tablename, cfname, sessionName, false).createMessage(), 
> endpoint, callback);
>                     snapshotLatch.await();
>                     snapshotLatch = null;
>                 }
>                 catch (InterruptedException e)
>                 {
>                     throw new RuntimeException(e);
>                 }
>             }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to