[ https://issues.apache.org/jira/browse/CASSANDRA-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974462#comment-13974462 ]
Jackson Chung commented on CASSANDRA-6415: ------------------------------------------ I ran into the stuck issue on 1.2.10 Upgraded to 1.2.16, I could see repair is not "stuck", in a sense I see multiple repair sessions/stages started and finished. But, in the end (after waiting a long time), I see that there is no more activity from the log, and also compactionstats/netstats, but yet the tpstats still show Active and Pending count in the stages: AntiEntropyStage 1 2 5073 0 0 AntiEntropySessions 1 1 44 0 0 > Snapshot repair blocks for ever if something happens to the "I made my > snapshot" response > ----------------------------------------------------------------------------------------- > > Key: CASSANDRA-6415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6415 > Project: Cassandra > Issue Type: Bug > Reporter: Jeremiah Jordan > Assignee: Yuki Morishita > Labels: repair > Fix For: 1.2.13, 2.0.4 > > Attachments: 6415-1.2.txt > > > The "snapshotLatch.await();" can be waiting for ever and block all repair > operations indefinitely if something happens that another node doesn't > respond. > {noformat} > public void makeSnapshots(Collection<InetAddress> endpoints) > { > try > { > snapshotLatch = new CountDownLatch(endpoints.size()); > IAsyncCallback callback = new IAsyncCallback() > { > public boolean isLatencyForSnitch() > { > return false; > } > public void response(MessageIn msg) > { > RepairJob.this.snapshotLatch.countDown(); > } > }; > for (InetAddress endpoint : endpoints) > MessagingService.instance().sendRR(new > SnapshotCommand(tablename, cfname, sessionName, false).createMessage(), > endpoint, callback); > snapshotLatch.await(); > snapshotLatch = null; > } > catch (InterruptedException e) > { > throw new RuntimeException(e); > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)