Hinted handoffs isn't delivered if/when HintedHandOffManager ends up in invalid state. --------------------------------------------------------------------------------------
Key: CASSANDRA-3546 URL: https://issues.apache.org/jira/browse/CASSANDRA-3546 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.3 Reporter: Fredrik L Stigbäck Running Cassandra 1.0.3. I've done some testing with 2 nodes (node A, node B), replication factor 2. I take node A down, writing some data to node B and then take node A up. Sometimes hints aren't delivered when node A comes up. I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and sometimes node B ends up in a strange state in method org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already has node A in it's Set and therefore no hints will ever be delivered to node A. The only reason for this that I can see is that in org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A) isn't removed from org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints will ever be delivered again until node B is restarted. During what conditions will hintStore.isEmpty() return true? Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, removing the endpoint from queuedDeliveries in the finally block? {code} public void deliverHints(final InetAddress to) { logger_.debug("deliverHints to {}", to); if (!queuedDeliveries.add(to)) return; ....... } {code} {code} private void deliverHintsToEndpoint(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, TimeoutException, InterruptedException { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); if (hintStore.isEmpty()) return; // nothing to do, don't confuse users by logging a no-op handoff try { ...... } finally { queuedDeliveries.remove(endpoint); } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira