[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906001#comment-15906001 ] Arijit commented on CASSANDRA-13308: My workaround for now is to delete hint files for a node before starting Cassandra and running "nodetool decommission" on it (since it is taking quite long). Does that sound legitimate? > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903097#comment-15903097 ] Aleksey Yeschenko commented on CASSANDRA-13308: --- We don't need to. I guess reusing {{completeDispatchBlockingly}} there was chosen as an option to simplify dealing with leftovers, to avoid the race between hints still replaying and dropping the files for the departing node. What we minimally need to do is to cancel blockingly - rather than wait for completion - and then remove the leftovers (excise). > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902609#comment-15902609 ] Jeff Jirsa commented on CASSANDRA-13308: {code} Thread 28548: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame) - java.util.concurrent.FutureTask.awaitDone(boolean, long) @bci=165, line=429 (Compiled frame) - java.util.concurrent.FutureTask.get() @bci=13, line=191 (Compiled frame) - org.apache.cassandra.hints.HintsDispatchExecutor.completeDispatchBlockingly(org.apache.cassandra.hints.HintsStore) @bci=22, line=112 (Interpreted frame) - org.apache.cassandra.hints.HintsService.excise(java.util.UUID) @bci=75, line=323 (Interpreted frame) - org.apache.cassandra.service.StorageService.excise(java.util.Collection, java.net.InetAddress) @bci=35, line=2229 (Interpreted frame) - org.apache.cassandra.service.StorageService.excise(java.util.Collection, java.net.InetAddress, long) @bci=9, line=2242 (Interpreted frame) - org.apache.cassandra.service.StorageService.handleStateLeft(java.net.InetAddress, java.lang.String[]) @bci=58, line=2146 (Interpreted frame) - java.util.concurrent.ConcurrentHashMap.get(java.lang.Object) @bci=1, line=936 (Compiled frame) - org.apache.cassandra.gms.Gossiper.getEndpointStateForEndpoint(java.net.InetAddress) @bci=5, line=817 (Compiled frame) - org.apache.cassandra.service.StorageService.onChange(java.net.InetAddress, org.apache.cassandra.gms.ApplicationState, org.apache.cassandra.gms.VersionedValue) @bci=418, line=1685 (Compiled frame) - org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(java.net.InetAddress, org.apache.cassandra.gms.ApplicationState, org.apache.cassandra.gms.VersionedValue) @bci=38, line=1200 (Compiled frame) - org.apache.cassandra.gms.Gossiper.applyNewStates(java.net.InetAddress, org.apache.cassandra.gms.EndpointState, org.apache.cassandra.gms.EndpointState) @bci=164, line=1183 (Compiled frame) - org.apache.cassandra.gms.Gossiper.applyStateLocally(java.util.Map) @bci=366, line=1146 (Compiled frame) - org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(org.apache.cassandra.net.MessageIn, int) @bci=143, line=58 (Compiled frame) - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=82, line=67 (Compiled frame) - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame) - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=745 (Compiled frame) {code} {{excise}} [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L287-L327] attempts to complete the running dispatch if it exists (for example, if the host was just down, but came up, and hint delivery is in progress), even though that endpoint is going away (was just decom'd). [~iamaleksey] - I'm not very familiar with this code - are we really gaining much from this? Do we need to block trying to deliver hints we know aren't going to be deliverable, risking getting into this situation where we're blocking waiting for {{isHostAlive()}} to finally fail (which won't happen if Gossip is blocked and thus FD won't kick in), when the very next thing we do is {{exciseStore()}}? > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, th
[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902506#comment-15902506 ] Arijit commented on CASSANDRA-13308: The stack and "logs" were for a non-leaving node. The "logs_decommissioned_node" file was for the leaving node. If you look at the timestamps, you will see that on 06:04:33, the leaving node says DECOMMISSIONED, but the "logs" file shows hinted handoff occurring at 07:01:43. The host id in the hints file corresponds to that of the leaving node. And you are correct! The cluster had a history of stopping Cassandra on nodes for a while before starting and running "nodetool decommission" on them. I believe this was done a few times before, and it caused the same condition described above at least twice. The nodes might have been done for several hours before the decommission. > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902439#comment-15902439 ] Jeff Jirsa commented on CASSANDRA-13308: Definitely not 12281. I'm not sure how you're getting 3G of hints on 2G of data. The stack+both logs you uploaded were for the leaving node, yes? Had you recently decommissioned another node in the recent'ish past (before you decommissioned this node)? > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902369#comment-15902369 ] Arijit commented on CASSANDRA-13308: Thanks for looking into this! The cluster is 10 nodes in size, with about 2 GB of metadata on each node right now. Although surprisingly, when this happened yesterday, I saw that nodes on average had 500 GB of hints for the decommissioned node with one node storing 3 GB of hints. I don't think there were any range movements happening. I would guess that this is not CASSANDRA-12281, since I don't see the stack trace for that bug in my jstack output. I've attached the jstack output (relevant threads from what I could figure out are 28548 and 5832) and a snippet of the log messages during this time. I didn't think to look at `nodetool netstats`, but it looked like hinted handoff was happening, albeit slowly (a 100 MB file was getting replayed every 30 minutes according to logs, even though the node was decommissioned). The streaming for decommission must have completed, from the fact that logs on the node said it was DECOMMISSIONED? > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > Attachments: 28207.stack, logs > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902244#comment-15902244 ] Jeff Jirsa commented on CASSANDRA-13308: Just tried a trivial repro using 3.0.11 and ccm, didn't reproduce (not really surprising). How big (approximately) is the cluster? Do you have any other range movements happening at the same time? Can you post the jstack with the blocked gossip thread? How much data did you have on the nodes? Did the streams actually finish (do you see the streams complete in {{nodetool netstats}} ) ? We've seen some other recent bugs where gossip gets blocked (CASSANDRA-12281, for example), so I'm curious if you can still reproduce on 3.0.11. > Hint files not being deleted on nodetool decommission > - > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 >Reporter: Arijit > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)