[jira] [Comment Edited] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947501#comment-14947501 ] Joel Knighton edited comment on CASSANDRA-10205 at 10/7/15 10:05 PM: - I'm +1 on C* patch and dtest patch. {{markDead}} makes sense for nodes that have LEFT because we consider LEFT a dead state elsewhere in Gossip. A note for anyone following: in the dtest, we've switched from hardstopping the node to stopping the node gracefully. This is fine since it exercises the same code path, since LEFT is a dead state so shutting it down gracefully will leave it LEFT in gossip, and we'll go through {{markDead}}. Sorry for how long this was stuck in limbo. was (Author: jkni): I'm +1 on C* patch and dtest patch. `markDead` makes sense for nodes that have LEFT because we consider LEFT a dead state elsewhere in Gossip. A note for anyone following: in the dtest, we've switched from hardstopping the node to stopping the node gracefully. This is fine since it exercises the same code path, since LEFT is a dead state so shutting it down gracefully will leave it LEFT in gossip, and we'll go through `markDead`. Sorry for how long this was stuck in limbo. > decommissioned_wiped_node_can_join_test fails on Jenkins > > > Key: CASSANDRA-10205 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10205 > Project: Cassandra > Issue Type: Sub-task >Reporter: Stefania >Assignee: Stefania > Fix For: 3.0.0 rc2 > > Attachments: decommissioned_wiped_node_can_join_test.tar.gz > > > This test passes locally but reliably fails on Jenkins. It seems after we > restart node4, it is unable to Gossip with other nodes: > {code} > INFO [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.2 > INFO [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.1 > INFO [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.3 > ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception > encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at > org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:570) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) > [main/:na] > WARN [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 > - No local state or state is in silent shutdown, not announcing shutdown > {code} > It seems both the addresses and port number of the seeds are correct so I > don't think the problem is the Amazon private addresses but I might be wrong. > It's also worth noting that the first time the node starts up without > problems. The problem only occurs during a restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736048#comment-14736048 ] Stefania edited comment on CASSANDRA-10205 at 9/9/15 2:56 AM: -- Third CI run also successful. We need a reviewer for the C* patch. Here is a recap: the fix for the dtest is to add {{wait_other_notice}} when stopping the node after decommissioning (else the test would be flacky). We also need the C* patch to mark the node as dead when stopping a decommissioned node or else: * {{wait_other_notice}} will hang because the {{is now DOWN}} notification is missing from the logs and * the sockets between processes are not closed so when the node is restarted it doesn't receive GOSSIP replies. Once the review is OK we need to back-port the C* patch to 2.0+ since the test fails on all branches. was (Author: stefania): Third CI run also successful. We need a reviewer for the C* patch. Here is a recap: the fix for the dtest is to add {{wait_other_notice}} when stopping the node after decommissioning (else the test would be flacky). We also need the C* patch to mark the node as dead when stopping a decommissioned node or else: * {{wait_other_notice}] will hang because the {{is now DOWN}} notification is missing from the logs and * the sockets between processes are not closed so when the node is restarted it doesn't receive GOSSIP replies. Once the review is OK we need to back-port the C* patch to 2.0+ since the test fails on all branches. > decommissioned_wiped_node_can_join_test fails on Jenkins > > > Key: CASSANDRA-10205 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10205 > Project: Cassandra > Issue Type: Test >Reporter: Stefania >Assignee: Stefania > Attachments: decommissioned_wiped_node_can_join_test.tar.gz > > > This test passes locally but reliably fails on Jenkins. It seems after we > restart node4, it is unable to Gossip with other nodes: > {code} > INFO [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.2 > INFO [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.1 > INFO [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.3 > ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception > encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at > org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:570) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) > [main/:na] > WARN [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 > - No local state or state is in silent shutdown, not announcing shutdown > {code} > It seems both the addresses and port number of the seeds are correct so I > don't think the problem is the Amazon private addresses but I might be wrong. > It's also worth noting that the first time the node starts up without > problems. The problem only occurs during a restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)