[jira] [Comment Edited] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins

2015-10-07 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947501#comment-14947501
 ] 

Joel Knighton edited comment on CASSANDRA-10205 at 10/7/15 10:05 PM:
-

I'm +1 on C* patch and dtest patch. {{markDead}} makes sense for nodes that 
have LEFT because we consider LEFT a dead state elsewhere in Gossip.

A note for anyone following: in the dtest, we've switched from hardstopping the 
node to stopping the node gracefully. This is fine since it exercises the same 
code path, since LEFT is a dead state so shutting it down gracefully will leave 
it LEFT in gossip, and we'll go through {{markDead}}.

Sorry for how long this was stuck in limbo.


was (Author: jkni):
I'm +1 on C* patch and dtest patch. `markDead` makes sense for nodes that have 
LEFT because we consider LEFT a dead state elsewhere in Gossip.

A note for anyone following: in the dtest, we've switched from hardstopping the 
node to stopping the node gracefully. This is fine since it exercises the same 
code path, since LEFT is a dead state so shutting it down gracefully will leave 
it LEFT in gossip, and we'll go through `markDead`.

Sorry for how long this was stuck in limbo.

> decommissioned_wiped_node_can_join_test fails on Jenkins
> 
>
> Key: CASSANDRA-10205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10205
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.0.0 rc2
>
> Attachments: decommissioned_wiped_node_can_join_test.tar.gz
>
>
> This test passes locally but reliably fails on Jenkins. It seems after we 
> restart node4, it is unable to Gossip with other nodes:
> {code}
> INFO  [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.2
> INFO  [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.1
> INFO  [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.3
> ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception 
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at 
> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:687)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:570)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) 
> [main/:na]
> WARN  [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 
> - No local state or state is in silent shutdown, not announcing shutdown
> {code}
> It seems both the addresses and port number of the seeds are correct so I 
> don't think the problem is the Amazon private addresses but I might be wrong. 
> It's also worth noting that the first time the node starts up without 
> problems. The problem only occurs during a restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins

2015-09-08 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736048#comment-14736048
 ] 

Stefania edited comment on CASSANDRA-10205 at 9/9/15 2:56 AM:
--

Third CI run also successful. We need a reviewer for the C* patch. 

Here is a recap: the fix for the dtest is to add {{wait_other_notice}} when 
stopping the node after decommissioning (else the test would be flacky). We 
also need the C* patch to mark the node as dead when stopping a decommissioned 
node or else:
* {{wait_other_notice}} will hang because the {{is now DOWN}} notification is 
missing from the logs and 
* the sockets between processes are not closed so when the node is restarted it 
doesn't receive GOSSIP replies.

Once the review is OK we need to back-port the C* patch to 2.0+ since the test 
fails on all branches.


was (Author: stefania):
Third CI run also successful. We need a reviewer for the C* patch. 

Here is a recap: the fix for the dtest is to add {{wait_other_notice}} when 
stopping the node after decommissioning (else the test would be flacky). We 
also need the C* patch to mark the node as dead when stopping a decommissioned 
node or else:
* {{wait_other_notice}] will hang because the {{is now DOWN}} notification is 
missing from the logs and 
* the sockets between processes are not closed so when the node is restarted it 
doesn't receive GOSSIP replies.

Once the review is OK we need to back-port the C* patch to 2.0+ since the test 
fails on all branches.

> decommissioned_wiped_node_can_join_test fails on Jenkins
> 
>
> Key: CASSANDRA-10205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10205
> Project: Cassandra
>  Issue Type: Test
>Reporter: Stefania
>Assignee: Stefania
> Attachments: decommissioned_wiped_node_can_join_test.tar.gz
>
>
> This test passes locally but reliably fails on Jenkins. It seems after we 
> restart node4, it is unable to Gossip with other nodes:
> {code}
> INFO  [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.2
> INFO  [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.1
> INFO  [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.3
> ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception 
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at 
> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:687)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:570)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) 
> [main/:na]
> WARN  [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 
> - No local state or state is in silent shutdown, not announcing shutdown
> {code}
> It seems both the addresses and port number of the seeds are correct so I 
> don't think the problem is the Amazon private addresses but I might be wrong. 
> It's also worth noting that the first time the node starts up without 
> problems. The problem only occurs during a restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)