[ 
https://issues.apache.org/jira/browse/CASSANDRA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346387#comment-14346387
 ] 

Stefania edited comment on CASSANDRA-7816 at 3/4/15 4:40 AM:
-------------------------------------------------------------

Submitting a patch for 2.0, cassandra_7816.txt.

The duplicate DOWN notification is caused by 
{{Gossiper.handleMajorStateChange}} passing the remote endpoint state to 
{{StorageService.onRestart}}, which then incorrectly comes to the conclusion 
that the node was not previously marked down. I changed it to receive the local 
state, if not null. If it is null we do not call {{onRestart}}, please confirm 
that this does not introduce problems (I checked all {{onStart}} 
implementations and it looks OK to me).

The multiple UP notifications are caused by the call to {{markAlive()}} in 
{{Gossiper.applyStateLocally()}} when receiving multiple gossip messages. 
Because {{markAlive()}} only marks the node as alive after receiving an echo 
message (CASSANDRA-3533), there is a delay during which the node is still not 
marked as alive. If gossip messages are received during this period, we 
incorrectly call {{markAlive()}} multiple times in {{applyStateLocally()}}. I 
fixed it by adding a flag to {{EndpointState}} and by checking this flag in 
{{markAlive}}, if an echo is outstanding then we do not send another one until 
we've received an answer. When there is a major change, {{markAlive()}} is 
called on the remote state, for which this flag is not set and so we try againg 
sending an echo message in mark alive even if we did not receive a reply to a 
previous echo request.


was (Author: stefania):
Submitting a patch for 2.0.

The duplicate DOWN notification is caused by 
{{Gossiper.handleMajorStateChange}} passing the remote endpoint state to 
{{StorageService.onRestart}}, which then incorrectly comes to the conclusion 
that the node was not previously marked down. I changed it to receive the local 
state, if not null. If it is null we do not call {{onRestart}}, please confirm 
that this does not introduce problems (I checked all {{onStart}} 
implementations and it looks OK to me).

The multiple UP notifications are caused by the call to {{markAlive()}} in 
{{Gossiper.applyStateLocally()}} when receiving multiple gossip messages. 
Because {{markAlive()}} only marks the node as alive after receiving an echo 
message (CASSANDRA-3533), there is a delay during which the node is still not 
marked as alive. If gossip messages are received during this period, we 
incorrectly call {{markAlive()}} multiple times in {{applyStateLocally()}}. I 
fixed it by adding a flag to {{EndpointState}} and by checking this flag in 
{{markAlive}}, if an echo is outstanding then we do not send another one until 
we've received an answer. When there is a major change, {{markAlive()}} is 
called on the remote state, for which this flag is not set and so we try againg 
sending an echo message in mark alive even if we did not receive a reply to a 
previous echo request.

> Duplicate DOWN/UP Events Pushed with Native Protocol
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7816
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7816
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Michael Penick
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 2.0.13, 2.1.4
>
>         Attachments: cassandra_7816.txt, tcpdump_repeating_status_change.txt, 
> trunk-7816.txt
>
>
> Added "MOVED_NODE" as a possible type of topology change and also specified 
> that it is possible to receive the same event multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to