[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829949#comment-16829949
 ] 

Shawn Heisey commented on ZOOKEEPER-2348:
-----------------------------------------

I will admit that trying to trace the description of what our user has said and 
the description here is making my head hurt.  But it sounds to me like their 
situation and the one described here are at least similar if not identical.  
Solr is running the ZK 3.4.13 client.  The version info from the user is "For 
context this a cluster running Solr 7.7.1 and ZooKeeper 3.4.13 (being monitored 
by Exhibitor 1.7.1)."  So I think they're running 3.4.13 on the server side as 
well.

Here's the detailed scenario we got:

  *   We have three ZooKeeper nodes: A, B, and C. A is the leader of the 
ensemble.
  *   ZooKeeper A becomes partitioned from ZooKeeper B and C and the Solr tier.
  *   Some Solr nodes log “zkclient has disconnected” warnings and ZooKeeper A 
expires some Solr client sessions due to timeouts. The partition between 
Zookeeper A and the Solr tier ends and Solr nodes that were connected to 
ZooKeeper A attempt to renew their sessions but are told their sessions have 
expired. [1]
     *   Note that I’m simplifying: some nodes that were connected to ZooKeeper 
A were able to move their sessions to ZooKeeper B/C before their session 
expired. [2]
  *   ZooKeeper A realizes it is not synced with ZooKeeper B and C and closes 
connections with Solr nodes and, apparently, remains partitioned from B/C.
  *   ZooKeeper B and C eventually elect ZooKeeper B as the leader and start 
accepting writes requests as they form a quorum.
  *   Solr nodes previously connected to ZooKeeper that had their sessions 
expire now connect to ZooKeeper B and C, they successfully publish their state 
as DOWN, and then attempt to write to /live_nodes to signal that they’re 
reconnected to ZooKeeper.
  *   The writes of the ephemeral znodes to /live_nodes fail with NodeExists 
exceptions [3]. The failed writes are logged on ZooKeeper B. [4]
     *   It looks like a failure mode of “leader becomes partitioned and 
ephemeral znode deletions are not processed by followers” is documented on 
ZOOKEEPER-2348<https://jira.apache.org/jira/browse/ZOOKEEPER-2348>.
  *   ZooKeeper A eventually rejoins the ensemble and the /live_nodes entries 
that expired after the initial partition are removed when session expirations 
are reprocessed on the new leader (ZooKeeper B) [5]
  *   The Solr nodes whose attempts at writing to /live_nodes failed never try 
again and remain in the GONE state for 6+ hours.

I think there's probably some work we can do in Solr to improve how we manage 
the ephemeral node creation so it's more robust.

> Data between leader and followers are not synchronized.
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-2348
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2348
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.1
>            Reporter: Echo Chen
>            Priority: Major
>
> When client session expired, leader tried to remove it from session map and 
> remove its EPHEMERAL znode, for example, /test_znode. This operation succeed 
> on leader, but at the very same time, network fault happended and not synced 
> to followers, a new leader election launched. After leader election finished, 
> the new leader is not the old leader. we found the znode /test_znode still 
> existed in the followers but not on leader
>  *Scenario :* 
> 1) Create znode E.g.  
> {{/rmstore/ZKRMStateRoot/RMAppRoot/application_1449644945944_0001/appattempt_1449644945944_0001_000001}}
> 2) Delete Znode. 
> 3) Network fault b/w follower and leader machines
> 4) leader election again and follower became leader.
> Now data is not synced with new leader..After this client is not able to same 
> znode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to