[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798219#comment-13798219
 ] 

Akshay Chander commented on ZOOKEEPER-1674:
-------------------------------------------

I am working with Thawan on this feature. I'd appreciate comments and 
suggestions for the analysis done so far.

Retaining the database across leader election should improve the recovery time 
after leader election. In order to support such a feature, the following 
changes will be required to ensure that the existing behavior is maintained. 
 
1) Anything that has reached the PrepRequestProcessor should make it to the 
SyncRequestProcessor. Similarly, anything that has reached the commitProcessor 
should eventually reach the FinalRequestProcessor. To maintain this invariant:
 
a) Currently, we drop the database and reload from disk (snapshot + txnlog). We 
can effectively mimic this behavior in one of two ways.
    i) We retain outstandingProposals and toBeApplied (in the case of leader) 
or pendingTxns (in the case of followers) across the leader election.
           We will apply the txns in these data structures to the data tree 
before calling getInitLastLoggedZxid in lookForLeader()). This will ensure that 
the lastSeenZxid sent by the participant during the leader election will remain 
the same as before this feature.
    ii) Alternatively, we could apply these txns to the data tree during the 
shutdown phase. This way, we dont need to do the extra work of persisting these 
data structures across leader elections.
 
b) During shutdown, we should ensure that all appends to the txnlog have 
actually been flushed to the disk.
 
c) By retaining the zkDataBase, we will also be retaining the 
sessionsWithTimeouts, which is a listing of global sessions. We need to ensure 
that this is now clean after the leader election.
        Leader: If there is an upgrade request for a session (from local to 
global), we add it to the global session tracker. Since this is going to 
persist across leader election, we need to ensure that the txn corresponding to 
this createSession is present in atleast the txnlog.
            Therefore we need to ensure that requests that are in the 
PrepRequestProcessor make their way to the SyncRequestProcessor even if there 
is a shutdown at any point in between.
 
d) Ensure that anything in the FinalRequestProcessor gets applied to the Data 
Tree.
 
2) Don't take a dirty snapshot. We don't want txns that haven't been accepted 
by a majority of the quorum to be part of any snapshot. Currently, we take 
snapshots on shutdown and in loadData, which we will stop doing.
 
3) In followers, there is a bug in the local session code. When there is an 
upgrade request,  we currently remove the session from the local session 
tracker and add it to globalSessionWithTimeouts in the local request processor 
itself (checkUpgradeSession)
We probably should not add it to the global sessions just yet and let it be 
done in the final request processor.
 
4) Another small bug: In learnerSessionTracker::touchSession, currently if a 
session is not in the localSessionTracker and not a global session, then we 
return false. this should not be the case any longer.
    This is because we may have removed the session from the local session 
tracker for an upgrade request. So just add it to the touchTable and return 
true.
 
This analysis was done on our internal branch which is based of 3.4. Therefore, 
we haven't investigated how this feature would be affected by  the Dynamic 
Reconfiguration feature.

> There is no need to clear & load the database across leader election
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1674
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1674
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Jacky007
>
> It is interesting to notice the piece of codes in QuorumPeer.java
>  /* ZKDatabase is a top level member of quorumpeer 
>      * which will be used in all the zookeeperservers
>      * instantiated later. Also, it is created once on 
>      * bootup and only thrown away in case of a truncate
>      * message from the leader
>      */
>     private ZKDatabase zkDb;
> It is introduced by ZOOKEEPER-596. Now, we just drop the database every 
> leader election.
> We can keep it safely with ZOOKEEPER-1549.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to