Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/180#discussion_r102229805
  
    --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java ---
    @@ -303,15 +305,38 @@ public void loadData() throws IOException, 
InterruptedException {
     
         public void takeSnapshot(){
    --- End diff --
    
    We now have a lot of potential paths to take a snapshot.  JMX, 
SyncRequestProcessor, Admin, and as part of the sync stage in ZABP.  In the 
past it was just the SyncRequestProcessor and ZABP that would trigger 
snapshots.  We didn't have much concern about collisions, because 
SyncRequestProcessor would only run as edits were being sent, and the edits 
would only be sent after the ZABP sync phase had completed.  Now we have Admin 
and JMX possibly doing a snapshot in the background.  
    
    Now if a snapshot request comes in during the ZABP sync phase after we have 
clear out the in memory DB and not yet applied the new snapshot and then we 
crash before we can write out the new snapshot we could end up with data 
corruption.  This should be super super rare so I don't really know if we care 
all that much about it, but I think it is something that we can fix.
    
    I am not an expert on the code, so I am not sure of the cleanest way to fix 
this, but it feels like having maybeTakeSnapshot be a part of the 
SyncRequestProcessor instead of ZooKeeperServer.  I don't know how simple it is 
to get to the SyncRequestProcessor from JMX/Admin but if you can then it means 
that we can reset the edit count so we are not taking a lot of snapshots all at 
once, one right after another.  It also means that we can truly avoid having 
more then one snapshot be taken at any time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to