-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19804/#review38972
-----------------------------------------------------------



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71354>

    The comment could mention that fate has not been started.
    
    Could add a sanity check to ensure fate was not started.



server/src/main/java/org/apache/accumulo/server/master/Master.java
<https://reviews.apache.org/r/19804/#comment71353>

    I think this check can cause problems. Master.run()  starts StatusThread, 
StatusThread.run() will indirectly call upgradeMetadata().  After Master.run() 
starts StatusThread, it seems like it will start Fate and the client service.  
So its possible that a 1.5 client could submit a fate op before the 
upgradeMetadata() is called. 
    
    Also, this check is probably not needed.  upgradeZookeeper() should be 
called before upgradeMetadata().  Could add a sanity check for this.
    



server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java
<https://reviews.apache.org/r/19804/#comment71357>

    Seems like there is a possibility of deadlock here.
    
     1. Master gets past upgradeZookeeper()
     2. Client submits FATE op
     3. Tablet server aborts copying walogs up
     4. Master can not upgradeMetadata because log recovery is needed, stuck.
    
    This is assuming that what I said in prev comment about Fate starting after 
upgrade zookeeper is right.  Need to confirm this.
    
    Some possible options:
    
     * prevent fate from starting until upgrade is complete
     * only abort if there are FATE txs and upgradeZookeeper() has not run.  
Would need to look for something that upgradeZookeeper() changes.
     * Don't delete walogs after copy if upgrade is not complete.  However 
would need to delete later then.  
    
    I'll think about this some more later.


- kturner


On March 28, 2014, 9:22 p.m., Sean Busbey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19804/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 9:22 p.m.)
> 
> 
> Review request for accumulo and kturner.
> 
> 
> Bugs: ACCUMULO-2519
>     https://issues.apache.org/jira/browse/ACCUMULO-2519
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Adds "make sure Fate has no outstanding items" to the upgrade instructions. 
> Makes sure the master and tabletservers don't take upgrade steps if they see 
> fate ops waiting.
> 
> 
> Diffs
> -----
> 
>   README 115a9b7 
>   server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 
>   server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 
>   
> server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java
>  d76946d 
> 
> Diff: https://reviews.apache.org/r/19804/diff/
> 
> 
> Testing
> -------
> 
> Took a 1.4.5-SNAP cluster
> 
> * triggered compactions
> * shutdown cluster
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Start upgrade to 1.5.2-SNAP
> * verified errors showing no upgrade and to go back to docs in: monitor, 
> master logs, tabletserver logs
> * verified waiting transactions
> * verified waiting local WALs
> * verified /accumulo/version showed 4
> * Cleared Fate operations
> * Start upgrade to 1.5.2-SNAP
> * verify no errors shown for upgrade
> * verified WALs copied to HDFS
> * verified /accumulo/version showed 5
> * verified monitor showed normal start up
> 
> Running verify job on existing data now. should take ~6 hours. 
> 
> 
> Thanks,
> 
> Sean Busbey
> 
>

Reply via email to