> On March 29, 2014, 12:26 a.m., kturner wrote: > > server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java, > > line 3328 > > <https://reviews.apache.org/r/19804/diff/1/?file=539927#file539927line3328> > > > > Seems like there is a possibility of deadlock here. > > > > 1. Master gets past upgradeZookeeper() > > 2. Client submits FATE op > > 3. Tablet server aborts copying walogs up > > 4. Master can not upgradeMetadata because log recovery is needed, > > stuck. > > > > This is assuming that what I said in prev comment about Fate starting > > after upgrade zookeeper is right. Need to confirm this. > > > > Some possible options: > > > > * prevent fate from starting until upgrade is complete > > * only abort if there are FATE txs and upgradeZookeeper() has not run. > > Would need to look for something that upgradeZookeeper() changes. > > * Don't delete walogs after copy if upgrade is not complete. However > > would need to delete later then. > > > > I'll think about this some more later.
New patch ensures Fate does not start until after upgrade is complete. - Sean ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19804/#review38972 ----------------------------------------------------------- On April 2, 2014, 6:06 a.m., Sean Busbey wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19804/ > ----------------------------------------------------------- > > (Updated April 2, 2014, 6:06 a.m.) > > > Review request for accumulo and kturner. > > > Bugs: ACCUMULO-2519 > https://issues.apache.org/jira/browse/ACCUMULO-2519 > > > Repository: accumulo > > > Description > ------- > > Adds "make sure Fate has no outstanding items" to the upgrade instructions. > Makes sure the master and tabletservers don't take upgrade steps if they see > fate ops waiting. > > > Diffs > ----- > > README 115a9b7 > server/src/main/java/org/apache/accumulo/server/Accumulo.java 99ec7e4 > server/src/main/java/org/apache/accumulo/server/master/Master.java 8c4c864 > > server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java > d76946d > server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java > 7328a55 > > Diff: https://reviews.apache.org/r/19804/diff/ > > > Testing > ------- > > Took a 1.4.5-SNAP cluster > > * loaded test data in a variety of table configs > * alternate table creation and deletion > * load additional table to cause !METADATA churn > * shutdown cluster uncleanly > * verified waiting Fate transactions (table deletion at success status) > * verified waiting local WALs > * verified waiting local WALs include !METADATA table (via LogReader) > * verified /accumulo/version showed 4 > * Start upgrade to 1.5.2-SNAP > * verified errors showing no upgrade and to go back to docs in: monitor, > master logs, tabletserver logs > * verified same waiting Fate transactions > * verified same waiting local WALs > * verified /accumulo/version showed 4 > * Cleared Fate operations > * Start upgrade to 1.5.2-SNAP > * wait a terrifying long amount of time, check on progress via local logs > * verify no errors shown for upgrade > * verified WALs copied to HDFS > * verified /accumulo/version showed 5 > * verified monitor showed normal start up > * wait for all tablets to be hosted > * verify test data > > > Thanks, > > Sean Busbey > >
