[ https://issues.apache.org/jira/browse/ACCUMULO-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner updated ACCUMULO-1243: ----------------------------------- Fix Version/s: 1.4.4 > Multiple assignment may occur if tablet server dies during split > ---------------------------------------------------------------- > > Key: ACCUMULO-1243 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1243 > Project: Accumulo > Issue Type: Bug > Affects Versions: 1.4.0 > Reporter: Keith Turner > Priority: Critical > Fix For: 1.5.0, 1.4.4 > > > Make the following change to the tablet server code. The tablet server has > to die at this exact point for the bug to occur. > {noformat} > Index: src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java > =================================================================== > --- src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java > (revision 1464780) > +++ src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java > (working copy) > @@ -3592,6 +3592,9 @@ > MetadataTable.splitTablet(high, extent.getPrevEndRow(), splitRatio, > SecurityConstants.getSystemCredentials(), tabletServer.getLock()); > MetadataTable.addNewTablet(low, lowDirectory, > tabletServer.getTabletSession(), lowDatafileSizes, bulkLoadedFiles, > SecurityConstants.getSystemCredentials(), time, lastFlushID, > lastCompactID, tabletServer.getLock()); > + > + Runtime.getRuntime().halt(2); > + > MetadataTable.finishSplit(high, highDatafileSizes, > highDatafilesToRemove, SecurityConstants.getSystemCredentials(), > tabletServer.getLock()); > > log.log(TLevel.TABLET_HIST, extent + " split " + low + " " + high); > {noformat} > Then create a table and add a split. > {noformat} > root@test15> createtable foo > root@test15 foo> addsplits -t foo m > {noformat} > If there are multiple tablet servers, then its possible that multiple > assignment may occur. Below is an example of this occurring after tablets > were loaded. > {noformat} > root@test15 !METADATA> scan -b 1 -c loc > 1;m loc:13d5a86463f4f98 [] 127.0.0.1:9998 > 1;m loc:13d5a86463f4f9f [] 127.0.0.1:10000 > 1< loc:13d5a86463f4f98 [] 127.0.0.1:9998 > {noformat} > The problem is that the assignment code in the tserver detects an incomplete > split and load both children. However, the master may also assign one of > the children. > I think the assignment code should be modified to fix up the metadata table > and only load one tablet. If the new tablet was not created, it should roll > back the changes and load the pre split tablets. If the new tablet was > created, then assume the master will assign it and only load the high tablet. > I think these changes would greatly simplify the code also. > I do not think the proposed changes would cause issues with merge, since the > chop flag is deleted in the case where this occurs. > Need to ensure that the solution is itself fault tolerant. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira