[ 
https://issues.apache.org/jira/browse/ACCUMULO-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Turner updated ACCUMULO-1243:
-----------------------------------

    Assignee: Keith Turner
    
> Multiple assignment may occur if tablet server dies during split
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-1243
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1243
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>            Priority: Critical
>             Fix For: 1.5.0, 1.4.4
>
>
> Make the following change to the tablet server code.  The tablet server has 
> to die at this exact point for the bug to occur.
> {noformat}
> Index: src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> ===================================================================
> --- src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java 
> (revision 1464780)
> +++ src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java 
> (working copy)
> @@ -3592,6 +3592,9 @@
>        MetadataTable.splitTablet(high, extent.getPrevEndRow(), splitRatio, 
> SecurityConstants.getSystemCredentials(), tabletServer.getLock());
>        MetadataTable.addNewTablet(low, lowDirectory, 
> tabletServer.getTabletSession(), lowDatafileSizes, bulkLoadedFiles,
>            SecurityConstants.getSystemCredentials(), time, lastFlushID, 
> lastCompactID, tabletServer.getLock());
> +      
> +      Runtime.getRuntime().halt(2);
> +
>        MetadataTable.finishSplit(high, highDatafileSizes, 
> highDatafilesToRemove, SecurityConstants.getSystemCredentials(), 
> tabletServer.getLock());
>        
>        log.log(TLevel.TABLET_HIST, extent + " split " + low + " " + high);
> {noformat}
> Then create a table and add a split.  
> {noformat}
> root@test15> createtable foo
> root@test15 foo> addsplits -t foo m
> {noformat}
> If there are multiple tablet servers, then its possible that multiple 
> assignment may occur.  Below is an example of this occurring after tablets 
> were loaded.
> {noformat}
> root@test15 !METADATA> scan -b 1 -c loc
> 1;m loc:13d5a86463f4f98 []    127.0.0.1:9998
> 1;m loc:13d5a86463f4f9f []    127.0.0.1:10000
> 1< loc:13d5a86463f4f98 []    127.0.0.1:9998
> {noformat}
> The problem is that the assignment code in the tserver detects an incomplete 
> split and load both children.   However, the master may also assign one of 
> the children.   
> I think the assignment code should be modified to fix up the metadata table 
> and only load one tablet. If the new tablet was not created, it should roll 
> back the changes and load the pre split tablets.  If the new tablet was 
> created, then assume the master will assign it and only load the high tablet. 
>  I think these changes would greatly simplify the code also.
> I do not think the proposed changes would cause issues with merge, since the 
> chop flag is deleted in the case where this occurs.
> Need to ensure that the solution is itself fault tolerant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to