[
https://issues.apache.org/jira/browse/ACCUMULO-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190061#comment-13190061
]
John Vines commented on ACCUMULO-315:
-------------------------------------
Brain dump of discussions throughout the day-
Going in the path that there was a potential issue of a split occuring during a
merge, such that the split data was lost, we went in the direction to ensure
that merges could not include split data. Through code review, we're
comfortable saying that it's not a problem with our logic of ensuring offline
== total to determine all tablets are offline. We were originally thinking that
this had the possibility of total not being updated accurately, causing there
to be another tablet in the range which was not offline which should have been
offline. This is not possible, since the totals are updates within the code.
However, the code could be simplified by looking for the existence of ANY
(boolean) online tablets in the range. Additionally, a similar fix could have
been made for needsChop.
Unfortunately, we realized that this is not the case because of the keyextents
we deal with. We believe that it's not possible to get a partial key extent,
such as taht after the very first write of a split (changing of prevRow in the
parent of the split). This means that if we catch a scan between the first and
second writes of a split, we will be missing a tablet which is abou to come
online. Keith originally had the idea, which I'm not 100% thrilled with, but I
believe is the only solution, which is to have merge have a dedicated scanner
which goes over the METADATA table for the merge range to ensure that the
startrow = next KE prevendrow. This would catch the previously mentioned
partial write of the split. It would be enough information for the merge to
rescan and ensure the data is valid.
The state changes for the merge FATE operation seem valid, I can only really
attribute it to our scanning and the interruptableness of splits. The only
other way I see us accomplishing this possibility of merge/split conflicts is
to have a mechanism in place to halt splits during the chop. We sort of have
this by taking tablets offline, but we still have this race condition of the
three writes for a split and the offlining of a tablet to occur while the
scanner is after the newly created tablet and the tablet going offline.
> Hole in metadata table occurred during random walk test
> -------------------------------------------------------
>
> Key: ACCUMULO-315
> URL: https://issues.apache.org/jira/browse/ACCUMULO-315
> Project: Accumulo
> Issue Type: Bug
> Components: master, tserver
> Environment: Running 1.4.0 SNAPSHOT on 10 node cluster.
> Reporter: Keith Turner
> Assignee: Keith Turner
> Priority: Critical
> Fix For: 1.4.0
>
>
> While running the random walk test a hole in the metadata table occurred. A
> client tried to delete the table with the whole and the fate op got stuck.
> Was continually seeing the following in the master logs.
> {noformat}
> 14 00:02:11,273 [tableOps.CleanUp] DEBUG: Still waiting for table to be
> deleted: 4ct locationState:
> 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef@(null,xxx.xxx.xxx.xxx:9997[134d7425fc503e1],null)
> {noformat}
> The metadata table contained the following. Tablet 4ct;4d2d3be2823b0bf4 had
> a location.
> {noformat}
> 4ct;262249211a62cd6f ~tab:~pr [] \x011819e56edae21302
> 4ct;27b693c626c2d4ef ~tab:~pr [] \x01262249211a62cd6f
> 4ct;43422047c78fa52b ~tab:~pr [] \x0141ea825af0f262d9
> 4ct;4d2d3be2823b0bf4 ~tab:~pr [] \x0127b693c626c2d4ef
> 4ct;4f89df61392bb311 ~tab:~pr [] \x014d2d3be2823b0bf4
> {noformat}
> Found the following events on a tablet server.
> {noformat}
> #the tablet server events below are caused by the delete range operation
> 13 21:36:04,287 [tabletserver.Tablet] TABLET_HIST:
> 4ct;4d2d3be2823b0bf4;262249211a62cd6f split
> 4ct;27b693c626c2d4ef;262249211a62cd6f 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef
> 13 21:36:04,369 [tabletserver.Tablet] TABLET_HIST:
> 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef split
> 4ct;41ea825af0f262d9;27b693c626c2d4ef 4ct;4d2d3be2823b0bf4;41ea825af0f262d9
> 13 21:36:04,370 [tabletserver.Tablet] TABLET_HIST:
> 4ct;4d2d3be2823b0bf4;41ea825af0f262d9 opened
> 13 21:36:06,141 [tabletserver.Tablet] TABLET_HIST:
> 4ct;4d2d3be2823b0bf4;41ea825af0f262d9 closed
> 13 21:36:06,142 [tabletserver.Tablet] DEBUG: Files for low split
> 4ct;43422047c78fa52b;41ea825af0f262d9 [/t-0001cdi/F0001bmw.rf,
> /t-0001cdi/F0001bn1.rf]
> 13 21:36:06,142 [tabletserver.Tablet] DEBUG: Files for high split
> 4ct;4d2d3be2823b0bf4;43422047c78fa52b [/t-0001cdi/A0001cef.rf,
> /t-0001cdi/F0001bmw.rf, /t-0001cdi/F0001bn1.rf]
> #split from other random walker
> 13 21:36:06,351 [tabletserver.Tablet] TABLET_HIST:
> 4ct;4d2d3be2823b0bf4;41ea825af0f262d9 split
> 4ct;43422047c78fa52b;41ea825af0f262d9 4ct;4d2d3be2823b0bf4;43422047c78fa52b
> {noformat}
> The following events occurred on the master and overlap in time with the
> split on the tablet server.
> {noformat}
> 13 21:36:06,312 [master.EventCoordinator] INFO : Merge state of
> 4ct;41ea825af0f262d9;27b693c626c2d4ef set to MERGING
> 13 21:36:06,312 [master.Master] DEBUG: Deleting tablets for
> 4ct;41ea825af0f262d9;27b693c626c2d4ef
> 13 21:36:06,316 [master.Master] DEBUG: Found following tablet
> 4ct;4d2d3be2823b0bf4;43422047c78fa52b
> 13 21:36:06,317 [master.Master] DEBUG: Making file deletion entries for
> 4ct;41ea825af0f262d9;27b693c626c2d4ef
> 13 21:36:06,325 [master.Master] DEBUG: Removing metadata table entries in
> range [4ct;27b693c626c2d4ef%00; : [] 9223372036854775807
> false,4ct;41ea825af0f262d9%00; : [] 9223372036854775807 false)
> 13 21:36:06,331 [master.Master] DEBUG: Updating prevRow of
> 4ct;4d2d3be2823b0bf4;43422047c78fa52b to 27b693c626c2d4ef
> {noformat}
> After many hours of debugging Eric and I figured out what was going on. Two
> random walkers were running the concurrent test. One client initiated a
> delete range on table id 4ct for the range 27b693c626c2d4ef to
> 41ea825af0f262d9. While this delete range operation was occurring another
> client add the split point 43422047c78fa52b. The master read the metadata
> table while the split was occurring and got inconsistent/incomplete
> information about what tablets related to the delete range operation were
> online. It assumed the required tablets were offline when they were not.
> The log messages above show that the split and updating of the prevRow by the
> master overlap in time.
> We think the best solution is to ensure that scans of the metadata table for
> merges and delete range are consistent with respect to end row and prev end
> row matching. Can not consider tablets individually. Must ensure the
> portion of the metadata table under consideration forms a proper sorted
> linked list.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira