Todd Lipcon created KUDU-1968:
---------------------------------

             Summary: Aborted tablet copies delete live blocks
                 Key: KUDU-1968
                 URL: https://issues.apache.org/jira/browse/KUDU-1968
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.3.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Blocker


72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious 
regression in the case of a failed tablet copy. As of that patch, the following 
sequence happens:

- we fetch the remote tablet's metadata, and set our local metadata to match it 
(including the remote block IDs)
- as we download blocks, we replace remote block ids with local block IDs
- if we fail in the middle, we call DeleteTablet
-- this means that, since we still have some remote block IDs in the metadata, 
the DeleteTablet call deletes local blocks based on remote block IDs. These 
block ids are likely to belong to other live tablets locally!

This can cause pretty serious dataloss, and has the tendency to cascade around 
a cluster, since later attempts to copy a tablet with missing blocks will get 
aborted as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to