[ 
https://issues.apache.org/jira/browse/KUDU-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1968.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0
                   1.3.1

Resolved by reverting the above-mentioned patch. Will fast-track a 1.3.1 
release.

> Aborted tablet copies delete live blocks
> ----------------------------------------
>
>                 Key: KUDU-1968
>                 URL: https://issues.apache.org/jira/browse/KUDU-1968
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.3.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 1.3.1, 1.4.0
>
>
> 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious 
> regression in the case of a failed tablet copy. As of that patch, the 
> following sequence happens:
> - we fetch the remote tablet's metadata, and set our local metadata to match 
> it (including the remote block IDs)
> - as we download blocks, we replace remote block ids with local block IDs
> - if we fail in the middle, we call DeleteTablet
> -- this means that, since we still have some remote block IDs in the 
> metadata, the DeleteTablet call deletes local blocks based on remote block 
> IDs. These block ids are likely to belong to other live tablets locally!
> This can cause pretty serious dataloss, and has the tendency to cascade 
> around a cluster, since later attempts to copy a tablet with missing blocks 
> will get aborted as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to