[ https://issues.apache.org/jira/browse/KUDU-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved KUDU-1968. ------------------------------- Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Resolved by reverting the above-mentioned patch. Will fast-track a 1.3.1 release. > Aborted tablet copies delete live blocks > ---------------------------------------- > > Key: KUDU-1968 > URL: https://issues.apache.org/jira/browse/KUDU-1968 > Project: Kudu > Issue Type: Bug > Components: tserver > Affects Versions: 1.3.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Blocker > Fix For: 1.3.1, 1.4.0 > > > 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious > regression in the case of a failed tablet copy. As of that patch, the > following sequence happens: > - we fetch the remote tablet's metadata, and set our local metadata to match > it (including the remote block IDs) > - as we download blocks, we replace remote block ids with local block IDs > - if we fail in the middle, we call DeleteTablet > -- this means that, since we still have some remote block IDs in the > metadata, the DeleteTablet call deletes local blocks based on remote block > IDs. These block ids are likely to belong to other live tablets locally! > This can cause pretty serious dataloss, and has the tendency to cascade > around a cluster, since later attempts to copy a tablet with missing blocks > will get aborted as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)