[ 
https://issues.apache.org/jira/browse/KUDU-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Percy updated KUDU-1853:
-----------------------------
    Code Review: http://gerrit.cloudera.org:8080/5799

> Error during tablet copy may orphan a bunch of stuff
> ----------------------------------------------------
>
>                 Key: KUDU-1853
>                 URL: https://issues.apache.org/jira/browse/KUDU-1853
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet, tserver
>    Affects Versions: 1.2.0
>            Reporter: Adar Dembo
>            Assignee: Mike Percy
>            Priority: Critical
>
> Currently, a failure during tablet copy may leave behind a number of 
> different things:
> # Downloaded superblock (if the failure falls after TabletCopyClient::Start())
> # Downloaded data blocks (if the failure falls during 
> TabletCopyClient::FetchAll())
> # Downloaded WAL segments (if the failure falls during 
> TabletCopyClient::FetchAll())
> # Downloaded cmeta file (if the failure falls during 
> TabletCopyClient::Finish())
> The next time the tserver starts, it'll see that this tablet's state is still 
> TABLET_DATA_COPYING and will tombstone it. That takes care of #1, #3, and #4 
> (well, it leaves the cmeta file behind as the tombstone, but that's 
> intentional).
> Unfortunately, all data blocks are orphaned, because the on-disk superblock 
> has no record of the new blocks, and so they aren't deleted.
> We're already tracking a general purpose GC mechanism for data blocks in 
> KUDU-829, but I think this separate JIRA for describing the problem with 
> tablet copy is useful, if only as a reference for users.
> Separately, it may be worth addressing these issues for failures that don't 
> result in tserver crashes, such as intermittent network outages between 
> tservers. A long lived tserver won't GC for some time, and it'd be nice to 
> reclaim the disk space used by these orphaned objects in the interim, not to 
> mention that implementing this kind of "GC" for data blocks is a lot easier 
> than a general purpose GC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to