[ 
https://issues.apache.org/jira/browse/KUDU-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274568#comment-15274568
 ] 

Mike Percy commented on KUDU-1419:
----------------------------------

Adding some thoughts on how to work around this issue:

Docker typically ships with aufs or overlayfs to allow for copy-on-write 
storage semantics without requiring a host dependency on btrfs. These file 
systems return EXDEV when rename(2) is called on a directory.

In order to avoid this, we remove the need for atomic directory rename by 
adding additional machinery to the recovery logic. Here's a potential new 
approach:

# On tablet recovery, create a new directory inside the tablet wal dir
  called .recovery-dir and a recovery status file caller
  .recovery-status
# Write the value PREPARING to .recovery-status
# Move each WAL file to .recovery-dir
# Write the value REPLAYING to .recovery-status
# Replay the logs. This will read the wal files from .recovery-dir and
  write new wal files into the main wal directory.
# Write the value CLEANING_UP to .recovery-status
# Delete all of the old WAL files from .recovery-dir
# Delete the .recovery-dir directory
# Delete the .recovery-status file


> Kudu may fail to start in docker when using Ubuntu/AUFS
> -------------------------------------------------------
>
>                 Key: KUDU-1419
>                 URL: https://issues.apache.org/jira/browse/KUDU-1419
>             Project: Kudu
>          Issue Type: Bug
>          Components: util
>            Reporter: Casey Ching
>            Priority: Critical
>
> By default Ubuntu's docker setup uses AUFS for its storage layer. That leads 
> to problems during startup because rename() may not work in AUFS.
> {quote}
> To rename(2) directory may return EXDEV even if both of src and tgt are on 
> the same aufs. When the rename-src dir exists on multiple branches and the 
> lower dir has child(ren), aufs has to copyup all his children. It can be 
> recursive copyup. Current aufs does not support such huge copyup operation at 
> one time in kernel space, instead produces a warning and returns EXDEV. 
> Generally, mv(1) detects this error and tries mkdir(2) and rename(2) or 
> copy/unlink recursively. So the result is harmless. If your application which 
> issues rename(2) for a directory does not support EXDEV, it will not work on 
> aufs. Also this specification is applied to the case when the src directroy 
> exists on the lower readonly branch and it has child(ren).
> {quote}
> http://aufs.sourceforge.net/aufs.html
> Starting the master may try to rename()
> {code}
>     RETURN_NOT_OK_PREPEND(fs_manager->env()->RenameFile(log_dir, 
> recovery_path),
>                           Substitute("Could not move log directory $0 to 
> recovery dir $1",
>                                      log_dir, recovery_path));
> {code}
> https://github.com/cloudera/kudu/blob/master/src/kudu/tablet/tablet_bootstrap.cc#L597
> {code}
>   virtual Status RenameFile(const std::string& src, const std::string& 
> target) OVERRIDE {
>     TRACE_EVENT2("io", "PosixEnv::RenameFile", "src", src, "dst", target);
>     ThreadRestrictions::AssertIOAllowed();
>     Status result;
>     if (rename(src.c_str(), target.c_str()) != 0) {
>       result = IOError(src, errno);
>     }
>     return result;
>   }
> {code}
> https://github.com/cloudera/kudu/blob/master/src/kudu/util/env_posix.cc#L891
> I think Kudu is supposed to fall back to copy/remove. As an example here is 
> what python does
> {code}
>     try:
>         os.rename(src, real_dst)
>     except OSError:
>         if os.path.isdir(src):
>             if _destinsrc(src, dst):
>                 raise Error, "Cannot move a directory '%s' into itself '%s'." 
> % (src, dst)
>             copytree(src, real_dst, symlinks=True)
>             rmtree(src)
>         else:
>             copy2(src, real_dst)
>             os.unlink(src)
> {code}
> https://hg.python.org/cpython/file/2.7/Lib/shutil.py#l295



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to