> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> >

Thanks Xuefu for review. I answered all the questions and please let me know if 
they make sense.


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 152
> > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line152>
> >
> >     Instead of doing this, should we explicitly check the existance of the 
> > destination?

Yeah, if the code is like following and first explicitly checks the existence 
of target, it might be more readable. Instead I combined the exists and 
listStatus into one listStatus call which can save one trip to HDFS in case 
that targetPath exists.

if (dstFs.exists(targetPath)) {
   FileStatus[] destFiles = dstFs.listStatus(targetPath);
   if (dstFs.isDirectory(targetPath)) {
     for (FileStatus destFile : destFiles) {
       if (!dstFs.delete(destFile.getPath(), true)) {
         throw new IOException("Unable to clean the destination directory: " + 
targetPath);
       }
     }
   }
} else {
   if (!FileUtils.mkdir(dstFs, targetPath, false, conf)) {
     throw new HiveException("Failed to create local target directory " + 
targetPath);
}


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 143
> > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line143>
> >
> >     Should we check if it's a directory before calling listStatus() on it?

By first calling FileStatus[] destFiles = dstFs.listStatus(targetPath), we can 
tell if the file with path targetPath exists or not. If yes, we then check if 
it is directory, otherwise it throws FileNotFoundException and in its catch 
block we create a directory with targetPath. (Please see explanation in next 
comment for the reason why it is implemented in this way).
But if we do dstFs.isDirectory(targetPath) first, for a non-existing directory, 
it returns false instead of throwing FileNotFoundException and going to 
FileNotFoundException block to create the target directory.
Yeah, I initally did in that way and found out that trick in the testing.


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 137
> > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line137>
> >
> >     two nested try-catch block seem making the code hard to read.

Yeah, the outer try-catch
      } catch (IOException ioe) {
        throw new HiveException("Unable to move source " + sourcePath + " to 
destination "
            + targetPath, ioe);
      }
Is mainly for logging purpose. Should we remove?


On Feb. 4, 2016, 2:51 p.m., Chaoyu Tang wrote:
> > On a high level, deleting file one by one is slower. Could we instead 
> > remember the original permission and set it to the new directory that we 
> > are going to replace?

Yeah, I initially implemented it in the way you suggested which is more 
performant. But it could not get the right permission to my local directory 
probably due to the defect in HDFS RawLocalFileSystem. It always return the 
HDFS default one (0666)to all local files/directory. I have put comment in code 
to explain why I delete the file one by one.
          // RawLocalFileSystem seems not able to get the right permissions for 
a local file, it
          // always returns hdfs default permission (00666). So we can not 
overwrite a directory
          // by deleting and recreating the directory and restoring its 
permissions. We should
          // delete all its files and subdirectories instead.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review117816
-----------------------------------------------------------


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
>     https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten 
> directory if exists, recreate a new one, then copy the files from src 
> directory to the new local directory. This process sometimes changes the 
> permissions of the to-be-overwritten local directory, therefore causing some 
> applications no more to be able to access its content.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> -------
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>

Reply via email to