> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote: > >
Thanks Xuefu for review. I answered all the questions and please let me know if they make sense. > On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 152 > > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line152> > > > > Instead of doing this, should we explicitly check the existance of the > > destination? Yeah, if the code is like following and first explicitly checks the existence of target, it might be more readable. Instead I combined the exists and listStatus into one listStatus call which can save one trip to HDFS in case that targetPath exists. if (dstFs.exists(targetPath)) { FileStatus[] destFiles = dstFs.listStatus(targetPath); if (dstFs.isDirectory(targetPath)) { for (FileStatus destFile : destFiles) { if (!dstFs.delete(destFile.getPath(), true)) { throw new IOException("Unable to clean the destination directory: " + targetPath); } } } } else { if (!FileUtils.mkdir(dstFs, targetPath, false, conf)) { throw new HiveException("Failed to create local target directory " + targetPath); } > On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 143 > > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line143> > > > > Should we check if it's a directory before calling listStatus() on it? By first calling FileStatus[] destFiles = dstFs.listStatus(targetPath), we can tell if the file with path targetPath exists or not. If yes, we then check if it is directory, otherwise it throws FileNotFoundException and in its catch block we create a directory with targetPath. (Please see explanation in next comment for the reason why it is implemented in this way). But if we do dstFs.isDirectory(targetPath) first, for a non-existing directory, it returns false instead of throwing FileNotFoundException and going to FileNotFoundException block to create the target directory. Yeah, I initally did in that way and found out that trick in the testing. > On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 137 > > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line137> > > > > two nested try-catch block seem making the code hard to read. Yeah, the outer try-catch } catch (IOException ioe) { throw new HiveException("Unable to move source " + sourcePath + " to destination " + targetPath, ioe); } Is mainly for logging purpose. Should we remove? On Feb. 4, 2016, 2:51 p.m., Chaoyu Tang wrote: > > On a high level, deleting file one by one is slower. Could we instead > > remember the original permission and set it to the new directory that we > > are going to replace? Yeah, I initially implemented it in the way you suggested which is more performant. But it could not get the right permission to my local directory probably due to the defect in HDFS RawLocalFileSystem. It always return the HDFS default one (0666)to all local files/directory. I have put comment in code to explain why I delete the file one by one. // RawLocalFileSystem seems not able to get the right permissions for a local file, it // always returns hdfs default permission (00666). So we can not overwrite a directory // by deleting and recreating the directory and restoring its permissions. We should // delete all its files and subdirectories instead. - Chaoyu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43176/#review117816 ----------------------------------------------------------- On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43176/ > ----------------------------------------------------------- > > (Updated Feb. 4, 2016, 4:07 a.m.) > > > Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang. > > > Bugs: HIVE-12965 > https://issues.apache.org/jira/browse/HIVE-12965 > > > Repository: hive-git > > > Description > ------- > > In Hive, "insert overwrite local directory" first deletes the overwritten > directory if exists, recreate a new one, then copy the files from src > directory to the new local directory. This process sometimes changes the > permissions of the to-be-overwritten local directory, therefore causing some > applications no more to be able to access its content. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 > > Diff: https://reviews.apache.org/r/43176/diff/ > > > Testing > ------- > > Manual tests > Precommit tests > > > Thanks, > > Chaoyu Tang > >