ashutoshc commented on a change in pull request #552: Hive 21279
URL: https://github.com/apache/hive/pull/552#discussion_r260597741
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##########
@@ -1476,42 +1500,31 @@ private static String replaceTaskIdFromFilename(String
filename, String oldTaskI
}
public static void mvFileToFinalPath(Path specPath, Configuration hconf,
- boolean success, Logger log, DynamicPartitionCtx dpCtx, FileSinkDesc
conf,
- Reporter reporter) throws IOException,
+ boolean success, Logger log,
DynamicPartitionCtx dpCtx, FileSinkDesc conf,
+ Reporter reporter) throws IOException,
HiveException {
- //
- // Runaway task attempts (which are unable to be killed by MR/YARN) can
cause HIVE-17113,
- // where they can write duplicate output files to tmpPath after
de-duplicating the files,
- // but before tmpPath is moved to specPath.
- // Fixing this issue will be done differently for blobstore (e.g. S3)
- // vs non-blobstore (local filesystem, HDFS) filesystems due to
differences in
- // implementation - a directory move in a blobstore effectively results in
file-by-file
- // moves for every file in a directory, while in HDFS/localFS a directory
move is just a
- // single filesystem operation.
- // - For non-blobstore FS, do the following:
- // 1) Rename tmpPath to a new directory name to prevent additional files
- // from being added by runaway processes.
- // 2) Remove duplicates from the temp directory
- // 3) Rename/move the temp directory to specPath
- //
- // - For blobstore FS, do the following:
- // 1) Remove duplicates from tmpPath
- // 2) Use moveSpecifiedFiles() to perform a file-by-file move of the
de-duped files
- // to specPath. On blobstore FS, assuming n files in the directory,
this results
- // in n file moves, compared to 2*n file moves with the previous
solution
- // (each directory move would result in a file-by-file move of the
files in the directory)
- //
+ // There are following two paths this could could take based on the value
of shouldAvoidRename
Review comment:
Rest of the earlier comment still applies for true. we can retain that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services