[ 
https://issues.apache.org/jira/browse/HIVE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-17963:
---------------------------------


> Fix for HIVE-17113 can be improved for non-blobstore filesystems
> ----------------------------------------------------------------
>
>                 Key: HIVE-17963
>                 URL: https://issues.apache.org/jira/browse/HIVE-17963
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>
> HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves 
> on a file-by-file basis. For non-blobstore filesystems this results in many 
> more filesystem/namenode operations compared to the previous 
> Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename src 
> dir to final dir).
> For non-blobstore filesystems, a better solution would be the one described 
> [here|https://issues.apache.org/jira/browse/HIVE-17113?focusedCommentId=16100564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16100564]:
> 1) Move the temp directory to a new directory name, to prevent additional 
> files from being added by any runaway processes.
> 2) Run removeTempOrDuplicateFiles() on this renamed temp directory
> 3) Run renameOrMoveFiles() to move the renamed temp directory to the final 
> location.
> This results in only one additional file operation in non-blobstore FSes 
> compared to the original Utilities.mvFileToFinalPath() behavior.
> The proposal is to do away with the config setting 
> hive.exec.move.files.from.source.dir and always have behavior that should 
> take care of the duplicate file issue described in HIVE-17113. For 
> non-blobstore filesystems we will do steps 1-3 described above. For blobstore 
> filesystems we will do the solution done in HIVE-17113/HIVE-17813 which does 
> the file-by-file copy - this should have the same number of file operations 
> as doing a rename directory on blobstore, which effectively results in file 
> moves on a file-by-file basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to