[ https://issues.apache.org/jira/browse/HIVE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Dere updated HIVE-17963: ------------------------------ Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > Fix for HIVE-17113 can be improved for non-blobstore filesystems > ---------------------------------------------------------------- > > Key: HIVE-17963 > URL: https://issues.apache.org/jira/browse/HIVE-17963 > Project: Hive > Issue Type: Bug > Reporter: Jason Dere > Assignee: Jason Dere > Fix For: 3.0.0 > > Attachments: HIVE-17963.1.patch, HIVE-17963.2.patch > > > HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves > on a file-by-file basis. For non-blobstore filesystems this results in many > more filesystem/namenode operations compared to the previous > Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename src > dir to final dir). > For non-blobstore filesystems, a better solution would be the one described > [here|https://issues.apache.org/jira/browse/HIVE-17113?focusedCommentId=16100564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16100564]: > 1) Move the temp directory to a new directory name, to prevent additional > files from being added by any runaway processes. > 2) Run removeTempOrDuplicateFiles() on this renamed temp directory > 3) Run renameOrMoveFiles() to move the renamed temp directory to the final > location. > This results in only one additional file operation in non-blobstore FSes > compared to the original Utilities.mvFileToFinalPath() behavior. > The proposal is to do away with the config setting > hive.exec.move.files.from.source.dir and always have behavior that should > take care of the duplicate file issue described in HIVE-17113. For > non-blobstore filesystems we will do steps 1-3 described above. For blobstore > filesystems we will do the solution done in HIVE-17113/HIVE-17813 which does > the file-by-file copy - this should have the same number of file operations > as doing a rename directory on blobstore, which effectively results in file > moves on a file-by-file basis. -- This message was sent by Atlassian JIRA (v6.4.14#64029)