[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

GitBox Sat, 23 Feb 2019 22:51:04 -0800

sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259606013


 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##########
 @@ -271,12 +299,13 @@ public String getName() {
     LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
     if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
       ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-      if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+      if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
         rcwork.setDeleteDestIfExist(true);
 
 Review comment:
   As per discussion, we need to skip duplicate check for replace case, as we 
may end up with skipping copy of files with same name but different content. 
Also, we may need to move those matching files to new base path which is 
complicated. So, we just skip duplicate check for replace flow.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] sankarh commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

Reply via email to