sankarh commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259234204
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##########
 @@ -133,8 +161,11 @@ protected int execute(DriverContext driverContext) {
             return 6;
           }
           long writeId = Long.parseLong(writeIdString);
-          toPath = new Path(toPath, 
AcidUtils.baseOrDeltaSubdir(work.getDeleteDestIfExist(), writeId, writeId,
-                  
driverContext.getCtx().getHiveTxnManager().getStmtIdAndIncrement()));
+          // Set stmt id 0 for bootstrap load as the directory needs to be 
searched during incremental load to avoid any
+          // duplicate copy from the source. Check HIVE-21197 for more detail.
+          int stmtId = (writeId == 
ReplUtils.REPL_BOOTSTRAP_MIGRATION_BASE_WRITE_ID) ? 0 :
 
 Review comment:
   No need to have this check as it is migration flow. Set it to 0 always.
   Also, it is better to add a final variable in ReplUtils for this instead of 
hardcoding it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to