sankarh commented on a change in pull request #541: HIVE-21197 : Hive
Replication can add duplicate data during migration to a target with
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259588369
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
##########
@@ -271,12 +299,13 @@ public String getName() {
LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
- if (replicationSpec.isReplace() &&
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+ if (replicationSpec.isReplace() &&
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
rcwork.setDeleteDestIfExist(true);
Review comment:
If it is insert overwrite case(writeId=2) and let's say few of the listed
files are there in base_1. Now, this deleteDestIfExists marks to create base_2
instead of delta_2_2 as it is IOW. So, only partial data would be copied to
base_2. After commit of writeId=2, all readers would see partial data from
base_2. So, need to handle this case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services