[jira] [Created] (HIVE-21314) Hive Replication not retaining the owner in the replicated table

2019-02-24 Thread mahesh kumar behera (JIRA)
mahesh kumar behera created HIVE-21314:
--

 Summary: Hive Replication not retaining the owner in the 
replicated table
 Key: HIVE-21314
 URL: https://issues.apache.org/jira/browse/HIVE-21314
 Project: Hive
  Issue Type: Bug
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera


Hive Replication not retaining the owner in the replicated table. The owner for 
the target table is set same as the user executing the load command. The user 
information should be read from the dump metadata and should be used while 
creating the table at target cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622991
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
 ##
 @@ -112,6 +118,12 @@ public void run() {
 continue;
   }
 
+  if (replIsCompactionDisabledForTable(t)) {
 
 Review comment:
   not done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622969
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -661,6 +663,10 @@ public int execute(DriverContext driverContext) {
   if (work.getAlterMaterializedViewDesc() != null) {
 return alterMaterializedView(db, work.getAlterMaterializedViewDesc());
   }
+
+  if (work.getReplSetFirstIncLoadFlagDesc() != null) {
 
 Review comment:
   its done in a separate task to make it simpler


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622407
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -164,6 +165,12 @@ public IncrementalLoadTasksBuilder(String dbName, String 
tableName, String loadP
   lastEventid);
 }
   }
+
+  ReplSetFirstIncLoadFlagDesc desc = new 
ReplSetFirstIncLoadFlagDesc(dbName, tableName);
 
 Review comment:
   new ddl task is created to make it simpler with table level and warehouse 
level support


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622413
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -158,6 +159,15 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
 // Add the checkpoint key to the Database binding it to current dump 
directory.
 // So, if retry using same dump, we shall skip Database object update.
 parameters.put(ReplUtils.REPL_CHECKPOINT_KEY, dumpDirectory);
+
+if (needSetIncFlag) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622357
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/LoadDatabase.java
 ##
 @@ -135,7 +135,8 @@ private boolean isDbEmpty(String dbName) throws 
HiveException {
   }
 
   private Task alterDbTask(Database dbObj) {
-return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory), context.hiveConf);
+return alterDbTask(dbObj.getName(), updateDbProps(dbObj, 
context.dumpDirectory, false),
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622274
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622236
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSpec.java
 ##
 @@ -426,4 +427,14 @@ public static void copyLastReplId(Map 
srcParameter, Map

[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622204
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ##
 @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() {
 Assert.assertEquals("The hash codes must be equal", 
compactionInfo.hashCode(), compactionInfo1.hashCode());
   }
 
+  @Test
+  public void testDisableCompactionDuringReplLoad() throws Exception {
+String tblName = "discomp";
+String database = "discomp_db";
+executeStatementOnDriver("drop database if exists " + database + " 
cascade", driver);
+executeStatementOnDriver("create database " + database, driver);
+executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a 
INT, b STRING) " +
+" PARTITIONED BY(ds string)" +
+" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires 
table to be bucketed
+" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + database + "." + tblName + " 
partition (ds) values (1, 'fred', " +
+"'today'), (2, 'wilma', 'yesterday')", driver);
+
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", 
driver);
+List compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'true')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
 
 Review comment:
   table level is taken care of now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622210
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ##
 @@ -1536,6 +1536,62 @@ public void testCompactionInfoHashCode() {
 Assert.assertEquals("The hash codes must be equal", 
compactionInfo.hashCode(), compactionInfo1.hashCode());
   }
 
+  @Test
+  public void testDisableCompactionDuringReplLoad() throws Exception {
+String tblName = "discomp";
+String database = "discomp_db";
+executeStatementOnDriver("drop database if exists " + database + " 
cascade", driver);
+executeStatementOnDriver("create database " + database, driver);
+executeStatementOnDriver("CREATE TABLE " + database + "." + tblName + "(a 
INT, b STRING) " +
+" PARTITIONED BY(ds string)" +
+" CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires 
table to be bucketed
+" STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + database + "." + tblName + " 
partition (ds) values (1, 'fred', " +
+"'today'), (2, 'wilma', 'yesterday')", driver);
+
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'true')", 
driver);
+List compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'true')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
+" SET TBLPROPERTIES ( 'hive.repl.first.inc.pending' = 'false')", 
driver);
+compacts = getCompactionList();
+Assert.assertEquals(0, compacts.size());
+
+executeStatementOnDriver("alter database " + database +
+" set dbproperties ('hive.repl.first.inc.pending' = 'false')", 
driver);
+executeStatementOnDriver("ALTER TABLE " + database + "." + tblName +
 
 Review comment:
   table level is taken care of now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622190
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -271,12 +299,13 @@ public String getName() {
 LOG.debug("ReplCopyTask:getLoadCopyTask: {}=>{}", srcPath, dstPath);
 if ((replicationSpec != null) && replicationSpec.isInReplicationScope()){
   ReplCopyWork rcwork = new ReplCopyWork(srcPath, dstPath, false);
-  if (replicationSpec.isReplace() && 
conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION)) {
+  if (replicationSpec.isReplace() && 
(conf.getBoolVar(REPL_ENABLE_MOVE_OPTIMIZATION) || copyToMigratedTxnTable)) {
 rcwork.setDeleteDestIfExist(true);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r25960
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
 ##
 @@ -71,6 +74,16 @@ public void init(AtomicBoolean stop, AtomicBoolean looped) 
throws Exception {
 }
   }
 
+  @Override boolean replIsCompactionDisabledForDatabase(String dbName) throws 
TException {
+try {
+  Database database = rs.getDatabase(getDefaultCatalog(conf), dbName);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259622183
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -370,6 +370,9 @@ private int executeIncrementalLoad(DriverContext 
driverContext) {
 
   // If incremental events are already applied, then check and perform if 
need to bootstrap any tables.
   if (!builder.hasMoreWork() && !work.getPathsToCopyIterator().hasNext()) {
+// No need to set incremental load pending flag for external tables as 
the files will be copied to the same path
 
 Review comment:
   todo not required as table level load is taken care now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259621882
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
 
 Review comment:
   the code is removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259621844
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java
 ##
 @@ -289,12 +296,21 @@ private boolean shouldReplayEvent(FileStatus dir, 
DumpType dumpType, String dbNa
 return updateReplIdTask;
   }
 
-  private Task dbUpdateReplStateTask(String dbName, 
String replState,
+  private Task dbUpdateReplStateTask(String dbName, 
String replState, String incLoadPendFlag,
  Task preCursor) {
 HashMap mapProp = new HashMap<>();
-mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
 
-AlterDatabaseDesc alterDbDesc = new AlterDatabaseDesc(dbName, mapProp, new 
ReplicationSpec(replState, replState));
+// if the update is for incLoadPendFlag, then send replicationSpec as null 
to avoid replacement check.
+ReplicationSpec replicationSpec = null;
+if (incLoadPendFlag == null) {
+  mapProp.put(ReplicationSpec.KEY.CURR_STATE_ID.toString(), replState);
+  replicationSpec = new ReplicationSpec(replState, replState);
+} else {
+  assert replState == null;
+  mapProp.put(ReplUtils.REPL_FIRST_INC_PENDING_FLAG, incLoadPendFlag);
 
 Review comment:
   done. Dump will fail if the inc pending flag is set to true


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618838
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259618842
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##
 @@ -187,4 +192,12 @@ public static PathFilter getEventsDirectoryFilter(final 
FileSystem fs) {
   }
 };
   }
+
+  public static boolean isFirstIncDone(Map parameter) {
+if (parameter == null) {
+  return true;
+}
+String compFlag = parameter.get(ReplUtils.REPL_FIRST_INC_PENDING_FLAG);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive Replication can add duplicate data during migration to a target with hive.strict.managed.tables enabled

2019-02-24 Thread GitBox
maheshk114 commented on a change in pull request #541: HIVE-21197 : Hive 
Replication can add duplicate data during migration to a target with 
hive.strict.managed.tables enabled
URL: https://github.com/apache/hive/pull/541#discussion_r259617900
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ReplCopyTask.java
 ##
 @@ -61,6 +62,21 @@ public ReplCopyTask(){
 super();
   }
 
+  // If file is already present in base directory, then remove it from the 
list.
+  // Check  HIVE-21197 for more detail
+  private void updateSrcFileListForDupCopy(FileSystem dstFs, Path toPath, 
List srcFiles,
+   long writeId, int stmtId) throws 
IOException {
+ListIterator iter = srcFiles.listIterator();
+Path basePath = new Path(toPath, AcidUtils.baseOrDeltaSubdir(true, 
writeId, writeId, stmtId));
+while (iter.hasNext()) {
+  Path filePath = new Path(basePath, 
iter.next().getSourcePath().getName());
+  if (dstFs.exists(filePath)) {
 
 Review comment:
   the i/o exception retry case is handled specifically at 2 places only. there 
are many other i/o failure scenarios which are not handled. I think its not 
required here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services