[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=726140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-726140
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 14/Feb/22 06:20
Start Date: 14/Feb/22 06:20
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged pull request #2980:
URL: https://github.com/apache/hive/pull/2980


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 726140)
Time Spent: 2h  (was: 1h 50m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=726111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-726111
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 14/Feb/22 04:11
Start Date: 14/Feb/22 04:11
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r805492574



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##
@@ -136,8 +142,9 @@ public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
   FileSystem fs = failoverReadyMarker.getFileSystem(hiveConf);
   shouldFailover = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_FAILOVER_START)
   && fs.exists(failoverReadyMarker);
-  isFailover =
-  checkFileExists(new Path(dumpDirectory).getParent(), hiveConf, 
OptimisedBootstrapUtils.EVENT_ACK_FILE);
+  isFirstFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, EVENT_ACK_FILE);

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 726111)
Time Spent: 1h 50m  (was: 1h 40m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=725901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725901
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 14/Feb/22 02:26
Start Date: 14/Feb/22 02:26
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r805413504



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##
@@ -136,8 +142,9 @@ public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
   FileSystem fs = failoverReadyMarker.getFileSystem(hiveConf);
   shouldFailover = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_FAILOVER_START)
   && fs.exists(failoverReadyMarker);
-  isFailover =
-  checkFileExists(new Path(dumpDirectory).getParent(), hiveConf, 
OptimisedBootstrapUtils.EVENT_ACK_FILE);
+  isFirstFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, EVENT_ACK_FILE);

Review comment:
   nit: Could create and re-use the Path object. e.g
   Path dumpDirParent = new Path(dumpDirectory).getParent(); 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725901)
Time Spent: 1h 40m  (was: 1.5h)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=725841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725841
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 13/Feb/22 20:52
Start Date: 13/Feb/22 20:52
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r805413504



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##
@@ -136,8 +142,9 @@ public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
   FileSystem fs = failoverReadyMarker.getFileSystem(hiveConf);
   shouldFailover = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_FAILOVER_START)
   && fs.exists(failoverReadyMarker);
-  isFailover =
-  checkFileExists(new Path(dumpDirectory).getParent(), hiveConf, 
OptimisedBootstrapUtils.EVENT_ACK_FILE);
+  isFirstFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, EVENT_ACK_FILE);

Review comment:
   nit: Could create and re-use the Path object. e.g
   Path dumpDirParent = new Path(dumpDirectory).getParent(); 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 725841)
Time Spent: 1.5h  (was: 1h 20m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=722930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722930
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 15:42
Start Date: 08/Feb/22 15:42
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r801773721



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##
@@ -136,8 +142,8 @@ public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
   FileSystem fs = failoverReadyMarker.getFileSystem(hiveConf);
   shouldFailover = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_FAILOVER_START)
   && fs.exists(failoverReadyMarker);
-  isFailover =
-  checkFileExists(new Path(dumpDirectory).getParent(), hiveConf, 
OptimisedBootstrapUtils.EVENT_ACK_FILE);
+  isFirstFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, EVENT_ACK_FILE);
+  isSecondFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, BOOTSTRAP_TABLES_LIST);

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722930)
Time Spent: 1h 20m  (was: 1h 10m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=722902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-722902
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 08/Feb/22 15:19
Start Date: 08/Feb/22 15:19
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r801733505



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##
@@ -136,8 +142,8 @@ public ReplLoadWork(HiveConf hiveConf, String dumpDirectory,
   FileSystem fs = failoverReadyMarker.getFileSystem(hiveConf);
   shouldFailover = 
hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_REPL_FAILOVER_START)
   && fs.exists(failoverReadyMarker);
-  isFailover =
-  checkFileExists(new Path(dumpDirectory).getParent(), hiveConf, 
OptimisedBootstrapUtils.EVENT_ACK_FILE);
+  isFirstFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, EVENT_ACK_FILE);
+  isSecondFailover = checkFileExists(new Path(dumpDirectory).getParent(), 
hiveConf, BOOTSTRAP_TABLES_LIST);

Review comment:
   only if ! isFirstFailover?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 722902)
Time Spent: 1h 10m  (was: 1h)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=720678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720678
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 04/Feb/22 06:50
Start Date: 04/Feb/22 06:50
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r799204862



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/OptimisedBootstrapUtils.java
##
@@ -70,6 +71,8 @@
   /** event ack file which contains the event id till which the cluster was 
last loaded. */
   public static final String EVENT_ACK_FILE = "event_ack";
 
+  public static final String BOOTSTRAP_TABLES_LIST = "bootstrap_table_list";

Review comment:
   Renamed.
   
   Answered for the rollback & failback above, on high level it is part of 
reset solution, and we allow that at all levels and stages using that. Details 
available in the doc




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 720678)
Time Spent: 1h  (was: 50m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=720677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720677
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 04/Feb/22 06:49
Start Date: 04/Feb/22 06:49
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r799204359



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOptimisedBootstrap.java
##
@@ -734,4 +743,167 @@ public NotificationEventResponse apply(@Nullable 
NotificationEventResponse event
   InjectableBehaviourObjectStore.resetGetNextNotificationBehaviour();  // 
reset the behaviour
 }
   }
+
+
+  @Test
+  public void testReverseBootstrap() throws Throwable {
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+
+// Do a bootstrap cycle.
+primary.dump(primaryDbName, withClause);
+replica.load(replicatedDbName, primaryDbName, withClause);
+
+// Create 4 managed tables and do a dump & load.
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2),(3),(4)")
+.run("create table t2 (place string) partitioned by (country string)")
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("insert into table t2 partition(country='us') values ('new 
york')")
+.run("create table t3 (id int)")
+.run("insert into table t3 values (10)")
+.run("insert into table t3 values (20),(31),(42)")
+.run("create table t4 (place string) partitioned by (country string)")
+.run("insert into table t4 partition(country='india') values 
('bangalore')")
+.run("insert into table t4 partition(country='us') values ('austin')")
+.dump(primaryDbName, withClause);
+
+// Do the load and check all the external & managed tables are present.
+replica.load(replicatedDbName, primaryDbName, withClause)
+.run("repl status " + replicatedDbName)
+.verifyResult(tuple.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyResult("t1")
+.run("show tables like 't2'")
+.verifyResult("t2")
+.run("show tables like 't3'")
+.verifyResult("t3")
+.run("show tables like 't4'")
+.verifyResult("t4")
+.verifyReplTargetProperty(replicatedDbName);
+
+
+// Do some modifications on original source cluster. The diff 
becomes(tnew_managed, t1, t2, t3)
+primary.run("use " + primaryDbName)
+.run("create table tnew_managed (id int)")
+.run("insert into table t1 values (25)")
+.run("insert into table tnew_managed values (110)")
+.run("insert into table t2 partition(country='france') values 
('lyon')")
+.run("drop table t3");
+
+// Do some modifications on the target cluster. (t1, t2, t3: bootstrap & 
t4, t5: incremental)
+replica.run("use " + replicatedDbName)
+.run("insert into table t1 values (101)")
+.run("insert into table t1 values (210),(321)")
+.run("insert into table t2 partition(country='india') values 
('delhi')")
+.run("insert into table t3 values (11)")
+.run("insert into table t4 partition(country='india') values 
('lucknow')")
+.run("create table t5 (place string) partitioned by (country string)")
+.run("insert into table t5 partition(country='china') values 
('beejing')");
+
+// Prepare for reverse replication.
+DistributedFileSystem replicaFs = replica.miniDFSCluster.getFileSystem();
+Path newReplDir = new Path(replica.repldDir + "1");
+replicaFs.mkdirs(newReplDir);
+withClause = ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
newReplDir + "'");
+
+// Do a reverse dump
+tuple = replica.dump(replicatedDbName, withClause);
+
+// Check the event ack file got created.
+assertTrue(new Path(tuple.dumpLocation, EVENT_ACK_FILE).toString() + " 
doesn't exist",
+replicaFs.exists(new Path(tuple.dumpLocation, EVENT_ACK_FILE)));
+
+Path dumpPath = new Path(tuple.dumpLocation);
+
+// Do a load, this should create a table_diff_complete directory
+primary.load(primaryDbName, replicatedDbName, withClause);
+
+// Check the table diff directory exist.
+assertTrue(new Path(tuple.dumpLocation, 
TABLE_DIFF_COMPLETE_DIRECTORY).toString() + " doesn't exist",
+replicaFs.exists(new 

[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=720676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720676
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 04/Feb/22 06:47
Start Date: 04/Feb/22 06:47
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r799203703



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOptimisedBootstrap.java
##
@@ -734,4 +743,167 @@ public NotificationEventResponse apply(@Nullable 
NotificationEventResponse event
   InjectableBehaviourObjectStore.resetGetNextNotificationBehaviour();  // 
reset the behaviour
 }
   }
+
+
+  @Test
+  public void testReverseBootstrap() throws Throwable {
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+
+// Do a bootstrap cycle.
+primary.dump(primaryDbName, withClause);
+replica.load(replicatedDbName, primaryDbName, withClause);
+
+// Create 4 managed tables and do a dump & load.
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2),(3),(4)")
+.run("create table t2 (place string) partitioned by (country string)")
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("insert into table t2 partition(country='us') values ('new 
york')")
+.run("create table t3 (id int)")
+.run("insert into table t3 values (10)")
+.run("insert into table t3 values (20),(31),(42)")
+.run("create table t4 (place string) partitioned by (country string)")
+.run("insert into table t4 partition(country='india') values 
('bangalore')")
+.run("insert into table t4 partition(country='us') values ('austin')")
+.dump(primaryDbName, withClause);
+
+// Do the load and check all the external & managed tables are present.
+replica.load(replicatedDbName, primaryDbName, withClause)
+.run("repl status " + replicatedDbName)
+.verifyResult(tuple.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyResult("t1")
+.run("show tables like 't2'")
+.verifyResult("t2")
+.run("show tables like 't3'")
+.verifyResult("t3")
+.run("show tables like 't4'")
+.verifyResult("t4")
+.verifyReplTargetProperty(replicatedDbName);
+
+
+// Do some modifications on original source cluster. The diff 
becomes(tnew_managed, t1, t2, t3)
+primary.run("use " + primaryDbName)
+.run("create table tnew_managed (id int)")
+.run("insert into table t1 values (25)")
+.run("insert into table tnew_managed values (110)")
+.run("insert into table t2 partition(country='france') values 
('lyon')")
+.run("drop table t3");
+
+// Do some modifications on the target cluster. (t1, t2, t3: bootstrap & 
t4, t5: incremental)
+replica.run("use " + replicatedDbName)
+.run("insert into table t1 values (101)")
+.run("insert into table t1 values (210),(321)")
+.run("insert into table t2 partition(country='india') values 
('delhi')")
+.run("insert into table t3 values (11)")
+.run("insert into table t4 partition(country='india') values 
('lucknow')")
+.run("create table t5 (place string) partitioned by (country string)")
+.run("insert into table t5 partition(country='china') values 
('beejing')");
+
+// Prepare for reverse replication.
+DistributedFileSystem replicaFs = replica.miniDFSCluster.getFileSystem();
+Path newReplDir = new Path(replica.repldDir + "1");
+replicaFs.mkdirs(newReplDir);

Review comment:
   Nopes, it doesn't. It just need to make sure after the operation 
directory should be there, if already exist, it just returns

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOptimisedBootstrap.java
##
@@ -734,4 +743,167 @@ public NotificationEventResponse apply(@Nullable 
NotificationEventResponse event
   InjectableBehaviourObjectStore.resetGetNextNotificationBehaviour();  // 
reset the behaviour
 }
   }
+
+
+  @Test
+  public void testReverseBootstrap() throws Throwable {
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+
+// Do a bootstrap cycle.
+primary.dump(primaryDbName, withClause);
+

[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=720675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720675
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 04/Feb/22 06:46
Start Date: 04/Feb/22 06:46
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r799203501



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOptimisedBootstrap.java
##
@@ -734,4 +743,167 @@ public NotificationEventResponse apply(@Nullable 
NotificationEventResponse event
   InjectableBehaviourObjectStore.resetGetNextNotificationBehaviour();  // 
reset the behaviour
 }
   }
+
+
+  @Test
+  public void testReverseBootstrap() throws Throwable {
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+
+// Do a bootstrap cycle.
+primary.dump(primaryDbName, withClause);
+replica.load(replicatedDbName, primaryDbName, withClause);
+
+// Create 4 managed tables and do a dump & load.
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2),(3),(4)")
+.run("create table t2 (place string) partitioned by (country string)")
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("insert into table t2 partition(country='us') values ('new 
york')")
+.run("create table t3 (id int)")
+.run("insert into table t3 values (10)")
+.run("insert into table t3 values (20),(31),(42)")
+.run("create table t4 (place string) partitioned by (country string)")
+.run("insert into table t4 partition(country='india') values 
('bangalore')")
+.run("insert into table t4 partition(country='us') values ('austin')")
+.dump(primaryDbName, withClause);
+
+// Do the load and check all the external & managed tables are present.
+replica.load(replicatedDbName, primaryDbName, withClause)
+.run("repl status " + replicatedDbName)
+.verifyResult(tuple.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyResult("t1")
+.run("show tables like 't2'")
+.verifyResult("t2")
+.run("show tables like 't3'")
+.verifyResult("t3")
+.run("show tables like 't4'")
+.verifyResult("t4")
+.verifyReplTargetProperty(replicatedDbName);
+
+
+// Do some modifications on original source cluster. The diff 
becomes(tnew_managed, t1, t2, t3)
+primary.run("use " + primaryDbName)
+.run("create table tnew_managed (id int)")
+.run("insert into table t1 values (25)")
+.run("insert into table tnew_managed values (110)")
+.run("insert into table t2 partition(country='france') values 
('lyon')")
+.run("drop table t3");
+
+// Do some modifications on the target cluster. (t1, t2, t3: bootstrap & 
t4, t5: incremental)
+replica.run("use " + replicatedDbName)
+.run("insert into table t1 values (101)")
+.run("insert into table t1 values (210),(321)")
+.run("insert into table t2 partition(country='india') values 
('delhi')")
+.run("insert into table t3 values (11)")
+.run("insert into table t4 partition(country='india') values 
('lucknow')")
+.run("create table t5 (place string) partitioned by (country string)")
+.run("insert into table t5 partition(country='china') values 
('beejing')");
+
+// Prepare for reverse replication.
+DistributedFileSystem replicaFs = replica.miniDFSCluster.getFileSystem();
+Path newReplDir = new Path(replica.repldDir + "1");
+replicaFs.mkdirs(newReplDir);
+withClause = ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
newReplDir + "'");

Review comment:
   Nopes, that isn't. That is there in our doc as well in the initial 
assumptions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 720675)
Time Spent: 0.5h  (was: 20m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
>

[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=720674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720674
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 04/Feb/22 06:46
Start Date: 04/Feb/22 06:46
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r799203265



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOptimisedBootstrap.java
##
@@ -734,4 +743,167 @@ public NotificationEventResponse apply(@Nullable 
NotificationEventResponse event
   InjectableBehaviourObjectStore.resetGetNextNotificationBehaviour();  // 
reset the behaviour
 }
   }
+
+
+  @Test
+  public void testReverseBootstrap() throws Throwable {
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+
+// Do a bootstrap cycle.
+primary.dump(primaryDbName, withClause);
+replica.load(replicatedDbName, primaryDbName, withClause);
+
+// Create 4 managed tables and do a dump & load.
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2),(3),(4)")
+.run("create table t2 (place string) partitioned by (country string)")
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("insert into table t2 partition(country='us') values ('new 
york')")
+.run("create table t3 (id int)")
+.run("insert into table t3 values (10)")
+.run("insert into table t3 values (20),(31),(42)")
+.run("create table t4 (place string) partitioned by (country string)")
+.run("insert into table t4 partition(country='india') values 
('bangalore')")
+.run("insert into table t4 partition(country='us') values ('austin')")
+.dump(primaryDbName, withClause);
+
+// Do the load and check all the external & managed tables are present.
+replica.load(replicatedDbName, primaryDbName, withClause)
+.run("repl status " + replicatedDbName)
+.verifyResult(tuple.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyResult("t1")
+.run("show tables like 't2'")
+.verifyResult("t2")
+.run("show tables like 't3'")
+.verifyResult("t3")
+.run("show tables like 't4'")
+.verifyResult("t4")
+.verifyReplTargetProperty(replicatedDbName);
+
+
+// Do some modifications on original source cluster. The diff 
becomes(tnew_managed, t1, t2, t3)
+primary.run("use " + primaryDbName)
+.run("create table tnew_managed (id int)")
+.run("insert into table t1 values (25)")
+.run("insert into table tnew_managed values (110)")
+.run("insert into table t2 partition(country='france') values 
('lyon')")
+.run("drop table t3");
+
+// Do some modifications on the target cluster. (t1, t2, t3: bootstrap & 
t4, t5: incremental)

Review comment:
   Nopes, if the target stays read-only how after a DR situation target 
will be switched as the prod cluster. And if there are no modifications on 
target cluster, we need not to do B->A itself, since B didn't got modified. :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 720674)
Time Spent: 20m  (was: 10m)

> Bootstrap tables in table_diff during Incremental Load
> --
>
> Key: HIVE-25895
> URL: https://issues.apache.org/jira/browse/HIVE-25895
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Consume the table_diff_ack file and do a bootstrap dump & load for those 
> tables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25895) Bootstrap tables in table_diff during Incremental Load

2022-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25895?focusedWorklogId=719898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-719898
 ]

ASF GitHub Bot logged work on HIVE-25895:
-

Author: ASF GitHub Bot
Created on: 03/Feb/22 05:40
Start Date: 03/Feb/22 05:40
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2980:
URL: https://github.com/apache/hive/pull/2980#discussion_r798225837



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/OptimisedBootstrapUtils.java
##
@@ -70,6 +71,8 @@
   /** event ack file which contains the event id till which the cluster was 
last loaded. */
   public static final String EVENT_ACK_FILE = "event_ack";
 
+  public static final String BOOTSTRAP_TABLES_LIST = "bootstrap_table_list";

Review comment:
   What would happen in rollback case , like we initiated the failover but 
aborted the process in between. Theoretically, upto what point will we allow 
that to happen? 

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationOptimisedBootstrap.java
##
@@ -734,4 +743,167 @@ public NotificationEventResponse apply(@Nullable 
NotificationEventResponse event
   InjectableBehaviourObjectStore.resetGetNextNotificationBehaviour();  // 
reset the behaviour
 }
   }
+
+
+  @Test
+  public void testReverseBootstrap() throws Throwable {
+List withClause = 
ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
primary.repldDir + "'");
+
+// Do a bootstrap cycle.
+primary.dump(primaryDbName, withClause);
+replica.load(replicatedDbName, primaryDbName, withClause);
+
+// Create 4 managed tables and do a dump & load.
+WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName)
+.run("create table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2),(3),(4)")
+.run("create table t2 (place string) partitioned by (country string)")
+.run("insert into table t2 partition(country='india') values 
('chennai')")
+.run("insert into table t2 partition(country='us') values ('new 
york')")
+.run("create table t3 (id int)")
+.run("insert into table t3 values (10)")
+.run("insert into table t3 values (20),(31),(42)")
+.run("create table t4 (place string) partitioned by (country string)")
+.run("insert into table t4 partition(country='india') values 
('bangalore')")
+.run("insert into table t4 partition(country='us') values ('austin')")
+.dump(primaryDbName, withClause);
+
+// Do the load and check all the external & managed tables are present.
+replica.load(replicatedDbName, primaryDbName, withClause)
+.run("repl status " + replicatedDbName)
+.verifyResult(tuple.lastReplicationId)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyResult("t1")
+.run("show tables like 't2'")
+.verifyResult("t2")
+.run("show tables like 't3'")
+.verifyResult("t3")
+.run("show tables like 't4'")
+.verifyResult("t4")
+.verifyReplTargetProperty(replicatedDbName);
+
+
+// Do some modifications on original source cluster. The diff 
becomes(tnew_managed, t1, t2, t3)
+primary.run("use " + primaryDbName)
+.run("create table tnew_managed (id int)")
+.run("insert into table t1 values (25)")
+.run("insert into table tnew_managed values (110)")
+.run("insert into table t2 partition(country='france') values 
('lyon')")
+.run("drop table t3");
+
+// Do some modifications on the target cluster. (t1, t2, t3: bootstrap & 
t4, t5: incremental)
+replica.run("use " + replicatedDbName)
+.run("insert into table t1 values (101)")
+.run("insert into table t1 values (210),(321)")
+.run("insert into table t2 partition(country='india') values 
('delhi')")
+.run("insert into table t3 values (11)")
+.run("insert into table t4 partition(country='india') values 
('lucknow')")
+.run("create table t5 (place string) partitioned by (country string)")
+.run("insert into table t5 partition(country='china') values 
('beejing')");
+
+// Prepare for reverse replication.
+DistributedFileSystem replicaFs = replica.miniDFSCluster.getFileSystem();
+Path newReplDir = new Path(replica.repldDir + "1");
+replicaFs.mkdirs(newReplDir);
+withClause = ReplicationTestUtils.includeExternalTableClause(true);
+withClause.add("'" + HiveConf.ConfVars.REPLDIR.varname + "'='" + 
newReplDir + "'");
+
+// Do a reverse dump
+tuple = replica.dump(replicatedDbName, withClause);
+
+// Check the event ack file got created.
+