[ 
https://issues.apache.org/jira/browse/HIVE-23040?focusedWorklogId=425396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-425396
 ]

ASF GitHub Bot logged work on HIVE-23040:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Apr/20 17:40
            Start Date: 20/Apr/20 17:40
    Worklog Time Spent: 10m 
      Work Description: aasha commented on a change in pull request #977:
URL: https://github.com/apache/hive/pull/977#discussion_r411567286



##########
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##########
@@ -663,6 +669,322 @@ public void testMultiDBTxn() throws Throwable {
     }
   }
 
+  @Test
+  public void testIncrementalDumpCheckpointing() throws Throwable {
+    WarehouseInstance.Tuple bootstrapDump = primary.run("use " + primaryDbName)
+            .run("CREATE TABLE t1(a string) STORED AS TEXTFILE")
+            .run("CREATE TABLE t2(a string) STORED AS TEXTFILE")
+            .dump(primaryDbName);
+
+    replica.load(replicatedDbName, primaryDbName)
+            .run("select * from " + replicatedDbName + ".t1")
+            .verifyResults(new String[] {})
+            .run("select * from " + replicatedDbName + ".t2")
+            .verifyResults(new String[] {});
+
+
+    //Case 1: When the last dump finished all the events and
+    //only  _finished_dump file at the hiveDumpRoot was about to be written 
when it failed.
+    ReplDumpWork.testDeletePreviousDumpMetaPath(true);
+
+    WarehouseInstance.Tuple incrementalDump1 = primary.run("use " + 
primaryDbName)
+            .run("insert into t1 values (1)")
+            .run("insert into t2 values (2)")
+            .dump(primaryDbName);
+
+    Path hiveDumpDir = new Path(incrementalDump1.dumpLocation, 
ReplUtils.REPL_HIVE_BASE_DIR);
+    Path ackFile = new Path(hiveDumpDir, 
ReplAck.DUMP_ACKNOWLEDGEMENT.toString());
+    Path ackLastEventID = new Path(hiveDumpDir, 
ReplAck.EVENTS_DUMP.toString());
+    FileSystem fs = FileSystem.get(hiveDumpDir.toUri(), primary.hiveConf);
+    assertTrue(fs.exists(ackFile));
+    assertTrue(fs.exists(ackLastEventID));
+
+    fs.delete(ackFile, false);
+
+    Map<String, Long> eventModTimeMap = new HashMap<>();
+    long firstIncEventID = Long.parseLong(bootstrapDump.lastReplicationId) + 1;
+    long lastIncEventID = Long.parseLong(incrementalDump1.lastReplicationId);
+    assertTrue(lastIncEventID > (firstIncEventID + 1));
+
+    for (long eventId=firstIncEventID; eventId<=lastIncEventID; eventId++) {
+      Path eventRoot = new Path(hiveDumpDir, String.valueOf(eventId));
+      if (fs.exists(eventRoot)) {
+        eventModTimeMap.put(String.valueOf(eventId), 
fs.getFileStatus(eventRoot).getModificationTime());

Review comment:
       Please cross check this. Also add tests where 5 events are dumped for 
instance. Delete 2 events and keep 3. Only last 2 events should be rewritten




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 425396)
    Time Spent: 1h  (was: 50m)

> Checkpointing for repl dump incremental phase
> ---------------------------------------------
>
>                 Key: HIVE-23040
>                 URL: https://issues.apache.org/jira/browse/HIVE-23040
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Aasha Medhi
>            Assignee: PRAVIN KUMAR SINHA
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-23040.01.patch, HIVE-23040.02.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to