[ https://issues.apache.org/jira/browse/HIVE-23040?focusedWorklogId=425396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-425396 ]
ASF GitHub Bot logged work on HIVE-23040: ----------------------------------------- Author: ASF GitHub Bot Created on: 20/Apr/20 17:40 Start Date: 20/Apr/20 17:40 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #977: URL: https://github.com/apache/hive/pull/977#discussion_r411567286 ########## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ########## @@ -663,6 +669,322 @@ public void testMultiDBTxn() throws Throwable { } } + @Test + public void testIncrementalDumpCheckpointing() throws Throwable { + WarehouseInstance.Tuple bootstrapDump = primary.run("use " + primaryDbName) + .run("CREATE TABLE t1(a string) STORED AS TEXTFILE") + .run("CREATE TABLE t2(a string) STORED AS TEXTFILE") + .dump(primaryDbName); + + replica.load(replicatedDbName, primaryDbName) + .run("select * from " + replicatedDbName + ".t1") + .verifyResults(new String[] {}) + .run("select * from " + replicatedDbName + ".t2") + .verifyResults(new String[] {}); + + + //Case 1: When the last dump finished all the events and + //only _finished_dump file at the hiveDumpRoot was about to be written when it failed. + ReplDumpWork.testDeletePreviousDumpMetaPath(true); + + WarehouseInstance.Tuple incrementalDump1 = primary.run("use " + primaryDbName) + .run("insert into t1 values (1)") + .run("insert into t2 values (2)") + .dump(primaryDbName); + + Path hiveDumpDir = new Path(incrementalDump1.dumpLocation, ReplUtils.REPL_HIVE_BASE_DIR); + Path ackFile = new Path(hiveDumpDir, ReplAck.DUMP_ACKNOWLEDGEMENT.toString()); + Path ackLastEventID = new Path(hiveDumpDir, ReplAck.EVENTS_DUMP.toString()); + FileSystem fs = FileSystem.get(hiveDumpDir.toUri(), primary.hiveConf); + assertTrue(fs.exists(ackFile)); + assertTrue(fs.exists(ackLastEventID)); + + fs.delete(ackFile, false); + + Map<String, Long> eventModTimeMap = new HashMap<>(); + long firstIncEventID = Long.parseLong(bootstrapDump.lastReplicationId) + 1; + long lastIncEventID = Long.parseLong(incrementalDump1.lastReplicationId); + assertTrue(lastIncEventID > (firstIncEventID + 1)); + + for (long eventId=firstIncEventID; eventId<=lastIncEventID; eventId++) { + Path eventRoot = new Path(hiveDumpDir, String.valueOf(eventId)); + if (fs.exists(eventRoot)) { + eventModTimeMap.put(String.valueOf(eventId), fs.getFileStatus(eventRoot).getModificationTime()); Review comment: Please cross check this. Also add tests where 5 events are dumped for instance. Delete 2 events and keep 3. Only last 2 events should be rewritten ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 425396) Time Spent: 1h (was: 50m) > Checkpointing for repl dump incremental phase > --------------------------------------------- > > Key: HIVE-23040 > URL: https://issues.apache.org/jira/browse/HIVE-23040 > Project: Hive > Issue Type: Improvement > Reporter: Aasha Medhi > Assignee: PRAVIN KUMAR SINHA > Priority: Major > Labels: pull-request-available > Attachments: HIVE-23040.01.patch, HIVE-23040.02.patch > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)