Teddy Choi created HIVE-26599: --------------------------------- Summary: Fix NPE encountered in second dump cycle of optimised bootstrap Key: HIVE-26599 URL: https://issues.apache.org/jira/browse/HIVE-26599 Project: Hive Issue Type: Bug Reporter: Teddy Choi Assignee: Teddy Choi
After creating reverse replication policy after failover is completed from Primary to DR cluster and DR takes over. First dump and load cycle of optimised bootstrap is completing successfully, But We are encountering Null pointer exception in the second dump cycle which is halting this reverse replication and major blocker to test complete cycle of replication. {code:java} Scheduled Query Executor(schedule:repl_reverse, execution_id:14)]: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192) at org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpTable(ReplDumpTask.java:1458) at org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:961) at org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:290) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232){code} After doing RCA, we figured out that In second dump cycle on DR cluster when StageStart method is invoked by code, metrics corresponding to Tables is not being registered (which should be registered as we are doing selective bootstrap of tables for optimise bootstrap along with incremental dump) which is causing NPE when it is trying to update the progress corresponding to this metric latter on after bootstrap of table is completed. Fix is to register the Tables metric before updating the progress. -- This message was sent by Atlassian Jira (v8.20.10#820010)