Teddy Choi created HIVE-26599:
---------------------------------
Summary: Fix NPE encountered in second dump cycle of optimised
bootstrap
Key: HIVE-26599
URL: https://issues.apache.org/jira/browse/HIVE-26599
Project: Hive
Issue Type: Bug
Reporter: Teddy Choi
Assignee: Teddy Choi
After creating reverse replication policy after failover is completed from
Primary to DR cluster and DR takes over. First dump and load cycle of optimised
bootstrap is completing successfully, But We are encountering Null pointer
exception in the second dump cycle which is halting this reverse replication
and major blocker to test complete cycle of replication.
{code:java}
Scheduled Query Executor(schedule:repl_reverse, execution_id:14)]: FAILED:
Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.parse.repl.metric.ReplicationMetricCollector.reportStageProgress(ReplicationMetricCollector.java:192)
at
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpTable(ReplDumpTask.java:1458)
at
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:961)
at
org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:290)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232){code}
After doing RCA, we figured out that In second dump cycle on DR cluster when
StageStart method is invoked by code, metrics corresponding to Tables is not
being registered (which should be registered as we are doing selective
bootstrap of tables for optimise bootstrap along with incremental dump) which
is causing NPE when it is trying to update the progress corresponding to this
metric latter on after bootstrap of table is completed.
Fix is to register the Tables metric before updating the progress.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)