Harsh J created HIVE-3653:
-----------------------------
Summary: Failure in a counter poller run should not be considered
as a job failure
Key: HIVE-3653
URL: https://issues.apache.org/jira/browse/HIVE-3653
Project: Hive
Issue Type: Bug
Components: Clients
Affects Versions: 0.7.1
Reporter: Harsh J
A client had a simple transient failure in polling the JT for job status (which
it does for HIVECOUNTERSPULLINTERVAL for each currently running job).
{code}
java.io.IOException: Call to HOST/IP:PORT failed on local exception:
java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
at org.apache.hadoop.ipc.Client.call(Client.java:1110)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065)
at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:686)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:310)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:317)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:490)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
{code}
This lead to Hive thinking the running job itself has failed, and it failed the
query run, although the running job progressed to completion in the background.
We should not let transient IOExceptions in counter polling cause query
termination, and should instead just retry.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira