Harsh J created HIVE-3653: ----------------------------- Summary: Failure in a counter poller run should not be considered as a job failure Key: HIVE-3653 URL: https://issues.apache.org/jira/browse/HIVE-3653 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.7.1 Reporter: Harsh J
A client had a simple transient failure in polling the JT for job status (which it does for HIVECOUNTERSPULLINTERVAL for each currently running job). {code} java.io.IOException: Call to HOST/IP:PORT failed on local exception: java.io.IOException: Connection reset by peer at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) at org.apache.hadoop.ipc.Client.call(Client.java:1110) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065) at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:686) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:317) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {code} This lead to Hive thinking the running job itself has failed, and it failed the query run, although the running job progressed to completion in the background. We should not let transient IOExceptions in counter polling cause query termination, and should instead just retry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira