Harsh J created HIVE-3653:
-----------------------------

             Summary: Failure in a counter poller run should not be considered 
as a job failure
                 Key: HIVE-3653
                 URL: https://issues.apache.org/jira/browse/HIVE-3653
             Project: Hive
          Issue Type: Bug
          Components: Clients
    Affects Versions: 0.7.1
            Reporter: Harsh J


A client had a simple transient failure in polling the JT for job status (which 
it does for HIVECOUNTERSPULLINTERVAL for each currently running job).

{code}
java.io.IOException: Call to HOST/IP:PORT failed on local exception: 
java.io.IOException: Connection reset by peer 
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) 
at org.apache.hadoop.ipc.Client.call(Client.java:1110) 
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) 
at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) 
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053) 
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065) 
at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351) 
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:686) 
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) 
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) 
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) 
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) 
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) 
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:310) 
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:317) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:490) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:197) 
{code}

This lead to Hive thinking the running job itself has failed, and it failed the 
query run, although the running job progressed to completion in the background.

We should not let transient IOExceptions in counter polling cause query 
termination, and should instead just retry.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to