[jira] [Assigned] (HIVE-17941) Don't Re-Create RunningJob Client During Status Checks

Janaki Lahorani (JIRA) Thu, 27 Sep 2018 09:33:12 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-17941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Janaki Lahorani reassigned HIVE-17941:
--------------------------------------

    Assignee: Peter Vary  (was: Janaki Lahorani)

> Don't Re-Create RunningJob Client During Status Checks
> ------------------------------------------------------
>
>                 Key: HIVE-17941
>                 URL: https://issues.apache.org/jira/browse/HIVE-17941
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 3.0.0, 2.3.1
>            Reporter: BELUGA BEHR
>            Assignee: Peter Vary
>            Priority: Major
>
> {code:java|title=org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper}
> while (!rj.isComplete()) {
>   ...
>         RunningJob newRj = jc.getJob(rj.getID());
>         if (newRj == null) {
>           // under exceptional load, hadoop may not be able to look up status
>           // of finished jobs (because it has purged them from memory). From
>           // hive's perspective - it's equivalent to the job having failed.
>           // So raise a meaningful exception
>           throw new IOException("Could not find status of job:" + rj.getID());
>         } else {
>           th.setRunningJob(newRj);
>           rj = newRj;
>         }
>       }
>   ...
> }
> {code}
> https://github.com/apache/hive/blob/a9f25c0e7ad3f81a9f00f601947a161516e33f1b/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L295-L306
> Every time we loop here for a status update, we are rebuilding the RunningJob 
> object to test if the Job information is still loaded in YARN.  Rebuilding 
> this RunningJob object is not trivial because it requires that we re-load and 
> parse the Job Configuration XML file every time.
> {code:java|title=Outdated Stacktrace But Same Idea Holds}
> at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:120)
>         at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924)
>         at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877)
>         at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785)
>         at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
>         at 
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951)
>         at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398)
>         at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388)
>         at 
> org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:174)
>         at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:655)
>         at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:668)
>         at 
> org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:282)
>         at 
> org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532)
> {code}
> Maybe we can be use {{isRetired()}} instead for this particular check.  We 
> also probably need to be better about checking the return value from any of 
> the {{RunningJob}} methods if it's the case that they can fail/go-away at any 
> time if YARN purges the information.  It seems that perhaps this was an 
> attempt to detect a purged job before exercising the {{RunningJob}} object... 
> even though it can go bad at any point.
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapred/RunningJob.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-17941) Don't Re-Create RunningJob Client During Status Checks

Reply via email to