[ 
https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637009#action_12637009
 ] 

Devaraj Das commented on HADOOP-4296:
-------------------------------------

Dhruba, the dfs scan happens when the job is submitted (copying of jar files, 
etc.). Look at the public JobInProgress constructor and this gets called from 
the submitJob RPC. But I haven't thought about how this can be handled better...
Vinod's suggestion looks fine to me especially if you don't care about the 
final status of the completed job. In addition, if you really care about 
getting the status of completed jobs, you have the JobStatus store that I was 
talking about earlier, or, increase the number of jobs kept in memory and, 
maybe, tweak the retirejob interval. The former can be used to get the status 
of jobs that were completed a while back while the latter would be for clients 
that are polling very actively. Your approach would essentially be doing the 
latter, right?

> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4296
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Joydeep Sen Sarma
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: 4296_jt_delayretire.patch
>
>
> At very busy times - we get a wave of job client failures all at the same 
> time. the failures come when the job is about to complete. when we look at 
> the job history files - the jobs are actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient:  map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient:  map 100% reduce 99% 
> java.lang.NullPointerException
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
>       at 
> com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to