[
https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637009#action_12637009
]
Devaraj Das commented on HADOOP-4296:
-------------------------------------
Dhruba, the dfs scan happens when the job is submitted (copying of jar files,
etc.). Look at the public JobInProgress constructor and this gets called from
the submitJob RPC. But I haven't thought about how this can be handled better...
Vinod's suggestion looks fine to me especially if you don't care about the
final status of the completed job. In addition, if you really care about
getting the status of completed jobs, you have the JobStatus store that I was
talking about earlier, or, increase the number of jobs kept in memory and,
maybe, tweak the retirejob interval. The former can be used to get the status
of jobs that were completed a while back while the latter would be for clients
that are polling very actively. Your approach would essentially be doing the
latter, right?
> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
> Key: HADOOP-4296
> URL: https://issues.apache.org/jira/browse/HADOOP-4296
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.1
> Reporter: Joydeep Sen Sarma
> Assignee: dhruba borthakur
> Priority: Critical
> Attachments: 4296_jt_delayretire.patch
>
>
> At very busy times - we get a wave of job client failures all at the same
> time. the failures come when the job is about to complete. when we look at
> the job history files - the jobs are actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient: map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient: map 100% reduce 99%
> java.lang.NullPointerException
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
> at
> com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.