[ 
https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638240#action_12638240
 ] 

Joydeep Sen Sarma commented on HADOOP-4296:
-------------------------------------------

here's a related/new observation (maybe this deserves a separate jira):

- job is long complete and gone from tracker (hours). i checked the history 
file to confirm.
- jobclient is hung polling for completion

we are probably running with the 5 minute wait-before-purge on jobclient patch 
for this. i did a jstack on the jobclient:

"main" prio=10 tid=0x08059c00 nid=0x7374 waiting on condition 
[0xf7fdb000..0xf7fdc298]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:992)
        at 
com.facebook.hive.common.columnSetLoader.run(columnSetLoader.java:545)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

i am quite confused how this can happen. anyway - i will leave it running in 
this state so u can take a look.


> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4296
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Joydeep Sen Sarma
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: 4296_jt_delayretire.patch
>
>
> At very busy times - we get a wave of job client failures all at the same 
> time. the failures come when the job is about to complete. when we look at 
> the job history files - the jobs are actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient:  map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient:  map 100% reduce 99% 
> java.lang.NullPointerException
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
>       at 
> com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to