zhengchenyu created MAPREDUCE-6950:
--------------------------------------
Summary: Error Launching job : java.io.IOException: Unknown Job
job_xxx_xxx
Key: MAPREDUCE-6950
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6950
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: mr-am
Affects Versions: 2.7.1
Reporter: zhengchenyu
Fix For: 2.7.5
some job report error, like this:
{code}
hadoop.mapreduce.Job.monitorAndPrintJob(Job.java 1367) [main] : map 100%
reduce 100%
[2017-08-31T20:27:12.591+08:00] [INFO]
hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277)
[main] : Application state is completed. FinalApplicationStatus=SUCCEEDED.
Redirecting to job history server
[2017-08-31T20:27:12.821+08:00] [INFO]
hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277)
[main] : Application state is completed. FinalApplicationStatus=SUCCEEDED.
Redirecting to job history server
[2017-08-31T20:27:13.039+08:00] [INFO]
hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java 277)
[main] : Application state is completed. FinalApplicationStatus=SUCCEEDED.
Redirecting to job history server
[2017-08-31T20:27:13.256+08:00] [ERROR]
hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java 1034) [main] :
Error Launching job : java.io.IOException: Unknown Job job_xxx_xxx
{code}
I found the am container log, like below. Here we know error happened in
pipeline, maybe some dn error. And I also found some other reason which close
the JobHistoryEventHandler. So MR AM can't write the information for JH. So
client counldn't know whether the appplication is finished.
{code}
2017-08-31 20:27:10,813 INFO [Thread-1968]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing
event MAP_ATTEMPT_STARTED
2017-08-31 20:27:10,814 ERROR [Thread-1968]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing
History Event:
org.apache.hadoop.mapreduce.jobhistory.TaskAttemptStartedEvent@2055ea0a
java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2292)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1317)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
2017-08-31 20:27:10,814 INFO [Thread-1968]
org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler
failed in state STOPPED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException:
Premature EOF: no length prefix available
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.EOFException:
Premature EOF: no length prefix available
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:580)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:374)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
{code}
This problem is serious , especially for hive. Job must rerun meaninglessly!
So I think we need to retry the operation of writing history event.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]