availability of a job info in HS should be atomic
-------------------------------------------------
Key: MAPREDUCE-4109
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster, jobhistoryserver, mrv2
Affects Versions: 2.0.0
Reporter: Alejandro Abdelnur
Priority: Blocker
Fix For: 2.0.0
It seems that the HS starts serving info about a job before it has all the info
available.
In the trace below, a RunningJob throws a NPE when trying to access the
counters.
This is happening on & off, thus I assume it is related to either the AM not
flushing all job info to HDFS before notifying HS or the HS not loading all the
job info from HDFS before start serving it.
In case it helps to diagnose the issue, this is happening in a secure cluster.
This makes Oozie to mark jobs as failed.
{code}
java.lang.NullPointerException
at
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214)
at
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149)
at
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206)
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654)
at LocalTrace:
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163)
at $Proxy31.getCounters(Unknown Source)
at
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
at
org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325)
at
org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472)
at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714)
at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711)
at
org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
at
org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886)
at
org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162)
at
org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51)
at org.apache.oozie.command.XCommand.call(XCommand.java:260)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira