[ 
https://issues.apache.org/jira/browse/PIG-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424807#comment-15424807
 ] 

Xiang Li commented on PIG-4967:
-------------------------------

Hi Daniel, thanks for the explanation!

Regarding
bq. I am not sure what's the nature of status=null
Something I found so far:
In the class of Job of Hadoop, JobStatus status is updated by the function 
called updateStatus(), starting from line 318
{code}
synchronized void updateStatus() throws IOException {
    try {
      this.status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
        @Override
        public JobStatus run() throws IOException, InterruptedException {
          return cluster.getClient().getJobStatus(status.getJobID());
        }
      });
    }
    catch (InterruptedException ie) {
      throw new IOException(ie);
    }
    if (this.status == null) {
      throw new IOException("Job status not available ");
    }
    this.statustime = System.currentTimeMillis();
  }
{code}

I think it is not safe, because this.status will be set no matter what is 
returned by ugi.doAs(). Even if it returns null (maybe due to some network 
problems), this.status will be set to null directly. Another thread calling 
getJobName() has status=null.
The code followed will check if this.status is null and throw IOException. But 
it is weird that I did not this IOException in the hadoop log.

We also found the following message in app-master log
bq.INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.net.SocketTimeoutException: 70000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=xxxx 
remote=xxxx
but so far I could not tell if status=null has something to do with that 
timeout.

Anyway, I will upload the patch soon. Thanks!

> NPE in PigJobControl.run() when job status is null
> --------------------------------------------------
>
>                 Key: PIG-4967
>                 URL: https://issues.apache.org/jira/browse/PIG-4967
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Critical
>
> {code}
> [JobControl] ERROR org.apache.pig.backend.hadoop23.PigJobControl  - Error 
> while trying to run jobs.
> java.lang.NullPointerException
>       at org.apache.hadoop.mapreduce.Job.getJobName(Job.java:426)
>       at 
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.toString(ControlledJob.java:93)
>       at java.lang.String.valueOf(String.java:2982)
>       at java.lang.StringBuilder.append(StringBuilder.java:131)
>       at 
> org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:182)
>       at java.lang.Thread.run(Thread.java:745)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to