Pretty sure this was just tied up with our job history server issues. We've fixed those and Crunch seems to be happily crunching again now :-)
On 3 November 2015 at 12:01, David Whiting <[email protected]> wrote: > Different problem if I try that :-( > > 15/11/03 10:54:06 INFO mapred.ClientServiceDelegate: Application state is > completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history > server > 1 job failure(s) occurred: > 15/11/03 10:54:16 ERROR exec.MRExecutor: Pipeline failed due to exception > java.lang.NullPointerException > at org.apache.hadoop.mapreduce.Job.getJobName(Job.java:442) > at > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.getJobName(CrunchControlledJob.java:131) > at > org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:140) > at > org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:58) > at > org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:90) > at java.lang.Thread.run(Thread.java:745) > > > > On 30 October 2015 at 16:49, Josh Wills <[email protected]> wrote: > >> David! Welcome back! >> >> I haven't hit that one before; if you tweak handleMultiPaths to look like >> the below, does it fix the issue? >> >> J >> >> private synchronized void handleMultiPaths(MRJob job) throws IOException { >> try { >> if (job.getJobState() == MRJob.State.SUCCESS) { >> if (!multiPaths.isEmpty()) { >> for (Map.Entry<Integer, PathTarget> entry : >> multiPaths.entrySet()) { >> entry.getValue().handleOutputs(job.getJob().getConfiguration(), >> workingPath, entry.getKey()); >> } >> } >> } } catch(Exception ie) { >> throw new IOException(ie); >> } >> >> } >> >> >> On Fri, Oct 30, 2015 at 8:21 AM, David Whiting <[email protected]> >> wrote: >> >> > Hi everybody! I'm back and pushing Crunch in a new organisation >> > >> > I'm having some strange non-deterministic problems with the end of my >> > Crunch job executions in a new environment - I've got some possible >> ideas >> > as to why it's happening, but no good ideas for workarounds so I was >> hoping >> > somebody might be able to help me out. Basically, this is what it looks >> > like: >> > >> > 15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Running job >> > "crunching.CountEventsByType: SeqFile([{REDACTED}... ID=1 (1/1)" >> > 15/10/30 15:01:55 INFO jobcontrol.CrunchControlledJob: Job status >> available >> > at: {REDACTED}/proxy/application_1443106319465_13029/ >> > 15/10/30 15:05:02 INFO ipc.Client: Retrying connect to server: >> {REDACTED}. >> > Already tried 0 time(s); retry policy is >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 >> > MILLISECONDS) >> > 15/10/30 15:05:03 INFO ipc.Client: Retrying connect to server: >> {REDACTED}. >> > Already tried 1 time(s); retry policy is >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 >> > MILLISECONDS) >> > 15/10/30 15:05:04 INFO ipc.Client: Retrying connect to server: >> {REDACTED}. >> > Already tried 2 time(s); retry policy is >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 >> > MILLISECONDS) >> > 15/10/30 15:05:04 INFO mapred.ClientServiceDelegate: Application state >> is >> > completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history >> > server >> > 15/10/30 15:05:04 ERROR exec.MRExecutor: Pipeline failed due to >> exception >> > java.io.IOException: java.lang.NullPointerException >> > at >> > >> > >> org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:99) >> > at >> > >> > >> org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.run(CrunchJobHooks.java:86) >> > at >> > >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkRunningState(CrunchControlledJob.java:288) >> > at >> > >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkState(CrunchControlledJob.java:299) >> > at >> > >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.checkRunningJobs(CrunchJobControl.java:201) >> > at >> > >> > >> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:321) >> > at >> > >> org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:131) >> > at >> > org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:58) >> > at >> > org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:90) >> > at java.lang.Thread.run(Thread.java:745) >> > Caused by: java.lang.NullPointerException >> > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:325) >> > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:422) >> > at >> > >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >> > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322) >> > at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:632) >> > at >> > >> > >> org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:91) >> > ... 9 more >> > >> > The corresponding line in the Hadoop source is this: >> > >> > return cluster.getClient().getJobStatus(status.getJobID()); >> > >> > The only NPE-generating part of this is that getClient() could return >> null, >> > but I'm not exactly sure what could cause that. We have some >> intermittent >> > problems with our job history server (returning "not found" for whatever >> > job it looks up) which could well be correlated to this, but I would >> expect >> > that to fail at the getJobStatus part rather than the getClient part. >> This >> > would, however, agree with the fact the job reports itself as SUCCEEDED >> > before it fails during the handleMultiPaths section (as perhaps the >> request >> > to check status there will get routed to the job history server). >> > >> > This happens with any Crunch jobs I try to run on this cluster, but >> there >> > are plenty of "plain old MapReduce" running on this cluster with no >> issues, >> > so I'm struggling to find reasons why Crunch would fail where the others >> > are succeeding. >> > >> > Thanks, >> > David >> > >> > >
