One of my colleagues has noticed this problem for a while, and now it's
biting me.  Jobs seem to be failing before every really starting.  It seems
to be limited (so far) to running in pseudo-distributed mode, since that's
where he saw the problem and where I'm now seeing it; it hasn't come up on
our cluster (yet).

So here's what happens:

$ java -classpath $MY_CLASSPATH MyLauncherClass -conf my-config.xml -D
extra.properties=extravalues
...
launcher output
...
11/08/26 10:35:54 INFO input.FileInputFormat: Total input paths to process
: 2
11/08/26 10:35:54 INFO mapred.JobClient: Running job:
job_201108261034_0001
11/08/26 10:35:55 INFO mapred.JobClient:  map 0% reduce 0%

and it just sits there.  If I look at the jobtracker's web view the number
of submissions increments, but nothing shows up as a running, completed,
failed, or retired job.  If I use the command line probe I find

$ hadoop job -list
1 jobs currently running
JobId   State   StartTime       UserName        Priority        SchedulingInfo
job_201108261034_0001   4       1314369354247   hdfs    NORMAL  NA

If I try to kill this job, nothing happens; it remains in the list with
state 4 (failed?).  I've tried telling the mapper JVM to suspend so I can
find it in netstat and attach a debugger from IDEA, but it seems that the
job never gets to the point of even spinning up a JVM to run the mapper.

Any ideas what might be going wrong?  Thanks.

Reply via email to