I have a pig job which supposes to save some result in the hdfs. Most time it
runs just fine. But it failed once and no result was generated. There was no
dead node. And namenode, tasktracker and jobtracker were all up and running.
The following is the pig job log. It did show some exception but that is not
logged as an error. And no obvious clues can be found in logs of nn,
tasktracker and jobtracker when the failure happened.
So, how to tell what goes wrong when things like this happens? Any pointer and
thoughts? Thanks,
2010-06-22 01:45:06,029 [main] INFO org.apache.pig.Main - Logging error
messages to: /tmp/logs/pig_1277189106028.log
2010-06-22 01:45:06,516 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: hdfs://hadoopmaster:50001
2010-06-22 01:45:07,211 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
map-reduce job tracker at: hadoopmaster:50002
2010-06-22 01:45:07,910 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 3
2010-06-22 01:45:07,911 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 2 map-only splittees.
2010-06-22 01:45:07,911 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 2 out of total 2 splittees.
2010-06-22 01:45:07,911 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-06-22 01:45:09,163 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2010-06-22 01:45:09,171 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting identity combiner class.
2010-06-22 01:45:09,270 [Thread-10] WARN org.apache.hadoop.mapred.JobClient -
Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2010-06-22 01:45:09,767 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-06-22 01:46:13,385 [Thread-15] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.133.13:50010
2010-06-22 01:46:13,399 [Thread-15] INFO org.apache.hadoop.hdfs.DFSClient -
Abandoning block blk_-3238247195956896658_451260
2010-06-22 01:46:13,402 [Thread-15] INFO org.apache.hadoop.hdfs.DFSClient -
Waiting to find target node: 192.168.133.12:50010
2010-06-22 01:47:19,410 [Thread-15] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.133.12:50010
2010-06-22 01:47:19,410 [Thread-15] INFO org.apache.hadoop.hdfs.DFSClient -
Abandoning block blk_-8967610649184651127_451260
2010-06-22 01:47:19,412 [Thread-15] INFO org.apache.hadoop.hdfs.DFSClient -
Waiting to find target node: 192.168.133.11:50010
2010-06-22 01:48:26,267 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.133.3:50010
2010-06-22 01:48:26,267 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Abandoning block blk_9202045569864855438_451264
2010-06-22 01:48:26,283 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Waiting to find target node: 192.168.133.12:50010
2010-06-22 01:49:32,317 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.133.3:50010
2010-06-22 01:49:32,317 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Abandoning block blk_62775741158062430_451266
2010-06-22 01:49:32,320 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Waiting to find target node: 192.168.133.5:50010
2010-06-22 01:50:38,326 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.133.11:50010
2010-06-22 01:50:38,326 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Abandoning block blk_4600730976353859294_451267
2010-06-22 01:50:38,328 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Waiting to find target node: 192.168.133.10:50010
2010-06-22 01:51:44,338 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.133.11:50010
2010-06-22 01:51:44,385 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Abandoning block blk_274719573196033811_451267
2010-06-22 01:51:44,387 [Thread-21] INFO org.apache.hadoop.hdfs.DFSClient -
Waiting to find target node: 192.168.133.12:50010
2010-06-22 01:51:50,389 [Thread-21] WARN org.apache.hadoop.hdfs.DFSClient -
DataStreamer Exception: java.io.IOException: Unable to create new block.
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
2010-06-22 01:51:50,390 [Thread-21] WARN org.apache.hadoop.hdfs.DFSClient -
Error Recovery for block blk_274719573196033811_451267 bad datanode[1] nodes ==
null
2010-06-22 01:51:50,390 [Thread-21] WARN org.apache.hadoop.hdfs.DFSClient -
Could not get block locations. Source file
"/hadoop/hadoop-hadoop/mapred/system/job_201006211221_0037/job.xml" -
Aborting...
ERROR: 'Bad connect ack with firstBadLink 192.168.133.11:50010'
2010-06-22 01:51:50,620 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-06-22 01:51:50,621 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
--Michael