I have a pig job which supposes to save some result in the hdfs. Most time it 
runs just fine. But it failed once and no result was generated. There was no 
dead node. And namenode, tasktracker and jobtracker were all up and running. 
The following is the pig job log. It did show some exception but that is not 
logged as an error. And no obvious clues can be found in logs of nn, 
tasktracker and jobtracker when the failure happened.

So, how to tell what goes wrong when things like this happens? Any pointer and 
thoughts? Thanks,
 
2010-06-22 01:45:06,029 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /tmp/logs/pig_1277189106028.log
2010-06-22 01:45:06,516 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://hadoopmaster:50001
2010-06-22 01:45:07,211 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
map-reduce job tracker at: hadoopmaster:50002
2010-06-22 01:45:07,910 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 3
2010-06-22 01:45:07,911 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - Merged 2 map-only splittees.
2010-06-22 01:45:07,911 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - Merged 2 out of total 2 splittees.
2010-06-22 01:45:07,911 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 1
2010-06-22 01:45:09,163 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Setting up multi store job
2010-06-22 01:45:09,171 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Setting identity combiner class.
2010-06-22 01:45:09,270 [Thread-10] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2010-06-22 01:45:09,767 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2010-06-22 01:46:13,385 [Thread-15] INFO  org.apache.hadoop.hdfs.DFSClient - 
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 192.168.133.13:50010
2010-06-22 01:46:13,399 [Thread-15] INFO  org.apache.hadoop.hdfs.DFSClient - 
Abandoning block blk_-3238247195956896658_451260
2010-06-22 01:46:13,402 [Thread-15] INFO  org.apache.hadoop.hdfs.DFSClient - 
Waiting to find target node: 192.168.133.12:50010
2010-06-22 01:47:19,410 [Thread-15] INFO  org.apache.hadoop.hdfs.DFSClient - 
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 192.168.133.12:50010
2010-06-22 01:47:19,410 [Thread-15] INFO  org.apache.hadoop.hdfs.DFSClient - 
Abandoning block blk_-8967610649184651127_451260
2010-06-22 01:47:19,412 [Thread-15] INFO  org.apache.hadoop.hdfs.DFSClient - 
Waiting to find target node: 192.168.133.11:50010
2010-06-22 01:48:26,267 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 192.168.133.3:50010
2010-06-22 01:48:26,267 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Abandoning block blk_9202045569864855438_451264
2010-06-22 01:48:26,283 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Waiting to find target node: 192.168.133.12:50010
2010-06-22 01:49:32,317 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 192.168.133.3:50010
2010-06-22 01:49:32,317 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Abandoning block blk_62775741158062430_451266
2010-06-22 01:49:32,320 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Waiting to find target node: 192.168.133.5:50010
2010-06-22 01:50:38,326 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 192.168.133.11:50010
2010-06-22 01:50:38,326 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Abandoning block blk_4600730976353859294_451267
2010-06-22 01:50:38,328 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Waiting to find target node: 192.168.133.10:50010
2010-06-22 01:51:44,338 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Exception in createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 192.168.133.11:50010
2010-06-22 01:51:44,385 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Abandoning block blk_274719573196033811_451267
2010-06-22 01:51:44,387 [Thread-21] INFO  org.apache.hadoop.hdfs.DFSClient - 
Waiting to find target node: 192.168.133.12:50010
2010-06-22 01:51:50,389 [Thread-21] WARN  org.apache.hadoop.hdfs.DFSClient - 
DataStreamer Exception: java.io.IOException: Unable to create new block.
    at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
    at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
    at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

2010-06-22 01:51:50,390 [Thread-21] WARN  org.apache.hadoop.hdfs.DFSClient - 
Error Recovery for block blk_274719573196033811_451267 bad datanode[1] nodes == 
null
2010-06-22 01:51:50,390 [Thread-21] WARN  org.apache.hadoop.hdfs.DFSClient - 
Could not get block locations. Source file 
"/hadoop/hadoop-hadoop/mapred/system/job_201006211221_0037/job.xml" - 
Aborting...
ERROR:  'Bad connect ack with firstBadLink 192.168.133.11:50010'
2010-06-22 01:51:50,620 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2010-06-22 01:51:50,621 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!

--Michael



      

Reply via email to