[Hadoop Wiki] Update of "TroubleShooting" by SteveLoughran

Apache Wiki Tue, 15 Sep 2009 04:44:19 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/TroubleShooting

The comment on the change is:
link in the new tcp error pages, add something from the -user list

------------------------------------------------------------------------------
      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)}}}
  
- This is sometimes encountered if there is a corruption of the {{edits}} file 
+ This is sometimes encountered if there is a corruption of the {{{edits}}} file
  in the transaction log. Try using a hex editor or equivalent to open
  up 'edits' and get rid of the last record. In all cases, the last record
  might not be complete so your NameNode is not starting. Once you update
  your edits, start the NameNode and run {{{hadoop fsck /}}} to see if you
- have any corrupt files and fix/get rid of them. 
+ have any corrupt files and fix/get rid of them.
  
  Take a back up of {{{dfs.name.dir}}} before updating and playing around
  with it.
  
  == Client cannot talk to filesystem ==
  
+ === TCP Level Error Messages ===
+ 
+  * NoRouteToHost
+  * ConnectionRefused
+ 
  === Error message: Could not get block locations. Aborting... ===
  
- There are couple of causes for this. 
+ There are number of possible of causes for this.
   * The namenode may be overloaded. Check the logs for messages that say 
"discarding calls..."
   * There may not be enough (any) datanodes for the data to be written. Again, 
check the logs.
-  * The datanodes on which the blocks were stored might be down. 
+  * The datanodes on which the blocks were stored might be down.
+ 
+ === Error message: Could not obtain block ===
+ 
+ Your logs contain something like
+ {{{INFO hdfs.DFSClient: Could not obtain block blk_-4157273618194597760_1160
+  from any node:  java.io.IOException: No live nodes contain current block}}}
+ 
+ There are no live datanodes containing a copy of the block of the file you 
are looking for. Bring up any nodes that are down, or skip that block.
  
  == Reduce hangs ==
  
  This can be a DNS issue. Two problems which have been encountered in practice 
are:
-  * Machines with multiple NICs. In this case, set dfs.datanode.dns.interface 
(in hdfs-site.xml) and mapred.datanode.dns.interface (in mapred-site.xml) to 
the name of the network interface used by Hadoop (something like eth0 under 
Linux),
+  * Machines with multiple NICs. In this case, set 
{{{dfs.datanode.dns.interface}}} (in {{{hdfs-site.xml}}}) and 
{{{mapred.datanode.dns.interface}}} (in {{{mapred-site.xml}}}) to the name of 
the network interface used by Hadoop (something like eth0 under Linux),
-  * Badly formatted hosts files (/etc/hosts under Linux) can wreak havoc. Any 
DNS problem will hobble Hadoop, so ensure that names can be resolved correctly.
+  * Badly formatted or incorrect hosts files ({{{/etc/hosts}}} under Linux) 
can wreak havoc. Any DNS problem will hobble Hadoop, so ensure that names can 
be resolved correctly.

[Hadoop Wiki] Update of "TroubleShooting" by SteveLoughran

Reply via email to