so it seems but doing a dig from terminal command line returns the results correctly. the same setting are running in production servers (not hadoop) for months without problems.

clarification - i changed servers names in logs, domain isn't xxx.local originally..


On 01/04/2012 05:19 PM, Harsh J wrote:
Looks like your caching DNS servers aren't really functioning as you'd
expect them to?

org.apache.hadoop.hbase.ZooKeeperConnectionException:
java.net.UnknownHostException: s06.xxx.local
(That .local also worries me, you probably have a misconfiguration in
resolution somewhere.)

On Wed, Jan 4, 2012 at 8:38 PM, Oren<or...@infolinks.com>  wrote:
hi.
i have a small hadoop grid connected  with a 1g network.
when servers are configured to use the local dns server the jobs are running
without a problem and copy speed during reduce is tens on MB.
once i change the servers to work with a cache only named server on each
node, i start to get failed tasks with timeout errors.
also, copy speed is reduced to under 1M.

there is NO degradation in network, copy of files between servers is still
tens of MB.
resolving is working ok and in the same speed (give or take) with both
configurations.

any idea of what happens during the map/reduce process that causes this
behavior?
this is an example for the exceptions i get during map:
Too many fetch-failures

and during reduce:
java.lang.RuntimeException:
org.apache.hadoop.hbase.ZooKeeperConnectionException:
java.net.UnknownHostException: s06.xxx.local at
org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)
at
org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129)
at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) at
com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118)
at com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at
org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
org.apache.hadoop.hbase.ZooKeeperConnectionException:
java.net.UnknownHostException: s06.xxx.local at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)
at
org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at
org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at
java.net.InetAddress.getAllByName0(InetAddress.java:1158) at
java.net.InetAddress.getAllByName(InetAddress.java:1084) at
java.net.InetAddress.getAllByName(InetAddress.java:1020) at
org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at
org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at
org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at
org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998)
... 13 more

thank you,
Oren.




Reply via email to