Hey  Hadoop'sters -

 I wanted to break out the DFS runtime to stand alone in addition to not
requring ssh/rsync for opertions, although those are fine
 choices.

 Here's the results of my cannibalizing the existing scripts, etc (not
perfect):

   http://67.113.25.210/james/projects/hadoop/hadoop.sh

   note: if there is interest and this shell/thread is wiki worthy and
such, i can help w/ the details

 To start w/, the simple nits/issues i encountered earlier are resolved:

   bin/hadoop.sh fails under dash, which is ubuntu's 6.10b /bin/sh
     if [ "$HADOOP_NICENESS" == "" ]; then
   would be nice to add ".txt" suffix to the log and out files so that they
can be viewed w/in the browser

 On the downside, with this script I seem to have lost my log files, which
sucks. I'm pretty sure it is a classpath issue but it eludes
 me thus far.

 Now, the more critical issue. I'm wondering why, if I am understanding
things correctly, why the DataNode *requires* DNS
 resolution vs optionally simply using the specified IP? As a simple case I
created 2 loopback configurations, one specifying
 NameNode and the other specifying the singular DataNode. The two never
connected since Hadoop appears to prefer to
 get the host name and do a lookup on that IP which is not loopback, for my
case, thereby causing the failed to connect problem.

 I can understand why DNS resolution is a good default but why is it
mandatory, unless I am missing something? I can see some
 deployments may which to forego dns and simply map to predetermined IPs.
In the most trivial case, I may want to dev nodes
 on loopback emulating the just mentioned scenario.

 much appreciated,

- james

Reply via email to