Re: Hadoop over Lustre?

Konstantin Shvachko Mon, 25 Aug 2008 14:26:09 -0700

mapred.job.tracker is the address and port of the JobTracker - the main server 
that controls map-reduce jobs.
Every task tracker needs to know the address in order to connect.
Do you follow the docs, e.g. that one
http://wiki.apache.org/hadoop/GettingStartedWithHadoop


Can you start one node cluster?

> Are there standard tests of hadoop performance?

There is the sort benchmark. We also run DFSIO benchmark for read and write 
throughputs.

--Konstantin

Joel Welling wrote:

So far no success, Konstantin- the hadoop job seems to start up, but
fails immediately leaving no logs.  What is the appropriate setting for
mapred.job.tracker ?  The generic value references hdfs, but it also has
a port number- I'm not sure what that means.

My cluster is small, but if I get this working I'd be very happy to run
some benchmarks.  Are there standard tests of hadoop performance?

-Joel
 [EMAIL PROTECTED]

On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote:

I think the solution should be easier than Arun and Steve advise.
Lustre is already mounted as a local directory on each cluster machines, right?
Say, it is mounted on /mnt/lustre.
Then you configure hadoop-site.xml and set
<property>
   <name>fs.default.name</name>
   <value>file:///mnt/lustre</value>
</property>
And then you start map-reduce only without hdfs using start-mapred.sh

By this you basically redirect all FileSystem requests to Lustre and you don't 
need
data-nodes or the name-node.

Please let me know if that works.

Also it would very interesting to have your experience shared on this list.
Problems, performance - everything is quite interesting.

Cheers,
--Konstantin

Joel Welling wrote:

2. Could you set up symlinks from the local filesystem, so point everynode at a local dir
  /tmp/hadoop
with each node pointing to a different subdir in the big filesystem?

Yes, I could do that!  Do I need to do it for the log directories as
well, or can they be shared?

-Joel

On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote:

Joel Welling wrote:

Thanks, Steve and Arun.  I'll definitely try to write something based on
the KFS interface.  I think that for our applications putting the mapper
on the right rack is not going to be that useful.  A lot of our
calculations are going to be disordered stuff based on 3D spatial
relationships like nearest-neighbor finding, so things will be in a
random access pattern most of the time.

Is there a way to set up the configuration for HDFS so that different
datanodes keep their data in different directories?  That would be a big
help in the short term.

yes, but you'd have to push out a different config to each datanode.

1. I have some stuff that could help there, but its not ready forproduction use yet [1].

2. Could you set up symlinks from the local filesystem, so point everynode at a local dir

  /tmp/hadoop
with each node pointing to a different subdir in the big filesystem?

[1]http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

Re: Hadoop over Lustre?

Reply via email to