mapred.job.tracker is the address and port of the JobTracker - the main server
that controls map-reduce jobs.
Every task tracker needs to know the address in order to connect.
Do you follow the docs, e.g. that one
http://wiki.apache.org/hadoop/GettingStartedWithHadoop
Can you start one node cluster?
> Are there standard tests of hadoop performance?
There is the sort benchmark. We also run DFSIO benchmark for read and write
throughputs.
--Konstantin
Joel Welling wrote:
So far no success, Konstantin- the hadoop job seems to start up, but
fails immediately leaving no logs. What is the appropriate setting for
mapred.job.tracker ? The generic value references hdfs, but it also has
a port number- I'm not sure what that means.
My cluster is small, but if I get this working I'd be very happy to run
some benchmarks. Are there standard tests of hadoop performance?
-Joel
[EMAIL PROTECTED]
On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote:
I think the solution should be easier than Arun and Steve advise.
Lustre is already mounted as a local directory on each cluster machines, right?
Say, it is mounted on /mnt/lustre.
Then you configure hadoop-site.xml and set
<property>
<name>fs.default.name</name>
<value>file:///mnt/lustre</value>
</property>
And then you start map-reduce only without hdfs using start-mapred.sh
By this you basically redirect all FileSystem requests to Lustre and you don't
need
data-nodes or the name-node.
Please let me know if that works.
Also it would very interesting to have your experience shared on this list.
Problems, performance - everything is quite interesting.
Cheers,
--Konstantin
Joel Welling wrote:
2. Could you set up symlinks from the local filesystem, so point every
node at a local dir
/tmp/hadoop
with each node pointing to a different subdir in the big filesystem?
Yes, I could do that! Do I need to do it for the log directories as
well, or can they be shared?
-Joel
On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote:
Joel Welling wrote:
Thanks, Steve and Arun. I'll definitely try to write something based on
the KFS interface. I think that for our applications putting the mapper
on the right rack is not going to be that useful. A lot of our
calculations are going to be disordered stuff based on 3D spatial
relationships like nearest-neighbor finding, so things will be in a
random access pattern most of the time.
Is there a way to set up the configuration for HDFS so that different
datanodes keep their data in different directories? That would be a big
help in the short term.
yes, but you'd have to push out a different config to each datanode.
1. I have some stuff that could help there, but its not ready for
production use yet [1].
2. Could you set up symlinks from the local filesystem, so point every
node at a local dir
/tmp/hadoop
with each node pointing to a different subdir in the big filesystem?
[1]
http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf