So far no success, Konstantin- the hadoop job seems to start up, but fails immediately leaving no logs. What is the appropriate setting for mapred.job.tracker ? The generic value references hdfs, but it also has a port number- I'm not sure what that means.
My cluster is small, but if I get this working I'd be very happy to run some benchmarks. Are there standard tests of hadoop performance? -Joel [EMAIL PROTECTED] On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote: > I think the solution should be easier than Arun and Steve advise. > Lustre is already mounted as a local directory on each cluster machines, > right? > Say, it is mounted on /mnt/lustre. > Then you configure hadoop-site.xml and set > <property> > <name>fs.default.name</name> > <value>file:///mnt/lustre</value> > </property> > And then you start map-reduce only without hdfs using start-mapred.sh > > By this you basically redirect all FileSystem requests to Lustre and you > don't need > data-nodes or the name-node. > > Please let me know if that works. > > Also it would very interesting to have your experience shared on this list. > Problems, performance - everything is quite interesting. > > Cheers, > --Konstantin > > Joel Welling wrote: > >> 2. Could you set up symlinks from the local filesystem, so point every > >> node at a local dir > >> /tmp/hadoop > >> with each node pointing to a different subdir in the big filesystem? > > > > Yes, I could do that! Do I need to do it for the log directories as > > well, or can they be shared? > > > > -Joel > > > > On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote: > >> Joel Welling wrote: > >>> Thanks, Steve and Arun. I'll definitely try to write something based on > >>> the KFS interface. I think that for our applications putting the mapper > >>> on the right rack is not going to be that useful. A lot of our > >>> calculations are going to be disordered stuff based on 3D spatial > >>> relationships like nearest-neighbor finding, so things will be in a > >>> random access pattern most of the time. > >>> > >>> Is there a way to set up the configuration for HDFS so that different > >>> datanodes keep their data in different directories? That would be a big > >>> help in the short term. > >> yes, but you'd have to push out a different config to each datanode. > >> > >> 1. I have some stuff that could help there, but its not ready for > >> production use yet [1]. > >> > >> 2. Could you set up symlinks from the local filesystem, so point every > >> node at a local dir > >> /tmp/hadoop > >> with each node pointing to a different subdir in the big filesystem? > >> > >> > >> [1] > >> http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf > > > >