Joel Welling wrote:
Hi folks;
I'm new to Hadoop, and I'm trying to set it up on a cluster for which
almost all the disk is mounted via the Lustre filesystem. That
filesystem is visible to all the nodes, so I don't actually need HDFS to
implement a shared filesystem. (I know the philosophical reasons why
people say local disks are better for Hadoop, but that's not the
situation I've got). My system is failing, and I think it's because the
different nodes are tripping over each other when they try to run HDFS
out of the same directory tree.
Is there a way to turn off HDFS and just let Lustre do the distributed
filesystem? I've seen discussion threads about Hadoop with NFS which
said something like 'just specify a local filesystem and everything will
be fine', but I don't know how to do that. I'm using Hadoop 0.17.2.
I dont know enough about Lustre to be very useful
* You shouldnt have nodes trying to use the same directories. At the
very least, point each datanode at a different bit of the filesystem.
* If there is a specific API call to find out which rack has the data,
that could be used to place work near the data. Someone (==you) would
have to write a new filesystem back-end for Hadoop for this.
-steve