Joel Welling wrote:
Thanks, Steve and Arun. I'll definitely try to write something based on
the KFS interface. I think that for our applications putting the mapper
on the right rack is not going to be that useful. A lot of our
calculations are going to be disordered stuff based on 3D spatial
relationships like nearest-neighbor finding, so things will be in a
random access pattern most of the time.
Is there a way to set up the configuration for HDFS so that different
datanodes keep their data in different directories? That would be a big
help in the short term.
yes, but you'd have to push out a different config to each datanode.
1. I have some stuff that could help there, but its not ready for
production use yet [1].
2. Could you set up symlinks from the local filesystem, so point every
node at a local dir
/tmp/hadoop
with each node pointing to a different subdir in the big filesystem?
[1]
http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf