I was having some success with PVFS2. Jobtrackers and Tasktrackers were
setup to 'local' file system.
mapred.local.dir was on the hard drive of the machine. ie /tmp/hadoop
mapred.system.dir was on the pvfs2 mount and the same path for all
tasktrackers and the jobtracker. ie /mnt/pvfs2/hadoop/system
mapred.temp.dir was on the pvfs2 mount and the same path for all
tasktrackers and the jobtracker. ie /mnt/pvfs2/hadoop/temp
It worked out pretty good except for the performance of the pvfs2
cluster. When I decided to switch to the hadoop dfs I noticed that
things were more stable (tasktrackers stopped timing out) and that my
reduce tasks completed quicker.
There may have been somethings I could have done to the storage cluster
to increase it's performance but I decided it was quicker to try out the
hadoop dfs.
Jeff
Doug Cutting wrote:
Stefan Groschupf wrote:
in general hadoop's tasktracks and jobtrackers require to run with a
switched-on dfs.
Stefan: that should not be the case. One should be able to run things
entirely out of the "local" filesystem. Absolute pathnames may be
required for input and output directories, but that's a bug that we
can fix.
Doug
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general