I was having some success with PVFS2. Jobtrackers and Tasktrackers were setup to 'local' file system.

mapred.local.dir was on the hard drive of the machine.  ie /tmp/hadoop
mapred.system.dir was on the pvfs2 mount and the same path for all tasktrackers and the jobtracker. ie /mnt/pvfs2/hadoop/system mapred.temp.dir was on the pvfs2 mount and the same path for all tasktrackers and the jobtracker. ie /mnt/pvfs2/hadoop/temp

It worked out pretty good except for the performance of the pvfs2 cluster. When I decided to switch to the hadoop dfs I noticed that things were more stable (tasktrackers stopped timing out) and that my reduce tasks completed quicker.

There may have been somethings I could have done to the storage cluster to increase it's performance but I decided it was quicker to try out the hadoop dfs.

Jeff

Doug Cutting wrote:

Stefan Groschupf wrote:

in general hadoop's tasktracks and jobtrackers require to run with a switched-on dfs.


Stefan: that should not be the case. One should be able to run things entirely out of the "local" filesystem. Absolute pathnames may be required for input and output directories, but that's a bug that we can fix.

Doug




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to