Yeah, I tried some more experiments today and the error messages were more helpful. It does seem that some of the values were defaulting to ones very different from what I had configured.
I have been looking into Puppet but figured with 4 slaves, it shouldn't be a problem to use NFS. Guess I was wrong! Thanks all, Andrew On May 14, 2010, at 7:41 PM, Hemanth Yamijala wrote: > Andrew, > >> Just to be clear, I'm only sharing the Hadoop binaries and config files via >> NFS. I don't see how this would cause a conflict - do you have any >> additional information? > > FWIW, we had an experience where we were storing config files on NFS > on a large cluster. Randomly, (and we guess due to NFS problems), > Hadoop would fail picking up the config files on NFS and instead use > its defaults. The config values for some directory paths defined in > default being different from the actual config values was resulting in > very odd errors. We were able to eventually solve the problem by > moving the config files off NFS. Of course, the size of the cluster > (several hundreds of slaves) was probably a reason. But nevertheless, > you may want to try pulling everything off NFS. > > Thanks > Hemanth > >> >> The referenced path in the error below (/srv/hadoop/dfs/1) is not being >> shared via NFS... >> >> Thanks, >> Andrew >> >> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote: >> >>> It is not suggested to deploy hadoop on NFS, there will be conflict >>> between data nodes, because NFS share the same namespace of file >>> system. >>> >>> >>> >>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <and...@ucsfcti.org> wrote: >>>> >>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS. >>>> The log and pid directories are local. >>>> >>>> Thanks! >>>> >>>> --Andrew >>>> >>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: >>>> >>>>> These 4 nodes share NFS ? >>>>> >>>>> >>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen >>>>> <andrew-lists-had...@ucsfcti.org> wrote: >>>>>> I'm working on bringing up a second test cluster and am getting these >>>>>> intermittent errors on the DataNodes: >>>>>> >>>>>> 2010-05-12 17:17:15,094 ERROR >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>>>>> java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No >>>>>> such file or directory) >>>>>> at java.io.RandomAccessFile.open(Native Method) >>>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) >>>>>> >>>>>> >>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific >>>>>> nodes change. Sometimes it's slave1, sometimes it's slave4, etc. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> --Andrew >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >> >>