Yeah, I tried some more experiments today and the error messages were more 
helpful.  It does seem that some of the values were defaulting to ones very 
different from what I had configured.  

I have been looking into Puppet but figured with 4 slaves, it shouldn't be a 
problem to use NFS.  Guess I was wrong!

Thanks all,
Andrew

On May 14, 2010, at 7:41 PM, Hemanth Yamijala wrote:

> Andrew,
> 
>> Just to be clear, I'm only sharing the Hadoop binaries and config files via 
>> NFS.  I don't see how this would cause a conflict - do you have any 
>> additional information?
> 
> FWIW, we had an experience where we were storing config files on NFS
> on a large cluster. Randomly, (and we guess due to NFS problems),
> Hadoop would fail picking up the config files on NFS and instead use
> its defaults. The config values for some directory paths defined in
> default being different from the actual config values was resulting in
> very odd errors. We were able to eventually solve the problem by
> moving the config files off NFS. Of course, the size of the cluster
> (several hundreds of slaves) was probably a reason. But nevertheless,
> you may want to try pulling everything off NFS.
> 
> Thanks
> Hemanth
> 
>> 
>> The referenced path in the error below (/srv/hadoop/dfs/1) is not being 
>> shared via NFS...
>> 
>> Thanks,
>> Andrew
>> 
>> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote:
>> 
>>> It is not suggested to deploy hadoop on NFS, there will be conflict
>>> between data nodes, because NFS share the same namespace of file
>>> system.
>>> 
>>> 
>>> 
>>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <and...@ucsfcti.org> wrote:
>>>> 
>>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS. 
>>>>  The log and pid directories are local.
>>>> 
>>>> Thanks!
>>>> 
>>>> --Andrew
>>>> 
>>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>>>> 
>>>>> These 4 nodes share NFS ?
>>>>> 
>>>>> 
>>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
>>>>> <andrew-lists-had...@ucsfcti.org> wrote:
>>>>>> I'm working on bringing up a second test cluster and am getting these 
>>>>>> intermittent errors on the DataNodes:
>>>>>> 
>>>>>> 2010-05-12 17:17:15,094 ERROR 
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>>>>> java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No 
>>>>>> such file or directory)
>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>>>>>        at 
>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>>>>>> 
>>>>>> 
>>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific 
>>>>>> nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>> 
>>>>>> Any thoughts?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> --Andrew
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 

Reply via email to