Nature of the computational problem and the architecture of the cluster would 
have a significant bearing on the solution to your problem I think.  I would 
guess offhand its a bandwith issue since the NFS does serve requests 
eventually.  If you have a really large cluster or a very file writing 
intensive computational problem you can tie up the NFS server fairly quickly.

About how many nodes are you using?

>===== Original Message From Howell Silverman <[EMAIL PROTECTED]> =====
>Can anyone help lead us to a solution?
>
>
>> >Subject: NFS problem on [machine]> >
>
>> >> On the new cluster [x], we just found that for a few
>> >> times, the master node was not responding [to] NFS requests.
>> >>  Attached are lines grepped with 'nfs' from all
>> >>'messages'
>> >> file.
>> >>
>> >> This is a serious problem to us.  When the nfs server
>> >> stops responding, many running jobs are restarted from
>> >> scratch.  Is this a problem of the nfs configuration,
>> >> oscar or the hardware?  What can we do to make NFS
>> >>stable?
>> >>  Please advise.
>> >>
>> >> Thanks,
>
>[MORE INFORMATION]
>
>> We noticed this when our big jobs (need 3 days) were
>> restarted. From the log, it was happening more and more
>> frequently. Any suggestion on identifying the source of
>> the problem?
>>
>>
>> [EMAIL PROTECTED] log]# grep nfs messages | grep not | wc -l
>>      339
>> [EMAIL PROTECTED] log]# grep nfs messages.1 | grep not | wc -l
>>      381
>> [EMAIL PROTECTED] log]# grep nfs messages.2 | grep not | wc -l
>>        9
>> [EMAIL PROTECTED] log]# grep nfs messages.3 | grep not | wc -l
>>        0
>> [EMAIL PROTECTED] log]# grep nfs messages.4 | grep not | wc -l
>>        0
>> [EMAIL PROTECTED] log]# ls -l messages*
>> -rw-------    1 root     root       130926 Dec 29 15:45
>> messages
>> -rw-------    1 root     root       509784 Dec 28 04:02
>> messages.1
>> -rw-------    1 root     root       416508 Dec 21 04:02
>> messages.2
>> -rw-------    1 root     root       586158 Dec 14 04:02
>> messages.3
>> -rw-------    1 root     root       413372 Dec  7 04:02
>> messages.4
>
>[ ... more message ....]
>messages:Dec 28 04:05:06 node7.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:05:30 node3.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:05:46 node2.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:05:57 node2.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:06:09 node3.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:06:34 node7.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:06:58 node7.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:07:48 node3.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:08:20 node2.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:09:01 node2.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:09:29 node7.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:09:53 node3.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:10:10 node3.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:10:38 node2.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:10:47 node3.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:11:06 node2.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:11:25 node3.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:11:35 node2.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:11:42 node3.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:11:47 node2.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:11:51 node3.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:12:08 node2.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:12:35 node2.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:13:14 node3.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:14:27 node3.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:15:36 node7.metis kernel: nfs: server nfs_oscar not 
responding, still trying
>messages:Dec 28 04:16:30 node3.metis kernel: nfs: server nfs_oscar OK
>messages:Dec 28 04:17:27 node7.metis kernel: nfs: server nfs_oscar OK >
>[snip]



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to