|
Can anyone help lead us to a
solution?
> >Subject: NFS problem on
[machine]> >
> >> On the new cluster [x],
we just found that for a few
> >> times, the master node was not responding [to] NFS requests. > >> Attached are lines grepped with 'nfs' from all > >>'messages' > >> file. > >> > >> This is a serious problem to us. When the nfs server > >> stops responding, many running jobs are restarted from > >> scratch. Is this a problem of the nfs configuration, > >> oscar or the hardware? What can we do to make NFS > >>stable? > >> Please advise. > >> > >> Thanks, [MORE INFORMATION]
> We noticed this when our big jobs (need 3 days) were
> restarted. From the log, it was happening more and more > frequently. Any suggestion on identifying the source of > the problem? > >
> [EMAIL PROTECTED] log]# grep nfs messages | grep not | wc -l > 339 > [EMAIL PROTECTED] log]# grep nfs messages.1 | grep not | wc -l > 381 > [EMAIL PROTECTED] log]# grep nfs messages.2 | grep not | wc -l > 9 > [EMAIL PROTECTED] log]# grep nfs messages.3 | grep not | wc -l > 0 > [EMAIL PROTECTED] log]# grep nfs messages.4 | grep not | wc -l > 0 > [EMAIL PROTECTED] log]# ls -l messages* > -rw------- 1 root root 130926 Dec 29 15:45 > messages > -rw------- 1 root root 509784 Dec 28 04:02 > messages.1 > -rw------- 1 root root 416508 Dec 21 04:02 > messages.2 > -rw------- 1 root root 586158 Dec 14 04:02 > messages.3 > -rw------- 1 root root 413372 Dec 7 04:02 > messages.4 [ ... more message ....]
messages:Dec 28 04:05:06 node7.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:05:30 node3.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:05:46 node2.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:05:57 node2.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:06:09 node3.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:06:34 node7.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:06:58 node7.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:07:48 node3.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:08:20 node2.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:09:01 node2.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:09:29 node7.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:09:53 node3.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:10:10 node3.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:10:38 node2.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:10:47 node3.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:11:06 node2.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:11:25 node3.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:11:35 node2.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:11:42 node3.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:11:47 node2.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:11:51 node3.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:12:08 node2.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:12:35 node2.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:13:14 node3.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:14:27 node3.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:15:36 node7.metis kernel: nfs: server nfs_oscar not responding, still trying messages:Dec 28 04:16:30 node3.metis kernel: nfs: server nfs_oscar OK messages:Dec 28 04:17:27 node7.metis kernel: nfs: server nfs_oscar OK > [snip] |
- [Oscar-users] NFS on Master not responding Howell Silverman
- RE: [Oscar-users] NFS on Master not responding Michael Edwards
- Re: [Oscar-users] NFS on Master not responding Howell Silverman
- RE: [Oscar-users] NFS on Master not responding Lombard, David N
