Re: hard lock under 3.4-STABLE

2000-02-12 Thread David Malone

On Fri, Feb 11, 2000 at 06:03:16PM -0800, Matthew Dillon wrote:

> I presume its the client that is locking up?   If you remove the
> server binary and the client takes a page fault on the binary,
> and does not have the page in the cache, what is supposed to happen
> is that the program is supposed to seg fault when the NFS read fails.
> It's quite possible that there is a bug in dealing with this situation
> and if you can get it repeatable we can probably fix it fairly easily.

I did some experiments with this sort of thing a few months ago.
I think you can kill 3.X NFS client machines by truncating a binary
on the NFS server. You can also make the machine extreamly slugish
by catching SIGBUS and SIGSEGV in an executable and then causing
one of these signals once the binary is modified. We see it quite
frequently with people using MPI.

I'll see if I can reproduce any of these effects and let you know
how to do it.

David.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: hard lock under 3.4-STABLE

2000-02-11 Thread Matthew Dillon

:I am seeing a situation where a 3.4 system hard-locks while running 3.4
:(hard lock being that it does not respond to its serial console, nor is
:it pingable).  I believe (perhaps) that it may be NFS related, with a
:program running on an NFS client when the executable itself is deleted
:from the server  (although I haven't seen that style of panic in quite
:some time, and it is usually has a couple of lines earlier in the output
:to the effect that it lost its backing store. 
:
:I realize that information is sparse in this, but that is because the
:information that I have is equally sparse... I have no kernel messages,
:I cannot drop into the kernel debugger, and no crashdump is ever created
:(I need to hit the reset button to recover.)
:
:I am trying to reproduce a test case, but it is difficult not knowing what
:has caused the problems in the first place.
:
:--
:David Cross   | email: [EMAIL PROTECTED] 
:Acting Lab Director   | NYSLP: FREEBSD

I presume its the client that is locking up?   If you remove the
server binary and the client takes a page fault on the binary,
and does not have the page in the cache, what is supposed to happen
is that the program is supposed to seg fault when the NFS read fails.
It's quite possible that there is a bug in dealing with this situation
and if you can get it repeatable we can probably fix it fairly easily.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



hard lock under 3.4-STABLE

2000-02-11 Thread David E. Cross

I am seeing a situation where a 3.4 system hard-locks while running 3.4
(hard lock being that it does not respond to its serial console, nor is
it pingable).  I believe (perhaps) that it may be NFS related, with a
program running on an NFS client when the executable itself is deleted
from the server  (although I haven't seen that style of panic in quite
some time, and it is usually has a couple of lines earlier in the output
to the effect that it lost its backing store. 

I realize that information is sparse in this, but that is because the
information that I have is equally sparse... I have no kernel messages,
I cannot drop into the kernel debugger, and no crashdump is ever created
(I need to hit the reset button to recover.)

I am trying to reproduce a test case, but it is difficult not knowing what
has caused the problems in the first place.

--
David Cross   | email: [EMAIL PROTECTED] 
Acting Lab Director   | NYSLP: FREEBSD
Systems Administrator/Research Programmer | Web: http://www.cs.rpi.edu/~crossd 
Rensselaer Polytechnic Institute, | Ph: 518.276.2860
Department of Computer Science| Fax: 518.276.4033
I speak only for myself.  | WinNT:Linux::Linux:FreeBSD


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message