On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo
<t...@hamartun.priv.no> wrote:
> Ryota Ozaki <ozak...@netbsd.org> writes:
>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
> I'll give it a go tonight, and report back.


> Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
> consequences for NFS?  There's a problem that's been around for at least
> a couple of months, but that I only discovered the other day -- I was
> running with kernels from late October then, and the problem I observed
> is still there after upgrading.

I'm not sure. I don't know much about NFS, how it works and how it involves
the network stack.

> Reading NFS file systems is no problem, which is why I didn't notice it
> before, but writing hangs.  Here's an example: I started compiling a C
> source file directly to an executable on an NFS mounted file system
> (server and client both amd64 running fresh -current).  The compile pass
> is fine, but when the ld end of the pipeline wants to write the
> executable, it hangs.  So I try to do a 'df' in another terminal, and it
> hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
> show me if it's written anything yet, and that hangs, too: after an
> attempt to write has hung the communication up, reads no longer work,
> either:
>  UID   PID  PPID   CPU PRI  NI     VSZ     RSS WCHAN   STAT TTY          TIME 
>    0 22179 22678     0 124   0   33344    5136 netio   D+   pts/17    0:00.01 
> ld [...]
>  501 21370 21006   516  85   0    8952    1144 nfsrcv  I+   pts/18    0:00.00 
> df
>  501 21710     1     0 127   0    8964    1116 tstile  D    pts/20-   0:00.00 
> /bin/ls [...]
> Once I have something with "tstile" in the "WCHAN" column, I know that
> I can't just reboot the machine: it's going to take a hard reset.

Can you get DDB? If you can, you can know where the processes hang up:
  db> ps                 # you can get LWP addresses of ld and ls
  db> bt/a <LWP address> # you can get their stack traces

And I guess by ps you can see some other LWPs stuck on tstile, for example
softnet/N. Getting stack traces of such LWPs would explain how the hang
happens, at least, can be hints to investigate.

> Oh, and it's the client that hangs; the server seems to be just fine,
> and a reboot of the client makes NFS reads behave normally again.  On
> the server, the output file got created, but is zero bytes.  The error
> logged on the client when it gets stuck is this console output:
> nfs send error 64 for barsoom:/usr/local
> ...and then the normal "nfs server not responding" messages in syslog
> after that, of course.

I tried a NFS client with -current and a NFS server with netbsd-7, but
writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs).
The hang may happen depending on a NIC. Which NIC do you use?

And please let me know NFS options of the client and the server?


Reply via email to