On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> wrote: > Ryota Ozaki <ozak...@netbsd.org> writes: > >> The latest pfil.c (v1.34) should fix the panic. Could you try it? > > I'll give it a go tonight, and report back.
Thanks. > > Meanwhile, do you think this ongoing MPSAFE work may have some unwanted > consequences for NFS? There's a problem that's been around for at least > a couple of months, but that I only discovered the other day -- I was > running with kernels from late October then, and the problem I observed > is still there after upgrading. I'm not sure. I don't know much about NFS, how it works and how it involves the network stack. > > Reading NFS file systems is no problem, which is why I didn't notice it > before, but writing hangs. Here's an example: I started compiling a C > source file directly to an executable on an NFS mounted file system > (server and client both amd64 running fresh -current). The compile pass > is fine, but when the ld end of the pipeline wants to write the > executable, it hangs. So I try to do a 'df' in another terminal, and it > hangs. Finally, I simply attempt to make 'ls -l [target executable]' > show me if it's written anything yet, and that hangs, too: after an > attempt to write has hung the communication up, reads no longer work, > either: > > UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME > COMMAND > 0 22179 22678 0 124 0 33344 5136 netio D+ pts/17 0:00.01 > ld [...] > 501 21370 21006 516 85 0 8952 1144 nfsrcv I+ pts/18 0:00.00 > df > 501 21710 1 0 127 0 8964 1116 tstile D pts/20- 0:00.00 > /bin/ls [...] > > Once I have something with "tstile" in the "WCHAN" column, I know that > I can't just reboot the machine: it's going to take a hard reset. Can you get DDB? If you can, you can know where the processes hang up: db> ps # you can get LWP addresses of ld and ls db> bt/a <LWP address> # you can get their stack traces And I guess by ps you can see some other LWPs stuck on tstile, for example softnet/N. Getting stack traces of such LWPs would explain how the hang happens, at least, can be hints to investigate. > > Oh, and it's the client that hangs; the server seems to be just fine, > and a reboot of the client makes NFS reads behave normally again. On > the server, the output file got created, but is zero bytes. The error > logged on the client when it gets stuck is this console output: > > nfs send error 64 for barsoom:/usr/local > > ...and then the normal "nfs server not responding" messages in syslog > after that, of course. I tried a NFS client with -current and a NFS server with netbsd-7, but writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs). The hang may happen depending on a NIC. Which NIC do you use? And please let me know NFS options of the client and the server? ozaki-r