On 29/01/18(Mon) 21:25, Artturi Alm wrote: > On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote: > > On 29/01/18(Mon) 20:38, Artturi Alm wrote: > > > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote: > > > > Hello Artturi, > > > > > > > > On 28/01/18(Sun) 09:08, Artturi Alm wrote: > > > > > >Synopsis: stuck in netlock > > > > > >Category: amd64 > > > > > >Environment: > > > > > System : OpenBSD 6.2 > > > > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > > > > > 09:13:00 MST 2018 > > > > > > > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > > > > Architecture: OpenBSD.amd64 > > > > > Machine : amd64 > > > > > >Description: > > > > > processes getting stuck w/STATE=netlock, kill has no effect. > > > > > >How-To-Repeat: > > > > > using the desktop normally, until trying to restart chrome ends > > > > > up failing. > > > > > > > > What do you mean with "using the desktop normally"? Which applications > > > > are you using? Which browser plugins? Can you find out the minimum > > > > setup to reproduce this deadlock? > > > > > > > > > I've had this happen to me atleast twice in the last few of > > > > > weeks. > > > > > > > > Do you know how to reproduce it easily? > > > > > > > > > > this time i had less than 10tabs open, so i guess it can be narrowed > > > down even further. > > > > > > > > At first time i noticed how trying to launch chrome did lock up > > > > > all the other processes in netlock, and "pkill chrome" did allow > > > > > the system to recover, i was unable to figure out what was wrong > > > > > and rebooting did make everything work again, while ie. > > > > > removing ~/.cache & ~/.config did not. > > > > > > > > So the deadlock is related to your chrome usage? > > > > > > > > > > now it does feel like so. i'll upgrade tonight. > > > > > > > > long before running the "ps cl" below, i had already killed all > > > > > the xterm-windows those processes were in. cwm(1) was unable to > > > > > kill some of those, but xkill did not. > > > > > > > > Well killing process waiting for the 'netlock' won't help. What has to > > > > be find is which process is holding it. For that we need the full ps > > > > output, including kernel and userland threads. > > > > > > > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > > > > > $-prompt, and ^T did show xauth stuck in netlock.. > > > > > i guess it's obvious where it was heading; so i got pics of > > > > > "# reboot -nq" failing because stuck in the fckng netlock -_- > > > > > > > > > > i do have ddb.{panic,console,log}=1, but > > > > > "# sysctl ddb.trigger=1" == > > > > > "sysctl: ddb.trigger: Operation not supported by device" > > > > > > > > Not having DDB access will limit the debugging experience. Are you sure > > > > you tried to enter it on your console? > > > > > > > > > > so this requires ttyC0, right? > > > this time it was ifconfig in [netlock], that prevented using ttyC0. > > > i got there from X by running "virsh shutdown <domain" from the kvm host, > > > i guess it emulates what pressing actual power button would(acpi?). > > > > > > > > ?? so i had no option but "virsh reset <domain>"... > > > > > > > > Did you try top(1)? What were the kernel processes doing? > > > > > > see below, if "top -bCHS -d 1 999" should do. > > > anything else i could do? anyway, thanks in advance:) > > > > This is where the problems comes from: > > > > > 33315 443734 -6 0 141M 102M idle viowait 0:00 0.00% > > > chrome: > > > > I don't understand how chrome can end up sleeping in vio_ioctl() and why > > it is sleeping forever. But this thread is holding the NET_LOCK() and > > prevents the rest of the kernel from making progress. > > > > Could you try a virtual interface different from vio(4) and see if you > > can reproduce the problem? > > Will try with 'e1000', but then this does seem to me like it would have > something to do with routing too(?), as the vio0 is only for reaching to > the host. > and separate physical interface, to which the default route belongs to.
Here's a diff to fix vio(4), could you give it a go? Index: dev/pv/if_vio.c =================================================================== RCS file: /cvs/src/sys/dev/pv/if_vio.c,v retrieving revision 1.4 diff -u -p -r1.4 if_vio.c --- dev/pv/if_vio.c 10 Aug 2017 18:03:51 -0000 1.4 +++ dev/pv/if_vio.c 23 Feb 2018 09:14:29 -0000 @@ -1276,7 +1276,8 @@ vio_wait_ctrl(struct vio_softc *sc) int r = 0; while (sc->sc_ctrl_inuse != FREE) { - r = tsleep(&sc->sc_ctrl_inuse, PRIBIO|PCATCH, "viowait", 0); + r = rwsleep(&sc->sc_ctrl_inuse, &netlock, PRIBIO|PCATCH, + "viowait", 0); if (r == EINTR) return r; } @@ -1295,7 +1296,8 @@ vio_wait_ctrl_done(struct vio_softc *sc) r = 1; break; } - r = tsleep(&sc->sc_ctrl_inuse, PRIBIO|PCATCH, "viodone", 0); + r = rwsleep(&sc->sc_ctrl_inuse, &netlock, PRIBIO|PCATCH, + "viodone", 0); if (r == EINTR) break; }