On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote: > On 29/01/18(Mon) 20:38, Artturi Alm wrote: > > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote: > > > Hello Artturi, > > > > > > On 28/01/18(Sun) 09:08, Artturi Alm wrote: > > > > >Synopsis: stuck in netlock > > > > >Category: amd64 > > > > >Environment: > > > > System : OpenBSD 6.2 > > > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > > > > 09:13:00 MST 2018 > > > > > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > > Architecture: OpenBSD.amd64 > > > > Machine : amd64 > > > > >Description: > > > > processes getting stuck w/STATE=netlock, kill has no effect. > > > > >How-To-Repeat: > > > > using the desktop normally, until trying to restart chrome ends > > > > up failing. > > > > > > What do you mean with "using the desktop normally"? Which applications > > > are you using? Which browser plugins? Can you find out the minimum > > > setup to reproduce this deadlock? > > > > > > > I've had this happen to me atleast twice in the last few of > > > > weeks. > > > > > > Do you know how to reproduce it easily? > > > > > > > this time i had less than 10tabs open, so i guess it can be narrowed > > down even further. > > > > > > At first time i noticed how trying to launch chrome did lock up > > > > all the other processes in netlock, and "pkill chrome" did allow > > > > the system to recover, i was unable to figure out what was wrong > > > > and rebooting did make everything work again, while ie. > > > > removing ~/.cache & ~/.config did not. > > > > > > So the deadlock is related to your chrome usage? > > > > > > > now it does feel like so. i'll upgrade tonight. > > > > > > long before running the "ps cl" below, i had already killed all > > > > the xterm-windows those processes were in. cwm(1) was unable to > > > > kill some of those, but xkill did not. > > > > > > Well killing process waiting for the 'netlock' won't help. What has to > > > be find is which process is holding it. For that we need the full ps > > > output, including kernel and userland threads. > > > > > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > > > > $-prompt, and ^T did show xauth stuck in netlock.. > > > > i guess it's obvious where it was heading; so i got pics of > > > > "# reboot -nq" failing because stuck in the fckng netlock -_- > > > > > > > > i do have ddb.{panic,console,log}=1, but > > > > "# sysctl ddb.trigger=1" == > > > > "sysctl: ddb.trigger: Operation not supported by device" > > > > > > Not having DDB access will limit the debugging experience. Are you sure > > > you tried to enter it on your console? > > > > > > > so this requires ttyC0, right? > > this time it was ifconfig in [netlock], that prevented using ttyC0. > > i got there from X by running "virsh shutdown <domain" from the kvm host, > > i guess it emulates what pressing actual power button would(acpi?). > > > > > > ?? so i had no option but "virsh reset <domain>"... > > > > > > Did you try top(1)? What were the kernel processes doing? > > > > see below, if "top -bCHS -d 1 999" should do. > > anything else i could do? anyway, thanks in advance:) > > This is where the problems comes from: > > > 33315 443734 -6 0 141M 102M idle viowait 0:00 0.00% chrome: > > I don't understand how chrome can end up sleeping in vio_ioctl() and why > it is sleeping forever. But this thread is holding the NET_LOCK() and > prevents the rest of the kernel from making progress. > > Could you try a virtual interface different from vio(4) and see if you > can reproduce the problem?
Will try with 'e1000', but then this does seem to me like it would have something to do with routing too(?), as the vio0 is only for reaching to the host. and separate physical interface, to which the default route belongs to. Routing tables Internet: Destination Gateway Flags Refs Use Mtu Prio Iface default 10.0.1.2 UGS 11 65 - 8 em0 224/4 127.0.0.1 URS 0 60 32768 8 lo0 10.0.1/24 10.0.1.1 UCn 3 0 - 4 em0 10.0.1/24 10.0.1.1 US 0 0 - 8 em0 10.0.1.1 68:05:ca:23:90:88 UHLl 0 20 - 1 em0 10.0.1.2 bc:5f:f4:e6:e2:63 UHLch 4 80 - 3 em0 10.0.1.4 c8:3a:35:d8:ec:0b UHLc 0 5 - 3 em0 10.0.1.10 link#2 UHLch 2 10 - 3 em0 10.0.1.255 10.0.1.1 UHb 0 0 - 1 em0 10.0.10/24 10.0.1.10 UGS 0 0 - 8 em0 10.0.11/24 10.0.11.1 UCn 0 0 - 4 vio0 10.0.11.1 52:54:00:d8:72:b3 UHLl 0 1 - 1 vio0 10.0.11.255 10.0.11.1 UHb 0 0 - 1 vio0 10.0.100/24 10.0.1.10 UGS 0 0 - 8 em0 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 2 33 32768 1 lo0 $ ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768 index 4 priority 0 llprio 3 groups: lo inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet 127.0.0.1 netmask 0xff000000 vio0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 52:54:00:d8:72:b3 index 1 priority 0 llprio 3 media: Ethernet autoselect status: active inet 10.0.11.1 netmask 0xffffff00 broadcast 10.0.11.255 em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 68:05:ca:23:90:88 index 2 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet 10.0.1.1 netmask 0xffffff00 broadcast 10.0.1.255 enc0: flags=0<> index 3 priority 0 llprio 3 groups: enc status: active pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33136 index 5 priority 0 llprio 3 groups: pflog