Re: amd64: stuck in netlock

Artturi Alm Mon, 29 Jan 2018 11:25:52 -0800

On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote:
> On 29/01/18(Mon) 20:38, Artturi Alm wrote:
> > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > > Hello Artturi,
> > > 
> > > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > > >Synopsis:      stuck in netlock
> > > > >Category:      amd64
> > > > >Environment:
> > > >         System      : OpenBSD 6.2
> > > >         Details     : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> > > > 09:13:00 MST 2018
> > > >                          
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > >         Architecture: OpenBSD.amd64
> > > >         Machine     : amd64
> > > > >Description:
> > > >         processes getting stuck w/STATE=netlock, kill has no effect.
> > > > >How-To-Repeat:
> > > >         using the desktop normally, until trying to restart chrome ends
> > > >         up failing.
> > > 
> > > What do you mean with "using the desktop normally"?  Which applications
> > > are you using?  Which browser plugins?  Can you find out the minimum
> > > setup to reproduce this deadlock?
> > > 
> > > >         I've had this happen to me atleast twice in the last few of 
> > > > weeks.
> > > 
> > > Do you know how to reproduce it easily?
> > > 
> > 
> > this time i had less than 10tabs open, so i guess it can be narrowed
> > down even further.
> > 
> > > >         At first time i noticed how trying to launch chrome did lock up
> > > >         all the other processes in netlock, and "pkill chrome" did allow
> > > >         the system to recover, i was unable to figure out what was wrong
> > > >         and rebooting did make everything work again, while ie.
> > > >         removing ~/.cache & ~/.config did not.
> > > 
> > > So the deadlock is related to your chrome usage?
> > > 
> > 
> > now it does feel like so. i'll upgrade tonight.
> > 
> > > >         long before running the "ps cl" below, i had already killed all
> > > >         the xterm-windows those processes were in. cwm(1) was unable to
> > > >         kill some of those, but xkill did not.
> > > 
> > > Well killing process waiting for the 'netlock' won't help.  What has to
> > > be find is which process is holding it.  For that we need the full ps
> > > output, including kernel and userland threads.
> > > > 
> > > >         after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > > >         $-prompt, and ^T did show xauth stuck in netlock..
> > > >         i guess it's obvious where it was heading; so i got pics of
> > > >         "# reboot -nq" failing because stuck in the fckng netlock -_-
> > > > 
> > > >         i do have ddb.{panic,console,log}=1, but
> > > >         "# sysctl ddb.trigger=1" ==
> > > >         "sysctl: ddb.trigger: Operation not supported by device"
> > > 
> > > Not having DDB access will limit the debugging experience.  Are you sure
> > > you tried to enter it on your console?
> > > 
> > 
> > so this requires ttyC0, right?
> > this time it was ifconfig in [netlock], that prevented using ttyC0.
> > i got there from X by running "virsh shutdown <domain" from the kvm host,
> > i guess it emulates what pressing actual power button would(acpi?).
> > 
> > > >         ?? so i had no option but "virsh reset <domain>"...
> > > 
> > > Did you try top(1)?  What were the kernel processes doing?
> > 
> > see below, if "top -bCHS -d 1 999" should do.
> > anything else i could do? anyway, thanks in advance:)
> 
> This is where the problems comes from: 
> 
> > 33315   443734  -6    0  141M  102M idle      viowait   0:00  0.00% chrome: 
> 
> I don't understand how chrome can end up sleeping in vio_ioctl() and why
> it is sleeping forever.  But this thread is holding the NET_LOCK() and
> prevents the rest of the kernel from making progress.
> 
> Could you try a virtual interface different from vio(4) and see if you
> can reproduce the problem?


Will try with 'e1000', but then this does seem to me like it would have
something to do with routing too(?), as the vio0 is only for reaching to
the host.
and separate physical interface, to which the default route belongs to.


Routing tables

Internet:
Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
default            10.0.1.2           UGS       11       65     -     8 em0
224/4              127.0.0.1          URS        0       60 32768     8 lo0
10.0.1/24          10.0.1.1           UCn        3        0     -     4 em0
10.0.1/24          10.0.1.1           US         0        0     -     8 em0
10.0.1.1           68:05:ca:23:90:88  UHLl       0       20     -     1 em0
10.0.1.2           bc:5f:f4:e6:e2:63  UHLch      4       80     -     3 em0
10.0.1.4           c8:3a:35:d8:ec:0b  UHLc       0        5     -     3 em0
10.0.1.10          link#2             UHLch      2       10     -     3 em0
10.0.1.255         10.0.1.1           UHb        0        0     -     1 em0
10.0.10/24         10.0.1.10          UGS        0        0     -     8 em0
10.0.11/24         10.0.11.1          UCn        0        0     -     4 vio0
10.0.11.1          52:54:00:d8:72:b3  UHLl       0        1     -     1 vio0
10.0.11.255        10.0.11.1          UHb        0        0     -     1 vio0
10.0.100/24        10.0.1.10          UGS        0        0     -     8 em0
127/8              127.0.0.1          UGRS       0        0 32768     8 lo0
127.0.0.1          127.0.0.1          UHhl       2       33 32768     1 lo0

$ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768
        index 4 priority 0 llprio 3
        groups: lo
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
        inet 127.0.0.1 netmask 0xff000000
vio0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 52:54:00:d8:72:b3
        index 1 priority 0 llprio 3
        media: Ethernet autoselect
        status: active
        inet 10.0.11.1 netmask 0xffffff00 broadcast 10.0.11.255
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 68:05:ca:23:90:88
        index 2 priority 0 llprio 3
        groups: egress
        media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
        status: active
        inet 10.0.1.1 netmask 0xffffff00 broadcast 10.0.1.255
enc0: flags=0<>
        index 3 priority 0 llprio 3
        groups: enc
        status: active
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33136
        index 5 priority 0 llprio 3
        groups: pflog

Re: amd64: stuck in netlock

Reply via email to