pf table cannot contain multiple PFRKE_ROUTE to same IP on different interfaces
Dear OpenBSD developers, In packet filter, it is not possible to define a "route-to" rule with multiple destinations having the same IP address but on different interfaces. Minimal example to reproduce the problem: # cat /etc/hostname.lo5 rdomain 5 inet 127.0.0.1 255.0.0.0 # cat /etc/hostname.lo6 rdomain 6 inet 127.0.0.1 255.0.0.0 # cat /etc/pf.conf pass in on vio0 to 123.123.123.123 route-to { (lo5 127.0.10.1), (lo6 127.0.10.1) } round-robin # pfctl -f /etc/pf.conf -v table <__automatic_5854be65_0> const { 127.0.10.1@lo5 127.0.10.1@lo6 } pass in on vio0 inet from any to 123.123.123.123 flags S/SA route-to <__automatic_5854be65_0> round-robin # pfctl -T show -t __automatic_5854be65_0 127.0.10.1@lo5 In practice, I need this for routing traffic to several OpenVPN tunnels in a round-robin fashion. Unfortunately, my VPN provider uses the same gateway IP for all their servers. pass in on vlan123 route-to { (tun0 tun0:peer), (tun1 tun1:peer) } round-robin The second address is not added because of this: - In /sys/net/pf_table.c:1653, in the pfr_ina_define function, the call to pfr_lookup_addr returns non NULL - In /sys/net/pf_table.c:815, in the pfr_lookup_addr function, rn_match returns non NULL - In /sys/net/radix.c:263-265, in the rn_match function, the for loops checks for differences in the IP prefix, does not find any and returns the existing node in the tree. The problem is that only the IP address and mask are taken into consideration when searching a node in the radix tree, the interface is ignored. Therefore it's not possible to store two nodes with the same IP but different interfaces (127.0.10.1@lo5 and 127.0.10.1@lo6). Unfortunately, I did not manage to understand in details how the radix tree worked, especially the nodes ordering so I was not able to patch it to add the interface information. Can someone who knows this code better try to fix this problem or point me in the right direction? Thanks a lot for all your work on OpenBSD and thank you in advance for your help. Kind regards, NOP
Re: vmd terminates vms without an explicit "boot" line
On Mon, Jan 29, 2018 at 08:23:42PM -0800, Mike Larkin wrote: > On Thu, Jan 25, 2018 at 09:57:46PM -0700, Aaron Bieber wrote: > > Hola! > > > > This one is a bit funky. I just setup a new server with Hetzner. When I > > try to boot vms on it, they only start when I have a "boot" entry > > specified. Anything that uses the bios (doesn't have a boot entry) fails > > fairly silently. > > > > I can take the same config (without a boot entry) on my x240, and it > > boots (I can see the seabios startup). > > > > Hetzner box: > > hw.model=Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz > > This CPU is the problem. That is an a very old Nehalem CPU, which lacks Technically it's a Bloomfield but the problem is the same :) > the "unrestricted guest" virtualization feature required to run virtualized > real mode code (eg, bios). This also means you're going to be stuck with > OpenBSD guests only. > > You're going to have to use the -b option (or the "boot" entry like you > noted) on this CPU. I have plans to fix that someday but other things keep > jumping in front of this in line. > > Does Hetzner offer a newer CPU option? (This CPU is 9 years old). > > -ml > > > cpu0: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR > > > > x240: > > hw.model=Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz > > cpu0: > > > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT > > > > Config: > > switch "nat" { > > interface bridge1 > > } > > > > vm "test" { > > disable > > memory 1G > > disk "/vm/test.img" > > > > interface { switch "nat" } > > } > > > > The test.img file is a freshly created file with no OS installed. > > > > When starting the vm with debug on the Hetzner server: > > frank# vmd -dvvv > > startup > > /etc/vm.conf:5: switch "nat" registered > > vm_register: registering vm 1 > > /etc/vm.conf:13: vm "test" registered (disabled) > > vm_priv_brconfig: interface bridge1 description switch1-nat > > vmd_configure: not creating vm test (disabled) > > config_setconfig: setting config > > config_getconfig: retrieving config > > config_getconfig: retrieving config > > config_getconfig: retrieving config > > vm_opentty: vm test tty /dev/ttyp5 uid 0 gid 4 mode 620 > > vm_register: registering vm 1 > > vm_priv_ifconfig: interface tap0 description vm1-if0-test > > loadfile_bios: loaded BIOS image > > vm_priv_ifconfig: switch "nat" interface bridge1 add tap0 > > run_vm: initializing hardware for vm test > > test: started vm 1 successfully, tty /dev/ttyp5 > > virtio_init: vm "test" vio0 lladdr fe:e1:bb:d1:e4:39 > > run_vm: starting vcpu threads for vm test > > vcpu_reset: resetting vcpu 0 for vm 33 > > vmd: cannot reset VCPU 0 - exiting. > > vmm_sighdlr: handling signal 20 > > vmm_sighdlr: attempting to terminate vm 1 > > terminate_vm: terminating vmid 33 > > vmm_sighdlr: calling vm_remove > > vm_remove: removing vm id 1 from running config > > vm_remove: calling vm_stop > > vm_stop: stopping vm 1 > > vmd_dispatch_vmm: handling TERMINATE_EVENT for vm id 1 ret 5 > > vmd_dispatch_vmm: about to stop vm id 1 > > vm_stop: stopping vm 1 > > > > Output from vmctl when starting: > > frank# vmctl start test -c > > Connected to /dev/ttyp5 (speed 115200) > > > > [EOT] > > frank# > > > > Cheers, > > Aaron > > > > -- > > PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE > > >
Re: vmd terminates vms without an explicit "boot" line
On Thu, Jan 25, 2018 at 09:57:46PM -0700, Aaron Bieber wrote: > Hola! > > This one is a bit funky. I just setup a new server with Hetzner. When I > try to boot vms on it, they only start when I have a "boot" entry > specified. Anything that uses the bios (doesn't have a boot entry) fails > fairly silently. > > I can take the same config (without a boot entry) on my x240, and it > boots (I can see the seabios startup). > > Hetzner box: > hw.model=Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz This CPU is the problem. That is an a very old Nehalem CPU, which lacks the "unrestricted guest" virtualization feature required to run virtualized real mode code (eg, bios). This also means you're going to be stuck with OpenBSD guests only. You're going to have to use the -b option (or the "boot" entry like you noted) on this CPU. I have plans to fix that someday but other things keep jumping in front of this in line. Does Hetzner offer a newer CPU option? (This CPU is 9 years old). -ml > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR > > x240: > hw.model=Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz > cpu0: > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT > > Config: > switch "nat" { > interface bridge1 > } > > vm "test" { > disable > memory 1G > disk "/vm/test.img" > > interface { switch "nat" } > } > > The test.img file is a freshly created file with no OS installed. > > When starting the vm with debug on the Hetzner server: > frank# vmd -dvvv > startup > /etc/vm.conf:5: switch "nat" registered > vm_register: registering vm 1 > /etc/vm.conf:13: vm "test" registered (disabled) > vm_priv_brconfig: interface bridge1 description switch1-nat > vmd_configure: not creating vm test (disabled) > config_setconfig: setting config > config_getconfig: retrieving config > config_getconfig: retrieving config > config_getconfig: retrieving config > vm_opentty: vm test tty /dev/ttyp5 uid 0 gid 4 mode 620 > vm_register: registering vm 1 > vm_priv_ifconfig: interface tap0 description vm1-if0-test > loadfile_bios: loaded BIOS image > vm_priv_ifconfig: switch "nat" interface bridge1 add tap0 > run_vm: initializing hardware for vm test > test: started vm 1 successfully, tty /dev/ttyp5 > virtio_init: vm "test" vio0 lladdr fe:e1:bb:d1:e4:39 > run_vm: starting vcpu threads for vm test > vcpu_reset: resetting vcpu 0 for vm 33 > vmd: cannot reset VCPU 0 - exiting. > vmm_sighdlr: handling signal 20 > vmm_sighdlr: attempting to terminate vm 1 > terminate_vm: terminating vmid 33 > vmm_sighdlr: calling vm_remove > vm_remove: removing vm id 1 from running config > vm_remove: calling vm_stop > vm_stop: stopping vm 1 > vmd_dispatch_vmm: handling TERMINATE_EVENT for vm id 1 ret 5 > vmd_dispatch_vmm: about to stop vm id 1 > vm_stop: stopping vm 1 > > Output from vmctl when starting: > frank# vmctl start test -c > Connected to /dev/ttyp5 (speed 115200) > > [EOT] > frank# > > Cheers, > Aaron > > -- > PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE >
Re: wrong if used after adding new route - affects syslog, dhcrelay and more
On Mon, Jan 29, 2018 at 07:33:47PM +0100, Remi Locherer wrote: > > Problem Description > Local originating traffic leaves the system on the wrong interface > after a more specific route was added. This is problematic for services > like dhcrelay and syslogd. > > I verified this on iced this on OpenBSD 6.1 but do not know how if was with > older versions. The behaviour is still the same with current. What I wanted to write: I verified this behaviour on OpenBSD 6.1, 6.2 and -current. I do not know if it was different with older releases. > > > Workaround: > Monitor the routing socket and restart affected services when the routing > table changes. > > > How to reproduce: > mistral ~# route -n show -inet > Routing tables > > Internet: > DestinationGatewayFlags Refs Use Mtu Prio Iface > default172.18.35.1UGS7 408 -12 iwm0 > 224/4 127.0.0.1 URS0 1898 32768 8 lo0 > 127/8 127.0.0.1 UGRS 00 32768 8 lo0 > 127.0.0.1 127.0.0.1 UHhl 14 32768 1 lo0 > 172.18.35/24 172.18.35.87 UCn1 634 - 8 iwm0 > 172.18.35.1cc:4e:24:82:88:42 UHLch 1 312 - 7 iwm0 > 172.18.35.87 5c:e0:c5:1f:ad:c4 UHLl 0 14 - 1 iwm0 > 172.18.35.255 172.18.35.87 UHb0 633 - 1 iwm0 > 172.30.1/24192.168.250.18 UGS00 - 8 > vether0 > 192.168.250/24 192.168.250.1 UCn10 - 4 > vether0 > 192.168.250.1 fe:e1:ba:d0:05:e2 UHLl 05 - 1 > vether0 > 192.168.250.18 fe:e1:bb:d1:a2:b9 UHLch 2 37 - 3 > vether0 > 192.168.250.255192.168.250.1 UHb1 105 - 1 > vether0 > mistral ~# > > $mistral 130 ~$ nc -u 172.30.1.1 > adsfsa > > --> do not press ^C, keep it open to generate traffic over the same > socket after changing routes > > --> as expected traffic leaves on vether0 > 16:24:00.676256 rule 0/(match) match out on vether0: 192.168.250.1.28812 > > 172.30.1.1.: udp 7 > > mistral ~# route del 172.30.1.0/24 > > --> now traffic leaves on iwm0. makes sense because of the default route. > 16:24:51.078621 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > > 172.30.1.1.: udp 3 > > mistral ~# route add 172.30.1.0/24 192.168.250.18 > > --> traffic still leaves iwm0. expectations: vether0 > 16:28:09.440038 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > > 172.30.1.1.: udp 4 > > > > dmesg > > OpenBSD 6.2-current (GENERIC.MP) #107: Tue Jan 16 21:58:15 CET 2018 > r...@mistral.relo.ch:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 8473632768 (8081MB) > avail mem = 8209895424 (7829MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe (84 entries) > bios0: vendor Dell Inc. version "A13" date 06/16/2017 > bios0: Dell Inc. XPS 13 9343 > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP APIC FPDT FIDT MCFG HPET SSDT UEFI SSDT ASF! SSDT > SSDT SSDT SSDT PCCT SSDT SSDT SSDT SLIC MSDM DMAR CSRT BGRT > acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) > PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) > PXSX(S4) RP05(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.62 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT > cpu0: 256KB 64b/line 8-way L2 cache > acpitimer0: recalibrated TSC frequency 2194928041 Hz > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.22 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT > cpu1: 256KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 1 (application processor) >
Re: amd64: stuck in netlock
On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote: > On 29/01/18(Mon) 20:38, Artturi Alm wrote: > > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote: > > > Hello Artturi, > > > > > > On 28/01/18(Sun) 09:08, Artturi Alm wrote: > > > > >Synopsis: stuck in netlock > > > > >Category: amd64 > > > > >Environment: > > > > System : OpenBSD 6.2 > > > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > > > > 09:13:00 MST 2018 > > > > > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > > > Architecture: OpenBSD.amd64 > > > > Machine : amd64 > > > > >Description: > > > > processes getting stuck w/STATE=netlock, kill has no effect. > > > > >How-To-Repeat: > > > > using the desktop normally, until trying to restart chrome ends > > > > up failing. > > > > > > What do you mean with "using the desktop normally"? Which applications > > > are you using? Which browser plugins? Can you find out the minimum > > > setup to reproduce this deadlock? > > > > > > > I've had this happen to me atleast twice in the last few of > > > > weeks. > > > > > > Do you know how to reproduce it easily? > > > > > > > this time i had less than 10tabs open, so i guess it can be narrowed > > down even further. > > > > > > At first time i noticed how trying to launch chrome did lock up > > > > all the other processes in netlock, and "pkill chrome" did allow > > > > the system to recover, i was unable to figure out what was wrong > > > > and rebooting did make everything work again, while ie. > > > > removing ~/.cache & ~/.config did not. > > > > > > So the deadlock is related to your chrome usage? > > > > > > > now it does feel like so. i'll upgrade tonight. > > > > > > long before running the "ps cl" below, i had already killed all > > > > the xterm-windows those processes were in. cwm(1) was unable to > > > > kill some of those, but xkill did not. > > > > > > Well killing process waiting for the 'netlock' won't help. What has to > > > be find is which process is holding it. For that we need the full ps > > > output, including kernel and userland threads. > > > > > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > > > > $-prompt, and ^T did show xauth stuck in netlock.. > > > > i guess it's obvious where it was heading; so i got pics of > > > > "# reboot -nq" failing because stuck in the fckng netlock -_- > > > > > > > > i do have ddb.{panic,console,log}=1, but > > > > "# sysctl ddb.trigger=1" == > > > > "sysctl: ddb.trigger: Operation not supported by device" > > > > > > Not having DDB access will limit the debugging experience. Are you sure > > > you tried to enter it on your console? > > > > > > > so this requires ttyC0, right? > > this time it was ifconfig in [netlock], that prevented using ttyC0. > > i got there from X by running "virsh shutdown> i guess it emulates what pressing actual power button would(acpi?). > > > > > > ?? so i had no option but "virsh reset "... > > > > > > Did you try top(1)? What were the kernel processes doing? > > > > see below, if "top -bCHS -d 1 999" should do. > > anything else i could do? anyway, thanks in advance:) > > This is where the problems comes from: > > > 33315 443734 -60 141M 102M idle viowait 0:00 0.00% chrome: > > I don't understand how chrome can end up sleeping in vio_ioctl() and why > it is sleeping forever. But this thread is holding the NET_LOCK() and > prevents the rest of the kernel from making progress. > > Could you try a virtual interface different from vio(4) and see if you > can reproduce the problem? Will try with 'e1000', but then this does seem to me like it would have something to do with routing too(?), as the vio0 is only for reaching to the host. and separate physical interface, to which the default route belongs to. Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default10.0.1.2 UGS 11 65 - 8 em0 224/4 127.0.0.1 URS0 60 32768 8 lo0 10.0.1/24 10.0.1.1 UCn30 - 4 em0 10.0.1/24 10.0.1.1 US 00 - 8 em0 10.0.1.1 68:05:ca:23:90:88 UHLl 0 20 - 1 em0 10.0.1.2 bc:5f:f4:e6:e2:63 UHLch 4 80 - 3 em0 10.0.1.4 c8:3a:35:d8:ec:0b UHLc 05 - 3 em0 10.0.1.10 link#2 UHLch 2 10 - 3 em0 10.0.1.255 10.0.1.1 UHb00 - 1 em0 10.0.10/24 10.0.1.10 UGS00 - 8
Re: amd64: stuck in netlock
On 29/01/18(Mon) 20:38, Artturi Alm wrote: > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote: > > Hello Artturi, > > > > On 28/01/18(Sun) 09:08, Artturi Alm wrote: > > > >Synopsis:stuck in netlock > > > >Category:amd64 > > > >Environment: > > > System : OpenBSD 6.2 > > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > > > 09:13:00 MST 2018 > > > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.amd64 > > > Machine : amd64 > > > >Description: > > > processes getting stuck w/STATE=netlock, kill has no effect. > > > >How-To-Repeat: > > > using the desktop normally, until trying to restart chrome ends > > > up failing. > > > > What do you mean with "using the desktop normally"? Which applications > > are you using? Which browser plugins? Can you find out the minimum > > setup to reproduce this deadlock? > > > > > I've had this happen to me atleast twice in the last few of weeks. > > > > Do you know how to reproduce it easily? > > > > this time i had less than 10tabs open, so i guess it can be narrowed > down even further. > > > > At first time i noticed how trying to launch chrome did lock up > > > all the other processes in netlock, and "pkill chrome" did allow > > > the system to recover, i was unable to figure out what was wrong > > > and rebooting did make everything work again, while ie. > > > removing ~/.cache & ~/.config did not. > > > > So the deadlock is related to your chrome usage? > > > > now it does feel like so. i'll upgrade tonight. > > > > long before running the "ps cl" below, i had already killed all > > > the xterm-windows those processes were in. cwm(1) was unable to > > > kill some of those, but xkill did not. > > > > Well killing process waiting for the 'netlock' won't help. What has to > > be find is which process is holding it. For that we need the full ps > > output, including kernel and userland threads. > > > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > > > $-prompt, and ^T did show xauth stuck in netlock.. > > > i guess it's obvious where it was heading; so i got pics of > > > "# reboot -nq" failing because stuck in the fckng netlock -_- > > > > > > i do have ddb.{panic,console,log}=1, but > > > "# sysctl ddb.trigger=1" == > > > "sysctl: ddb.trigger: Operation not supported by device" > > > > Not having DDB access will limit the debugging experience. Are you sure > > you tried to enter it on your console? > > > > so this requires ttyC0, right? > this time it was ifconfig in [netlock], that prevented using ttyC0. > i got there from X by running "virsh shutdowni guess it emulates what pressing actual power button would(acpi?). > > > > ?? so i had no option but "virsh reset "... > > > > Did you try top(1)? What were the kernel processes doing? > > see below, if "top -bCHS -d 1 999" should do. > anything else i could do? anyway, thanks in advance:) This is where the problems comes from: > 33315 443734 -60 141M 102M idle viowait 0:00 0.00% chrome: I don't understand how chrome can end up sleeping in vio_ioctl() and why it is sleeping forever. But this thread is holding the NET_LOCK() and prevents the rest of the kernel from making progress. Could you try a virtual interface different from vio(4) and see if you can reproduce the problem?
Re: amd64: stuck in netlock
On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote: > Hello Artturi, > > On 28/01/18(Sun) 09:08, Artturi Alm wrote: > > >Synopsis: stuck in netlock > > >Category: amd64 > > >Environment: > > System : OpenBSD 6.2 > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > > 09:13:00 MST 2018 > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > processes getting stuck w/STATE=netlock, kill has no effect. > > >How-To-Repeat: > > using the desktop normally, until trying to restart chrome ends > > up failing. > > What do you mean with "using the desktop normally"? Which applications > are you using? Which browser plugins? Can you find out the minimum > setup to reproduce this deadlock? > > > I've had this happen to me atleast twice in the last few of weeks. > > Do you know how to reproduce it easily? > this time i had less than 10tabs open, so i guess it can be narrowed down even further. > > At first time i noticed how trying to launch chrome did lock up > > all the other processes in netlock, and "pkill chrome" did allow > > the system to recover, i was unable to figure out what was wrong > > and rebooting did make everything work again, while ie. > > removing ~/.cache & ~/.config did not. > > So the deadlock is related to your chrome usage? > now it does feel like so. i'll upgrade tonight. > > long before running the "ps cl" below, i had already killed all > > the xterm-windows those processes were in. cwm(1) was unable to > > kill some of those, but xkill did not. > > Well killing process waiting for the 'netlock' won't help. What has to > be find is which process is holding it. For that we need the full ps > output, including kernel and userland threads. > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > > $-prompt, and ^T did show xauth stuck in netlock.. > > i guess it's obvious where it was heading; so i got pics of > > "# reboot -nq" failing because stuck in the fckng netlock -_- > > > > i do have ddb.{panic,console,log}=1, but > > "# sysctl ddb.trigger=1" == > > "sysctl: ddb.trigger: Operation not supported by device" > > Not having DDB access will limit the debugging experience. Are you sure > you tried to enter it on your console? > so this requires ttyC0, right? this time it was ifconfig in [netlock], that prevented using ttyC0. i got there from X by running "virsh shutdown> ?? so i had no option but "virsh reset "... > > Did you try top(1)? What were the kernel processes doing? see below, if "top -bCHS -d 1 999" should do. anything else i could do? anyway, thanks in advance:) -Artturi load averages: 0.00, 0.02, 0.06tfort.my.domain 20:04:13 145 threads: 1 running, 139 idle, 5 on processor up 1 day, 11:38 CPU0 states: 0.2% user, 0.0% nice, 0.4% system, 0.3% interrupt, 99.2% idle CPU1 states: 1.1% user, 0.1% nice, 2.3% system, 0.0% interrupt, 96.5% idle CPU2 states: 1.3% user, 0.1% nice, 2.5% system, 0.0% interrupt, 96.1% idle CPU3 states: 0.9% user, 0.2% nice, 2.9% system, 0.0% interrupt, 96.0% idle CPU4 states: 0.3% user, 0.1% nice, 0.8% system, 0.0% interrupt, 98.8% idle CPU5 states: 0.4% user, 0.1% nice, 1.2% system, 0.0% interrupt, 98.3% idle Memory: Real: 285M/1053M act/tot Free: 6876M Cache: 521M Swap: 0K/4336M PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 14495 155467 20 35M 40M sleep/1 poll 39:05 1.61% /usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t 70058 507112 20 9652K 13M sleep/1 select0:02 0.05% xterm 13394 440936 -2200K 21M idle -35.3H 0.00% idle0 6862 125212 -2200K 21M onproc/5 -35.2H 0.00% idle5 43153 547872 -2200K 21M onproc/4 -35.0H 0.00% idle4 661 212291 -2200K 21M onproc/3 -34.7H 0.00% idle3 25137 319342 -2200K 21M onproc/1 -34.4H 0.00% idle1 65690 467656 -2200K 21M idle -34.4H 0.00% idle2 3067 485689 100 12M 23M idle netlock 3:12 0.00% weechat -r /connect freenode 87817 410790 68 200K 21M run/2 - 2:29 0.00% zerothread 14495 421539 20 35M 40M sleep/4 poll 1:51 0.00% /usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t 13992 615559 10 -20 888K 2452K idle netlock 0:47 0.00% ntpd: ntp engine 30357 245010 1000K 21M idle netlock 0:42 0.00% softclock 61217 230818 1000K 21M idle netlock 0:30 0.00% softnet 51008 255493 1800K 21M sleep/1 syncer0:30 0.00% update 70625 286762 1000K
wrong if used after adding new route - affects syslog, dhcrelay and more
> Problem Description Local originating traffic leaves the system on the wrong interface after a more specific route was added. This is problematic for services like dhcrelay and syslogd. I verified this on iced this on OpenBSD 6.1 but do not know how if was with older versions. The behaviour is still the same with current. > Workaround: Monitor the routing socket and restart affected services when the routing table changes. > How to reproduce: mistral ~# route -n show -inet Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface default172.18.35.1UGS7 408 -12 iwm0 224/4 127.0.0.1 URS0 1898 32768 8 lo0 127/8 127.0.0.1 UGRS 00 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 14 32768 1 lo0 172.18.35/24 172.18.35.87 UCn1 634 - 8 iwm0 172.18.35.1cc:4e:24:82:88:42 UHLch 1 312 - 7 iwm0 172.18.35.87 5c:e0:c5:1f:ad:c4 UHLl 0 14 - 1 iwm0 172.18.35.255 172.18.35.87 UHb0 633 - 1 iwm0 172.30.1/24192.168.250.18 UGS00 - 8 vether0 192.168.250/24 192.168.250.1 UCn10 - 4 vether0 192.168.250.1 fe:e1:ba:d0:05:e2 UHLl 05 - 1 vether0 192.168.250.18 fe:e1:bb:d1:a2:b9 UHLch 2 37 - 3 vether0 192.168.250.255192.168.250.1 UHb1 105 - 1 vether0 mistral ~# $mistral 130 ~$ nc -u 172.30.1.1 adsfsa --> do not press ^C, keep it open to generate traffic over the same socket after changing routes --> as expected traffic leaves on vether0 16:24:00.676256 rule 0/(match) match out on vether0: 192.168.250.1.28812 > 172.30.1.1.: udp 7 mistral ~# route del 172.30.1.0/24 --> now traffic leaves on iwm0. makes sense because of the default route. 16:24:51.078621 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > 172.30.1.1.: udp 3 mistral ~# route add 172.30.1.0/24 192.168.250.18 --> traffic still leaves iwm0. expectations: vether0 16:28:09.440038 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > 172.30.1.1.: udp 4 > dmesg OpenBSD 6.2-current (GENERIC.MP) #107: Tue Jan 16 21:58:15 CET 2018 r...@mistral.relo.ch:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 8473632768 (8081MB) avail mem = 8209895424 (7829MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe (84 entries) bios0: vendor Dell Inc. version "A13" date 06/16/2017 bios0: Dell Inc. XPS 13 9343 acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG HPET SSDT UEFI SSDT ASF! SSDT SSDT SSDT SSDT PCCT SSDT SSDT SSDT SLIC MSDM DMAR CSRT BGRT acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.62 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT cpu0: 256KB 64b/line 8-way L2 cache acpitimer0: recalibrated TSC frequency 2194928041 Hz cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.22 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 1 (application processor) cpu2: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.22 MHz cpu2:
Re: amd64: stuck in netlock
On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote: > Hello Artturi, > > On 28/01/18(Sun) 09:08, Artturi Alm wrote: > > >Synopsis: stuck in netlock > > >Category: amd64 > > >Environment: > > System : OpenBSD 6.2 > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > > 09:13:00 MST 2018 > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > processes getting stuck w/STATE=netlock, kill has no effect. > > >How-To-Repeat: > > using the desktop normally, until trying to restart chrome ends > > up failing. > > What do you mean with "using the desktop normally"? Which applications > are you using? Which browser plugins? Can you find out the minimum > setup to reproduce this deadlock? > I had mupdf, gvim, weechat and chromium running out of packages, not much else even installed, and no browser plugins. if i had only one machine to use, this would be it, so kind of hard to minimize the setup as i had +24hrs of use(or atleast uptime) before this got triggered. > > I've had this happen to me atleast twice in the last few of weeks. > > Do you know how to reproduce it easily? > No i don't, but i will try to stay alert for this to notice it before i go killing stuff randomly in despair. > > At first time i noticed how trying to launch chrome did lock up > > all the other processes in netlock, and "pkill chrome" did allow > > the system to recover, i was unable to figure out what was wrong > > and rebooting did make everything work again, while ie. > > removing ~/.cache & ~/.config did not. > > So the deadlock is related to your chrome usage? > Possibly, i've an issue with crome, where it will eventually stop playing videos. as an example let's say i've got +50 tabs open in a single "main"window, and then open a second one with "chromium --incognito" and make it play some playlist from youtube, once the playback ceases(at the beginning of a new vid). i have to restart all chrome processes to have it continue. i was guessing it's not related, as i've had this playback issues like before 6.2 iirc., and even when playback is not working, it does keep downloading/buffering the video and everything else does work w/o issues in the other chrome window. > > long before running the "ps cl" below, i had already killed all > > the xterm-windows those processes were in. cwm(1) was unable to > > kill some of those, but xkill did not. > > Well killing process waiting for the 'netlock' won't help. What has to > be find is which process is holding it. For that we need the full ps > output, including kernel and userland threads. Ok, i'll get those if/when i run into this again. > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > > $-prompt, and ^T did show xauth stuck in netlock.. > > i guess it's obvious where it was heading; so i got pics of > > "# reboot -nq" failing because stuck in the fckng netlock -_- > > > > i do have ddb.{panic,console,log}=1, but > > "# sysctl ddb.trigger=1" == > > "sysctl: ddb.trigger: Operation not supported by device" > > Not having DDB access will limit the debugging experience. Are you sure > you tried to enter it on your console? > Yes, i had already exited X, or do you mean above would only work from what i get into with ctrl+alt+f1? and not ie. ctrl+alt+f2? using the first VT(or whatever those are) was impossible as there was the xauth locked up giving me no prompt.. > > ?? so i had no option but "virsh reset "... > > Did you try top(1)? What were the kernel processes doing? Yes, but i didn't pay attention to anything but how weechat went waiting on netlock if i launched chrome. on first time i ran into it, i think launching chrome froze systat too.
Re: amd64: stuck in netlock
Hello Artturi, On 28/01/18(Sun) 09:08, Artturi Alm wrote: > >Synopsis:stuck in netlock > >Category:amd64 > >Environment: > System : OpenBSD 6.2 > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7 > 09:13:00 MST 2018 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > processes getting stuck w/STATE=netlock, kill has no effect. > >How-To-Repeat: > using the desktop normally, until trying to restart chrome ends > up failing. What do you mean with "using the desktop normally"? Which applications are you using? Which browser plugins? Can you find out the minimum setup to reproduce this deadlock? > I've had this happen to me atleast twice in the last few of weeks. Do you know how to reproduce it easily? > At first time i noticed how trying to launch chrome did lock up > all the other processes in netlock, and "pkill chrome" did allow > the system to recover, i was unable to figure out what was wrong > and rebooting did make everything work again, while ie. > removing ~/.cache & ~/.config did not. So the deadlock is related to your chrome usage? > long before running the "ps cl" below, i had already killed all > the xterm-windows those processes were in. cwm(1) was unable to > kill some of those, but xkill did not. Well killing process waiting for the 'netlock' won't help. What has to be find is which process is holding it. For that we need the full ps output, including kernel and userland threads. > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to > $-prompt, and ^T did show xauth stuck in netlock.. > i guess it's obvious where it was heading; so i got pics of > "# reboot -nq" failing because stuck in the fckng netlock -_- > > i do have ddb.{panic,console,log}=1, but > "# sysctl ddb.trigger=1" == > "sysctl: ddb.trigger: Operation not supported by device" Not having DDB access will limit the debugging experience. Are you sure you tried to enter it on your console? > ?? so i had no option but "virsh reset "... Did you try top(1)? What were the kernel processes doing?