pf table cannot contain multiple PFRKE_ROUTE to same IP on different interfaces

2018-01-29 Thread NOP
Dear OpenBSD developers,

In packet filter, it is not possible to define a "route-to" rule with
multiple destinations having the same IP address but on different
interfaces.

Minimal example to reproduce the problem:

# cat /etc/hostname.lo5
rdomain 5
inet 127.0.0.1 255.0.0.0
# cat /etc/hostname.lo6
rdomain 6
inet 127.0.0.1 255.0.0.0
# cat /etc/pf.conf
pass in on vio0 to 123.123.123.123 route-to { (lo5 127.0.10.1), (lo6
127.0.10.1) } round-robin
# pfctl -f /etc/pf.conf -v
table <__automatic_5854be65_0> const { 127.0.10.1@lo5 127.0.10.1@lo6 }
pass in on vio0 inet from any to 123.123.123.123 flags S/SA route-to
<__automatic_5854be65_0> round-robin
# pfctl -T show -t __automatic_5854be65_0
   127.0.10.1@lo5

In practice, I need this for routing traffic to several OpenVPN tunnels
in a round-robin fashion. Unfortunately, my VPN provider uses the same
gateway IP for all their servers.

pass in on vlan123 route-to { (tun0 tun0:peer), (tun1 tun1:peer) }
round-robin


The second address is not added because of this:
- In /sys/net/pf_table.c:1653, in the pfr_ina_define function, the call
to pfr_lookup_addr returns non NULL
- In /sys/net/pf_table.c:815, in the pfr_lookup_addr function, rn_match
returns non NULL
- In /sys/net/radix.c:263-265, in the rn_match function, the for loops
checks for differences in the IP prefix, does not find any and returns
the existing node in the tree.

The problem is that only the IP address and mask are taken into
consideration when searching a node in the radix tree, the interface is
ignored. Therefore it's not possible to store two nodes with the same IP
but different interfaces (127.0.10.1@lo5 and 127.0.10.1@lo6).

Unfortunately, I did not manage to understand in details how the radix
tree worked, especially the nodes ordering so I was not able to patch it
to add the interface information.

Can someone who knows this code better try to fix this problem or point
me in the right direction?

Thanks a lot for all your work on OpenBSD and thank you in advance for
your help.

Kind regards,

NOP



Re: vmd terminates vms without an explicit "boot" line

2018-01-29 Thread Mike Larkin
On Mon, Jan 29, 2018 at 08:23:42PM -0800, Mike Larkin wrote:
> On Thu, Jan 25, 2018 at 09:57:46PM -0700, Aaron Bieber wrote:
> > Hola!
> > 
> > This one is a bit funky. I just setup a new server with Hetzner. When I
> > try to boot vms on it, they only start when I have a "boot" entry
> > specified. Anything that uses the bios (doesn't have a boot entry) fails
> > fairly silently.
> > 
> > I can take the same config (without a boot entry) on my x240, and it
> > boots (I can see the seabios startup).
> > 
> > Hetzner box:
> >   hw.model=Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
> 
> This CPU is the problem. That is an a very old Nehalem CPU, which lacks

Technically it's a Bloomfield but the problem is the same :)

> the "unrestricted guest" virtualization feature required to run virtualized
> real mode code (eg, bios). This also means you're going to be stuck with
> OpenBSD guests only.
> 
> You're going to have to use the -b option (or the "boot" entry like you
> noted) on this CPU. I have plans to fix that someday but other things keep
> jumping in front of this in line.
> 
> Does Hetzner offer a newer CPU option? (This CPU is 9 years old).
> 
> -ml
> 
> >   cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> > 
> > x240:
> >   hw.model=Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
> >   cpu0:
> >   
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT
> > 
> > Config:
> >   switch "nat" {
> >   interface bridge1
> >   }
> > 
> >   vm "test" {
> >   disable
> >   memory 1G
> >   disk "/vm/test.img"
> > 
> >   interface { switch "nat" }
> >   }
> > 
> > The test.img file is a freshly created file with no OS installed.
> > 
> > When starting the vm with debug on the Hetzner server:
> >   frank# vmd -dvvv
> >   startup
> >   /etc/vm.conf:5: switch "nat" registered
> >   vm_register: registering vm 1
> >   /etc/vm.conf:13: vm "test" registered (disabled)
> >   vm_priv_brconfig: interface bridge1 description switch1-nat
> >   vmd_configure: not creating vm test (disabled)
> >   config_setconfig: setting config
> >   config_getconfig: retrieving config
> >   config_getconfig: retrieving config
> >   config_getconfig: retrieving config
> >   vm_opentty: vm test tty /dev/ttyp5 uid 0 gid 4 mode 620
> >   vm_register: registering vm 1
> >   vm_priv_ifconfig: interface tap0 description vm1-if0-test
> >   loadfile_bios: loaded BIOS image
> >   vm_priv_ifconfig: switch "nat" interface bridge1 add tap0
> >   run_vm: initializing hardware for vm test
> >   test: started vm 1 successfully, tty /dev/ttyp5
> >   virtio_init: vm "test" vio0 lladdr fe:e1:bb:d1:e4:39
> >   run_vm: starting vcpu threads for vm test
> >   vcpu_reset: resetting vcpu 0 for vm 33
> >   vmd: cannot reset VCPU 0 - exiting.
> >   vmm_sighdlr: handling signal 20
> >   vmm_sighdlr: attempting to terminate vm 1
> >   terminate_vm: terminating vmid 33
> >   vmm_sighdlr: calling vm_remove
> >   vm_remove: removing vm id 1 from running config
> >   vm_remove: calling vm_stop
> >   vm_stop: stopping vm 1
> >   vmd_dispatch_vmm: handling TERMINATE_EVENT for vm id 1 ret 5
> >   vmd_dispatch_vmm: about to stop vm id 1
> >   vm_stop: stopping vm 1
> > 
> > Output from vmctl when starting:
> >   frank# vmctl start test -c
> >   Connected to /dev/ttyp5 (speed 115200)
> > 
> >   [EOT]
> >   frank#
> > 
> > Cheers,
> > Aaron
> > 
> > --
> > PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A  4AF0 1F81 112D 62A9 ADCE
> > 
> 



Re: vmd terminates vms without an explicit "boot" line

2018-01-29 Thread Mike Larkin
On Thu, Jan 25, 2018 at 09:57:46PM -0700, Aaron Bieber wrote:
> Hola!
> 
> This one is a bit funky. I just setup a new server with Hetzner. When I
> try to boot vms on it, they only start when I have a "boot" entry
> specified. Anything that uses the bios (doesn't have a boot entry) fails
> fairly silently.
> 
> I can take the same config (without a boot entry) on my x240, and it
> boots (I can see the seabios startup).
> 
> Hetzner box:
>   hw.model=Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz

This CPU is the problem. That is an a very old Nehalem CPU, which lacks
the "unrestricted guest" virtualization feature required to run virtualized
real mode code (eg, bios). This also means you're going to be stuck with
OpenBSD guests only.

You're going to have to use the -b option (or the "boot" entry like you
noted) on this CPU. I have plans to fix that someday but other things keep
jumping in front of this in line.

Does Hetzner offer a newer CPU option? (This CPU is 9 years old).

-ml

>   cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,SENSOR
> 
> x240:
>   hw.model=Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
>   cpu0:
>   
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT
> 
> Config:
>   switch "nat" {
>   interface bridge1
>   }
> 
>   vm "test" {
>   disable
>   memory 1G
>   disk "/vm/test.img"
> 
>   interface { switch "nat" }
>   }
> 
> The test.img file is a freshly created file with no OS installed.
> 
> When starting the vm with debug on the Hetzner server:
>   frank# vmd -dvvv
>   startup
>   /etc/vm.conf:5: switch "nat" registered
>   vm_register: registering vm 1
>   /etc/vm.conf:13: vm "test" registered (disabled)
>   vm_priv_brconfig: interface bridge1 description switch1-nat
>   vmd_configure: not creating vm test (disabled)
>   config_setconfig: setting config
>   config_getconfig: retrieving config
>   config_getconfig: retrieving config
>   config_getconfig: retrieving config
>   vm_opentty: vm test tty /dev/ttyp5 uid 0 gid 4 mode 620
>   vm_register: registering vm 1
>   vm_priv_ifconfig: interface tap0 description vm1-if0-test
>   loadfile_bios: loaded BIOS image
>   vm_priv_ifconfig: switch "nat" interface bridge1 add tap0
>   run_vm: initializing hardware for vm test
>   test: started vm 1 successfully, tty /dev/ttyp5
>   virtio_init: vm "test" vio0 lladdr fe:e1:bb:d1:e4:39
>   run_vm: starting vcpu threads for vm test
>   vcpu_reset: resetting vcpu 0 for vm 33
>   vmd: cannot reset VCPU 0 - exiting.
>   vmm_sighdlr: handling signal 20
>   vmm_sighdlr: attempting to terminate vm 1
>   terminate_vm: terminating vmid 33
>   vmm_sighdlr: calling vm_remove
>   vm_remove: removing vm id 1 from running config
>   vm_remove: calling vm_stop
>   vm_stop: stopping vm 1
>   vmd_dispatch_vmm: handling TERMINATE_EVENT for vm id 1 ret 5
>   vmd_dispatch_vmm: about to stop vm id 1
>   vm_stop: stopping vm 1
> 
> Output from vmctl when starting:
>   frank# vmctl start test -c
>   Connected to /dev/ttyp5 (speed 115200)
> 
>   [EOT]
>   frank#
> 
> Cheers,
> Aaron
> 
> --
> PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A  4AF0 1F81 112D 62A9 ADCE
> 



Re: wrong if used after adding new route - affects syslog, dhcrelay and more

2018-01-29 Thread Remi Locherer
On Mon, Jan 29, 2018 at 07:33:47PM +0100, Remi Locherer wrote:
> > Problem Description
> Local originating traffic leaves the system on the wrong interface
> after a more specific route was added. This is problematic for services
> like dhcrelay and syslogd.
> 
> I verified this on iced this on OpenBSD 6.1 but do not know how if was with
> older versions. The behaviour is still the same with current.

What I wanted to write:
I verified this behaviour on OpenBSD 6.1, 6.2 and -current. I do not know
if it was different with older releases.

> 
> > Workaround:
> Monitor the routing socket and restart affected services when the routing
> table changes.
> 
> > How to reproduce:
> mistral ~# route -n show -inet
> Routing tables
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default172.18.35.1UGS7  408 -12 iwm0 
> 224/4  127.0.0.1  URS0 1898 32768 8 lo0  
> 127/8  127.0.0.1  UGRS   00 32768 8 lo0  
> 127.0.0.1  127.0.0.1  UHhl   14 32768 1 lo0  
> 172.18.35/24   172.18.35.87   UCn1  634 - 8 iwm0 
> 172.18.35.1cc:4e:24:82:88:42  UHLch  1  312 - 7 iwm0 
> 172.18.35.87   5c:e0:c5:1f:ad:c4  UHLl   0   14 - 1 iwm0 
> 172.18.35.255  172.18.35.87   UHb0  633 - 1 iwm0 
> 172.30.1/24192.168.250.18 UGS00 - 8 
> vether0
> 192.168.250/24 192.168.250.1  UCn10 - 4 
> vether0
> 192.168.250.1  fe:e1:ba:d0:05:e2  UHLl   05 - 1 
> vether0
> 192.168.250.18 fe:e1:bb:d1:a2:b9  UHLch  2   37 - 3 
> vether0
> 192.168.250.255192.168.250.1  UHb1  105 - 1 
> vether0
> mistral ~#
> 
> $mistral 130 ~$ nc -u 172.30.1.1  
> adsfsa
> 
> --> do not press ^C, keep it open to generate traffic over the same
> socket after changing routes
> 
> --> as expected traffic leaves on vether0
> 16:24:00.676256 rule 0/(match) match out on vether0: 192.168.250.1.28812 > 
> 172.30.1.1.: udp 7
> 
> mistral ~# route del 172.30.1.0/24
> 
> --> now traffic leaves on iwm0. makes sense because of the default route.
> 16:24:51.078621 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > 
> 172.30.1.1.: udp 3
> 
> mistral ~# route add 172.30.1.0/24 192.168.250.18
> 
> --> traffic still leaves iwm0. expectations: vether0
> 16:28:09.440038 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > 
> 172.30.1.1.: udp 4
> 
> 
> > dmesg
> 
> OpenBSD 6.2-current (GENERIC.MP) #107: Tue Jan 16 21:58:15 CET 2018
> r...@mistral.relo.ch:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8473632768 (8081MB)
> avail mem = 8209895424 (7829MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe (84 entries)
> bios0: vendor Dell Inc. version "A13" date 06/16/2017
> bios0: Dell Inc. XPS 13 9343
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG HPET SSDT UEFI SSDT ASF! SSDT 
> SSDT SSDT SSDT PCCT SSDT SSDT SSDT SLIC MSDM DMAR CSRT BGRT
> acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) 
> PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) 
> PXSX(S4) RP05(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.62 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
> cpu0: 256KB 64b/line 8-way L2 cache
> acpitimer0: recalibrated TSC frequency 2194928041 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.22 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 1 (application processor)
> 

Re: amd64: stuck in netlock

2018-01-29 Thread Artturi Alm
On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote:
> On 29/01/18(Mon) 20:38, Artturi Alm wrote:
> > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > > Hello Artturi,
> > > 
> > > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > > >Synopsis:  stuck in netlock
> > > > >Category:  amd64
> > > > >Environment:
> > > > System  : OpenBSD 6.2
> > > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> > > > 09:13:00 MST 2018
> > > >  
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > > >Description:
> > > > processes getting stuck w/STATE=netlock, kill has no effect.
> > > > >How-To-Repeat:
> > > > using the desktop normally, until trying to restart chrome ends
> > > > up failing.
> > > 
> > > What do you mean with "using the desktop normally"?  Which applications
> > > are you using?  Which browser plugins?  Can you find out the minimum
> > > setup to reproduce this deadlock?
> > > 
> > > > I've had this happen to me atleast twice in the last few of 
> > > > weeks.
> > > 
> > > Do you know how to reproduce it easily?
> > > 
> > 
> > this time i had less than 10tabs open, so i guess it can be narrowed
> > down even further.
> > 
> > > > At first time i noticed how trying to launch chrome did lock up
> > > > all the other processes in netlock, and "pkill chrome" did allow
> > > > the system to recover, i was unable to figure out what was wrong
> > > > and rebooting did make everything work again, while ie.
> > > > removing ~/.cache & ~/.config did not.
> > > 
> > > So the deadlock is related to your chrome usage?
> > > 
> > 
> > now it does feel like so. i'll upgrade tonight.
> > 
> > > > long before running the "ps cl" below, i had already killed all
> > > > the xterm-windows those processes were in. cwm(1) was unable to
> > > > kill some of those, but xkill did not.
> > > 
> > > Well killing process waiting for the 'netlock' won't help.  What has to
> > > be find is which process is holding it.  For that we need the full ps
> > > output, including kernel and userland threads.
> > > > 
> > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > > > $-prompt, and ^T did show xauth stuck in netlock..
> > > > i guess it's obvious where it was heading; so i got pics of
> > > > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > > > 
> > > > i do have ddb.{panic,console,log}=1, but
> > > > "# sysctl ddb.trigger=1" ==
> > > > "sysctl: ddb.trigger: Operation not supported by device"
> > > 
> > > Not having DDB access will limit the debugging experience.  Are you sure
> > > you tried to enter it on your console?
> > > 
> > 
> > so this requires ttyC0, right?
> > this time it was ifconfig in [netlock], that prevented using ttyC0.
> > i got there from X by running "virsh shutdown  > i guess it emulates what pressing actual power button would(acpi?).
> > 
> > > > ?? so i had no option but "virsh reset "...
> > > 
> > > Did you try top(1)?  What were the kernel processes doing?
> > 
> > see below, if "top -bCHS -d 1 999" should do.
> > anything else i could do? anyway, thanks in advance:)
> 
> This is where the problems comes from: 
> 
> > 33315   443734  -60  141M  102M idle  viowait   0:00  0.00% chrome: 
> 
> I don't understand how chrome can end up sleeping in vio_ioctl() and why
> it is sleeping forever.  But this thread is holding the NET_LOCK() and
> prevents the rest of the kernel from making progress.
> 
> Could you try a virtual interface different from vio(4) and see if you
> can reproduce the problem?

Will try with 'e1000', but then this does seem to me like it would have
something to do with routing too(?), as the vio0 is only for reaching to
the host.
and separate physical interface, to which the default route belongs to.


Routing tables

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default10.0.1.2   UGS   11   65 - 8 em0
224/4  127.0.0.1  URS0   60 32768 8 lo0
10.0.1/24  10.0.1.1   UCn30 - 4 em0
10.0.1/24  10.0.1.1   US 00 - 8 em0
10.0.1.1   68:05:ca:23:90:88  UHLl   0   20 - 1 em0
10.0.1.2   bc:5f:f4:e6:e2:63  UHLch  4   80 - 3 em0
10.0.1.4   c8:3a:35:d8:ec:0b  UHLc   05 - 3 em0
10.0.1.10  link#2 UHLch  2   10 - 3 em0
10.0.1.255 10.0.1.1   UHb00 - 1 em0
10.0.10/24 10.0.1.10  UGS00 - 8 

Re: amd64: stuck in netlock

2018-01-29 Thread Martin Pieuchot
On 29/01/18(Mon) 20:38, Artturi Alm wrote:
> On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > Hello Artturi,
> > 
> > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > >Synopsis:stuck in netlock
> > > >Category:amd64
> > > >Environment:
> > >   System  : OpenBSD 6.2
> > >   Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> > > 09:13:00 MST 2018
> > >
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > > >Description:
> > >   processes getting stuck w/STATE=netlock, kill has no effect.
> > > >How-To-Repeat:
> > >   using the desktop normally, until trying to restart chrome ends
> > >   up failing.
> > 
> > What do you mean with "using the desktop normally"?  Which applications
> > are you using?  Which browser plugins?  Can you find out the minimum
> > setup to reproduce this deadlock?
> > 
> > >   I've had this happen to me atleast twice in the last few of weeks.
> > 
> > Do you know how to reproduce it easily?
> > 
> 
> this time i had less than 10tabs open, so i guess it can be narrowed
> down even further.
> 
> > >   At first time i noticed how trying to launch chrome did lock up
> > >   all the other processes in netlock, and "pkill chrome" did allow
> > >   the system to recover, i was unable to figure out what was wrong
> > >   and rebooting did make everything work again, while ie.
> > >   removing ~/.cache & ~/.config did not.
> > 
> > So the deadlock is related to your chrome usage?
> > 
> 
> now it does feel like so. i'll upgrade tonight.
> 
> > >   long before running the "ps cl" below, i had already killed all
> > >   the xterm-windows those processes were in. cwm(1) was unable to
> > >   kill some of those, but xkill did not.
> > 
> > Well killing process waiting for the 'netlock' won't help.  What has to
> > be find is which process is holding it.  For that we need the full ps
> > output, including kernel and userland threads.
> > > 
> > >   after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > >   $-prompt, and ^T did show xauth stuck in netlock..
> > >   i guess it's obvious where it was heading; so i got pics of
> > >   "# reboot -nq" failing because stuck in the fckng netlock -_-
> > > 
> > >   i do have ddb.{panic,console,log}=1, but
> > >   "# sysctl ddb.trigger=1" ==
> > >   "sysctl: ddb.trigger: Operation not supported by device"
> > 
> > Not having DDB access will limit the debugging experience.  Are you sure
> > you tried to enter it on your console?
> > 
> 
> so this requires ttyC0, right?
> this time it was ifconfig in [netlock], that prevented using ttyC0.
> i got there from X by running "virsh shutdown  i guess it emulates what pressing actual power button would(acpi?).
> 
> > >   ?? so i had no option but "virsh reset "...
> > 
> > Did you try top(1)?  What were the kernel processes doing?
> 
> see below, if "top -bCHS -d 1 999" should do.
> anything else i could do? anyway, thanks in advance:)

This is where the problems comes from: 

> 33315   443734  -60  141M  102M idle  viowait   0:00  0.00% chrome: 

I don't understand how chrome can end up sleeping in vio_ioctl() and why
it is sleeping forever.  But this thread is holding the NET_LOCK() and
prevents the rest of the kernel from making progress.

Could you try a virtual interface different from vio(4) and see if you
can reproduce the problem?



Re: amd64: stuck in netlock

2018-01-29 Thread Artturi Alm
On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> Hello Artturi,
> 
> On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > >Synopsis:  stuck in netlock
> > >Category:  amd64
> > >Environment:
> > System  : OpenBSD 6.2
> > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> > 09:13:00 MST 2018
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > processes getting stuck w/STATE=netlock, kill has no effect.
> > >How-To-Repeat:
> > using the desktop normally, until trying to restart chrome ends
> > up failing.
> 
> What do you mean with "using the desktop normally"?  Which applications
> are you using?  Which browser plugins?  Can you find out the minimum
> setup to reproduce this deadlock?
> 
> > I've had this happen to me atleast twice in the last few of weeks.
> 
> Do you know how to reproduce it easily?
> 

this time i had less than 10tabs open, so i guess it can be narrowed
down even further.

> > At first time i noticed how trying to launch chrome did lock up
> > all the other processes in netlock, and "pkill chrome" did allow
> > the system to recover, i was unable to figure out what was wrong
> > and rebooting did make everything work again, while ie.
> > removing ~/.cache & ~/.config did not.
> 
> So the deadlock is related to your chrome usage?
> 

now it does feel like so. i'll upgrade tonight.

> > long before running the "ps cl" below, i had already killed all
> > the xterm-windows those processes were in. cwm(1) was unable to
> > kill some of those, but xkill did not.
> 
> Well killing process waiting for the 'netlock' won't help.  What has to
> be find is which process is holding it.  For that we need the full ps
> output, including kernel and userland threads.
> > 
> > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > $-prompt, and ^T did show xauth stuck in netlock..
> > i guess it's obvious where it was heading; so i got pics of
> > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > 
> > i do have ddb.{panic,console,log}=1, but
> > "# sysctl ddb.trigger=1" ==
> > "sysctl: ddb.trigger: Operation not supported by device"
> 
> Not having DDB access will limit the debugging experience.  Are you sure
> you tried to enter it on your console?
> 

so this requires ttyC0, right?
this time it was ifconfig in [netlock], that prevented using ttyC0.
i got there from X by running "virsh shutdown  > ?? so i had no option but "virsh reset "...
> 
> Did you try top(1)?  What were the kernel processes doing?

see below, if "top -bCHS -d 1 999" should do.
anything else i could do? anyway, thanks in advance:)
-Artturi

load averages:  0.00,  0.02,  0.06tfort.my.domain 20:04:13
145 threads: 1 running, 139 idle, 5 on processor  up 1 day, 11:38
CPU0 states:  0.2% user,  0.0% nice,  0.4% system,  0.3% interrupt, 99.2% idle
CPU1 states:  1.1% user,  0.1% nice,  2.3% system,  0.0% interrupt, 96.5% idle
CPU2 states:  1.3% user,  0.1% nice,  2.5% system,  0.0% interrupt, 96.1% idle
CPU3 states:  0.9% user,  0.2% nice,  2.9% system,  0.0% interrupt, 96.0% idle
CPU4 states:  0.3% user,  0.1% nice,  0.8% system,  0.0% interrupt, 98.8% idle
CPU5 states:  0.4% user,  0.1% nice,  1.2% system,  0.0% interrupt, 98.3% idle
Memory: Real: 285M/1053M act/tot Free: 6876M Cache: 521M Swap: 0K/4336M

  PID  TID PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND
14495   155467   20   35M   40M sleep/1   poll 39:05  1.61% 
/usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t
70058   507112   20 9652K   13M sleep/1   select0:02  0.05% xterm
13394   440936 -2200K   21M idle  -35.3H  0.00% idle0
 6862   125212 -2200K   21M onproc/5  -35.2H  0.00% idle5
43153   547872 -2200K   21M onproc/4  -35.0H  0.00% idle4
  661   212291 -2200K   21M onproc/3  -34.7H  0.00% idle3
25137   319342 -2200K   21M onproc/1  -34.4H  0.00% idle1
65690   467656 -2200K   21M idle  -34.4H  0.00% idle2
 3067   485689  100   12M   23M idle  netlock   3:12  0.00% weechat -r 
/connect freenode
87817   410790  68   200K   21M run/2 - 2:29  0.00% zerothread
14495   421539   20   35M   40M sleep/4   poll  1:51  0.00% 
/usr/X11R6/bin/X :0 -auth /home/aalm/.serverauth.KsYBQlXE5t
13992   615559  10  -20  888K 2452K idle  netlock   0:47  0.00% ntpd: ntp 
engine
30357   245010  1000K   21M idle  netlock   0:42  0.00% softclock
61217   230818  1000K   21M idle  netlock   0:30  0.00% softnet
51008   255493  1800K   21M sleep/1   syncer0:30  0.00% update
70625   286762  1000K   

wrong if used after adding new route - affects syslog, dhcrelay and more

2018-01-29 Thread Remi Locherer
> Problem Description
Local originating traffic leaves the system on the wrong interface
after a more specific route was added. This is problematic for services
like dhcrelay and syslogd.

I verified this on iced this on OpenBSD 6.1 but do not know how if was with
older versions. The behaviour is still the same with current.

> Workaround:
Monitor the routing socket and restart affected services when the routing
table changes.

> How to reproduce:
mistral ~# route -n show -inet
Routing tables

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default172.18.35.1UGS7  408 -12 iwm0 
224/4  127.0.0.1  URS0 1898 32768 8 lo0  
127/8  127.0.0.1  UGRS   00 32768 8 lo0  
127.0.0.1  127.0.0.1  UHhl   14 32768 1 lo0  
172.18.35/24   172.18.35.87   UCn1  634 - 8 iwm0 
172.18.35.1cc:4e:24:82:88:42  UHLch  1  312 - 7 iwm0 
172.18.35.87   5c:e0:c5:1f:ad:c4  UHLl   0   14 - 1 iwm0 
172.18.35.255  172.18.35.87   UHb0  633 - 1 iwm0 
172.30.1/24192.168.250.18 UGS00 - 8 vether0
192.168.250/24 192.168.250.1  UCn10 - 4 vether0
192.168.250.1  fe:e1:ba:d0:05:e2  UHLl   05 - 1 vether0
192.168.250.18 fe:e1:bb:d1:a2:b9  UHLch  2   37 - 3 vether0
192.168.250.255192.168.250.1  UHb1  105 - 1 vether0
mistral ~#

$mistral 130 ~$ nc -u 172.30.1.1  
adsfsa

--> do not press ^C, keep it open to generate traffic over the same
socket after changing routes

--> as expected traffic leaves on vether0
16:24:00.676256 rule 0/(match) match out on vether0: 192.168.250.1.28812 > 
172.30.1.1.: udp 7

mistral ~# route del 172.30.1.0/24

--> now traffic leaves on iwm0. makes sense because of the default route.
16:24:51.078621 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > 
172.30.1.1.: udp 3

mistral ~# route add 172.30.1.0/24 192.168.250.18

--> traffic still leaves iwm0. expectations: vether0
16:28:09.440038 rule 0/(match) match out on iwm0: 192.168.250.1.28812 > 
172.30.1.1.: udp 4


> dmesg

OpenBSD 6.2-current (GENERIC.MP) #107: Tue Jan 16 21:58:15 CET 2018
r...@mistral.relo.ch:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8473632768 (8081MB)
avail mem = 8209895424 (7829MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xe (84 entries)
bios0: vendor Dell Inc. version "A13" date 06/16/2017
bios0: Dell Inc. XPS 13 9343
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG HPET SSDT UEFI SSDT ASF! SSDT SSDT 
SSDT SSDT PCCT SSDT SSDT SSDT SLIC MSDM DMAR CSRT BGRT
acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) 
PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) 
PXSX(S4) RP05(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.62 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
acpitimer0: recalibrated TSC frequency 2194928041 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.22 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, 2494.22 MHz
cpu2: 

Re: amd64: stuck in netlock

2018-01-29 Thread Artturi Alm
On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> Hello Artturi,
> 
> On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > >Synopsis:  stuck in netlock
> > >Category:  amd64
> > >Environment:
> > System  : OpenBSD 6.2
> > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> > 09:13:00 MST 2018
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > processes getting stuck w/STATE=netlock, kill has no effect.
> > >How-To-Repeat:
> > using the desktop normally, until trying to restart chrome ends
> > up failing.
> 
> What do you mean with "using the desktop normally"?  Which applications
> are you using?  Which browser plugins?  Can you find out the minimum
> setup to reproduce this deadlock?
> 

I had mupdf, gvim, weechat and chromium running out of packages, not much
else even installed, and no browser plugins.
if i had only one machine to use, this would be it, so kind of hard to
minimize the setup as i had +24hrs of use(or atleast uptime) before this
got triggered.

> > I've had this happen to me atleast twice in the last few of weeks.
> 
> Do you know how to reproduce it easily?
> 

No i don't, but i will try to stay alert for this to notice it before i go
killing stuff randomly in despair.

> > At first time i noticed how trying to launch chrome did lock up
> > all the other processes in netlock, and "pkill chrome" did allow
> > the system to recover, i was unable to figure out what was wrong
> > and rebooting did make everything work again, while ie.
> > removing ~/.cache & ~/.config did not.
> 
> So the deadlock is related to your chrome usage?
> 

Possibly, i've an issue with crome, where it will eventually stop playing
videos.
as an example let's say i've got +50 tabs open in a single "main"window,
and then open a second one with "chromium --incognito" and make it play
some playlist from youtube, once the playback ceases(at the beginning
of a new vid). i have to restart all chrome processes to have it continue.

i was guessing it's not related, as i've had this playback issues like
before 6.2 iirc., and even when playback is not working, it does keep
downloading/buffering the video and everything else does work w/o issues
in the other chrome window.

> > long before running the "ps cl" below, i had already killed all
> > the xterm-windows those processes were in. cwm(1) was unable to
> > kill some of those, but xkill did not.
> 
> Well killing process waiting for the 'netlock' won't help.  What has to
> be find is which process is holding it.  For that we need the full ps
> output, including kernel and userland threads.

Ok, i'll get those if/when i run into this again.

> > 
> > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > $-prompt, and ^T did show xauth stuck in netlock..
> > i guess it's obvious where it was heading; so i got pics of
> > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > 
> > i do have ddb.{panic,console,log}=1, but
> > "# sysctl ddb.trigger=1" ==
> > "sysctl: ddb.trigger: Operation not supported by device"
> 
> Not having DDB access will limit the debugging experience.  Are you sure
> you tried to enter it on your console?
> 

Yes, i had already exited X, or do you mean above would only work from
what i get into with ctrl+alt+f1? and not ie. ctrl+alt+f2?
using the first VT(or whatever those are) was impossible as there
was the xauth locked up giving me no prompt..

> > ?? so i had no option but "virsh reset "...
> 
> Did you try top(1)?  What were the kernel processes doing?

Yes, but i didn't pay attention to anything but how weechat
went waiting on netlock if i launched chrome.
on first time i ran into it, i think launching chrome froze systat too.



Re: amd64: stuck in netlock

2018-01-29 Thread Martin Pieuchot
Hello Artturi,

On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> >Synopsis:stuck in netlock
> >Category:amd64
> >Environment:
>   System  : OpenBSD 6.2
>   Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan  7 
> 09:13:00 MST 2018
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   processes getting stuck w/STATE=netlock, kill has no effect.
> >How-To-Repeat:
>   using the desktop normally, until trying to restart chrome ends
>   up failing.

What do you mean with "using the desktop normally"?  Which applications
are you using?  Which browser plugins?  Can you find out the minimum
setup to reproduce this deadlock?

>   I've had this happen to me atleast twice in the last few of weeks.

Do you know how to reproduce it easily?

>   At first time i noticed how trying to launch chrome did lock up
>   all the other processes in netlock, and "pkill chrome" did allow
>   the system to recover, i was unable to figure out what was wrong
>   and rebooting did make everything work again, while ie.
>   removing ~/.cache & ~/.config did not.

So the deadlock is related to your chrome usage?

>   long before running the "ps cl" below, i had already killed all
>   the xterm-windows those processes were in. cwm(1) was unable to
>   kill some of those, but xkill did not.

Well killing process waiting for the 'netlock' won't help.  What has to
be find is which process is holding it.  For that we need the full ps
output, including kernel and userland threads.
> 
>   after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
>   $-prompt, and ^T did show xauth stuck in netlock..
>   i guess it's obvious where it was heading; so i got pics of
>   "# reboot -nq" failing because stuck in the fckng netlock -_-
> 
>   i do have ddb.{panic,console,log}=1, but
>   "# sysctl ddb.trigger=1" ==
>   "sysctl: ddb.trigger: Operation not supported by device"

Not having DDB access will limit the debugging experience.  Are you sure
you tried to enter it on your console?

>   ?? so i had no option but "virsh reset "...

Did you try top(1)?  What were the kernel processes doing?