acpimadt(4) problem

2019-07-24 Thread Nick Holland
This machine seems to have a problem with acpimadt(4).

It will not run bsd.rd for an install or upgrade.  It hangs
seemingly indefinitely at "root on ..." after otherwise seeming
to booted successfully.

When running bsd.mp, the system is sluggish and, according to top,
busy doing nothing:

load averages:  1.00,  1.03,  0.80  
   gw.in.nickh.org 22:05:06
36 processes: 35 idle, 1 on processor   
   up  0:15
CPU0 states:  0.0% user,  0.0% nice, 75.4% sys,  5.0% spin, 13.2% intr,  6.4% 
idle
CPU1 states:  0.0% user,  0.0% nice,  0.8% sys,  0.2% spin,  0.0% intr, 99.0% 
idle
Memory: Real: 147M/794M act/tot Free: 1175M Cache: 385M Swap: 0K/1026M

  PID USERNAME PRI NICE  SIZE   RES STATE WAIT  TIMECPU COMMAND
79498 nick   20 1364K 2772K sleep/0   select0:03  0.00% sshd
1 root  100  480K  468K idle  wait  0:01  0.00% init
...

HOWEVER...do a boot -c, disable acpimadt, and bsd.rd works great,
and bsd.mp is properly snappy and really idle when it isn't doing 
anything.

acpimadt was fingered by disabling acpi, and verifying the system booted, 
dmesg|grep acpi to see all acpi devices, and tried disabling one at a 
time until I found the one that was the difference between working and
not working.

sendbug form can be fetched here:
http://holland-consulting.net/bugreport.txt
but is also below.

"Where did you get such a piece of junk?" is an acceptable response, as
it was originally an "appliance", but all those on-board NICs were hard
to resist. :)

Thanks!
Nick.

OpenBSD 6.5-current (GENERIC.MP) #137: Tue Jul 23 13:26:59 MDT 2019
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2130444288 (2031MB)
avail mem = 2055802880 (1960MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xfb4f0 (27 entries)
bios0: vendor American Megatrends Inc. version "080015" date 01/29/2010
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S0
acpi0: tables DSDT FACP APIC MCFG OEMB
acpi0: wakeup devices P0P2(S0) P0P3(S0) P0P1(S0) USB0(S0) USB2(S0) USB5(S0) 
EUSB(S0) USB3(S0) USB4(S0) USB6(S0) USBE(S0) GBE_(S0) P0P4(S0) P0P5(S0) 
P0P6(S0) P0P7(S0) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz, 2793.41 MHz, 06-17-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG,LAHF,PERF,SENSOR,MELTDOWN
cpu0: 3MB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 265MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz, 2793.01 MHz, 06-17-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE,NXE,LONG,LAHF,PERF,SENSOR,MELTDOWN
cpu1: 3MB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 9 (P0P1)
acpiprt2 at acpi0: bus 3 (P0P4)
acpiprt3 at acpi0: bus 4 (P0P5)
acpiprt4 at acpi0: bus 5 (P0P6)
acpiprt5 at acpi0: bus 6 (P0P7)
acpiprt6 at acpi0: bus 7 (P0P8)
acpiprt7 at acpi0: bus 8 (P0P9)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
acpicmos0 at acpi0
acpibtn0 at acpi0: PWRB
cpu0: unknown Enhanced SpeedStep CPU, msr 0x06164a2506004a25
cpu0: using only highest and lowest power states
cpu0: Enhanced SpeedStep 2793 MHz: speeds: 19734, 1600 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 3200/3210 Host" rev 0x01
ppb0 at pci0 dev 6 function 0 "Intel 3210 PCIE" rev 0x01: msi
pci1 at ppb0 bus 2
em0 at pci1 dev 0 function 0 "Intel 82575EB" rev 0x02: msi, address 
00:03:b2:71:03:40
em1 at pci1 dev 0 function 1 "Intel 82575EB" rev 0x02: msi, address 
00:03:b2:71:03:41
ppb1 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02: msi
pci2 at ppb1 bus 3
em2 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
00:03:b2:71:03:42
ppb2 at pci0 dev 28 function 1 "Intel 82801I PCIE" rev 0x02: msi
pci3 at ppb2 bus 4
em3 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
00:03:b2:71:03:43
ppb3 at pci0 dev 28 function 2 "Intel 82801I PCIE" rev 0x02: msi
pci4 at ppb3 bus 5
em4 at pci4 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
00:03:b2:71:03:44
ppb4 at pci0 dev 28 function 3 "Intel 82801I PCIE" rev 

Re: uvm_fault pmap_enter+0x1d6: movq __ALIGN_SIZE+0x3000(%rcx,%rsi,8),%rsi

2019-07-24 Thread Mike Larkin
On Wed, Jul 24, 2019 at 10:48:25PM +0200, Alexander Bluhm wrote:
> On Wed, Jul 24, 2019 at 08:59:44PM +0200, Alexander Bluhm wrote:
> > The reaper on CPU 0 does a NULL dereference when removing the page.
> > On CPU 1 zerothread is waiting for kernel lock.  CPU 2 and 3 are
> > idle.
> >
> > uvm_fault(0xfd8240760cc8, 0x7f827ea48908, 0, 2) -> e
> > kernel: page fault trap, code=0
> > Stopped at  pmap_page_remove+0x210: xchgq   %rax,0(%rcx,%rdx,1)
> 
> Forgot to mention, that was C source line pmap.c:1878
> 
> opte = pmap_pte_set(_BASE[pl1_i(pve->pv_va)], 0);
> 
> > I will update kernel and look if panic is reproducable.
> 
> It is reproduceable
> 
> ddb{3}> x/s version
> version:OpenBSD 6.5-current (GENERIC.MP) #139: Wed Jul 24 05:11:28 
> MDT 2
> 019\012
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> \012
> 
> ddb{3}> show panic
> kernel page fault
> uvm_fault(0xfd823efc7998, 0x7f8444c11f08, 0, 1) -> e
> pmap_enter(fd823e1ce3f8,889823e1000,5f3c2000,3,22) at pmap_enter+0x1d6
> end trace frame: 0x80002210ed30, count: 0
> 
> Now it happens in pmap.c:2624
> 
> opte = PTE_BASE[pl1_i(va)]; /* old PTE */
> 
> Something in PTE_BASE array is not mapped.
> 

I wrote a quick program to calculate what address this would be (thinking
maybe we had some overflow or something) but it does indeed match the
faulting address above (0x7f8444c11f08) for the VA 0x889823e1000.
This address (0x7f8444c11f08) is in the PTE range, so it looks like it
was never allocated or possibly double-freed. Double free matches the
previous email's comment as well.

If this happens again, it might be interesting to see what pages around
that are mapped. For example, for this particular instance, to see if
0x7f8444c1 is mapped, or 0x7f8444c12000. ddb>'s 'x' command can do that
(see if you get another fault or if you get some data). Maybe the data in
those pages around it might provide a hint (although that's a longshot).

-ml

> ddb{3}> trace
> pmap_enter(fd823e1ce3f8,889823e1000,5f3c2000,3,22) at pmap_enter+0x1d6
> uvm_fault(fd823efc7998,889823e1000,0,2) at uvm_fault+0xa2a
> pageflttrap() at pageflttrap+0x145
> usertrap(80002210ee20) at usertrap+0x1e3
> recall_trap(6,dfdfdfdfdfdfdfdf,0,6,1000,8890b6fc7c0) at recall_trap+0x8
> end of kernel
> end trace frame: 0x888fdfc9330, count: -5
> 
> Note that at June 11th I reported a similiar trace in pmap to bugs@
> when ld caused a crash.
> 
> ddb{3}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  76368  342680   5059  0  2 0x2malloc_duel
>  76368  101339   5059  0  7   0x402malloc_duel
>  76368  514296   5059  0  3   0x482  fsleepmalloc_duel
> *76368  384915   5059  0  7   0x402malloc_duel
>  76368  221830   5059  0  7   0x402malloc_duel
>  76368  361827   5059  0  7   0x402malloc_duel
>  76368  480274   5059  0  3   0x482  fsleepmalloc_duel
>  76368  468117   5059  0  3   0x482  fsleepmalloc_duel
>  76368  461971   5059  0  3   0x482  fsleepmalloc_duel
>  76368  266728   5059  0  2   0x402malloc_duel
>  76368   82327   5059  0  2   0x402malloc_duel
>   5059  194815   4702  0  30x10008a  pause make
>   4702  434789  57398  0  30x10008a  pause sh
>  57398  272052  80135  0  30x10008a  pause make
>  80135   83438  74843  0  30x10008a  pause sh
>  74843  269959  24644  0  30x10008a  pause make
>  71213   91038  31378  0  30x100082  piperdgzip
>  31378  297755  24644  0  30x100082  piperdpax
>  24644  139228  73204  0  30x82  piperdperl
>  73204  241400   3907  0  30x10008a  pause ksh
>   3907  427314  77842  0  30x92  selectsshd
>  49732  259852  1  0  30x100083  ttyin getty
>  58444  180559  1  0  30x100083  ttyin getty
>  30659  289121  1  0  30x100083  ttyin getty
>   9656  108850  1  0  30x100083  ttyin getty
>  24203   10241  1  0  30x100083  ttyin getty
>  65063  251469  1  0  30x100083  ttyin getty
>  16142  523320  1  0  30x100098  poll  cron
>  908053316  0  0  3 0x14280  nfsidlnfsio
>  11202  322177  0  0  3 0x14280  nfsidlnfsio
>  73491  331359  0  0  3 0x14280  nfsidlnfsio
>  37841  249018  0  0  3 0x14280  nfsidlnfsio
>   4136  428500  1 99  30x100090  poll  sndiod
>  12112  519438  1110  30x100090  poll  sndiod
>  49306   97767137 95  30x100092  kqreadsmtpd
>  70869  189393137

uvm_fault pmap_enter+0x1d6: movq __ALIGN_SIZE+0x3000(%rcx,%rsi,8),%rsi

2019-07-24 Thread Alexander Bluhm
On Wed, Jul 24, 2019 at 08:59:44PM +0200, Alexander Bluhm wrote:
> The reaper on CPU 0 does a NULL dereference when removing the page.
> On CPU 1 zerothread is waiting for kernel lock.  CPU 2 and 3 are
> idle.
>
> uvm_fault(0xfd8240760cc8, 0x7f827ea48908, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  pmap_page_remove+0x210: xchgq   %rax,0(%rcx,%rdx,1)

Forgot to mention, that was C source line pmap.c:1878

opte = pmap_pte_set(_BASE[pl1_i(pve->pv_va)], 0);

> I will update kernel and look if panic is reproducable.

It is reproduceable

ddb{3}> x/s version
version:OpenBSD 6.5-current (GENERIC.MP) #139: Wed Jul 24 05:11:28 MDT 2
019\012dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
\012

ddb{3}> show panic
kernel page fault
uvm_fault(0xfd823efc7998, 0x7f8444c11f08, 0, 1) -> e
pmap_enter(fd823e1ce3f8,889823e1000,5f3c2000,3,22) at pmap_enter+0x1d6
end trace frame: 0x80002210ed30, count: 0

Now it happens in pmap.c:2624

opte = PTE_BASE[pl1_i(va)]; /* old PTE */

Something in PTE_BASE array is not mapped.

ddb{3}> trace
pmap_enter(fd823e1ce3f8,889823e1000,5f3c2000,3,22) at pmap_enter+0x1d6
uvm_fault(fd823efc7998,889823e1000,0,2) at uvm_fault+0xa2a
pageflttrap() at pageflttrap+0x145
usertrap(80002210ee20) at usertrap+0x1e3
recall_trap(6,dfdfdfdfdfdfdfdf,0,6,1000,8890b6fc7c0) at recall_trap+0x8
end of kernel
end trace frame: 0x888fdfc9330, count: -5

Note that at June 11th I reported a similiar trace in pmap to bugs@
when ld caused a crash.

ddb{3}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 76368  342680   5059  0  2 0x2malloc_duel
 76368  101339   5059  0  7   0x402malloc_duel
 76368  514296   5059  0  3   0x482  fsleepmalloc_duel
*76368  384915   5059  0  7   0x402malloc_duel
 76368  221830   5059  0  7   0x402malloc_duel
 76368  361827   5059  0  7   0x402malloc_duel
 76368  480274   5059  0  3   0x482  fsleepmalloc_duel
 76368  468117   5059  0  3   0x482  fsleepmalloc_duel
 76368  461971   5059  0  3   0x482  fsleepmalloc_duel
 76368  266728   5059  0  2   0x402malloc_duel
 76368   82327   5059  0  2   0x402malloc_duel
  5059  194815   4702  0  30x10008a  pause make
  4702  434789  57398  0  30x10008a  pause sh
 57398  272052  80135  0  30x10008a  pause make
 80135   83438  74843  0  30x10008a  pause sh
 74843  269959  24644  0  30x10008a  pause make
 71213   91038  31378  0  30x100082  piperdgzip
 31378  297755  24644  0  30x100082  piperdpax
 24644  139228  73204  0  30x82  piperdperl
 73204  241400   3907  0  30x10008a  pause ksh
  3907  427314  77842  0  30x92  selectsshd
 49732  259852  1  0  30x100083  ttyin getty
 58444  180559  1  0  30x100083  ttyin getty
 30659  289121  1  0  30x100083  ttyin getty
  9656  108850  1  0  30x100083  ttyin getty
 24203   10241  1  0  30x100083  ttyin getty
 65063  251469  1  0  30x100083  ttyin getty
 16142  523320  1  0  30x100098  poll  cron
 908053316  0  0  3 0x14280  nfsidlnfsio
 11202  322177  0  0  3 0x14280  nfsidlnfsio
 73491  331359  0  0  3 0x14280  nfsidlnfsio
 37841  249018  0  0  3 0x14280  nfsidlnfsio
  4136  428500  1 99  30x100090  poll  sndiod
 12112  519438  1110  30x100090  poll  sndiod
 49306   97767137 95  30x100092  kqreadsmtpd
 70869  189393137103  30x100092  kqreadsmtpd
 79867  131344137 95  30x100092  kqreadsmtpd
 66859  375509137 95  30x100092  kqreadsmtpd
 22396   48018137 95  30x100092  kqreadsmtpd
 16604   93317137 95  30x100092  kqreadsmtpd
   137  452544  1  0  30x100080  kqreadsmtpd
 77842  219221  1  0  30x80  selectsshd
 88298  318549  0  0  3 0x14200  acct  acct
  7436  211089  1  0  30x100080  poll  ntpd
 15596  214430  72873 83  30x100092  poll  ntpd
 72873  423080  1 83  30x100092  poll  ntpd
   639  455748   5843 74  30x100092  bpf   pflogd
  5843  152563  1  0  30x80  netio pflogd
 49089   65344  96782 73  30x100090  kqreadsyslogd
 96782  134250  1  0  30x100082  netio syslogd
 15309   57931  1 77  30x100090  

Re: bsd.mp hangs at boot with lastest snap (bsd.sp does not)

2019-07-24 Thread Antoine Jacoutot
Hi.

Issue is still there with snapshot:
OpenBSD 6.5-current (GENERIC.MP) #115: Wed Jul 24 05:34:08 MDT 2019

Only bsd.mp is affected.
Not sure what else to provide.


On Sun, 2019-07-14 at 18:56 +0200, Antoine Jacoutot wrote:
> Hi.
> 
> I just updated an i386 VM to a current snapshot; running on SmartOS KVM.
> I hangs at boot right after printing "clock: unknown CMOS layout".
> 
> This only happens with bsd.mp, bsd.sp works fine.
> 
> booting hd0a:/bsd: 9599755+2257924+266260+0+1101824
> [739510+107+541008+569395]=0xe62ce4
> entry point at 0x201000
> 
> [ using 1850596 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>   The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2019 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 6.5-current (GENERIC.MP) #104: Fri Jul 12 10:23:17 MDT 2019
> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> real mem  = 3757568000 (3583MB)
> avail mem = 3673387008 (3503MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: date 06/23/99, BIOS32 rev. 0 @ 0xff046, SMBIOS rev. 2.4 @
> 0xddf0 (15 entries)
> bios0: vendor Bochs version "Bochs" date 01/01/2007
> bios0: Joyent SmartDC HVM
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP SSDT APIC HPET
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: QEMU Virtual CPU version 0.14.1 ("GenuineIntel" 686-class) 2.67 GHz, 06-
> 02-03
> cpu0:
> FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,CFLUSH,MMX,FXS
> R,SSE,SSE2,SSE3,CX16,POPCNT,HV,NXE,LONG,LAHF,PERF,MELTDOWN
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 1020MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: QEMU Virtual CPU version 0.14.1 ("GenuineIntel" 686-class) 2.73 GHz, 06-
> 02-03
> cpu1:
> FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,CFLUSH,MMX,FXS
> R,SSE,SSE2,SSE3,CX16,POPCNT,HV,NXE,LONG,LAHF,PERF,MELTDOWN
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: QEMU Virtual CPU version 0.14.1 ("GenuineIntel" 686-class) 2.73 GHz, 06-
> 02-03
> cpu2:
> FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,CFLUSH,MMX,FXS
> R,SSE,SSE2,SSE3,CX16,POPCNT,HV,NXE,LONG,LAHF,PERF,MELTDOWN
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: QEMU Virtual CPU version 0.14.1 ("GenuineIntel" 686-class) 2.73 GHz, 06-
> 02-03
> cpu3:
> FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,CFLUSH,MMX,FXS
> R,SSE,SSE2,SSE3,CX16,POPCNT,HV,NXE,LONG,LAHF,PERF,MELTDOWN
> ioapic0 at mainbus0: apid 4 pa 0xfec0, version 11, 24 pins, remapped
> acpihpet0 at acpi0: 1 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicpu2 at acpi0: C1(@1 halt!)
> acpicpu3 at acpi0: C1(@1 halt!)
> "ACPI0006" at acpi0 not configured
> "PNP0A03" at acpi0 not configured
> acpicmos0 at acpi0
> "ACPI0007" at acpi0 not configured
> "ACPI0007" at acpi0 not configured
> "ACPI0007" at acpi0 not configured
> "ACPI0007" at acpi0 not configured
> bios0: ROM list: 0xc/0x9e00 0xca000/0xa00 0xcb000/0x2200
> pvbus0 at mainbus0: KVM
> pvclock0 at pvbus0
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0
> wired to compatibility, channel 1 wired to compatibility
> pciide0: channel 0 disabled (no drives)
> atapiscsi0 at pciide0 channel 1 drive 0
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom
> removable
> cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
> uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 4 int 11
> piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 4 int 10
> iic0 at piixpm0
> iic0: addr 0x18 00=06 01=c2 02=c2 03=c2 04=c2 05=c2 06=c2 07=c2 08=c2 09=4a
> 0a=40 3e=d0 48=d0 4a=d0 4e=d0 fc=d0 fe=d0 words 00=1800 01=1800 02=1800
> 03=1800 04=1800 05=1800 06=1800 07=1800
> iic0: addr 0x19 00=19 3e=d0 48=d0 4a=d0 4e=d0 fc=d0 fe=d0 words 00=1900
> 01=1900 02=1900 03=1900 04=1900 05=1900 06=1900 07=1900
> iic0: addr 0x1a 00=06 01=c2 02=c2 03=c2 04=c2 05=c2 06=c2 07=c2 08=c2 09=4a
> 0a=40 3e=d0 48=d0 4a=d0 4e=d0 fc=d0 fe=d0 words 00=1a00 01=1a00 02=1a00
> 03=1a00 04=1a00 05=1a00 06=1a00 07=1a00
> iic0: addr 0x1b 00=1b 3e=d0 48=d0 4a=d0 4e=d0 fc=d0 fe=d0 words 00=1b00
> 01=1b00 02=1b00 03=1b00 04=1b00 05=1b00 06=1b00 07=1b00
> iic0: addr 0x1c 00=1c 0f=06 3e=d0 48=d0 4a=d0 4e=d0 fc=d0 fe=d0 words 00=1c00
> 01=1c00 02=1c00 03=1c00 04=1c00 05=1c00 06=1c00 07=1c00
> iic0: addr 0x1d 00=1d 0f=06 3e=d0 48=d0 4a=d0 4e=d0 fc=d0 fe=d0 words 00=1d00
> 01=1d00 02=1d00 03=1d00 

uvm_fault pmap_page_remove+0x210: xchgq %rax,0(%rcx,%rdx,1)

2019-07-24 Thread Alexander Bluhm
On Wed, Jul 17, 2019 at 05:19:55PM +0200, Alexander Bluhm wrote:
> On Wed, Jul 17, 2019 at 01:26:29PM +0200, Alexander Bluhm wrote:
> > I got a strange panic on my daily amd64 regress machine
> > reordering libraries:panic: pool_do_get: scxspl free list modified: page
>
> I see more strange effects on my regress machines for a while now.
> The SSH connection that controls my tests fails with broken pipe.

Unfortunately SSH broken pipe is the common error if I loose contact
with the test machine.  Most of the time it was a local fuckup.  A
pf on a bridge sending TCP RST.

> Fri Jul 12 09:59:59 MDT 2019
> malloc_duel(22164) in free(): chunk canary corrupted 0x4fd491216a0 0x4@0x4 
> (double free?)

This problem is real.  Running /usr/src/regress/lib/libpthread/malloc_duel
in a loop ends in a crash.

The reaper on CPU 0 does a NULL dereference when removing the page.
On CPU 1 zerothread is waiting for kernel lock.  CPU 2 and 3 are
idle.

uvm_fault(0xfd8240760cc8, 0x7f827ea48908, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at  pmap_page_remove+0x210: xchgq   %rax,0(%rcx,%rdx,1)

version:OpenBSD 6.5-current (GENERIC.MP) #129: Mon Jul 15 18:54:34 MDT 2
019\012dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

ddb{0}> trace
pmap_page_remove(fd8107d97b00) at pmap_page_remove+0x210
uvm_anfree(fd8214777360) at uvm_anfree+0x36
amap_wipeout(fd825d3dc130) at amap_wipeout+0xe5
uvm_unmap_detach(800021eba2d8,1) at uvm_unmap_detach+0xef
uvm_map_teardown(fd8279b37340) at uvm_map_teardown+0x1c1
uvmspace_free(fd8279b37340) at uvmspace_free+0x57
uvm_exit(80006d90) at uvm_exit+0x24
reaper(80008770) at reaper+0x13b
end trace frame: 0x0, count: -8

*52315  402033  0  0  7 0x14200reaper

ddb{0}> show register
rdi  0xa
rsi   0xfd8107d97b68
rbp   0x800021eba1f0
rbx0
rdx   0x7f80
rcx  0x20a8ea850
rax0
r80xfd810a6e5d80
r90x81d27ff0cpu_info_full_primary+0x1ff0
r10   0x45e35718a213a84d
r11   0xd63ebed56fdc728a
r12   0xfd8107d97b00
r13   0xfd823df88940
r14  0x27f7c2000
r15   0xfd8107d97b68
rip   0x817df8f0pmap_page_remove+0x210
cs   0x8
rflags   0x10246__ALIGN_SIZE+0xf246
rsp   0x800021eba190
ss  0x10
pmap_page_remove+0x210: xchgq   %rax,0(%rcx,%rdx,1)

ddb{1}> trace
x86_ipi_db(800021c80ff0) at x86_ipi_db+0x12
x86_ipi_handler() at x86_ipi_handler+0x80
Xresume_lapic_ipi(a,800021c80ff0,80009640,0,0,80009718) at X
resume_lapic_ipi+0x23
_kernel_lock() at _kernel_lock+0xae
timeout_del_barrier(80009718) at timeout_del_barrier+0xa2
msleep(81dacc6c,81dacbb8,7f,81a91b36,0) at msleep+0xf5
uvm_pagezero_thread(80009640) at uvm_pagezero_thread+0xa2
end trace frame: 0x0, count: -7

*63535   48641  0  0  7 0x14200zerothread

ddb{1}> show register
rdi   0x800021c80ff0
rsi0
rbp   0x800021ed2440
rbx   0x81d06168ipifunc+0x38
rdx0
rcx  0x7
rax   0xff7f
r8 0
r9 0
r100
r11   0x29c1412aeecf9c9a
r12  0x7
r130
r14   0x800021c80ff0
r150
rip   0x813572e2x86_ipi_db+0x12
cs   0x8
rflags 0x206
rsp   0x800021ed2430
ss  0x10
x86_ipi_db+0x12:leave

I will update kernel and look if panic is reproducable.

bluhm



Re: gencat(1) problem

2019-07-24 Thread Ingo Schwarze
Hi Marcel,

Ingo Schwarze wrote on Wed, Jul 24, 2019 at 03:27:05PM +0200:
> Marcel Logen wrote on Tue, Jul 23, 2019 at 11:16:33PM +0200:
>> Ingo Schwarze wrote:

>>> 1. What exactly are you doing? (minimal test case, if possible)

> [...]
>> | t20$ gencat foo1.cat foo1.msg
>> | gencat: can't specify a message when no set exists on line 3
>> | 1 "foo
>> 
>> | t20$ gencat foo2.cat foo2.msg
> [...]

> Thank you for the report.  I am able to reproduce the described
> behaviour.  I shall investigate further and come back to you.

Oh, it seems to me the issue is actually quite easy to fix,
see the patch below.

While here, note that the automatic variable "setid" is a state
variable and must persist from one iteration of the main parsing
loop to the next, while "msgid" is not a state variable.  It is
used only inside the "else" clause and assigned to before each use.
So delete the misleading initializations of msgid.

My impression is that no change is needed in the manual page.

OK?
  Ingo

P.S.
It appears this tool could use some auditing, KNF, and general
cleanup, but it is certainly not high priority...


Index: gencat.c
===
RCS file: /cvs/src/usr.bin/gencat/gencat.c,v
retrieving revision 1.19
diff -u -p -r1.19 gencat.c
--- gencat.c28 Jun 2019 13:35:01 -  1.19
+++ gencat.c24 Jul 2019 13:53:00 -
@@ -405,11 +405,12 @@ void
 MCParse(int fd)
 {
char   *cptr, *str;
-   int setid, msgid = 0;
+   int setid, msgid;
charquote = 0;
 
/* XXX: init sethead? */
 
+   setid = 0;
while ((cptr = get_line(fd))) {
if (*cptr == '$') {
++cptr;
@@ -418,7 +419,6 @@ MCParse(int fd)
cptr = wskip(cptr);
setid = atoi(cptr);
MCAddSet(setid);
-   msgid = 0;
} else if (strncmp(cptr, "delset", 6) == 0) {
cptr += 6;
cptr = wskip(cptr);
@@ -462,6 +462,10 @@ MCParse(int fd)
} else {
warning(cptr, "neither blank line nor start of 
a message id");
continue;
+   }
+   if (setid == 0) {
+   setid = NL_SETD;
+   MCAddSet(setid);
}
/*
 * If we have a message ID, but no message,



Re: gencat(1) problem

2019-07-24 Thread Ingo Schwarze
Hi Marcel,

Marcel Logen wrote on Tue, Jul 23, 2019 at 11:16:33PM +0200:
> Ingo Schwarze wrote:

>> 1. What exactly are you doing? (minimal test case, if possible)

[...]
> | t20$ gencat foo1.cat foo1.msg
> | gencat: can't specify a message when no set exists on line 3
> | 1 "foo
> 
> | t20$ gencat foo2.cat foo2.msg
[...]

Thank you for the report.  I am able to reproduce the described
behaviour.  I shall investigate further and come back to you.

Yours,
  Ingo



Re: ifconfig bridge crashes host

2019-07-24 Thread Mischa



> On 23 Jul 2019, at 23:40, Hrvoje Popovski  wrote:
> 
> On 23.7.2019. 17:03, obs...@high5.nl wrote:
>>> Synopsis:   ifconfig bridge crashes host
>>> Category:   
>>> Environment:
>>  System  : OpenBSD 6.5
>>  Details : OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019
>>   
>> r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>>  Architecture: OpenBSD.amd64
>>  Machine : amd64
>>> Description:
>>  After running the command "ifconfig bridge" twice on the host, the host
>>  became unresponsive. I was able to capture the trace from the console.
>>> How-To-Repeat:
>>  The host was running for some time so I am uncertain if it's related to 
>> time,
>>  but I have seen this happening a couple of times now, and it seems 
>> running the
>>  "ifconfig bridge" command multiple times triggers this.
> 
> Hi,
> 
> can you update your box with latest snapshot ?
> There were some problems with "ifconfig bridge" command few months ago..

Will give that a go.

Thanx!

Mischa



gencat(1) problem

2019-07-24 Thread Marcel Logen
System: OpenBSD -current

| t20$ head -n 1 /etc/motd
| OpenBSD 6.5-current (GENERIC.MP) #132: Sat Jul 20 15:23:46 MDT 2019

| t20$ uname -a
| OpenBSD t20 6.5 GENERIC.MP#132 amd64

| t20$ arch
| OpenBSD.amd64

1. What exactly are you doing? (minimal test case, if possible)

| t20$ pwd
| /home/user20/ybtra-t20/gencat-test23

| t20$ ls -l 
| total 16
| -rw-r--r--  1 user20  user20  43 Jul 23 15:35 foo1.msg
| -rw-r--r--  1 user20  user20  50 Jul 23 15:37 foo2.msg

| t20$ head -n 1000 *msg
| ==> foo1.msg <==
| $ comment
| $quote "
| 1 "foo"
| 2 "bar"
| 3 "baz"
| 
| ==> foo2.msg <==
| $ comment
| $quote "
| $set 1
| 1 "foo"
| 2 "bar"
| 3 "baz"

| t20$ gencat foo1.cat foo1.msg
| gencat: can't specify a message when no set exists on line 3
| 1 "foo

| t20$ gencat foo2.cat foo2.msg

| t20$ ls -l
| total 24
| -rw-r--r--  1 user20  user20  43 Jul 23 15:35 foo1.msg
| -rw-r--r--  1 user20  user20  80 Jul 23 22:03 foo2.cat
| -rw-r--r--  1 user20  user20  50 Jul 23 15:37 foo2.msg

2. What do you expect to happen?

"gencat foo1.cat foo1.msg" working, i. e. producing the file "foo1.cat".

3. What happens instead?

"gencat: can't specify a message when no set exists on line 3"

Error description:

According to POSIX [1] and according to the man page of "gencat" [2]
it should work also without "$set":

| If no $set directive is specified in a message text source file,
| all messages shall be located in an implementation-defined
| default message set NL_SETD [...] [1]

| If no $set directive is specified in a given source file, all
| messages will be located in the default message set NL_SETD. [2]

[1] 

[2]