Re: Radeon HD 5450 panic

2018-03-20 Thread Jonathan Gray
On Tue, Mar 20, 2018 at 02:26:19AM -0600, Anthony J. Bentley wrote:
> Hi,
> 
> Since updating this machine yesterday I've noticed repeated hangs during
> light usage. This is easily reproducible by, e.g., enabling TrueType
> fonts in XTerm, then running "cat /etc/ssl/cert.pem". Happens with a
> single screen or multiple.
> 
> I think the previous snapshot this was running was about a month old,
> likely before the radeon update (which I neither noticed nor tested --
> for which I must apologize).

Which radeon update are you referring to?

2016/11/13 xf86-video-ati 7.7.1
2018/02/20 xf86-video-ati 7.10.0
2018/03/13 xf86-video-ati 18.0.0
2018/03/17 xf86-video-ati 18.0.1

Does this not occur with xf86-video-ati 7.7.1 ?

> 
> panic: kernel diagnostic assertion "!(!radeon_bo_is_reserved(bo) && 
> !force_drop)" failed: file "/usr/src/sys/dev/pci/drm/radeon/radeon_object.c", 
> line 547
> Stopped at db_enter+0x5: popq %rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *382735  71010   10010x32  02K Xorg
> db_enter() at db_enter+0x5
> panic() at panic+0x129
> __assert(815786f4,800023b0d350,1,80df9708) at 
> __assert+0x24
> 
> radeon_bo_fault_reserve_notify(1) at radeon_bo_fault_reserve_notify+0x1cc
> ttm_bo_vm_fault(8013f408,800023b0d528,800023b0d560,1,2,80df9708)
>  at ttm_bo_vm_fault+0x5e
> radeon_ttm_fault(0,800023b0d528,0,2,,0) at 
> radeon_ttm_fault+0x63
> uvm_fault(800023b0d700,800023bc3ac8,ff0722e02600,800023b08000)
>  at uvm_fault+0x6eb
> trap() at trap+0x445
> --- trap (number 6) ---
> end of kernel
> end trace frame: 0x7f7c5500, count: 7
> 0x388a8d8509:
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{2}>
> 
> (Sorry for no further information: when this happened I didn't have
> machdep.forceukbd set, and of the 8 or so hangs I've triggered since none
> have gone to ddb.)
> 
> OpenBSD 6.3 (GENERIC.MP) #71: Sun Mar 18 12:28:59 MDT 2018
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34283859968 (32695MB)
> avail mem = 33237762048 (31698MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec2b0 (89 entries)
> bios0: vendor American Megatrends Inc. version "2205" date 10/09/2014
> bios0: ASUS All Series
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT SSDT SSDT SSDT MCFG HPET SSDT SSDT
> acpi0: wakeup devices PEGP(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) 
> UAR1(S4) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) 
> RP04(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz, 3292.85 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> acpitimer0: recalibrated TSC frequency 3292381586 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz, 3292.39 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz, 3292.39 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz, 3292.39 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8

Re: Undefined reference to functions that are a part of amd64 ABI

2018-03-30 Thread Jonathan Gray
On Fri, Mar 30, 2018 at 09:49:32AM +0530, Utkarsh Anand wrote:
> Hello! I was trying to compile neovim (https://github.com/neovim/neovim)
> in OpenBSD 6.2 and I get the following error when compiling LuaJIT, 
> which is dependency for neovim:
> http://termbin.com/5zyd
> The functions _Unwind_* are a part of amd64 ABI:
> https://uclibc.org/docs/psABI-x86_64.pdf
> Do I need to link a library externally for that? It compiles directly 
> on other systems like NetBSD, linux etc.
> 

The luajit port links libc++abi for those symbols on clang archs.
With gcc they are included when libgcc is linked by default.

neovim is also ported as editors/neovim.



Re: OpenBSD 6.3 Kernel Panic?

2018-04-15 Thread Jonathan Gray
On Sun, Apr 15, 2018 at 10:04:13PM -0700, Mike Larkin wrote:
> On Sun, Apr 15, 2018 at 11:47:45PM -0500, Juan Morado wrote:
> > System:  OpenBSD 6.3
> > Details: OpenBSD 6.3 (GENERIC.MP) #107: Sat Mar 24 14:21:59 MDT 2018
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/
> > GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine:  amd64
> > 
> > Description:  Touching the track pad of this ASUS X541SA laptop during boot
> > causes a kernel panic when running 6.3. Touching/using the trackpad after X
> > starts causes the system to become unresponsive (hang).
> > 
> > How to repeat: Reboot the system, swipe a finger on the track pad while
> > booting. Or use the track pad after the machine has booted and X has
> > started.
> > 
> > Fix: no known work around or fix
> > 
> > "show panic", trace, ps, and "show registers" output attached in
> > ddb_trace.txt
> > 
> > dmesg.txt contains the dmesg.
> 
> > addr 1: xHCI root hub, Intel
> >  addr 2: USB Flash Disk, General
> 
> 
> jsg, kettenis? Thoughts?
> 
> -ml
> 
> 
> 
> 
> 
> > Domain /dev/pci0:
> >  0:0:0: Intel Braswell Host
> >  0:2:0: Intel HD Graphics
> >  0:11:0: Intel Braswell Power
> >  0:19:0: Intel Braswell AHCI
> >  0:20:0: Intel Braswell xHCI
> >  0:26:0: Intel Braswell TXE
> >  0:27:0: Intel Braswell HD Audio
> >  0:28:0: Intel Braswell PCIE
> >  0:28:2: Intel Braswell PCIE
> >  0:28:3: Intel Braswell PCIE
> >  0:31:0: Intel Braswell PCU LPC
> >  0:31:3: Intel Braswell SMBus
> >  2:0:0: Realtek 8101E
> >  3:0:0: Realtek 8191SE
> 
> > System:  OpenBSD 6.3
> > Details: OpenBSD 6.3 (GENERIC.MP) #107: Sat Mar 24 14:21:59 MDT 2018
> > 
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine:  amd64
> > 
> > Description:  Touching the track pad of this ASUS X541SA laptop during boot 
> > causes a kernel panic. Touching/using the trackpad after X starts causes 
> > the system to become unresponsive (hang).
> > 
> > How to repeat: Reboot the system, swip a finger on the track pad while 
> > booting. Or use the track pad after the machine has booted and X has 
> > started. 
> > 
> > Fix: no known work around or fix
> > 
> > 
> > ddb{0}> show panic
> > kernel page fault
> > uvm_fault(0x81afca80, 0x806f8000, 0, 1) -> e
> > ihidev_intr(800d9300) at ihidev_intr+0x18a
> > end trace frame: 0x8000314a3ac8, count: 0
> > 
> > ddb{0}> trace
> > ihidev_intr(800d9300) at ihidev_intr+0x18a
> > intr_handler(0,80127280) at intr_handler+0x57
> > Xintr_ioapic_level17_untramp() at Xintr_ioapic_level17_untramp+0x12c
> > --- interrupt ---
> > acpicpu_idle() at acpicpu_idle+0x1d9
> > cpu_idle_cycle(0,0,81abbff0,817e4290,817e44b5,fff181c1f0)
> >  at cpu_idle_cycle+0x10
> > end trace frame: 0x0, count: -5

I'm curious if disestablishing the interrupt handler in attach in the
early returns before sc_ibuf has been allocated changes anything.

Index: ihidev.c
===
RCS file: /cvs/src/sys/dev/i2c/ihidev.c,v
retrieving revision 1.16
diff -u -p -r1.16 ihidev.c
--- ihidev.c12 Jan 2018 08:11:47 -  1.16
+++ ihidev.c16 Apr 2018 06:17:48 -
@@ -139,8 +139,14 @@ ihidev_attach(struct device *parent, str
(char *)ia->ia_cookie);
 
sc->sc_nrepid = ihidev_maxrepid(sc->sc_report, sc->sc_reportlen);
-   if (sc->sc_nrepid < 0)
+   if (sc->sc_nrepid < 0) {
+   printf("%s: nrepid %d\n", sc->sc_dev.dv_xname, sc->sc_nrepid);
+   if (sc->sc_ih) {
+   intr_disestablish(sc->sc_ih);
+   sc->sc_ih = NULL;
+   }
return;
+   }
 
printf("%s: %d report id%s\n", sc->sc_dev.dv_xname, sc->sc_nrepid,
sc->sc_nrepid > 1 ? "s" : "");
@@ -150,6 +156,10 @@ ihidev_attach(struct device *parent, str
M_DEVBUF, M_NOWAIT | M_ZERO);
if (sc->sc_subdevs == NULL) {
printf("%s: failed allocating memory\n", sc->sc_dev.dv_xname);
+   if (sc->sc_ih) {
+   intr_disestablish(sc->sc_ih);
+   sc->sc_ih = NULL;
+   }
return;
}
 



Re: panic in radeondrm_attachhook

2018-04-18 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 11:40:06AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> my i386 regression test machine crashed with the Tue Apr 17 snapshot
> in radeondrm_attachhook().

So the machine doesn't have /etc/firmware/radeon/R100_cp.bin ?

I have an i386 laptop that loads r100 microcode which works.

> 
> uvm_fault(0xd0ccfb70, 0xd11e, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  pmap_page_remove_pae+0x18:  cmpl$0,0x48(%edi)
> 
> pmap_page_remove_pae(d11e0800) at pmap_page_remove_pae+0x18
> ttm_tt_destroy(d5763980) at ttm_tt_destroy+0x61
> ttm_bo_cleanup_refs_or_queue(d572cb28) at ttm_bo_cleanup_refs_or_queue+0x2b3
> ttm_bo_unref(d0f61ed8) at ttm_bo_unref+0x69
> radeon_bo_unref(d572cb28) at radeon_bo_unref+0x27
> radeon_wb_fini(d572b000) at radeon_wb_fini+0x49
> r100_init(d572b000) at r100_init+0x409
> radeon_device_init(d572b000,d56ff400,d56ff43c,840001) at 
> radeon_device_init+0x797
> radeondrm_attachhook(d572b000) at radeondrm_attachhook+0x2b
> config_process_deferred_mountroot() at config_process_deferred_mountroot+0x2c
> main(0) at main+0x7bf
> 
> Full console output below.
> 
> bluhm
> 
> >> OpenBSD/i386 BOOT 3.31
> boot> 
> booting hd0a:/bsd: 8620607+2356228+188436+0+1101824 
> [695508+98+513600+533177]=0xd5e848
> entry point at 0x2000d4
> 
> [ using 1742920 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>   The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2018 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 6.3-current (GENERIC.MP) #559: Tue Apr 17 10:11:13 MDT 2018
> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> real mem  = 2145783808 (2046MB)
> avail mem = 2092011520 (1995MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: date 07/12/06, BIOS32 rev. 0 @ 0xfd450, SMBIOS rev. 2.51 @ 
> 0x7feea000 (35 entries)
> bios0: vendor Phoenix Technologies LTD version "6.00" date 07/12/2006
> bios0: Supermicro PDSM4+
> acpi0 at bios0: rev 0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP MCFG APIC BOOT SSDT
> acpi0: wakeup devices PXHA(S5) PXHB(S5) DEV3(S5) EXP1(S5) EXP5(S5) EXP6(S5) 
> PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) USB1(S4) USB2(S4) USB3(S4) 
> USB4(S4) EUSB(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0 addr 0xf000, bus 0-14
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 
> GHz
> cpu0: 
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,LAHF,PERF,SENSOR
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 266MHz
> cpu0: mwait min=64, max=64, C-substates=0.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 
> GHz
> cpu1: 
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,LAHF,PERF,SENSOR
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
> ioapic1 at mainbus0: apid 3 pa 0xfecc, version 20, 24 pins
> ioapic2 at mainbus0: apid 4 pa 0xfecc0400, version 20, 24 pins
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 2 (PXHA)
> acpiprt2 at acpi0: bus 3 (PXHB)
> acpiprt3 at acpi0: bus -1 (DEV3)
> acpiprt4 at acpi0: bus 9 (EXP1)
> acpiprt5 at acpi0: bus 13 (EXP5)
> acpiprt6 at acpi0: bus 14 (EXP6)
> acpiprt7 at acpi0: bus 15 (PCIB)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicmos0 at acpi0
> "PNP0A05" at acpi0 not configured
> acpibtn0 at acpi0: PWRB
> bios0: ROM list: 0xc/0xb000 0xcb000/0x1000 0xcc000/0x1000 0xcd000/0x2600
> ipmi at mainbus0 not configured
> cpu0: Enhanced SpeedStep disabled by BIOS
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> pchb0 at pci0 dev 0 function 0 "Intel E7230 Host" rev 0xc0
> ppb0 at pci0 dev 1 function 0 "Intel E7230 PCIE" rev 0xc0: apic 2 int 16
> pci1 at ppb0 bus 1
> ppb1 at pci1 dev 0 function 0 "Intel 6700PXH PCIE-PCIX" rev 0x09
> pci2 at ppb1 bus 2
> em0 at pci2 dev 2 function 0 "Intel 82545GM" rev 0x04: apic 3 int 4, address 
> 00:04:23:cd:41:fb
> "Intel IOxAPIC" rev 0x09 at pci1 dev 0 function 1 not configured
> ppb2 at pci1 dev 0 function 2 "Intel 6700PXH PCIE-PCIX" rev 0x09
> pci3 at ppb2 bus 3
> em1 at pci3 dev 1 function 0 "Intel 82545GM" rev 0x04: apic 4 int 0, address 
> 00:1b:21:0e:6e:8e
> ami0 at pci3 dev 3 function 0 "Symbios Logic MegaRAID" rev 0x01: apic 4 int 4
> ami0: LSI 520, 64b/lhc, FW 1L47, BIOS vG121, 128MB RAM
> ami0: 1 channels, 0 FC loops, 2 logical drives
> scsibus1 at ami0: 40 targets
> sd0 at scsibus1 targ 0 lun 0:  SCSI2 0/direct

Re: panic in radeondrm_attachhook

2018-04-18 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 08:30:16PM +1000, Jonathan Gray wrote:
> On Wed, Apr 18, 2018 at 11:40:06AM +0200, Alexander Bluhm wrote:
> > Hi,
> > 
> > my i386 regression test machine crashed with the Tue Apr 17 snapshot
> > in radeondrm_attachhook().
> 
> So the machine doesn't have /etc/firmware/radeon/R100_cp.bin ?
> 
> I have an i386 laptop that loads r100 microcode which works.

Even after removing the firmware I don't hit this.  Machine doesn't have
PAE though.

OpenBSD 6.3-current (GENERIC) #552: Tue Apr 17 22:07:30 MDT 2018
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
real mem  = 1341014016 (1278MB)
avail mem = 1301983232 (1241MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: date 06/02/06, BIOS32 rev. 0 @ 0xfd750, SMBIOS rev. 2.33 @ 
0xe0010 (61 entries)
bios0: vendor IBM version "1RETDPWW (3.21 )" date 06/02/2006
bios0: IBM 2378JZM
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT ECDT TCPA BOOT
acpi0: wakeup devices LID_(S3) SLPB(S3) UART(S3) PCI0(S3) PCI1(S4) USB0(S3) 
USB1(S3) AC9M(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (AGP_)
acpiprt2 at acpi0: bus 2 (PCI1)
acpipwrres0 at acpi0: PUBS, resource for USB0, USB1, USB7
acpitz0 at acpi0: critical temperature is 93 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpicmos0 at acpi0
"IBM0057" at acpi0 not configured
"IBM0071" at acpi0 not configured
acpibat0 at acpi0: BAT0 model "IBM-92P1011" serial  3457 type LION oem "SONY"
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0
acpidock0 at acpi0: DOCK not docked (0)
acpivideo0 at acpi0: VID_
bios0: ROM list: 0xc/0x1 0xd/0x1000 0xd1000/0x1000 0xdc000/0x4000! 
0xe/0x1
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Pentium(R) M processor 1.60GHz ("GenuineIntel" 686-class) 1.60 
GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE,EST,TM2,PERF
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: Enhanced SpeedStep 1595 MHz: speeds: 1600, 1400, 1200, 1000, 800, 600 MHz
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82855PM Host" rev 0x03
intelagp0 at pchb0
agp0 at intelagp0: aperture at 0xd000, size 0x1000
ppb0 at pci0 dev 1 function 0 "Intel 82855PM AGP" rev 0x03
pci1 at ppb0 bus 1
radeondrm0 at pci1 dev 0 function 0 "ATI Radeon Mobility M7" rev 0x00
drm0 at radeondrm0
radeondrm0: irq 11
...
root on wd0a (a84015999566fd16.a) swap on wd0b dump on wd0b
initializing kernel modesetting (RV200 0x1002:0x4C57 0x1014:0x0530).
radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
error: [drm:pid0:r100_cp_init] *ERROR* Failed to load firmware!
drm:pid0:r100_startup *ERROR* failed initializing CP (-2).
drm:pid0:r100_init *ERROR* Disabling GPU acceleration
ttm_pool_mm_shrink_fini: stub
radeondrm0: 1024x768, 8bpp
wsdisplay0 at radeondrm0 mux 1: console (std, vt100 emulation), using wskbd0
wsdisplay0: screen 1-5 added (std, vt100 emulation)


> 
> > 
> > uvm_fault(0xd0ccfb70, 0xd11e, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  pmap_page_remove_pae+0x18:  cmpl$0,0x48(%edi)
> > 
> > pmap_page_remove_pae(d11e0800) at pmap_page_remove_pae+0x18
> > ttm_tt_destroy(d5763980) at ttm_tt_destroy+0x61
> > ttm_bo_cleanup_refs_or_queue(d572cb28) at ttm_bo_cleanup_refs_or_queue+0x2b3
> > ttm_bo_unref(d0f61ed8) at ttm_bo_unref+0x69
> > radeon_bo_unref(d572cb28) at radeon_bo_unref+0x27
> > radeon_wb_fini(d572b000) at radeon_wb_fini+0x49
> > r100_init(d572b000) at r100_init+0x409
> > radeon_device_init(d572b000,d56ff400,d56ff43c,840001) at 
> > radeon_device_init+0x797
> > radeondrm_attachhook(d572b000) at radeondrm_attachhook+0x2b
> > config_process_deferred_mountroot() at 
> > config_process_deferred_mountroot+0x2c
> > main(0) at main+0x7bf
> > 
> > Full console output below.
> > 
> > bluhm
> > 
> > >> OpenBSD/i386 BOOT 3.31
> > boot> 
> > booting hd0a:/bsd: 8620607+2356228+188436+0+1101824 
> > [695508+98+513600+533177]=0xd5e848
> > entry point at 0x2000d4
> > 
> > [ using 1742920 bytes of bsd ELF symbol table ]
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> >   The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2018 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> > 
> > OpenBSD 6.3-current (GENERIC.MP) #559: Tue Apr 17 10:11:13 MDT 2018
> > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > real mem  = 2145783

Re: panic in radeondrm_attachhook

2018-04-18 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 02:15:50PM +0200, Alexander Bluhm wrote:
> On Wed, Apr 18, 2018 at 08:30:16PM +1000, Jonathan Gray wrote:
> > On Wed, Apr 18, 2018 at 11:40:06AM +0200, Alexander Bluhm wrote:
> > > my i386 regression test machine crashed with the Tue Apr 17 snapshot
> > > in radeondrm_attachhook().
> > 
> > So the machine doesn't have /etc/firmware/radeon/R100_cp.bin ?
> 
> No firmware installed.  And I don't use vga, console is serial.
> 
> bluhm
> 

sys/dev/pci/drm/drm_linux.h rev 1.85 should help

Though there is another problem I noticed.  When the firmware is not present
radeondrm is detached.  With the diff in snapshots a bus_space_unmap() call
is made with the wrong bus space tag in that path.

diff --git sys/dev/pci/drm/radeon/radeon_device.c 
sys/dev/pci/drm/radeon/radeon_device.c
index 9085bf845c4..ec631f66cd9 100644
--- sys/dev/pci/drm/radeon/radeon_device.c
+++ sys/dev/pci/drm/radeon/radeon_device.c
@@ -1582,7 +1582,7 @@ void radeon_device_fini(struct radeon_device *rdev)
rdev->rmmio = NULL;
 #else
if (rdev->rio_mem_size > 0)
-   bus_space_unmap(rdev->memt, rdev->rio_mem, rdev->rio_mem_size);
+   bus_space_unmap(rdev->iot, rdev->rio_mem, rdev->rio_mem_size);
rdev->rio_mem_size = 0;
 
if (rdev->rmmio_size > 0)



Re: panic in radeondrm_attachhook

2018-04-18 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 03:09:08PM +0200, Alexander Bluhm wrote:
> On Wed, Apr 18, 2018 at 10:29:30PM +1000, Jonathan Gray wrote:
> > sys/dev/pci/drm/drm_linux.h rev 1.85 should help
> 
> I have compiled and successfully booted a kernel with that commit.
> 
> bluhm
> 

That's with the big radeon update diff as well?  Those functions are not
called for radeon otherwise.



Re: panic in radeondrm_attachhook

2018-04-18 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 03:29:10PM -0700, Carlos Cardenas wrote:
> Howdy.
> 
> Please excuse this poor bug report, I'm not able to obtain more info as
> the system locks hard (below is transcribed by hand).
> 
> On a clean install, with the latest snap (as of time of email):
> OpenBSD 6.3-current (GENERIC.MP) #207: Wed Apr 18 11:37:15 MDT 2018
> 
> CPU: AMD A8-7670K Radeon R7 aka Kaveri
> 
> extent_free(819489f8,100,1000,7e281000) at extent_free+0x110
> bus_space_unmap(0,800de000,0) at bus_space_unmap+0x110
> radeon_device_fini(800de000) at radeon_device_fini+0x43
> radeondrm_detach_kms(0,800de000) at radeondrm_detach_kms+0x33
> config_detach(800dd600,800de000) at config_detach+0x14e
> radeondrm_attachhook(81bf9420) at radeondrm_attachhook+0x9d
> config_process_deferred_mountroot() at config_process_deferred_mountroot+0x56
> main(6dd84e08) at main+0x80f

See the other thread on bugs.  Not sure when a snapshot with the fix
will go out.

diff --git sys/dev/pci/drm/radeon/radeon_device.c 
sys/dev/pci/drm/radeon/radeon_device.c
index 9085bf845c4..ec631f66cd9 100644
--- sys/dev/pci/drm/radeon/radeon_device.c
+++ sys/dev/pci/drm/radeon/radeon_device.c
@@ -1582,7 +1582,7 @@ void radeon_device_fini(struct radeon_device *rdev)
rdev->rmmio = NULL;
 #else
if (rdev->rio_mem_size > 0)
-   bus_space_unmap(rdev->memt, rdev->rio_mem, rdev->rio_mem_size);
+   bus_space_unmap(rdev->iot, rdev->rio_mem, rdev->rio_mem_size);
rdev->rio_mem_size = 0;
 
if (rdev->rmmio_size > 0)



Re: panic in radeondrm_attachhook

2018-04-18 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 06:59:42PM -0700, Carlos Cardenas wrote:
> On Thu, Apr 19, 2018 at 09:35:19AM +1000, Jonathan Gray wrote:
> > On Wed, Apr 18, 2018 at 03:29:10PM -0700, Carlos Cardenas wrote:
> > > Howdy.
> > > 
> > > Please excuse this poor bug report, I'm not able to obtain more info as
> > > the system locks hard (below is transcribed by hand).
> > > 
> > > On a clean install, with the latest snap (as of time of email):
> > > OpenBSD 6.3-current (GENERIC.MP) #207: Wed Apr 18 11:37:15 MDT 2018
> > > 
> > > CPU: AMD A8-7670K Radeon R7 aka Kaveri
> > > 
> > > extent_free(819489f8,100,1000,7e281000) at extent_free+0x110
> > > bus_space_unmap(0,800de000,0) at bus_space_unmap+0x110
> > > radeon_device_fini(800de000) at radeon_device_fini+0x43
> > > radeondrm_detach_kms(0,800de000) at radeondrm_detach_kms+0x33
> > > config_detach(800dd600,800de000) at config_detach+0x14e
> > > radeondrm_attachhook(81bf9420) at radeondrm_attachhook+0x9d
> > > config_process_deferred_mountroot() at 
> > > config_process_deferred_mountroot+0x56
> > > main(6dd84e08) at main+0x80f
> > 
> > See the other thread on bugs.  Not sure when a snapshot with the fix
> > will go out.
> 
> I'm tracking bluhm's thread as well.
> 
> With snap #209 (which has that patch), we get further along from a clean
> install:
> 
> initializing kernel modesetting (KAVERI 0x1002:0x1313 0x1462:0x7969).
> cik_cp: Failed to load firmware "radeon/KAVERI_pfp.bin"
> error: [drm:pid0:cik_init] *ERROR* Failed to load Firmware!
> drm:pid0:radeondrm_attachhook *ERROR* Fatal error during GPU init
> ttm_pool_mm_shrink_fini: stub
> drm0 detached
> radeondrm0 detached
> vendor "ATI", unknown product 0x1313 (class display subclass VGA, rev 0xd4) at
> pci0 dev 1 function 0 not configured
> init: can't open /dev/console: Device not configured
> init: can't open /dev/console: Device not configured
> init: can't open /dev/console: Device not configured
> [repeats]

So I'm quite sure you are booting via uefi with efifb here.
There was previously no code that reprobed efifb as efifb isn't probed
via pci devices.

The following on top of radeon.diff.2 works here when removing the
firmware and booting via efiboot on a mullins system.

diff --git sys/arch/amd64/amd64/efifb.c sys/arch/amd64/amd64/efifb.c
index 609de484ae0..42ae666540a 100644
--- sys/arch/amd64/amd64/efifb.c
+++ sys/arch/amd64/amd64/efifb.c
@@ -484,6 +484,12 @@ efifb_cndetach(void)
efifb_console.detached = 1;
 }
 
+void
+efifb_cnreattach(void)
+{
+   efifb_console.detached = 0;
+}
+
 int
 efifb_cb_cnattach(void)
 {
diff --git sys/arch/amd64/amd64/mainbus.c sys/arch/amd64/amd64/mainbus.c
index ed6ab059329..835e798ef2b 100644
--- sys/arch/amd64/amd64/mainbus.c
+++ sys/arch/amd64/amd64/mainbus.c
@@ -261,6 +261,20 @@ mainbus_attach(struct device *parent, struct device *self, 
void *aux)
 #endif
 }
 
+void
+mainbus_efifb_reattach(void)
+{
+   union mainbus_attach_args mba;
+   struct device *self = device_mainbus();
+#if NEFIFB > 0
+   if (bios_efiinfo != NULL || efifb_cb_found()) {
+   efifb_cnreattach();
+   mba.mba_eaa.eaa_name = "efifb";
+   config_found(self, &mba, mainbus_print);
+   }
+#endif
+}
+
 int
 mainbus_print(void *aux, const char *pnp)
 {
diff --git sys/arch/amd64/include/efifbvar.h sys/arch/amd64/include/efifbvar.h
index f5e2bb26cae..a213811cba6 100644
--- sys/arch/amd64/include/efifbvar.h
+++ sys/arch/amd64/include/efifbvar.h
@@ -28,6 +28,7 @@ struct pci_attach_args;
 int efifb_cnattach(void);
 int efifb_is_console(struct pci_attach_args *);
 void efifb_cndetach(void);
+void efifb_cnreattach(void);
 
 int efifb_cb_found(void);
 int efifb_cb_cnattach(void);
diff --git sys/dev/pci/drm/radeon/radeon_kms.c 
sys/dev/pci/drm/radeon/radeon_kms.c
index 6523819ba04..ce2ac47fb84 100644
--- sys/dev/pci/drm/radeon/radeon_kms.c
+++ sys/dev/pci/drm/radeon/radeon_kms.c
@@ -47,6 +47,7 @@ extern int vga_console_attached;
 
 #ifdef __amd64__
 #include "efifb.h"
+#include 
 #endif
 
 #if NEFIFB > 0
@@ -642,6 +643,8 @@ radeondrm_attach_kms(struct device *parent, struct device 
*self, void *aux)
config_mountroot(self, radeondrm_attachhook);
 }
 
+extern void mainbus_efifb_reattach(void);
+
 int
 radeondrm_forcedetach(struct radeon_device *rdev)
 {
@@ -653,8 +656,19 @@ radeondrm_forcedetach(struct radeon_device *rdev)
vga_console_attached = 0;
 #endif
 
-   config_detach(&rdev->self, 0);
-   return pci_probe_device(sc, tag, NULL, NULL);
+   /* reprobe pci device for non efi systems */
+#if NEFIFB > 0
+   if (bios_efiinfo == NULL && !efifb_cb_found()) {
+#endif
+   config_detach(&rdev->self, 0);
+   return pci_probe_device(sc, tag, NULL, NULL);
+#if NEFIFB > 0
+   } else if (rdev->console) {
+   mainbus_efifb_reattach();
+   }
+#endif
+
+   return 0;
 }
 
 void



Re: panic in radeondrm_attachhook

2018-04-19 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 08:30:16PM +1000, Jonathan Gray wrote:
> On Wed, Apr 18, 2018 at 11:40:06AM +0200, Alexander Bluhm wrote:
> > Hi,
> > 
> > my i386 regression test machine crashed with the Tue Apr 17 snapshot
> > in radeondrm_attachhook().
> 
> So the machine doesn't have /etc/firmware/radeon/R100_cp.bin ?
> 
> I have an i386 laptop that loads r100 microcode which works.

If I force radeon_agp_init() to fail early and remove the firmware
I see it:

initializing kernel modesetting (RV200 0x1002:0x4C57 0x1014:0x0530).
radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
error: [drm:pid0:r100_cp_init] *ERROR* Failed to load firmware!
drm:pid0:r100_startup *ERROR* failed initializing CP (-2).
drm:pid0:r100_init *ERROR* Disabling GPU acceleration
ttm_tt_clear_mapping: flags 0x80 num_pages 1
ttm_tt_clear_mapping page 0 addr 0xd113742c
uvm_fault(0xd0ca89d8, 0xd1137000, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  pmap_page_remove_86+0x18:   cmpl$0,0x48(%ebx)
ddb{0}> tr
pmap_page_remove_86(d113742c) at pmap_page_remove_86+0x18
ttm_tt_destroy(d3cd2b00) at ttm_tt_destroy+0x95
ttm_bo_cleanup_refs_or_queue(d3c87b28,d3c87b28,d3c86000,d0eb7ee4,d057e567) at t
tm_bo_cleanup_refs_or_queue+0x2af
ttm_bo_unref(d0eb7ed8) at ttm_bo_unref+0x69
radeon_bo_unref(d3c87b28) at radeon_bo_unref+0x27
radeon_wb_fini(d3c86000) at radeon_wb_fini+0x49
r100_init(d3c86000) at r100_init+0x409
radeon_device_init(d3c86000,d3c89000,d3c8903c,90003) at radeon_device_init+0x797
radeondrm_attachhook(d3c86000) at radeondrm_attachhook+0x2b
config_process_deferred_mountroot(d3c25ccc,eb5000,ec4000,0,d02004d1) at config_
process_deferred_mountroot+0x2c
uvm_fault(0xd0ca89d8, 0xd020, 0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...
ddb{0}> sh reg
ds  0x10
es  0x10
fs  0x20
gs 0
edi0
esi   0xd3cd2b00end+0x2f1db00
ebp   0xd0eb7e80end+0x102e80
ebx   0xd113742cend+0x38242c
edx   0xd0c8dff8cpu_info_full_primary+0x1ff8
ecx0x100
eax   0xa981c212
eip   0xd0364558pmap_page_remove_86+0x18
cs   0x8
eflags   0x10292__ALIGN_SIZE+0xf292
esp   0xd0eb7e58end+0x102e58
ss  0x10
pmap_page_remove_86+0x18:   cmpl$0,0x48(%ebx)

> 
> > 
> > uvm_fault(0xd0ccfb70, 0xd11e, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  pmap_page_remove_pae+0x18:  cmpl$0,0x48(%edi)
> > 
> > pmap_page_remove_pae(d11e0800) at pmap_page_remove_pae+0x18
> > ttm_tt_destroy(d5763980) at ttm_tt_destroy+0x61
> > ttm_bo_cleanup_refs_or_queue(d572cb28) at ttm_bo_cleanup_refs_or_queue+0x2b3
> > ttm_bo_unref(d0f61ed8) at ttm_bo_unref+0x69
> > radeon_bo_unref(d572cb28) at radeon_bo_unref+0x27
> > radeon_wb_fini(d572b000) at radeon_wb_fini+0x49
> > r100_init(d572b000) at r100_init+0x409
> > radeon_device_init(d572b000,d56ff400,d56ff43c,840001) at 
> > radeon_device_init+0x797
> > radeondrm_attachhook(d572b000) at radeondrm_attachhook+0x2b
> > config_process_deferred_mountroot() at 
> > config_process_deferred_mountroot+0x2c
> > main(0) at main+0x7bf
> > 
> > Full console output below.
> > 
> > bluhm
> > 
> > >> OpenBSD/i386 BOOT 3.31
> > boot> 
> > booting hd0a:/bsd: 8620607+2356228+188436+0+1101824 
> > [695508+98+513600+533177]=0xd5e848
> > entry point at 0x2000d4
> > 
> > [ using 1742920 bytes of bsd ELF symbol table ]
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> >   The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2018 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> > 
> > OpenBSD 6.3-current (GENERIC.MP) #559: Tue Apr 17 10:11:13 MDT 2018
> > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > real mem  = 2145783808 (2046MB)
> > avail mem = 2092011520 (1995MB)
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: date 07/12/06, BIOS32 rev. 0 @ 0xfd450, SMBIOS rev. 2.51 
> > @ 0x7feea000 (35 entries)
> > bios0: vendor Phoenix Technologies LTD version "6.00" date 07/12/2006
> > bios0: Supermicro PDSM4+
> > acpi0 at bios0: rev 0
> > acpi0: sleep states S0 S1 S4 S5
> > acpi0: tables DSDT FACP MCFG APIC BOOT SSDT
> > acpi0: wakeup devices PXHA(S5) PXHB(S5) DEV3(S5) EXP1(S5) EXP5(S5) EXP6(S5) 
> > PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) USB1(S4) USB2(S4) USB3(S4) 
> > USB4(S4) EUSB(S4)
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits

Re: panic in radeondrm_attachhook

2018-04-19 Thread Jonathan Gray
On Thu, Apr 19, 2018 at 09:46:54AM -0700, Carlos Cardenas wrote:
> On Thu, Apr 19, 2018 at 03:03:51PM +1000, Jonathan Gray wrote:
> > On Wed, Apr 18, 2018 at 06:59:42PM -0700, Carlos Cardenas wrote:
> > > On Thu, Apr 19, 2018 at 09:35:19AM +1000, Jonathan Gray wrote:
> > > > On Wed, Apr 18, 2018 at 03:29:10PM -0700, Carlos Cardenas wrote:
> > > > > Howdy.
> > > > > 
> > > > > Please excuse this poor bug report, I'm not able to obtain more info 
> > > > > as
> > > > > the system locks hard (below is transcribed by hand).
> > > > > 
> > > > > On a clean install, with the latest snap (as of time of email):
> > > > > OpenBSD 6.3-current (GENERIC.MP) #207: Wed Apr 18 11:37:15 MDT 2018
> > > > > 
> > > > > CPU: AMD A8-7670K Radeon R7 aka Kaveri
> > > > > 
> > > > > extent_free(819489f8,100,1000,7e281000) at extent_free+0x110
> > > > > bus_space_unmap(0,800de000,0) at bus_space_unmap+0x110
> > > > > radeon_device_fini(800de000) at radeon_device_fini+0x43
> > > > > radeondrm_detach_kms(0,800de000) at radeondrm_detach_kms+0x33
> > > > > config_detach(800dd600,800de000) at 
> > > > > config_detach+0x14e
> > > > > radeondrm_attachhook(81bf9420) at radeondrm_attachhook+0x9d
> > > > > config_process_deferred_mountroot() at 
> > > > > config_process_deferred_mountroot+0x56
> > > > > main(6dd84e08) at main+0x80f
> > > > 
> > > > See the other thread on bugs.  Not sure when a snapshot with the fix
> > > > will go out.
> > > 
> > > I'm tracking bluhm's thread as well.
> > > 
> > > With snap #209 (which has that patch), we get further along from a clean
> > > install:
> > > 
> > > initializing kernel modesetting (KAVERI 0x1002:0x1313 0x1462:0x7969).
> > > cik_cp: Failed to load firmware "radeon/KAVERI_pfp.bin"
> > > error: [drm:pid0:cik_init] *ERROR* Failed to load Firmware!
> > > drm:pid0:radeondrm_attachhook *ERROR* Fatal error during GPU init
> > > ttm_pool_mm_shrink_fini: stub
> > > drm0 detached
> > > radeondrm0 detached
> > > vendor "ATI", unknown product 0x1313 (class display subclass VGA, rev 
> > > 0xd4) at
> > > pci0 dev 1 function 0 not configured
> > > init: can't open /dev/console: Device not configured
> > > init: can't open /dev/console: Device not configured
> > > init: can't open /dev/console: Device not configured
> > > [repeats]
> > 
> > So I'm quite sure you are booting via uefi with efifb here.
> > There was previously no code that reprobed efifb as efifb isn't probed
> > via pci devices.
> > 
> > The following on top of radeon.diff.2 works here when removing the
> > firmware and booting via efiboot on a mullins system.
> > 
> 
> That was itattached are two dmesg's:
> * one after a clean install with no firmware loaded
> * one after first boot with firmware loaded
> 
> With the firmware loaded, there's an artifact of a vertical purple line
> along the left side of the screen (x=0) but everything seems to be ok (X
> looks nice).

Are you setting a different gop mode via efiboot/boot.conf ?

If you compile a kernel with efifb commented out (not just disabled via ukc)
do the ring tests pass?

Proper acceleration on kaveri requires building Mesa 17.3 against ports
libLLVM.  But the ring tests should still pass without that.

> 
> Thanks for the hard work getting radeon updated.
> 
> +--+
> Carlos

> OpenBSD 6.3-current (GENERIC.MP) #0: Thu Apr 19 08:52:15 PDT 2018
> los@rollo.castle:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16046288896 (15302MB)
> avail mem = 15552057344 (14831MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xecc20 (55 entries)
> bios0: vendor American Megatrends Inc. version "V1.3" date 04/13/2016
> bios0: MSI MS-7969
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG HPET UEFI IVRS SSDT SSDT CRAT 
> SSDT SSDT
> acpi0: wakeup devices SBAZ(S4) P0PC(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) 
> OHC3(S4) EHC3(S4) OHC4(S4) XHC0(S4) XHC1(S4) PE20(S4) PE21(S4) PE23(S4) 
> PB21(S4) PB22(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 16 (boot processo

Re: panic in radeondrm_attachhook

2018-04-24 Thread Jonathan Gray
On Wed, Apr 18, 2018 at 06:55:35PM +0200, Alexander Bluhm wrote:
> On Thu, Apr 19, 2018 at 12:10:04AM +1000, Jonathan Gray wrote:
> > That's with the big radeon update diff as well?
> 
> That was current.  Boots fine.
> 
> With ~jsg/radeon.diff.2 it still crashes.  I have checked, it is
> the diff with
> 
> if (rdev->rio_mem_size > 0)
> bus_space_unmap(rdev->iot, rdev->rio_mem, rdev->rio_mem_size);
> 
> initializing kernel modesetting (RV100 0x1002:0x515E 0x1002:0x515E).
> radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
> error: [drm:pid0:r100_cp_init] *ERROR* Failed to load firmware!
> drm:pid0:r100_startup *ERROR* failed initializing CP (-2).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> uvm_fault(0xd0cae21c, 0xd11e2000, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  pmap_page_remove_pae+0x18:  cmpl$0,0x48(%edi)
> ddb{0}> trace
> pmap_page_remove_pae(d11e2768) at pmap_page_remove_pae+0x18
> ttm_tt_destroy(d5765a00) at ttm_tt_destroy+0x61
> ttm_bo_cleanup_refs_or_queue(d572eb28) at ttm_bo_cleanup_refs_or_queue+0x2b3
> ttm_bo_unref(d0f63ed8) at ttm_bo_unref+0x69
> radeon_bo_unref(d572eb28) at radeon_bo_unref+0x27
> radeon_wb_fini(d572d000) at radeon_wb_fini+0x49
> r100_init(d572d000) at r100_init+0x409
> radeon_device_init(d572d000,d5701400,d570143c,840001) at 
> radeon_device_init+0x7
> 97
> radeondrm_attachhook(d572d000) at radeondrm_attachhook+0x2b
> config_process_deferred_mountroot() at config_process_deferred_mountroot+0x2c
> main(0) at main+0x7bf

After spending some time trying to track this down I have come up with
the diff below and included it in ~jsg/radeon.diff.4 can you confirm
that it works for you as well?

diff --git sys/dev/pci/drm/ttm/ttm_bo_util.c sys/dev/pci/drm/ttm/ttm_bo_util.c
index 3b26d865be2..da6c459bd39 100644
--- sys/dev/pci/drm/ttm/ttm_bo_util.c
+++ sys/dev/pci/drm/ttm/ttm_bo_util.c
@@ -644,7 +644,7 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map)
vunmap(map->virtual, bo->mem.bus.size);
break;
case ttm_bo_map_kmap:
-   kunmap(map->page);
+   kunmap(map->virtual);
break;
case ttm_bo_map_premapped:
break;



Re: panic in radeondrm_attachhook

2018-04-24 Thread Jonathan Gray
On Tue, Apr 24, 2018 at 01:53:07PM +0200, Mark Kettenis wrote:
> > Date: Tue, 24 Apr 2018 20:18:42 +1000
> > From: Jonathan Gray 
> > 
> > On Wed, Apr 18, 2018 at 06:55:35PM +0200, Alexander Bluhm wrote:
> > > On Thu, Apr 19, 2018 at 12:10:04AM +1000, Jonathan Gray wrote:
> > > > That's with the big radeon update diff as well?
> > > 
> > > That was current.  Boots fine.
> > > 
> > > With ~jsg/radeon.diff.2 it still crashes.  I have checked, it is
> > > the diff with
> > > 
> > > if (rdev->rio_mem_size > 0)
> > > bus_space_unmap(rdev->iot, rdev->rio_mem, 
> > > rdev->rio_mem_size);
> > > 
> > > initializing kernel modesetting (RV100 0x1002:0x515E 0x1002:0x515E).
> > > radeon_cp: Failed to load firmware "radeon/R100_cp.bin"
> > > error: [drm:pid0:r100_cp_init] *ERROR* Failed to load firmware!
> > > drm:pid0:r100_startup *ERROR* failed initializing CP (-2).
> > > drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> > > uvm_fault(0xd0cae21c, 0xd11e2000, 0, 1) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at  pmap_page_remove_pae+0x18:  cmpl$0,0x48(%edi)
> > > ddb{0}> trace
> > > pmap_page_remove_pae(d11e2768) at pmap_page_remove_pae+0x18
> > > ttm_tt_destroy(d5765a00) at ttm_tt_destroy+0x61
> > > ttm_bo_cleanup_refs_or_queue(d572eb28) at 
> > > ttm_bo_cleanup_refs_or_queue+0x2b3
> > > ttm_bo_unref(d0f63ed8) at ttm_bo_unref+0x69
> > > radeon_bo_unref(d572eb28) at radeon_bo_unref+0x27
> > > radeon_wb_fini(d572d000) at radeon_wb_fini+0x49
> > > r100_init(d572d000) at r100_init+0x409
> > > radeon_device_init(d572d000,d5701400,d570143c,840001) at 
> > > radeon_device_init+0x7
> > > 97
> > > radeondrm_attachhook(d572d000) at radeondrm_attachhook+0x2b
> > > config_process_deferred_mountroot() at 
> > > config_process_deferred_mountroot+0x2c
> > > main(0) at main+0x7bf
> > 
> > After spending some time trying to track this down I have come up with
> > the diff below and included it in ~jsg/radeon.diff.4 can you confirm
> > that it works for you as well?
> 
> That fix is correct.  UVM works differently from the Linux VM system,
> so we can't really implement kunmap().  Maybe we should rename it (to
> kunmap_virt()?).

That is a good idea.  I'd like to get the radeondrm update in first though.

> 
> Cheers,
> 
> Mark
> 
> 
> > diff --git sys/dev/pci/drm/ttm/ttm_bo_util.c 
> > sys/dev/pci/drm/ttm/ttm_bo_util.c
> > index 3b26d865be2..da6c459bd39 100644
> > --- sys/dev/pci/drm/ttm/ttm_bo_util.c
> > +++ sys/dev/pci/drm/ttm/ttm_bo_util.c
> > @@ -644,7 +644,7 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map)
> > vunmap(map->virtual, bo->mem.bus.size);
> > break;
> > case ttm_bo_map_kmap:
> > -   kunmap(map->page);
> > +   kunmap(map->virtual);
> > break;
> > case ttm_bo_map_premapped:
> > break;
> > 
> > 
> 



Re: panic in radeondrm_attachhook

2018-04-24 Thread Jonathan Gray
On Tue, Apr 24, 2018 at 02:25:24PM +0200, Alexander Bluhm wrote:
> On Tue, Apr 24, 2018 at 08:18:42PM +1000, Jonathan Gray wrote:
> > After spending some time trying to track this down I have come up with
> > the diff below and included it in ~jsg/radeon.diff.4 can you confirm
> > that it works for you as well?
> 
> With this diff my machine boots fine.  No monitor connect, so I
> cannot test X11.  When I blindly run startx and kill it with Ctrl-C,
> I get on the console:
> 
> Can't enable IRQ/MSI because no handler is installed

This is expected, the interrupt handler is only established when the ring
tests pass and acceleration is enabled.  When init is aborted after
firmware is not found acceleration is disabled.

> 
> I have no firmware, so I do not expect that it works.  No crash is
> enough for me.

Thanks for reporting and testing the fix.

> 
> bluhm
> 
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>   
> The Regents of the University of California.  All rights reserved.
>   
> Copyright (c) 1995-2018 OpenBSD. All rights reserved.  
> https://www.OpenBSD.org  
> 
> OpenBSD 6.3-current (GENERIC.MP) #0: Tue Apr 24 14:03:04 CEST 2018
> r...@ot1.obsd-lab.genua.de:/usr/src/sys/arch/i386/compile/GENERIC.MP
> real mem  = 2145783808 (2046MB)
> avail mem = 2091991040 (1995MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: date 07/12/06, BIOS32 rev. 0 @ 0xfd450, SMBIOS rev. 2.51 @ 
> 0x7feea000 (35 entries)
> bios0: vendor Phoenix Technologies LTD version "6.00" date 07/12/2006
> bios0: Supermicro PDSM4+
> acpi0 at bios0: rev 0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP MCFG APIC BOOT SSDT
> acpi0: wakeup devices PXHA(S5) PXHB(S5) DEV3(S5) EXP1(S5) EXP5(S5) EXP6(S5) 
> PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) USB1(S4) USB2(S4) USB3(S4) 
> USB4(S4) EUSB(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0 addr 0xf000, bus 0-14
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 
> GHz
> cpu0: 
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,LAHF,PERF,SENSOR
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 266MHz
> cpu0: mwait min=64, max=64, C-substates=0.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 
> GHz
> cpu1: 
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,LONG,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,LAHF,PERF,SENSOR
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
> ioapic1 at mainbus0: apid 3 pa 0xfecc, version 20, 24 pins
> ioapic2 at mainbus0: apid 4 pa 0xfecc0400, version 20, 24 pins
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 2 (PXHA)
> acpiprt2 at acpi0: bus 3 (PXHB)
> acpiprt3 at acpi0: bus -1 (DEV3)
> acpiprt4 at acpi0: bus 9 (EXP1)
> acpiprt5 at acpi0: bus 13 (EXP5)
> acpiprt6 at acpi0: bus 14 (EXP6)
> acpiprt7 at acpi0: bus 15 (PCIB)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpicmos0 at acpi0
> "PNP0A05" at acpi0 not configured
> acpibtn0 at acpi0: PWRB
> bios0: ROM list: 0xc/0xb000 0xcb000/0x1000 0xcc000/0x1000 0xcd000/0x2600
> ipmi at mainbus0 not configured
> cpu0: Enhanced SpeedStep disabled by BIOS
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> pchb0 at pci0 dev 0 function 0 "Intel E7230 Host" rev 0xc0
> ppb0 at pci0 dev 1 function 0 "Intel E7230 PCIE" rev 0xc0: apic 2 int 16
> pci1 at ppb0 bus 1
> ppb1 at pci1 dev 0 function 0 "Intel 6700PXH PCIE-PCIX" rev 0x09
> pci2 at ppb1 bus 2
> em0 at pci2 dev 2 function 0 "Intel 82545GM" rev 0x04: apic 3 int 4, address 
> 00:04:23:cd:41:fb
> "Intel IOxAPIC" rev 0x09 at pci1 dev 0 function 1 not configured
> ppb2 at pci1 dev 0 function 2 "Intel 6700PXH PCIE-PCIX" rev 0x09
> pci3 at ppb2 bus 3
> em1 at pci3 dev 1 function 0 "Intel 82545GM" rev 0x04: apic 4 int 0, address 
> 00:1b:21:0e:6e:8e
> ami0 at pci3 dev 3 function 0 "Symbios Logic MegaRAID" rev 0x01: apic 4 int 4
> ami0: LSI 520, 64b/lhc, FW 1L47, BIOS vG121, 128MB RAM
> ami0: 1 channels, 0 FC loops, 2 logical drives
> scsibus1 at ami0: 40 targets
> sd0 at scsibus1 targ 0 lun 0:  SCSI2 0/direct

Re: usb keyboard stopped working amd64

2018-05-21 Thread Jonathan Gray
On Mon, May 21, 2018 at 11:37:31PM +0100, Nigel Taylor wrote:
> On 05/21/18 22:59, Nigel Taylor wrote:
> > Upgraded machine, rather than default entered shell to do final task.
> > When finished typed in reboot
> > 
> > Machine came up, and reached the normal login prompt.
> > 
> > Tried to login typing on keyboard nothing happened, initial thought had
> > hung. Checked from another machine and could ssh and complete the
> > packages upgrades, in case some package was causing issue.
> > 
> > Once complete used shutdown -r now, the same happened. Did shutdown -r
> > now, this time typed bsd.rd at boot prompt, bsd.rd I could still type,
> > entered shell and tried bsd.sp same issue with not being able to type.
> > 
> > As part of the upgrade I saved previous bsd / bsd.sp / bsd.rd, so booted
> > and entered bsd.1 the previous bsd before the upgrade, The usb keyboard
> > is working again. The previous bsd was created on 20th Apr, the upgrade
> > bsd created on the 20th May.
> > 
> > The /bsd from 20th May had been running on another machine, but I normal
> > only ever ssh to that machine, or use old serial console, rather than
> > the usb keyboard. Check the usb keyboard does appear that it's not
> > working either. num lock/caps keys don't turn lights on/off.
> > 
> > Both machines are nearly identical running GENERIC.MP amd64 kernel, the
> > one still running the newer /bsd is use to build packages, releases and
> > has been up running without other issues running dpb builds. The only
> > problem seems to be the usb keyboard.
> > 
> > Going to build machine that has an old /bsd from 6th May tried that and
> > looked like keyboard wasn't working, but noted keyboard lights flashed
> > when reached radeondrm driver.
> > 
> > Did a boot -c and disabled radeondrm continued and cap/num lock lights
> > working. Back to upgraded machine, and did the same boot -c and disabled
> > radeondrm and usb keyboard back working.
> > 
> > However the screen doesn't look very nice, lower resolution.
> > 
> > 
> > $ dmesg | grep -i radeon
> > UKC> disable radeon
> > 228 radeondrm* at pci* dev -1 function -1 flags 0x0
> > UKC> disable radeondrm
> > 228 radeondrm* disabled
> > vga1 at pci1 dev 5 function 0 "ATI Radeon HD 4250" rev 0x00
> > azalia0 at pci1 dev 5 function 1 "ATI Radeon HD 4200 HD Audio" rev 0x00: msi
> > 
> > 
> > $ dmesg | egrep -i "radeon|modesetting"
> > radeondrm0 at pci1 dev 5 function 0 "ATI Radeon HD 4250" rev 0x00
> > drm0 at radeondrm0
> > radeondrm0: apic 2 int 18
> > azalia0 at pci1 dev 5 function 1 "ATI Radeon HD 4200 HD Audio" rev 0x00: msi
> > initializing kernel modesetting (RS880 0x1002:0x9715 0x1458:0xD000).
> > radeondrm0: 1024x768, 32bpp
> > wsdisplay0 at radeondrm0 mux 1: console (std, vt100 emulation), using wskbd0
> > 
> > 
> >>>From cvs log between 20th April to 6th May ...
> > 
> > revision 1.7
> > date: 2018/04/25 01:27:46;  author: jsg;  state: Exp;  lines: +906 -21;
> > commitid: sATjL4ONH9UfGvNV;
> > update ttm and radeondrm(4) to Linux 4.4.129
> > 
> > 
> > 
> 
> Update:
> 
> I found a usb to PS2 adapter, this allowed the keyboard to work. But the
> usb mouse stopped working.
> 
> Wondered about some sort of clash so moved mouse to difference usb port,
> that worked, then tried keyboard in another usb port, and now have both
> keyboard and mouse working in usb ports, but some usb ports don't work
> any more.

try this

Index: cik.c
===
RCS file: /cvs/src/sys/dev/pci/drm/radeon/cik.c,v
retrieving revision 1.1
diff -u -p -r1.1 cik.c
--- cik.c   25 Apr 2018 01:27:46 -  1.1
+++ cik.c   22 May 2018 03:58:50 -
@@ -7901,6 +7901,8 @@ int cik_irq_process(struct radeon_device
 
wptr = cik_get_ih_wptr(rdev);
 
+   if (wptr == rdev->ih.rptr)
+   return IRQ_NONE;
 restart_ih:
/* is somebody else already processing irqs? */
if (atomic_xchg(&rdev->ih.lock, 1))
Index: evergreen.c
===
RCS file: /cvs/src/sys/dev/pci/drm/radeon/evergreen.c,v
retrieving revision 1.22
diff -u -p -r1.22 evergreen.c
--- evergreen.c 25 Apr 2018 01:27:46 -  1.22
+++ evergreen.c 22 May 2018 03:57:23 -
@@ -5072,6 +5072,8 @@ int evergreen_irq_process(struct radeon_
 
wptr = evergreen_get_ih_wptr(rdev);
 
+   if (wptr == rdev->ih.rptr)
+   return IRQ_NONE;
 restart_ih:
/* is somebody else already processing irqs? */
if (atomic_xchg(&rdev->ih.lock, 1))
Index: r600.c
===
RCS file: /cvs/src/sys/dev/pci/drm/radeon/r600.c,v
retrieving revision 1.22
diff -u -p -r1.22 r600.c
--- r600.c  25 Apr 2018 01:27:46 -  1.22
+++ r600.c  22 May 2018 03:57:54 -
@@ -4058,6 +4058,8 @@ int r600_irq_process(struct radeon_devic
 
wptr = r600_get_ih_wptr(rdev);
 
+   if (wptr == rdev->ih.rptr)
+   return IRQ_NONE;
 

Re: Lenovo M700 Tiny Type 10HY inteldrm

2018-07-20 Thread Jonathan Gray
On Fri, Jul 20, 2018 at 07:49:39AM +0100, Stuart Henderson wrote:
> On 2018/07/19 21:39, Jim wrote:
> > My apologies, I missed that section. Everything is working fine except 
> > hardware video acceleration. 
> > 
> > inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 530" rev 0x06
> > drm0 at inteldrm0
> > inteldrm0: msi
> > error: [drm:pid0:i915_firmware_load_error_print] *ERROR* failed to load 
> > firmware i915/skl_dmc_ver1.bin (-22)
> 
> Is the inteldrm firmware installed? pkg_info | grep firmware
> 
> If not, run fw_update as root (needs internet access).
> 

No, that is expected.  Firmware loading is not wired up for inteldrm and
apparently not strictly required for skylake.



Re: Kernel crash on i386 -current

2018-07-23 Thread Jonathan Gray
On Mon, Jul 23, 2018 at 11:09:28AM +0200, Alexander Bluhm wrote:
> On Sat, Jul 21, 2018 at 09:30:36PM +0200, Eivind Eide wrote:
> > There are no crash with drm disabled. I'm attaching the dmesg to this
> > mail. So the update to radeondrm are the cause.
> 
> So we know that "ATI Radeon Mobility M7" is broken with the new drm
> code.  Unfortunately I have no experience with that part of the
> kernel.  I was just searching for other i386 regressions.
> 
> I anybody else interested in fixing this old hardware?
> Does it work with the Linux version of the driver?

The radeondrm update worked on a t42 running i386 with
radeondrm0 at pci1 dev 0 function 0 "ATI Radeon Mobility M7" rev 0x00
at the time it went in.



Re: No audio/sound

2018-08-16 Thread Jonathan Gray
On Fri, Aug 17, 2018 at 11:10:52AM +1000, sch...@gmail.com wrote:
> >Synopsis:Unable to hear any audio/sound when playing
> >Category:audio
> >Environment:
>   System  : OpenBSD 6.3
>   Details : OpenBSD 6.3 (GENERIC.MP) #8: Sat Aug  4 16:56:56 CEST 2018
>
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Unable to hear any audio. When checking with audioctl, play.errors are 
> incrementing.
> >How-To-Repeat:
>   Install OpenBSD on this hardware and attempt to play audio.
> I've also tested this on FreeBSD and was unable to hear audio here 
> either.
> When I tested in OS-X and Fedora (Linux) audio (play mp3 or ogg 
> files) plays without an issue

Apple machines always seem to require various quirks for audio.
Try this.

Index: azalia_codec.c
===
RCS file: /cvs/src/sys/dev/pci/azalia_codec.c,v
retrieving revision 1.172
diff -u -p -r1.172 azalia_codec.c
--- azalia_codec.c  28 Mar 2017 04:54:44 -  1.172
+++ azalia_codec.c  17 Aug 2018 06:22:52 -
@@ -199,6 +199,7 @@ azalia_codec_init_vtbl(codec_t *this)
case 0x10ec0885:
this->name = "Realtek ALC885";
this->qrks |= AZ_QRK_WID_CDIN_1C | AZ_QRK_WID_BEEP_1D;
+   printf("\nsubid %x\n", this->subid);
if (this->subid == 0x00a1106b ||/* APPLE_MB3 */
this->subid == 0xcb7910de ||/* APPLE_MACMINI3_1 
(line-in + hp) */
this->subid == 0x00a0106b ||/* APPLE_MB3_1 */
@@ -207,8 +208,15 @@ azalia_codec_init_vtbl(codec_t *this)
}
if (this->subid == 0x00a1106b ||
this->subid == 0xcb7910de ||/* APPLE_MACMINI3_1 
(internal spkr) */
-   this->subid == 0x00a0106b)
+   this->subid == 0x00a0106b ||
+   this->subid == 0x4200106b)
this->qrks |= AZ_QRK_WID_OVREF50;
+   if (this->subid == 0x0c00106b ||/* Mac Pro */
+   this->subid == 0x1000106b ||/* iMac 24 */
+   this->subid == 0x2800106b ||/* AppleTV */
+   this->subid == 0x3e00106b ||/* iMac 24 Aluminum */
+   this->subid == 0x4200106b)  /* Mac Pro 4,1/5,1 */
+   this->qrks |= AZ_QRK_GPIO_UNMUTE_3;
break;
case 0x10ec0888:
this->name = "Realtek ALC888";



Re: vmd fails to run on ThinkPad 25

2018-09-22 Thread Jonathan Gray
On Sat, Sep 22, 2018 at 06:44:37AM -0700, Mike Larkin wrote:
> On Sat, Sep 22, 2018 at 09:39:18AM -0400, ja...@kaivo.net wrote:
> > >Synopsis:  vmd fails to run vms on ThinkPad 25
> > >Category:  system
> > >Environment:
> > System  : OpenBSD 6.3
> > Details : OpenBSD 6.3 (GENERIC.MP) #11: Thu Sep 20 16:05:37 CEST 
> > 2018
> >  
> > r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > On my ThinkPad 25, trying to start a virtual machine with vmctl fails,
> > (where debian.img is an image prepared in QEMU):
> > thinkpad-25:~$ doas vmctl start "Debian" -Lc -m 512m -d debian.img 
> > vmctl: start vm command failed: No such file or directory
> > 
> > The following is logged to /dev/console:
> > cpu2: failed to enter VMM mode
> > 
> > And these lines in /var/log/daemon:
> > Sep 21 16:32:33 thinkpad-25 vmd[10504]: Debian: create vmm ioctl failed - 
> > exiting: Input/output error
> > Sep 21 16:32:33 thinkpad-25 vmd[80021]: Debian: failed to start vm: No such 
> > file or directory
> > 
> > I have verified that VMX is enabled in the firmware, and updated it to
> > the latest version available from Lenovo. I've dual booted the computer to
> > Windows 10 and confirmed that Hyper-V works there, so it's definitely 
> > working
> > on the hardware side.
> > 
> > >How-To-Repeat:
> > $ doas rcctl enable vmd
> > $ doas rcctl start vmd
> > $ doas vmctl start 
> > >Fix:
> > Unknown
> > 
> 
> See below.
> 
> -ml
> 
> > 
> > dmesg:
> > OpenBSD 6.3 (GENERIC.MP) #11: Thu Sep 20 16:05:37 CEST 2018
> > 
> > r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 16918003712 (16134MB)
> > avail mem = 16398188544 (15638MB)
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x6f07 (62 entries)
> > bios0: vendor LENOVO version "N1QET77W (1.52 )" date 07/04/2018
> > bios0: LENOVO 20K70004US
> > acpi0 at bios0: rev 2
> > acpi0: sleep states S0 S3 S4 S5
> > acpi0: tables DSDT FACP UEFI SSDT SSDT HPET APIC MCFG ECDT SSDT SSDT BOOT 
> > BATB SLIC SSDT SSDT SSDT WSMT SSDT SSDT DBGP DBG2 MSDM SSDT SSDT DMAR ASF! 
> > FPDT UEFI
> > acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) RP02(S4) 
> > RP03(S4) RP04(S4) RP05(S4) RP06(S4) RP08(S4) RP09(S4) RP10(S4) RP11(S4) 
> > RP12(S4) RP13(S4) [...]
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpihpet0 at acpi0: 2399 Hz
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 1297.47 MHz
> > cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> > cpu0: 256KB 64b/line 8-way L2 cache
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 24MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> > cpu1 at mainbus0: apid 2 (application processor)
> > cpu1: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 1086.37 MHz
> > cpu1: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> > cpu1: 256KB 64b/line 8-way L2 cache
> > cpu1: failed to identify
> 
> Try booting with the power cord attached or selecting high performance mode
> in the BIOS. You have CPUs that are in slow mode and failing to hatch 
> properly.
> 
> This is not a vmm/vmd issue.

In -current/snapshots amd64 cpu.c includes


revision 1.127
date: 2018/08/25 05:29:28;  author: deraadt;  state: Exp;  lines: +2 -2;  
commitid: mHZp231AYMIqHjRL;
As Intel(TM) cpus are discovered to have more bugs, more workaround MSRs
are added.  Presence of such MSRs is indicated with a feature flag, which
we probe and print at startup for each AP CPU.  EFI screen scrolling hasn't
gotten faster (yet) and 9600 baud serial console is still the same speed
as 1980.   Final piece of the puzzle is machines have more cpus, providing
more opportunity for screen scrolling and serial fifo's to fill up.  The
B

Re: amdgpio0 high interrupt rate on Matebook D

2020-01-24 Thread Jonathan Gray
On Sat, Jan 11, 2020 at 08:16:23PM -0500, James Hastings wrote:
> > On Sat, Jan 11, 2020 at 12:42:50 +0100, Mark Kettenis wrote:
> > > Date: Fri, 10 Jan 2020 13:37:38 -0500
> > > From: Bryan Steele 
> > > 
> > > On Fri, Jan 10, 2020 at 02:59:10AM -0500, James Hastings wrote:
> > > > > On Sat, Jan 04, 2020 at 06:23:44PM +0100, Mark Kettenis wrote:
> > > > > > Date: Sat, 4 Jan 2020 12:03:14 -0500
> > > > > > From: Bryan Steele 
> > > > > > 
> > > > > > On Sat, Jan 04, 2020 at 11:30:56AM -0500, Bryan Steele wrote:
> > > > > > > On Sat, Jan 04, 2020 at 05:07:04PM +0100, Mark Kettenis wrote:
> > > > > > > > > Date: Sat, 4 Jan 2020 10:52:24 -0500
> > > > > > > > > From: Bryan Steele 
> > > > > > > > > 
> > > > > > > > > I noticed an unusually high interrupt rate for amdgpio0 on my 
> > > > > > > > > Huawei
> > > > > > > > > Matebook D laptop. I'm suspecting this may be partially why 
> > > > > > > > > it apmd -A
> > > > > > > > > has been struggling, as the CPU is constantly busy so it 
> > > > > > > > > never has a
> > > > > > > > > chance to scale down.
> > > > > > > > > 
> > > > > > > > > Any ideas?
> > > > > > > > 
> > > > > > > > Please send acpidump output (all files in /var/db/acpi).
> > > > > > > > 
> > > > > > > > Try to figure out which GPIO pin is causing the interrupt.  
> > > > > > > > That may
> > > > > > > > be tricky since the interrupt fires again and again, so if you 
> > > > > > > > add a
> > > > > > > > printf in amdgpio_intr() your machine will become unusable.  
> > > > > > > > Maybe
> > > > > > > > just print the pin every 1 times:
> > > > > > > > 
> > > > > > > > static count = 0;
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > if ((count++ % 1) == 0)
> > > > > > > > printf("st %llx\n", status)
> > > > > > > > 
> > > > > > > > Cheers,
> > > > > > > > 
> > > > > > > > Mark
> > > > > > > 
> > > > > > > Thanks, for some reason it vanished with on a cold boot. I'm not 
> > > > > > > sure
> > > > > > > what it was that caused it. In case it helps, sending the acpidump
> > > > > > > anyway.
> > > > > > > 
> > > > > > > If I can figure out what caused it, I'll try your other 
> > > > > > > suggestion.
> > > > > > > 
> > > > > > > Sigh.. :-)
> > > > > > 
> > > > > > Aha! I must of accidentally bumped the Touchscreen at some point, 
> > > > > > doing
> > > > > > that causes the amdgpio0 rate to spike.
> > > > > > 
> > > > > > I had sent a diff to add AMD controller support to dwiic(4) months 
> > > > > > ago,
> > > > > > I could never get interrupts to work, only polling mode. Perhaps 
> > > > > > this
> > > > > > issue explains some of that. I don't have this diff in my tree at 
> > > > > > the
> > > > > > moment, had to restore from backup.
> > > > > > 
> > > > > > Managed to login and type shutdown, lol.
> > > > > > 
> > > > > > ..
> > > > > > st 1f08
> > > > > 
> > > > > Very likely.  The AML defines an I2C device with:
> > > > > 
> > > > >Name (_DDN, "Raydium Touchscreen")  // _DDN: DOS Device 
> > > > > Name
> > > > > 
> > > > > that uses a GPIO interrupt that matches the lowest bit set in the
> > > > > status register.
> > > > >
> > > > > This suggest we may need to be a little bit more careful and mask
> > > > > interrupts for which we don't have an interrupt handler.
> > > > That is my fault. Does this diff stop your interrupt storm?
> > > 
> > > Yes. This works, thanks!
> > > 
> > > ok brynet@ (if Mark agrees)
> > 
> > I fear that this isn't the right approach.  Some of the GPIO pins
> > might be used for SMIs.  And it isn't clear to me whether disabling
> > interrupts will also stop SMIs from being generated.
> > 
> > A better strategy would be to have the interrupt handler disable pins
> > for which it sees an interrupt pending when there is no interrupt
> > handler registered.  I believe that is what Linux does.
> > 
> > James, is that something you'd like to work on?
> > 
> > Also, I don't think that this will fix the issue that claudo@ and I
> > are seeing on the m715q.  There I'm starting to suspect that the
> > problem is that the interrupt is shared with a quirky PCI device.
> > There a BIOS update (which stops amdgpio(4) from attaching) may
> > actually be the only reasonable fix.
> > 
> > Cheers,
> > 
> > Mark
> 
> I changed the interrupt routine to mask pins that do not have an
> interrupt handler registered.

no problems on t495 with
amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins

> 
> Index: dev/acpi/amdgpio.c
> ===
> RCS file: /cvs/src/sys/dev/acpi/amdgpio.c,v
> retrieving revision 1.1
> diff -u -p -u -r1.1 amdgpio.c
> --- dev/acpi/amdgpio.c23 Dec 2019 08:05:42 -  1.1
> +++ dev/acpi/amdgpio.c11 Jan 2020 23:56:02 -
> @@ -260,20 +260,28 @@ int
>  amdgpio_pin_intr(struct amdgpio_softc *sc, int pin)
>  {
>   uint32_t reg;
> + int rc = 0;
>  
>   reg = bus_space_read_4(sc->sc_memt, 

Re: amdgpio0 high interrupt rate on Matebook D

2020-01-25 Thread Jonathan Gray
On Sat, Jan 11, 2020 at 08:16:23PM -0500, James Hastings wrote:
> > On Sat, Jan 11, 2020 at 12:42:50 +0100, Mark Kettenis wrote:
> > > Date: Fri, 10 Jan 2020 13:37:38 -0500
> > > From: Bryan Steele 
> > > 
> > > On Fri, Jan 10, 2020 at 02:59:10AM -0500, James Hastings wrote:
> > > > > On Sat, Jan 04, 2020 at 06:23:44PM +0100, Mark Kettenis wrote:
> > > > > > Date: Sat, 4 Jan 2020 12:03:14 -0500
> > > > > > From: Bryan Steele 
> > > > > > 
> > > > > > On Sat, Jan 04, 2020 at 11:30:56AM -0500, Bryan Steele wrote:
> > > > > > > On Sat, Jan 04, 2020 at 05:07:04PM +0100, Mark Kettenis wrote:
> > > > > > > > > Date: Sat, 4 Jan 2020 10:52:24 -0500
> > > > > > > > > From: Bryan Steele 
> > > > > > > > > 
> > > > > > > > > I noticed an unusually high interrupt rate for amdgpio0 on my 
> > > > > > > > > Huawei
> > > > > > > > > Matebook D laptop. I'm suspecting this may be partially why 
> > > > > > > > > it apmd -A
> > > > > > > > > has been struggling, as the CPU is constantly busy so it 
> > > > > > > > > never has a
> > > > > > > > > chance to scale down.
> > > > > > > > > 
> > > > > > > > > Any ideas?
> > > > > > > > 
> > > > > > > > Please send acpidump output (all files in /var/db/acpi).
> > > > > > > > 
> > > > > > > > Try to figure out which GPIO pin is causing the interrupt.  
> > > > > > > > That may
> > > > > > > > be tricky since the interrupt fires again and again, so if you 
> > > > > > > > add a
> > > > > > > > printf in amdgpio_intr() your machine will become unusable.  
> > > > > > > > Maybe
> > > > > > > > just print the pin every 1 times:
> > > > > > > > 
> > > > > > > > static count = 0;
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > if ((count++ % 1) == 0)
> > > > > > > > printf("st %llx\n", status)
> > > > > > > > 
> > > > > > > > Cheers,
> > > > > > > > 
> > > > > > > > Mark
> > > > > > > 
> > > > > > > Thanks, for some reason it vanished with on a cold boot. I'm not 
> > > > > > > sure
> > > > > > > what it was that caused it. In case it helps, sending the acpidump
> > > > > > > anyway.
> > > > > > > 
> > > > > > > If I can figure out what caused it, I'll try your other 
> > > > > > > suggestion.
> > > > > > > 
> > > > > > > Sigh.. :-)
> > > > > > 
> > > > > > Aha! I must of accidentally bumped the Touchscreen at some point, 
> > > > > > doing
> > > > > > that causes the amdgpio0 rate to spike.
> > > > > > 
> > > > > > I had sent a diff to add AMD controller support to dwiic(4) months 
> > > > > > ago,
> > > > > > I could never get interrupts to work, only polling mode. Perhaps 
> > > > > > this
> > > > > > issue explains some of that. I don't have this diff in my tree at 
> > > > > > the
> > > > > > moment, had to restore from backup.
> > > > > > 
> > > > > > Managed to login and type shutdown, lol.
> > > > > > 
> > > > > > ..
> > > > > > st 1f08
> > > > > 
> > > > > Very likely.  The AML defines an I2C device with:
> > > > > 
> > > > >Name (_DDN, "Raydium Touchscreen")  // _DDN: DOS Device 
> > > > > Name
> > > > > 
> > > > > that uses a GPIO interrupt that matches the lowest bit set in the
> > > > > status register.
> > > > >
> > > > > This suggest we may need to be a little bit more careful and mask
> > > > > interrupts for which we don't have an interrupt handler.
> > > > That is my fault. Does this diff stop your interrupt storm?
> > > 
> > > Yes. This works, thanks!
> > > 
> > > ok brynet@ (if Mark agrees)
> > 
> > I fear that this isn't the right approach.  Some of the GPIO pins
> > might be used for SMIs.  And it isn't clear to me whether disabling
> > interrupts will also stop SMIs from being generated.
> > 
> > A better strategy would be to have the interrupt handler disable pins
> > for which it sees an interrupt pending when there is no interrupt
> > handler registered.  I believe that is what Linux does.
> > 
> > James, is that something you'd like to work on?
> > 
> > Also, I don't think that this will fix the issue that claudo@ and I
> > are seeing on the m715q.  There I'm starting to suspect that the
> > problem is that the interrupt is shared with a quirky PCI device.
> > There a BIOS update (which stops amdgpio(4) from attaching) may
> > actually be the only reasonable fix.
> > 
> > Cheers,
> > 
> > Mark
> 
> I changed the interrupt routine to mask pins that do not have an
> interrupt handler registered.

Thanks for the patch and sorry for the delay.  I have committed this.



Re: X window system will not start on fresh 6.6 and snapshot install

2020-01-25 Thread Jonathan Gray
On Sat, Jan 25, 2020 at 03:59:54PM +0100, Matthieu Herrb wrote:
> On Fri, Jan 24, 2020 at 11:47:02PM -0600, David Savolainen wrote:
> Hi,
> 
> > Here is the output from sendbug.  Mail isn't fully set up..
> > >Fix: What is undefined symbol 'shadowDamage'?
> 
> 
> It's an oversight. We missed the removal from this function in xserver
> back in 2016. I don't have the hardware to test it anymore.
> 
> The patch below should fix it:

ok jsg@

> 
> Index: driver/xf86-video-wildcatfb/src/wildcatfb_driver.c
> ===
> RCS file: 
> /cvs/OpenBSD/xenocara/driver/xf86-video-wildcatfb/src/wildcatfb_driver.c,v
> retrieving revision 1.13
> diff -u -p -u -r1.13 wildcatfb_driver.c
> --- driver/xf86-video-wildcatfb/src/wildcatfb_driver.c30 Jun 2019 
> 17:10:24 -  1.13
> +++ driver/xf86-video-wildcatfb/src/wildcatfb_driver.c25 Jan 2020 
> 14:57:33 -
> @@ -971,7 +971,7 @@ WildcatFBShadowUpdate(ScreenPtr pScreen,
>  {
>  ScrnInfoPtr pScrn = xf86ScreenToScrn(pScreen);
>  WildcatFBPtr fPtr = WILDCATFBPTR(pScrn);
> -RegionPtrdamage = shadowDamage (pBuf);
> +RegionPtrdamage = DamageRegion (pBuf->pDamage);
>  PixmapPtrpShadow = pBuf->pPixmap;
>  int  nbox = REGION_NUM_RECTS (damage);
>  BoxPtr   pbox = REGION_RECTS (damage);
> 
> 
> -- 
> Matthieu Herrb
> 
> 



Re: UDP/TCP Packets are sent but not received on vio(4)

2020-02-20 Thread Jonathan Gray
On Tue, Feb 18, 2020 at 12:37:10AM +0100, alexandre wrote:
> 
> > On Mon, Feb 17, 2020 at 01:31:08AM +0100, Alexandre wrote:
> > > Hello,
> > >
> > > I am running an OpenBSD/armv7 guest on a QEMU 4.2.0 "virt" machine host;
> > > see the attached file (from_qemu_virt.dts) for the fdt of the guest
> > > machine, it has only a virtio-mmio bus with a virtio-net device attached.
> > >
> > > The QEMU command line is also attached, together with source and bin for
> > > the tiny bootloader for the bsd kernel (from the distfiles of 6.6 release)
> > > used as QEMU "bios".
> > >
> > > I used the netdev user backend.
> > >
> > > Boot is OK (see dmesg.txt).
> > >
> > > The vio network interface is configured by dhclient and we have this:
> > >
> > > my# ifconfig
> > > lo0: flags=8049 mtu 32768
> > > index 3 priority 0 llprio 3
> > > groups: lo
> > > inet6 ::1 prefixlen 128
> > > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
> > > inet 127.0.0.1 netmask 0xff00
> > > vio0: flags=808843 mtu
> > > 1500
> > > lladdr 52:54:00:12:34:56
> > > index 1 priority 0 llprio 3
> > > groups: egress
> > > media: Ethernet autoselect
> > > status: active
> > > inet 10.0.2.15 netmask 0xff00 broadcast 10.0.2.255
> > > enc0: flags=0<>
> > > index 2 priority 0 llprio 3
> > > groups: enc
> > > status: active
> > > pflog0: flags=141 mtu 33168
> > > index 4 priority 0 llprio 3
> > > groups: pflog
> > >
> > > my# route -n show
> > > Routing tables
> > >
> > > Internet:
> > > Destination Gateway Flags Refs Use Mtu Prio
> > > Iface
> > > default 10.0.2.2 UGS 0 0 - 8 vio0
> > > 224/4 127.0.0.1 URS 0 0 32768 8 lo0
> > > 10.0.2/24 10.0.2.15 UCn 1 0 - 4 vio0
> > > 10.0.2.2 52:55:0a0:00:02:02 UHLch 1 4 - 3
> > > vio0
> > > 10.0.2.15 52:54:00:12:34:56 UHLl 0 46 - 1 vio0
> > > 10.0.2.255 10.0.2.15 UHb 0 0 - 1 vio0
> > > 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0
> > > 127.0.0.1 127.0.0.1 UHhl 1 2 32768 1 lo0
> > > [[ edited Inet6 routes ..]]
> > >
> > > Ping works OK from the guest to the host ICMP Echo requests are correctly
> > > sent and Echo replies correctly received. It does not work from the guest
> > > to a public IP (but that's fine, it is a known limitation of QEMU net 
> > > user).
> > >
> > > UDP packets are OK in the direction guest --> host, but not in reverse 
> > > host
> > > --> guest. This cause failure of DNS resolution for instance. TCP packets
> > > have the same problem (the guest sends the SYN, which is received by the
> > > host who sends the SYN-ACK, but the SYN-ACK is not "seen" by the OBSD 
> > > guest
> > > and connect timeouts).
> > >
> > > What's surprising (to me !) is that packets are visible on tcpdump on the
> > > guest (with 0 packets "dropped by kernel")
> > >
> > > Steps to reproduce:
> > >
> > > on guest:
> > >
> > > my# tcpdump -w test.dump -p&
> > > [1] 13288
> > > my# tcpdump: listening on vio0, link-type EN10MB
> > > my# nc -v -u 10.0.2.2 
> > > Connection to 10.0.2.2  port [udp/*] succeeded!
> > > hello from guest (<<-- typed on the guest console)
> > >
> > > on host:
> > >
> > > $ nc -l -v -u -p 
> > > listening on 0.0.0.0: ...
> > > connect to 127.0.0.1: from localhost.localdomain:60487 
> > > (127.0.0.1:60487)
> > > hello from guest (-->> got this from the guest)
> > > hello from host (<<-- typed on the host console, NOT shown on the guest
> > > console)
> > >
> > > Now the tcpdump -neX on the guest is attached, you can see that the reply
> > > packets are seen by the kernel but forwarded to beyond "to user space". I
> > > also attached tcpdump on the guest, no difference is shown.
> > >
> > > I tried the 0x02 flags of vio (see dmesg) with no effect. The same with
> > > 0x100 or by guetting the vio0 interface in promiscuous mode with tcpdump.
> > >
> > > pf has default rules (block return all, pass all flags S/SA, X11 and dpb
> > > builder blocking). Same problem with pf disabled. When appropriate log
> > > rules are configured, I see the faulty packets in pflogd journal as in the
> > > guest tcpdump.
> > >
> > >
> >
> > Maybe netstat -s -p ip and netstat -s -p udp help to find the cause of the
> > packet drops. Also check pfctl -si if one of those counters change when
> > you send UDP packets to the guest.
> >
> > I normally used qemu with the tap virtio option (using
> > -net nic,vlan=$id,macaddr=$mac,model=virtio -net tap,vlan=$id,fd=$fd),
> > never had issues with that.
> 
> 
> I confirm it is a problem with in4_cksum on ARM.
> 
> Networking is OK when I either:
> 
> * change
> 
> --- sys/dev/pv/vio.c.old
> +++ sys/dev/pv/vio.c
> - m->m_pkthdr.csum_flags = 0;
> + m->m_pkthdr.csum_flags = M_UDP_CSUM_IN_OK | M_TCP_CSUM_IN_OK;
> 
> so as to disable checksum verification by udp_input()
> 
> OR:
> 
> * use the portable C implementation of in4_cksum instead of the ARM one by 
> doing

I can not reproduce this problem on a cubox with fec(4) but can on qemu
with vio(4) (by the way you can use qemu_arm/u-boot.bin for bios with
-M virt,highmem=off).

It turns out the armv5te/xscale path (which w

Re: UDP/TCP Packets are sent but not received on vio(4)

2020-02-21 Thread Jonathan Gray
On Fri, Feb 21, 2020 at 04:45:01PM +1100, Jonathan Gray wrote:
> On Tue, Feb 18, 2020 at 12:37:10AM +0100, alexandre wrote:
> > 
> > > On Mon, Feb 17, 2020 at 01:31:08AM +0100, Alexandre wrote:
> > > > Hello,
> > > >
> > > > I am running an OpenBSD/armv7 guest on a QEMU 4.2.0 "virt" machine host;
> > > > see the attached file (from_qemu_virt.dts) for the fdt of the guest
> > > > machine, it has only a virtio-mmio bus with a virtio-net device 
> > > > attached.
> > > >
> > > > The QEMU command line is also attached, together with source and bin for
> > > > the tiny bootloader for the bsd kernel (from the distfiles of 6.6 
> > > > release)
> > > > used as QEMU "bios".
> > > >
> > > > I used the netdev user backend.
> > > >
> > > > Boot is OK (see dmesg.txt).
> > > >
> > > > The vio network interface is configured by dhclient and we have this:
> > > >
> > > > my# ifconfig
> > > > lo0: flags=8049 mtu 32768
> > > > index 3 priority 0 llprio 3
> > > > groups: lo
> > > > inet6 ::1 prefixlen 128
> > > > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
> > > > inet 127.0.0.1 netmask 0xff00
> > > > vio0: flags=808843 mtu
> > > > 1500
> > > > lladdr 52:54:00:12:34:56
> > > > index 1 priority 0 llprio 3
> > > > groups: egress
> > > > media: Ethernet autoselect
> > > > status: active
> > > > inet 10.0.2.15 netmask 0xff00 broadcast 10.0.2.255
> > > > enc0: flags=0<>
> > > > index 2 priority 0 llprio 3
> > > > groups: enc
> > > > status: active
> > > > pflog0: flags=141 mtu 33168
> > > > index 4 priority 0 llprio 3
> > > > groups: pflog
> > > >
> > > > my# route -n show
> > > > Routing tables
> > > >
> > > > Internet:
> > > > Destination Gateway Flags Refs Use Mtu Prio
> > > > Iface
> > > > default 10.0.2.2 UGS 0 0 - 8 vio0
> > > > 224/4 127.0.0.1 URS 0 0 32768 8 lo0
> > > > 10.0.2/24 10.0.2.15 UCn 1 0 - 4 vio0
> > > > 10.0.2.2 52:55:0a0:00:02:02 UHLch 1 4 - 3
> > > > vio0
> > > > 10.0.2.15 52:54:00:12:34:56 UHLl 0 46 - 1 vio0
> > > > 10.0.2.255 10.0.2.15 UHb 0 0 - 1 vio0
> > > > 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0
> > > > 127.0.0.1 127.0.0.1 UHhl 1 2 32768 1 lo0
> > > > [[ edited Inet6 routes ..]]
> > > >
> > > > Ping works OK from the guest to the host ICMP Echo requests are 
> > > > correctly
> > > > sent and Echo replies correctly received. It does not work from the 
> > > > guest
> > > > to a public IP (but that's fine, it is a known limitation of QEMU net 
> > > > user).
> > > >
> > > > UDP packets are OK in the direction guest --> host, but not in reverse 
> > > > host
> > > > --> guest. This cause failure of DNS resolution for instance. TCP 
> > > > packets
> > > > have the same problem (the guest sends the SYN, which is received by the
> > > > host who sends the SYN-ACK, but the SYN-ACK is not "seen" by the OBSD 
> > > > guest
> > > > and connect timeouts).
> > > >
> > > > What's surprising (to me !) is that packets are visible on tcpdump on 
> > > > the
> > > > guest (with 0 packets "dropped by kernel")
> > > >
> > > > Steps to reproduce:
> > > >
> > > > on guest:
> > > >
> > > > my# tcpdump -w test.dump -p&
> > > > [1] 13288
> > > > my# tcpdump: listening on vio0, link-type EN10MB
> > > > my# nc -v -u 10.0.2.2 
> > > > Connection to 10.0.2.2  port [udp/*] succeeded!
> > > > hello from guest (<<-- typed on the guest console)
> > > >
> > > > on host:
> > > >
> > > > $ nc -l -v -u -p 
> > > > listening on 0.0.0.0: ...
> > > > connect to 127.0.0.1: from localhost.localdomain:60487 
> > > > (127.0.0.1:60487)
> > > > hello from guest (-->> got this from the guest)
> > > > hello from host (<<-- typed on the host console, NOT shown on the guest
> > > > console)
> > > >
> > > > Now the tcpdump -neX on 

Re: Sun V240 will not boot with generic ati video

2020-02-22 Thread Jonathan Gray
xvr-100 is listed as being supported in v240 in old sun docs.

It is a radeon rv100 class card with fcode.

kettenis@ made radeondrm(4) work on xvr-100 some time ago before the
last major drm update.  I'm not sure what the current state is.

On Sat, Feb 22, 2020 at 08:36:11PM -0600, David Savolainen wrote:
> Patrick,
> Thanks for the insight.  I was wondering if something like this might be the
> case.  Unfortunately, a V240 is too old to have PCIe slots.  I am stuck with
> XVR-600 (what I have, known to work, but outputs no video at all for some
> reason in openbsd) and similar wildcat based cards that are not well
> supported - unaccelerated 8 bit color.  Pity.
> 
> On 2/21/20 12:37 PM, Patrick Harper wrote:
> > GPU's firmware won't work with the system firmware, which uses openfirmware 
> > and not a BIOS/UEFI.
> > 
> > You could source a Sun XVR-300, which is a version of the ATI FireMV 2200 
> > PCIe loaded with appropriate firmware for your system.
> > 
> 
> 



Re: Need modified dtb and u-boot files for 11" Pinebook 1080p (from NetBSD)

2020-04-20 Thread Jonathan Gray
On Mon, Apr 20, 2020 at 11:39:20AM +0200, Alexander Shendi wrote:
> >Synopsis:Need modified dtb and u-boot files (from NetBSD) to boot
> >Category:system, arm, arm64, aarch64
> >Environment:
>   System  : OpenBSD 6.7
>   Details : OpenBSD 6.7-beta (GENERIC.MP) #557: Sun Apr 12 20:59:24 
> MDT 2020
>
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.arm64
>   Machine : arm64
> >Description:
>   1. The machine (Pinebook 11" 1080p) does not from SD-card. No output
>either on the screen of on the serial console,
>not even from U-Boot.
> 2. The left USB port doesn't work. It does under NetBSD. I have
>yet failed to come up with a solution.
> 3. I can provide the necessary files (*.dtb, u-boot*.bin)
>upon request
> 
> 
> >How-To-Repeat:
>   Turn off the machine, hit the power switch :)
> >Fix:
> The machine does not boot, unless the following changes

Allwinner U-Boot images have a builtin dtb.  For the pinebook a prebuilt
U-Boot image is available in the u-boot-aarch64 package.

We aren't going to build miniroots for every poorly designed arm device
which lacks firmware required to boot on a dedicated storage device.

> are made to the SD-Card:
> 1. dd if=miniroot67.fs of=/dev/rsd0c bs=1m conv=sync
> 2. dd if=u-boot-sunxi-with-spl.bin of=/dev/rsd0c bs=1024 seek=8 
> conv=sync
> 3. mount /dev/sd0i /mnt
> 4. mkdir /mnt/vendor
> 5. cp sun50i-a64-pinebook.dtb /mnt/vendor
> 6. umount /mnt
> 7. reboot
> 8. At the OpenBSD bootloader prompt interrupt the boot
>process and enter:
>- set tty fb0
>- boot
> 9. I can provide the necessary files (*.dtb, u-boot*.bin)
>upon request
> 
> 
> 
> dmesg:
> OpenBSD 6.7-beta (GENERIC.MP) #557: Sun Apr 12 20:59:24 MDT 2020
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 2015813632 (1922MB)
> avail mem = 1923813376 (1834MB)
> mainbus0 at root: Pinebook
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
> cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> cpu0: 512KB 64b/line 16-way L2 cache
> efi0 at mainbus0: UEFI 2.7
> efi0: Das U-Boot rev 0x20181100
> apm0 at mainbus0
> psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
> "osc24M_clk" at mainbus0 not configured
> "osc32k_clk" at mainbus0 not configured
> "internal-osc-clk" at mainbus0 not configured
> "sound_spdif" at mainbus0 not configured
> "spdif-out" at mainbus0 not configured
> agtimer0 at mainbus0: tick rate 24000 KHz
> simplebus0 at mainbus0: "soc"
> sxisyscon0 at simplebus0
> sxisid0 at simplebus0
> sxiccmu0 at simplebus0
> sxipio0 at simplebus0: 103 pins
> ampintc0 at simplebus0 nirq 224, ncpu 4 ipi: 0, 1: "interrupt-controller"
> sxirtc0 at simplebus0
> sxiccmu1 at simplebus0
> sxipio1 at simplebus0: 13 pins
> sxirsb0 at simplebus0
> axppmic0 at sxirsb0 addr 0x3a3: AXP803
> "de2" at simplebus0 not configured
> "dma-controller" at simplebus0 not configured
> "lcd-controller" at simplebus0 not configured
> "lcd-controller" at simplebus0 not configured
> sximmc0 at simplebus0
> sdmmc0 at sximmc0: 4-bit, sd high-speed, mmc high-speed, dma
> sximmc1 at simplebus0
> sdmmc1 at sximmc1: 4-bit, sd high-speed, mmc high-speed, dma
> sximmc2 at simplebus0
> sdmmc2 at sximmc2: 8-bit, sd high-speed, mmc high-speed, dma
> "phy" at simplebus0 not configured
> ehci0 at simplebus0
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 configuration 1 interface 0 "Generic EHCI root hub" rev 
> 2.00/1.00 addr 1
> ohci0 at simplebus0: version 1.0
> ehci1 at simplebus0
> usb1 at ehci1: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "Generic EHCI root hub" rev 
> 2.00/1.00 addr 1
> ohci1 at simplebus0: version 1.0
> com0 at simplebus0sxiccmu_ccu_reset: 0x002e
> : ns16550, no working fifo
> sxipwm0 at simplebus0
> "hdmi-phy" at simplebus0 not configured
> "interrupt-controller" at simplebus0 not configured
> sxitwi0 at simplebus0
> iic0 at sxitwi0
> "analogix,anx6345" at iic0 addr 0x38 not configured
> sxidog0 at simplebus0
> gpio0 at sxipio0: 32 pins
> gpio1 at sxipio0: 32 pins
> gpio2 at sxipio0: 32 pins
> gpio3 at sxipio0: 32 pins
> gpio4 at sxipio0: 32 pins
> gpio5 at sxipio0: 32 pins
> gpio6 at sxipio0: 32 pins
> gpio7 at sxipio0: 32 pins
> gpio8 at sxipio1: 32 pins
> usb2 at ohci0: USB revision 1.0
> uhub2 at usb2 configuration 1 interface 0 "Generic OHCI root hub" rev 
> 1.00/1.00 addr 1
> usb3 at ohci1: USB revision 1.0
> uhub3 at usb3 configuration 1 interface 0 "Generic OHCI root hub" rev 
> 1.00/1.00 addr 1
> "regulator" at mainbus0 not configured
> pwmbl0 at mainbus0
> "gpio_keys" at mainbus0 not configured
> "vcc3v3" at mainbus0 not configured
> "wifi_pwrseq" at mainbus0 not configured
> simplefb0 at mainbus0: 1920x1080, 32bpp
> wsdisplay0 at simplefb0 mux 1

Re: VGA fonts not reloading when changing back to text mode

2020-05-14 Thread Jonathan Gray
On Wed, May 13, 2020 at 11:11:56PM -0700, jo...@armadilloaerospace.com wrote:
> On a system with VGA graphics, like a VirtualBox:
> 
> Boot to X.
> ctrl-alt-F1 back to text mode.
> As root:
> wsfontload -h 8 -e ibm /usr/share/misc/pcvtfonts/vt220l.808
> wsconscfg -dF 1
> wsconscfg -t 80x50 1
> ctrl-alt-F2 to see the second screen in 80x50 mode
> (takes a while for the getty to restart).
> ctrl-alt-F5 to go back to graphics mode with X.
> ctrl-alt-F2 to go back to the 80x50 text mode.
> 
> The custom font has not been reloaded, so the screen is full of trash.
> 
> Inserting a restore in vga.c:824 fixes it:
>   vga_setfont(vc, scr);
> + vga_restore_fonts(vc);
>   vga_restore_palette(vc);
> 
> This does cause the font to be uploaded on every virtual screen switch,
> but it is only there if a custom font has already been loaded, and is
> tiny in any case.
> 
> The vga_restore_fonts() function is currently only called in vga_ioctl()
> for WSDISPLAYIO_SMODE when going from graphics to WSDISPLAYIO_MODE_EMUL,
> which should be optimal, but this never happens -- that vga_ioctl()
> happens once after boot to go into graphics mode, but never gets called
> again when switching back to text mode.  Maybe something with the
> integration of kms drivers changed the call semantics?

Thanks for the report.

In xenocara/xserver/hw/xfree86/os-support/bsd/bsd_init.c
xf86OpenConsole() does the ioctl with WSDISPLAYIO_MODE_MAPPED
xf86CloseConsole() does the ioctl with WSDISPLAYIO_MODE_EMUL

however these aren't currently done for switching when xorg is running

Testing here xf86VTSwitchAway() is called when switching from xorg to VT
xf86VTSwitchTo() is called when switching back to xorg.

Adding the ioctl to these functions makes the steps you've given end in
a non trashed VT.

This is clearly a dark corner of the xserver code and suffers from
trying to support multiple systems and be compatible with USL ioctls.

Index: xserver/hw/xfree86/os-support/bsd/bsd_VTsw.c
===
RCS file: /cvs/xenocara/xserver/hw/xfree86/os-support/bsd/bsd_VTsw.c,v
retrieving revision 1.5
diff -u -p -r1.5 bsd_VTsw.c
--- xserver/hw/xfree86/os-support/bsd/bsd_VTsw.c2 Apr 2016 14:25:10 
-   1.5
+++ xserver/hw/xfree86/os-support/bsd/bsd_VTsw.c14 May 2020 08:44:33 
-
@@ -72,13 +72,20 @@ xf86VTSwitchAway(void)
 {
 #if defined (SYSCONS_SUPPORT) || defined (PCVT_SUPPORT) \
 || defined(WSCONS_SUPPORT)
+int mode;
+
 if (xf86Info.consType == SYSCONS || xf86Info.consType == PCVT ||
 xf86Info.consType == WSCONS) {
 xf86Info.vtRequestsPending = FALSE;
 if (ioctl(xf86Info.consoleFd, VT_RELDISP, 1) < 0)
 return FALSE;
-else
-return TRUE;
+
+#ifdef WSCONS_SUPPORT
+mode = WSDISPLAYIO_MODE_EMUL;
+ioctl(xf86Info.consoleFd, WSDISPLAYIO_SMODE, &mode); 
+#endif
+
+return TRUE;
 }
 #endif
 return FALSE;
@@ -89,13 +96,20 @@ xf86VTSwitchTo(void)
 {
 #if defined (SYSCONS_SUPPORT) || defined (PCVT_SUPPORT) \
 || defined(WSCONS_SUPPORT)
+int mode;
+
 if (xf86Info.consType == SYSCONS || xf86Info.consType == PCVT ||
 xf86Info.consType == WSCONS) {
 xf86Info.vtRequestsPending = FALSE;
 if (ioctl(xf86Info.consoleFd, VT_RELDISP, VT_ACKACQ) < 0)
 return FALSE;
-else
-return TRUE;
+
+#ifdef WSCONS_SUPPORT
+mode = WSDISPLAYIO_MODE_MAPPED;
+ioctl(xf86Info.consoleFd, WSDISPLAYIO_SMODE, &mode);
+#endif
+
+return TRUE;
 }
 #endif
 return TRUE;



Re: Corrupted text background color on VirtualBox

2020-05-15 Thread Jonathan Gray
On Fri, May 15, 2020 at 05:03:30PM -0700, jo...@armadilloaerospace.com wrote:
> One of the first things I noticed when I tried OpenBSD in VirtualBox
> was that doing ctrl-alt-2, then ctrl-alt-1 to switch back and forth
> between virtual screens left the text mode background a dim red
> color instead of black.
> 
> Here is someone else noting it, ten years ago:
> http://daemonforums.org/showthread.php?t=4704
> 
> It doesn't happen on a real (nvidia) VGA, but unexpected behavior on
> a virtual machine can be a sign that code is doing undefined things
> on the hardware, which is worth investigating.
> 
> The call to vga_enable() at the end of vga_restore_palette() is what
> triggers the behavior.
> 
> This appears to be truly ancient code, but I don't think it was ever
> correct.
> 
> #define   vga_enable(vh) \
>   vga_raw_write(vh, 0, 0x20);
> 
> After resetting the attribute flip-flop, this is just setting the VGA
> attribute index to an illegal register value -- there aren't 32
> attribute registers.

bits 0-4 are the index register, bit 5 is some kind of an enable bit.

To quote Abrash

"bit 5 of the AC Index register should be set to 1 whenever palette RAM
is not being set (which is to say, all the time in your code, because
palette RAM should normally be set via the BIOS). When bit 5 is 0, video
data from display memory is no longer sent to palette RAM, and the
screen becomes a solid color—not normally a desirable state of affairs."

from
http://www.jagregory.com/abrash-black-book/#notes-on-setting-and-reading-registers

The Matrox G400 datasheet describes bit 5 as

"This bit controls use of the internal palette. If pas = 0, the host
CPU can read and write the palette, and the display is forced to the
overscan color. If pas = 1, the palette is used normally by the video
stream to translate color indices (CPU writes are inhibited and reads
return all `1's). Normally, the internal palette is loaded during the
blank time, since loading inhibits video translation."

from
http://www.bitsavers.org/pdf/matrox/G400SPEC_Jun1999.PDF
3-289 -> 3-290

> 
> The palette code doesn't even touch the attributes, so it doesn't
> have to reset the flip-flop, either.
> 
> The vga_enable() was also used, along with another unnecessary flip
> flop reset (it toggles between address and data, so after writing
> an address and data, it is back where it started), in the
> vga_attr_read and write functions.
> 
> Removing all this appears to work fine on both VirtualBox and Nvidia.
> 
> Perhaps there was errant code that wrote to the io port register at
> some point, and having it dissapear into an "invalid" register hid
> another bug?
> 
> Index: vga.c
> ===
> RCS file: /cvs/src/sys/dev/ic/vga.c,v
> retrieving revision 1.69
> diff -u -p -r1.69 vga.c
> --- vga.c 17 Jun 2017 19:20:30 -  1.69
> +++ vga.c 15 May 2020 23:38:32 -
> @@ -1213,8 +1213,6 @@ vga_save_palette(struct vga_config *vc)
>   vga_raw_write(vh, VGA_DAC_READ, 0x00);
>   for (i = 0; i < 3 * 256; i++)
>   *palette++ = vga_raw_read(vh, VGA_DAC_DATA);
> -
> - vga_raw_read(vh, 0x0a); /* reset flip/flop */
>  }
>  
>  void
> @@ -1231,10 +1229,6 @@ vga_restore_palette(struct vga_config *v
>   vga_raw_write(vh, VGA_DAC_WRITE, 0x00);
>   for (i = 0; i < 3 * 256; i++)
>   vga_raw_write(vh, VGA_DAC_DATA, *palette++);
> -
> - vga_raw_read(vh, 0x0a); /* reset flip/flop */
> -
> - vga_enable(vh);
>  }
>  
>  void
> Index: vgavar.h
> ===
> RCS file: /cvs/src/sys/dev/ic/vgavar.h,v
> retrieving revision 1.13
> diff -u -p -r1.13 vgavar.h
> --- vgavar.h  26 Jul 2015 03:17:07 -  1.13
> +++ vgavar.h  15 May 2020 23:38:32 -
> @@ -96,9 +96,6 @@ static inline void _vga_gdc_write(struct
>  #define  vga_raw_write(vh, reg, value) \
>   bus_space_write_1(vh->vh_iot, vh->vh_ioh_vga, reg, value)
>  
> -#define  vga_enable(vh) \
> - vga_raw_write(vh, 0, 0x20);
> -
>  static inline u_int8_t
>  _vga_attr_read(struct vga_handle *vh, int reg)
>  {
> @@ -110,11 +107,6 @@ _vga_attr_read(struct vga_handle *vh, in
>   vga_raw_write(vh, VGA_ATC_INDEX, reg);
>   res = vga_raw_read(vh, VGA_ATC_DATAR);
>  
> - /* reset state XXX unneeded? */
> - (void) bus_space_read_1(vh->vh_iot, vh->vh_ioh_6845, 10);
> -
> - vga_enable(vh);
> -
>   return (res);
>  }
>  
> @@ -126,11 +118,6 @@ _vga_attr_write(struct vga_handle *vh, i
>  
>   vga_raw_write(vh, VGA_ATC_INDEX, reg);
>   vga_raw_write(vh, VGA_ATC_DATAW, val);
> -
> - /* reset state XXX unneeded? */
> - (void) bus_space_read_1(vh->vh_iot, vh->vh_ioh_6845, 10);
> -
> - vga_enable(vh);
>  }
>  
>  static inline u_int8_t
> 
> 
> 



Re: Corrupted text background color on VirtualBox

2020-05-16 Thread Jonathan Gray
On Fri, May 15, 2020 at 10:02:22PM -0700, jo...@armadilloaerospace.com wrote:
> Ok, new theory:
> 
> VirtualBox ignores the pas bit, so writes to the attribute data register
> after vga_enable() is called go ahead and modify the attribute palette
> index 0, but on a real vga the write is ignored.
> 
> This implies something is writing to VGA_ATC_DATAW without resetting the
> flip flop.
> 
> If I change vga_enable() to write 0x3f instead of 0x20, which is an
> actual illegal register + pas instead of text palette 0 + pas, the text
> background doesn't change.
> 
> If I explicitly write a 0 to VGA_ATC_DATAW after the vga_enable(), the
> problem also doesn't appear; the background stays black.
> 
> If I explicitly write a 3 to VGA_ATC_DATAW after the vga_enable(), the
> text background becomes cyan after the restore, which confirms that
> VirtualBox is allowing the palette registers to be modified even when
> pas is set.
> 
> Oddly, if I reset the flip-flop again after vga_enable(), which should
> make the next write just an address change instead of doing anything,
> the problem still happens.
> 
> So, I would propose:
> 
> Removing the flip flop reset and vga_reset() from vga_restore_palette()
> since setting the 256 color VGA palette doesn't touch the attribute
> registers at all.  This fixes VirtualBox.
> 
> Leave the vga_enable() code in vga_var.h for the vga_attr_read/write()
> calls, where it probably is still needed, at least for old cards.
> 
> It looks like the only place those are ever called from is
> vga_save_state() / vga_restore_state(), which only happens on ACPI
> events.  That might still cause a problem on VirtualBox, but I don't
> see a way to simulate a hibernate event, so it probably doesn't matter.
> 
> Alternately, vga_enable() could explicitly write a 0 after setting 0x20
> which would work around the VirtualBox issue and not harm anything
> else.

It turns out the flip-flop reset in the palette functions is using the
wrong address.  _vga_attr_read()/_vga_attr_write() get it right.

ioh_vga 0x3c0
vh_ioh_6845 0x3d0 (unless mono)

vga_raw_read() uses ioh_vga so incorrectly reads 0x3ca to reset
instead of 0x3da.

This was a mistake made in importing code from FreeBSD


revision 1.48
date: 2009/02/01 14:37:22;  author: miod;  state: Exp;  lines: +78 -3;
Save the text mode color palette upon startup, and restore it when
switching consoles or when X11 exits. Almost all other operating systems
do this, and thus do not suffer from palette bugs in some X11 drivers.

>From FreeBSD.


In FreeBSD they have

inb(adp->va_crtc_addr + 6)

where va_crtc_addr is set to COLOR_CRTC for non-mono

#define COLOR_CRTC  (IO_CGA + 0x04) 
#define IO_CGA  0x3D0

I agree the attempted flip-flop reset and pas setting can go:

Index: vga.c
===
RCS file: /cvs/src/sys/dev/ic/vga.c,v
retrieving revision 1.69
diff -u -p -r1.69 vga.c
--- vga.c   17 Jun 2017 19:20:30 -  1.69
+++ vga.c   16 May 2020 07:21:19 -
@@ -1213,8 +1213,6 @@ vga_save_palette(struct vga_config *vc)
vga_raw_write(vh, VGA_DAC_READ, 0x00);
for (i = 0; i < 3 * 256; i++)
*palette++ = vga_raw_read(vh, VGA_DAC_DATA);
-
-   vga_raw_read(vh, 0x0a); /* reset flip/flop */
 }
 
 void
@@ -1231,10 +1229,6 @@ vga_restore_palette(struct vga_config *v
vga_raw_write(vh, VGA_DAC_WRITE, 0x00);
for (i = 0; i < 3 * 256; i++)
vga_raw_write(vh, VGA_DAC_DATA, *palette++);
-
-   vga_raw_read(vh, 0x0a); /* reset flip/flop */
-
-   vga_enable(vh);
 }
 
 void



Re: VGA fonts not reloading when changing back to text mode

2020-05-17 Thread Jonathan Gray
On Thu, May 14, 2020 at 11:59:07AM +0200, Mark Kettenis wrote:
> > Date: Thu, 14 May 2020 19:20:00 +1000
> > From: Jonathan Gray 
> > 
> > On Wed, May 13, 2020 at 11:11:56PM -0700, jo...@armadilloaerospace.com 
> > wrote:
> > > On a system with VGA graphics, like a VirtualBox:
> > > 
> > > Boot to X.
> > > ctrl-alt-F1 back to text mode.
> > > As root:
> > > wsfontload -h 8 -e ibm /usr/share/misc/pcvtfonts/vt220l.808
> > > wsconscfg -dF 1
> > > wsconscfg -t 80x50 1
> > > ctrl-alt-F2 to see the second screen in 80x50 mode
> > > (takes a while for the getty to restart).
> > > ctrl-alt-F5 to go back to graphics mode with X.
> > > ctrl-alt-F2 to go back to the 80x50 text mode.
> > > 
> > > The custom font has not been reloaded, so the screen is full of trash.
> > > 
> > > Inserting a restore in vga.c:824 fixes it:
> > >   vga_setfont(vc, scr);
> > > + vga_restore_fonts(vc);
> > >   vga_restore_palette(vc);
> > > 
> > > This does cause the font to be uploaded on every virtual screen switch,
> > > but it is only there if a custom font has already been loaded, and is
> > > tiny in any case.
> > > 
> > > The vga_restore_fonts() function is currently only called in vga_ioctl()
> > > for WSDISPLAYIO_SMODE when going from graphics to WSDISPLAYIO_MODE_EMUL,
> > > which should be optimal, but this never happens -- that vga_ioctl()
> > > happens once after boot to go into graphics mode, but never gets called
> > > again when switching back to text mode.  Maybe something with the
> > > integration of kms drivers changed the call semantics?
> > 
> > Thanks for the report.
> > 
> > In xenocara/xserver/hw/xfree86/os-support/bsd/bsd_init.c
> > xf86OpenConsole() does the ioctl with WSDISPLAYIO_MODE_MAPPED
> > xf86CloseConsole() does the ioctl with WSDISPLAYIO_MODE_EMUL
> > 
> > however these aren't currently done for switching when xorg is running
> > 
> > Testing here xf86VTSwitchAway() is called when switching from xorg to VT
> > xf86VTSwitchTo() is called when switching back to xorg.
> > 
> > Adding the ioctl to these functions makes the steps you've given end in
> > a non trashed VT.
> > 
> > This is clearly a dark corner of the xserver code and suffers from
> > trying to support multiple systems and be compatible with USL ioctls.
> 
> I don't think this is the right approach.  Having X involved in the VT
> switch has always been a bit of an issue.  Can't be avoided completely
> since X somehow needs to be told to stop messing with the display
> hardware behind our back.  But I think it is best if the kernel can do
> all the necessary repair by itself.  Then, if X crashes, you can still
> VT-switch to another screen and have things work.
> 
> So I think John's suggestion makes sense.  The only thing that seems a
> bit wrong is that the current font is set before the fonts are
> actually restored.
> 
> Restoring the fonts on every VT-switch isn't a very big issue.
> Nothing will happen if you didn't explicitly load an alternative font.
> But if the overhead is noticable we could only do the restore when
> switching away from a screen that has the SCR_GRAPHICS flag set to a
> screen that doesn't.

I've committed the palette fix and this patch with the order changed
as suggested by Mark to have vga_restore_fonts() (write to video memory)
before vga_setfont() (pointing the character generator at it).

Thanks for the detailed reports and patches.



Re: 6.7-release boot fails on AMD64 after installing AMD GPU firmware

2020-05-31 Thread Jonathan Gray
On Sun, May 31, 2020 at 09:52:46AM -0400, Daniel Sullivan wrote:
> OpenBSD 6.7 installed using the amd64 install67.iso (verified using
> signify-openbsd on Ubuntu 20.04).
> 
> The system successfully reboots once after installation and installs the
> firmware for the CPU's onboard GPU, but fails with the following error message
> upon rebooting again:
> 
> initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1002:0x15D8 0xCC).

That line is not an error message, it is expected.

Can you not interact with the console when booting with the firmware
installed?
Is any text visible on the display?
Which display connector are you using?

You'll need to reboot after the firmware is installed if you've not
already done so.  So install, reboot and fw_update runs, then reboot
again.

> 
> This is the dmesg output from the first reboot:
> 
> OpenBSD 6.7 (RAMDISK_CD) #177: Thu May  7 11:19:02 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 6372102144 (6076MB)
> avail mem = 6174957568 (5888MB)
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe8d60 (46 entries)
> bios0: vendor American Megatrends Inc. version "F50" date 11/28/2019
> bios0: Gigabyte Technology Co., Ltd. A320M-S2H
> acpi0 at bios0: ACPI 6.0
> acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT MCFG SSDT HPET
> UEFI IVRS SSDT CRAT CDIT SSDT SSDT SSDT SSDT WSMT
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Athlon 3000G with Radeon Vega Graphics, 3493.99 MHz, 17-18-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> ioapic0 at mainbus0: apid 5 pa 0xfec0, version 21, 24 pins
> ioapic1 at mainbus0: apid 6 pa 0xfec01000, version 21, 32 pins
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (GPP0)
> acpiprt2 at acpi0: bus -1 (GPP2)
> acpiprt3 at acpi0: bus -1 (GPP3)
> acpiprt4 at acpi0: bus -1 (GPP4)
> acpiprt5 at acpi0: bus -1 (GPP5)
> acpiprt6 at acpi0: bus -1 (GPP6)
> acpiprt7 at acpi0: bus 7 (GP17)
> acpiprt8 at acpi0: bus 8 (GP18)
> acpiprt9 at acpi0: bus 1 (GPP1)
> acpicpu at acpi0 not configured
> acpitz at acpi0 not configured
> "PNP0A08" at acpi0 not configured
> acpicmos0 at acpi0
> "PNP0C0C" at acpi0 not configured
> amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins
> "AMDIF030" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "AMD 17h/1xh Root Complex" rev 0x00
> "AMD 17h/1xh IOMMU" rev 0x00 at pci0 dev 0 function 2 not configured
> pchb1 at pci0 dev 1 function 0 "AMD 17h PCIE" rev 0x00
> ppb0 at pci0 dev 1 function 2 "AMD 17h/1xh PCIE" rev 0x00: msi
> pci1 at ppb0 bus 1
> xhci0 at pci1 dev 0 function 0 vendor "AMD", unknown product 0x43bc
> rev 0x02: msi, xHCI 1.10
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev
> 3.00/1.00 addr 1
> ahci0 at pci1 dev 0 function 1 vendor "AMD", unknown product 0x43b8
> rev 0x02: msi, AHCI 1.3.1
> ahci0: port busy after first PMP probe FIS
> ahci0: port busy after first PMP probe FIS
> ahci0: port 0: 6.0Gb/s
> ahci0: port busy after first PMP probe FIS
> ahci0: port busy after first PMP probe FIS
> ahci0: port 1: 6.0Gb/s
> scsibus0 at ahci0: 32 targets
> sd0 at scsibus0 targ 0 lun 0:  
> naa.5001b448b48ef126
> sd0: 238475MB, 512 bytes/sector, 488397168 sectors, thin
> sd1 at scsibus0 targ 1 lun 0:  
> naa.50014ee211473ff3
> sd1: 3815447MB, 512 bytes/sector, 7814037168 sectors
> ppb1 at pci1 dev 0 function 2 vendor "AMD", unknown product 0x43b3 rev 0x02
> pci2 at ppb1 bus 2
> ppb2 at pci2 dev 4 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci3 at ppb2 bus 3
> ppb3 at pci2 dev 5 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci4 at ppb3 bus 4
> ppb4 at pci2 dev 6 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci5 at ppb4 bus 5
> ppb5 at pci2 dev 7 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci6 at ppb5 bus 6
> re0 at pci6 dev 0 function 0 "Realtek 8168" rev 0x0c: RTL8168G/8111G
> (0x4c00), msi, address b4:2e:99:e3:ca:f0
> rgephy0 at re0 phy 7:

Re: 6.7-release boot fails on AMD64 after installing AMD GPU firmware

2020-05-31 Thread Jonathan Gray
On Sun, May 31, 2020 at 11:35:04AM -0400, Daniel Sullivan wrote:
> Sorry, I misspoke. That's just the last line that gets printed to the console
> before the boot hangs.
> 
> I have not tried to interact with the console when it hangs at this point
> (didn't know that I could do that).
> The rest of the boot text (device discoveries, etc). is visible on the screen.
> I am using the HDMI connector on the motherboard (this processor has an
> onboard GPU).

Can you try displayport or vga?  Support for picasso was backported into
the 4.19 tree which was tested on laptops with edp panels, it is possible
something was missed.

If you can build a kernel with 'option DRMDEBUG' added it may provide
more information.

> 
> This problem occurs on the second reboot after install (after the amdgpu
> firmware is installed). After disabling the amdgpu firmware using fw_update,
> the machine no longer hangs during boot.
> 
> On Sun, May 31, 2020 at 10:59 AM Jonathan Gray  wrote:
> >
> > On Sun, May 31, 2020 at 09:52:46AM -0400, Daniel Sullivan wrote:
> > > OpenBSD 6.7 installed using the amd64 install67.iso (verified using
> > > signify-openbsd on Ubuntu 20.04).
> > >
> > > The system successfully reboots once after installation and installs the
> > > firmware for the CPU's onboard GPU, but fails with the following error 
> > > message
> > > upon rebooting again:
> > >
> > > initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1002:0x15D8 0xCC).
> >
> > That line is not an error message, it is expected.
> >
> > Can you not interact with the console when booting with the firmware
> > installed?
> > Is any text visible on the display?
> > Which display connector are you using?
> >
> > You'll need to reboot after the firmware is installed if you've not
> > already done so.  So install, reboot and fw_update runs, then reboot
> > again.
> >
> > >
> > > This is the dmesg output from the first reboot:
> > >
> > > OpenBSD 6.7 (RAMDISK_CD) #177: Thu May  7 11:19:02 MDT 2020
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > > real mem = 6372102144 (6076MB)
> > > avail mem = 6174957568 (5888MB)
> > > mainbus0 at root
> > > bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe8d60 (46 entries)
> > > bios0: vendor American Megatrends Inc. version "F50" date 11/28/2019
> > > bios0: Gigabyte Technology Co., Ltd. A320M-S2H
> > > acpi0 at bios0: ACPI 6.0
> > > acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT MCFG SSDT HPET
> > > UEFI IVRS SSDT CRAT CDIT SSDT SSDT SSDT SSDT WSMT
> > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > > cpu0 at mainbus0: apid 0 (boot processor)
> > > cpu0: AMD Athlon 3000G with Radeon Vega Graphics, 3493.99 MHz, 17-18-01
> > > cpu0: 
> > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> > > cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB
> > > 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> > > cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully 
> > > associative
> > > cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully 
> > > associative
> > > cpu0: apic clock running at 24MHz
> > > cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> > > cpu at mainbus0: not configured
> > > cpu at mainbus0: not configured
> > > cpu at mainbus0: not configured
> > > ioapic0 at mainbus0: apid 5 pa 0xfec0, version 21, 24 pins
> > > ioapic1 at mainbus0: apid 6 pa 0xfec01000, version 21, 32 pins
> > > acpiprt0 at acpi0: bus 0 (PCI0)
> > > acpiprt1 at acpi0: bus -1 (GPP0)
> > > acpiprt2 at acpi0: bus -1 (GPP2)
> > > acpiprt3 at acpi0: bus -1 (GPP3)
> > > acpiprt4 at acpi0: bus -1 (GPP4)
> > > acpiprt5 at acpi0: bus -1 (GPP5)
> > > acpiprt6 at acpi0: bus -1 (GPP6)
> > > acpiprt7 at acpi0: bus 7 (GP17)
> > > acpiprt8 at acpi0: bus 8 (GP18)
> > > acpiprt9 at acpi0: bus 1 (GPP1)
> > > acpicpu at acpi0 not configured
> > > acpitz at acpi0 not configured
> > > "PNP0A08" at acpi0 not configured
> > > acp

Re: 6.7-release boot fails on AMD64 after installing AMD GPU firmware

2020-06-01 Thread Jonathan Gray
The last line you see with debug if it differs would help.
Serial output from boot is preferred but it seems your board does not
have a serial header.

On Mon, Jun 01, 2020 at 11:08:04AM -0400, Daniel Sullivan wrote:
> I can test it with a VGA connection and build a kernel with that option.
> 
> What information should I gather when I do this? If the boot hangs, then 
> what's
> the preferred method for gathering it?
> 
> On Mon, Jun 1, 2020 at 2:00 AM Jonathan Gray  wrote:
> >
> > On Sun, May 31, 2020 at 11:35:04AM -0400, Daniel Sullivan wrote:
> > > Sorry, I misspoke. That's just the last line that gets printed to the 
> > > console
> > > before the boot hangs.
> > >
> > > I have not tried to interact with the console when it hangs at this point
> > > (didn't know that I could do that).
> > > The rest of the boot text (device discoveries, etc). is visible on the 
> > > screen.
> > > I am using the HDMI connector on the motherboard (this processor has an
> > > onboard GPU).
> >
> > Can you try displayport or vga?  Support for picasso was backported into
> > the 4.19 tree which was tested on laptops with edp panels, it is possible
> > something was missed.
> >
> > If you can build a kernel with 'option DRMDEBUG' added it may provide
> > more information.
> >
> > >
> > > This problem occurs on the second reboot after install (after the amdgpu
> > > firmware is installed). After disabling the amdgpu firmware using 
> > > fw_update,
> > > the machine no longer hangs during boot.
> > >
> > > On Sun, May 31, 2020 at 10:59 AM Jonathan Gray  wrote:
> > > >
> > > > On Sun, May 31, 2020 at 09:52:46AM -0400, Daniel Sullivan wrote:
> > > > > OpenBSD 6.7 installed using the amd64 install67.iso (verified using
> > > > > signify-openbsd on Ubuntu 20.04).
> > > > >
> > > > > The system successfully reboots once after installation and installs 
> > > > > the
> > > > > firmware for the CPU's onboard GPU, but fails with the following 
> > > > > error message
> > > > > upon rebooting again:
> > > > >
> > > > > initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1002:0x15D8 
> > > > > 0xCC).
> > > >
> > > > That line is not an error message, it is expected.
> > > >
> > > > Can you not interact with the console when booting with the firmware
> > > > installed?
> > > > Is any text visible on the display?
> > > > Which display connector are you using?
> > > >
> > > > You'll need to reboot after the firmware is installed if you've not
> > > > already done so.  So install, reboot and fw_update runs, then reboot
> > > > again.
> > > >
> > > > >
> > > > > This is the dmesg output from the first reboot:
> > > > >
> > > > > OpenBSD 6.7 (RAMDISK_CD) #177: Thu May  7 11:19:02 MDT 2020
> > > > > 
> > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > > > > real mem = 6372102144 (6076MB)
> > > > > avail mem = 6174957568 (5888MB)
> > > > > mainbus0 at root
> > > > > bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe8d60 (46 entries)
> > > > > bios0: vendor American Megatrends Inc. version "F50" date 11/28/2019
> > > > > bios0: Gigabyte Technology Co., Ltd. A320M-S2H
> > > > > acpi0 at bios0: ACPI 6.0
> > > > > acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT MCFG SSDT HPET
> > > > > UEFI IVRS SSDT CRAT CDIT SSDT SSDT SSDT SSDT WSMT
> > > > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > > > > cpu0 at mainbus0: apid 0 (boot processor)
> > > > > cpu0: AMD Athlon 3000G with Radeon Vega Graphics, 3493.99 MHz, 
> > > > > 17-18-01
> > > > > cpu0: 
> > > > > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> > > > > cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB
> > > > >

Re: uvm_fault panic at drm_mode_rmfb_work_dn

2020-06-08 Thread Jonathan Gray
On Mon, Jun 08, 2020 at 08:16:42PM +0200, Matthias Schmidt wrote:
> Hi,
> 
> * Matthias Schmidt wrote:
> > Hi,
> > 
> > I run 
> > 
> > OpenBSD 6.7-current (GENERIC.MP) #250: Sun Jun  7 19:48:27 MDT 2020
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > on my Thinkpad T450s.  While watching a movie mounted on a NFS share the
> > kernel paniced with the following message (transcribed by hand).  The
> > movie was played with mpv in full-screen.
> 
> The panic is reproducible with mpv and watching a video in full-screen.
> 
> If you want to reproduce without hassle, install mpv and youtube-dl
> from ports and use the latter to download https://vimeo.com/427096874 (a
> totally random video I picked because its CC licensed).  Play it, seek a
> bit around and the kernel will panic.
> 
> Cheers
> 
>   Matthias
> 

You are using a snapshot just before the drm tree was replaced by a new
port of drm from linux 5.7.  Can you reproduce this with a newer
snapshot?

I can't reproduce this with -current on similiar broadwell machine (x250)
with mpv defaulting to a 'gpu' video output backend.



Re: 6.7-release boot fails on AMD64 after installing AMD GPU firmware

2020-06-08 Thread Jonathan Gray
That all looks expected.  Can you try boot a kernel from a -current
snapshot?  drm in -current has just had a major update.

On Sun, Jun 07, 2020 at 03:59:29PM -0400, Daniel Sullivan wrote:
> Here's the transcribed output from the DRMDEBUG boots with both the HDMI and
> VGA connectors. Output appears to be identical (minus one time when I got a
> WARNING: CHECK AND RESET THE DATE message in between the debug lines, but 
> maybe
> I was just looking at it funny).
> 
> 
> initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1002:0x15D8 0xCC).
> [drm] register mmio base: 0xFCC0
> [drm] register mmio size: 524288
> [drm] probing pcie caps for device 7:0:0 0x1002:0x15d8 = 3/e
> [drm] probing pcie caps for device 0:8:1 0x1022:0x15db = 3/e
> [drm] probing pcie width for device 0:8:1 0x1022:0x15db = 700d03
> [drm] add ip block number 0 
> [drm] add ip block number 1 
> [drm] add ip block number 2 
> [drm] add ip block number 3 
> [drm] add ip block number 4 
> [drm] add ip block number 5 
> [drm] add ip block number 6 
> [drm] add ip block number 7 
> [drm] add ip block number 8 
> [drm] VCN decode is enabled in VM mode
> [drm] VCN encode is enabled in VM mode
> [drm] VCN jpeg decode is enabled in VM mode
> [drm] BIOS signature incorrect 73 17
> [drm] BIOS signature incorrect 73 17
> ATOM BIOS: 113-RAVEN2-115
> [drm] vm size is 262144 GB, 3 levels, block size is 9-bit, fragment
> size is 9-bit
> drm: VRAM: 2048M 0x00F4 - 0x00F47FFF (2048M used)
> drm: GART: 1024M 0x00F5 - 0x00F53FFF
> 
> On Tue, Jun 2, 2020 at 12:47 AM Jonathan Gray  wrote:
> >
> > The last line you see with debug if it differs would help.
> > Serial output from boot is preferred but it seems your board does not
> > have a serial header.
> >
> > On Mon, Jun 01, 2020 at 11:08:04AM -0400, Daniel Sullivan wrote:
> > > I can test it with a VGA connection and build a kernel with that option.
> > >
> > > What information should I gather when I do this? If the boot hangs, then 
> > > what's
> > > the preferred method for gathering it?
> > >
> > > On Mon, Jun 1, 2020 at 2:00 AM Jonathan Gray  wrote:
> > > >
> > > > On Sun, May 31, 2020 at 11:35:04AM -0400, Daniel Sullivan wrote:
> > > > > Sorry, I misspoke. That's just the last line that gets printed to the 
> > > > > console
> > > > > before the boot hangs.
> > > > >
> > > > > I have not tried to interact with the console when it hangs at this 
> > > > > point
> > > > > (didn't know that I could do that).
> > > > > The rest of the boot text (device discoveries, etc). is visible on 
> > > > > the screen.
> > > > > I am using the HDMI connector on the motherboard (this processor has 
> > > > > an
> > > > > onboard GPU).
> > > >
> > > > Can you try displayport or vga?  Support for picasso was backported into
> > > > the 4.19 tree which was tested on laptops with edp panels, it is 
> > > > possible
> > > > something was missed.
> > > >
> > > > If you can build a kernel with 'option DRMDEBUG' added it may provide
> > > > more information.
> > > >
> > > > >
> > > > > This problem occurs on the second reboot after install (after the 
> > > > > amdgpu
> > > > > firmware is installed). After disabling the amdgpu firmware using 
> > > > > fw_update,
> > > > > the machine no longer hangs during boot.
> > > > >
> > > > > On Sun, May 31, 2020 at 10:59 AM Jonathan Gray  wrote:
> > > > > >
> > > > > > On Sun, May 31, 2020 at 09:52:46AM -0400, Daniel Sullivan wrote:
> > > > > > > OpenBSD 6.7 installed using the amd64 install67.iso (verified 
> > > > > > > using
> > > > > > > signify-openbsd on Ubuntu 20.04).
> > > > > > >
> > > > > > > The system successfully reboots once after installation and 
> > > > > > > installs the
> > > > > > > firmware for the CPU's onboard GPU, but fails with the following 
> > > > > > > error message
> > > > > > > upon rebooting again:
> > > > > > >
> > > > > > > initializing kernel modesetting (RAVEN 0x1002:0x15D8 
> > > > > > > 0x1002:0x15D8 0xCC).
> > > > > >
> > > > > >

Re: page fault trap in connector_bad_edid with new drm code

2020-06-09 Thread Jonathan Gray
On Tue, Jun 09, 2020 at 08:12:17AM +0200, Otto Moerbeek wrote:
> On Tue, Jun 09, 2020 at 08:01:12AM +0200, Otto Moerbeek wrote:
> 
> > On Mon, Jun 08, 2020 at 09:46:23PM +0200, Mark Kettenis wrote:
> > 
> > > > Date: Mon, 8 Jun 2020 20:27:22 +0200
> > > > From: Otto Moerbeek 
> > > > 
> > > > Hi.
> > > > 
> > > > a page fault trap happens if I boot my Thnkpad X1 6th generation in the 
> > > > dock
> > > > or put it in the dock afterwards. The dock has two DP monitors 
> > > > connected.
> > > > 
> > > > If I change connector_bad_edid() to return immediately things seems to
> > > > work ok.
> > > > 
> > > > -Otto
> > > > 
> > > > summary of trace:
> > > > 
> > > > connector_bad_edid+0x4d
> > > > drm_do_get_edid+0x382
> > > > drm_get_edid+0x6b
> > > > intel_hmi_set_edid+0xad
> > > > intel_hdmi_detect+0xb1
> > > > drm_helper_probe_detect+0x108
> > > > intel_encoder_hotplug+0x7f
> > > > intel_ddi_hotplug+0x54
> > > > i915_hotplug_work_func+0x245
> > > > tasq_thread+0x8d
> > > > 
> > > > full trace:
> > > > 
> > > > https://www.drijf.net/openbsd/IMG_20200608_154513.jpg
> > > 
> > > Not sure what kernel you're using.  But the instruction in that image
> > > doesn't eist in connector_bad_edid in the kernel I just built.
> > > 
> > 
> > Strange, I do have it.  /usr/src/sys/dev/pci/drm/drm_edid.c contains
> > no version marker, but it's md5 is 3d889f9e1cb3c66cdb4eb49e9319d947
> > 
> > objdump -d snippet:
> > 
> > 
> > 3830 :
> > 3830:   4c 8b 1d 00 00 00 00mov0(%rip),%r11#
> > 3837 
> > 3837:   4c 33 1c 24 xor(%rsp),%r11
> > 383b:   55  push   %rbp
> > 383c:   48 89 e5mov%rsp,%rbp
> > 383f:   57  push   %rdi
> > 3840:   56  push   %rsi
> > 3841:   52  push   %rdx
> > 3842:   57  push   %rdi
> > 3843:   41 53   push   %r11
> > 3845:   41 57   push   %r15
> > 3847:   41 56   push   %r14
> > 3849:   41 55   push   %r13
> > 384b:   41 54   push   %r12
> > 384d:   53  push   %rbx
> > 384e:   48 83 ec 20 sub$0x20,%rsp
> > 3852:   41 89 d7mov%edx,%r15d
> > 3855:   0f b6 46 7e movzbl 0x7e(%rsi),%eax
> > 3859:   48 c1 e0 07 shl$0x7,%rax
> > 385d:   48 89 75 a8 mov%rsi,0xffa8(%rbp)
> > 3861:   48 01 f0add%rsi,%rax
> > 3864:   31 d2   xor%edx,%edx
> > 3866:   b9 03 00 00 00  mov$0x3,%ecx
> > 386b:   eb 10   jmp387d 
> > 
> > 386d:   cc  int3
> > 386e:   cc  int3
> > 386f:   cc  int3
> > 3870:   0f b6 34 08 movzbl (%rax,%rcx,1),%esi
> > 3874:   01 f2   add%esi,%edx
> > 3876:   0f b6 d2movzbl %dl,%edx
> > 3879:   48 83 c1 04 add$0x4,%rcx
> > 387d:   0f b6 74 08 fd  movzbl 
> > 0xfffd(%rax,%rcx,1),%esi
> > 3882:   01 d6   add%edx,%esi
> > 3884:   0f b6 54 08 fe  movzbl 
> > 0xfffe(%rax,%rcx,1),%edx
> > 3889:   01 f2   add%esi,%edx
> > 388b:   0f b6 f2movzbl %dl,%esi
> > 388e:   0f b6 54 08 ff  movzbl 
> > 0x(%rax,%rcx,1),%edx
> > 3893:   01 f2   add%esi,%edx
> > 3895:   48 83 f9 7f cmp$0x7f,%rcx
> > 3899:   75 d5   jne3870 
> > 
> > ...
> > 
> 
> This looks like an inlined unrolled version of drm_edid_block_checksum()

gdb points to drm_edid_block_checksum() as well

0x819966ed <+77>:movzbl -0x3(%rax,%rcx,1),%esi

(gdb) info line *0x819966ed
Line 1597 of "/sys/dev/pci/drm/drm_edid.c"
   starts at address 0x819966ed 
   and ends at 0x819966fb .

> 
> My guess is num_of_ext > num_blocks, will verify later when I'm at the 
> machine.
> 
>   -Otto



Re: page fault trap in connector_bad_edid with new drm code

2020-06-10 Thread Jonathan Gray
On Wed, Jun 10, 2020 at 12:52:31PM +0200, Mark Kettenis wrote:
> > Date: Wed, 10 Jun 2020 11:28:04 +0200
> > From: Otto Moerbeek 
> > 
> > On Tue, Jun 09, 2020 at 08:28:57PM +0200, Mark Kettenis wrote:
> > 
> > > > Date: Tue, 9 Jun 2020 20:08:42 +0200
> > > > From: Otto Moerbeek 
> > > > 
> > > > On Tue, Jun 09, 2020 at 04:19:34PM +0200, Mark Kettenis wrote:
> > > > 
> > > > > > Date: Tue, 9 Jun 2020 16:08:26 +0200
> > > > > > From: Otto Moerbeek 
> > > > > > 
> > > > > > On Tue, Jun 09, 2020 at 04:05:25PM +0200, Otto Moerbeek wrote:
> > > > > > 
> > > > > > > On Tue, Jun 09, 2020 at 04:59:17PM +1000, Jonathan Gray wrote:
> > > > > > > 
> > > > > > > > On Tue, Jun 09, 2020 at 08:12:17AM +0200, Otto Moerbeek wrote:
> > > > > > > > > On Tue, Jun 09, 2020 at 08:01:12AM +0200, Otto Moerbeek wrote:
> > > > > > > > > 
> > > > > > > > > > On Mon, Jun 08, 2020 at 09:46:23PM +0200, Mark Kettenis 
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > > Date: Mon, 8 Jun 2020 20:27:22 +0200
> > > > > > > > > > > > From: Otto Moerbeek 
> > > > > > > > > > > > 
> > > > > > > > > > > > Hi.
> > > > > > > > > > > > 
> > > > > > > > > > > > a page fault trap happens if I boot my Thnkpad X1 6th 
> > > > > > > > > > > > generation in the dock
> > > > > > > > > > > > or put it in the dock afterwards. The dock has two DP 
> > > > > > > > > > > > monitors connected.
> > > > > > > > > > > > 
> > > > > > > > > > > > If I change connector_bad_edid() to return immediately 
> > > > > > > > > > > > things seems to
> > > > > > > > > > > > work ok.
> > > > > > > > > > > > 
> > > > > > > > > > > > -Otto
> > > > > > > > > > > > 
> > > > > > > > > > > > summary of trace:
> > > > > > > > > > > > 
> > > > > > > > > > > > connector_bad_edid+0x4d
> > > > > > > > > > > > drm_do_get_edid+0x382
> > > > > > > > > > > > drm_get_edid+0x6b
> > > > > > > > > > > > intel_hmi_set_edid+0xad
> > > > > > > > > > > > intel_hdmi_detect+0xb1
> > > > > > > > > > > > drm_helper_probe_detect+0x108
> > > > > > > > > > > > intel_encoder_hotplug+0x7f
> > > > > > > > > > > > intel_ddi_hotplug+0x54
> > > > > > > > > > > > i915_hotplug_work_func+0x245
> > > > > > > > > > > > tasq_thread+0x8d
> > > > > > > > > > > > 
> > > > > > > > > > > > full trace:
> > > > > > > > > > > > 
> > > > > > > > > > > > https://www.drijf.net/openbsd/IMG_20200608_154513.jpg
> > > > > > > > > > > 
> > > > > > > > > > > Not sure what kernel you're using.  But the instruction 
> > > > > > > > > > > in that image
> > > > > > > > > > > doesn't eist in connector_bad_edid in the kernel I just 
> > > > > > > > > > > built.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Strange, I do have it.  /usr/src/sys/dev/pci/drm/drm_edid.c 
> > > > > > > > > > contains
> > > > > > > > > > no version marker, but it's md5 is 
> > > > > > > > > > 3d889f9e1cb3c66cdb4eb49e9319d947
> > > > > > > > > > 
> > > > > > > > > > objdump -d snippet:
> > > > > > > > > > 
> > > > > > &g

Re: drm panic

2020-06-11 Thread Jonathan Gray
On Thu, Jun 11, 2020 at 11:07:11AM +0100, Laurence Tratt wrote:
> The recent DRM update has fixed one long-ish-standing bug for me where Xorg
> sometimes would get stuck while executing Xsession (I could never work out
> why) which is really good!
> 
> However I've now had the following kernel panic several times:
> 
>   kernel: page fault trap, code=0
>   Stopped at intel_partial_pages+0xf4:moq 0x58,%rsi
> 
> Unfortunately that also seems to take out my keyboard at ddb so I have no
> further information beyond my dmesg :/

Try this, the code in question does

sg_set_page(sg, NULL, I915_GTT_PAGE_SIZE, 0);
sg_dma_address(sg) =
i915_gem_object_get_dma_address(obj, src_idx);
sg_dma_len(sg) = I915_GTT_PAGE_SIZE;

VM_PAGE_TO_PHYS() will attempt to deref NULL.

Index: sys/dev/pci/drm/include/linux/scatterlist.h
===
RCS file: /cvs/src/sys/dev/pci/drm/include/linux/scatterlist.h,v
retrieving revision 1.2
diff -u -p -r1.2 scatterlist.h
--- sys/dev/pci/drm/include/linux/scatterlist.h 8 Jun 2020 04:48:15 -   
1.2
+++ sys/dev/pci/drm/include/linux/scatterlist.h 11 Jun 2020 10:54:05 -
@@ -115,7 +115,8 @@ sg_set_page(struct scatterlist *sgl, str
 unsigned int length, unsigned int offset)
 {
sgl->__page = page;
-   sgl->dma_address = VM_PAGE_TO_PHYS(page);
+   if (page != NULL)
+   sgl->dma_address = VM_PAGE_TO_PHYS(page);
sgl->offset = offset;
sgl->length = length;
sgl->end = false;



Re: i915_request_create+0x4b: uvm_fault

2020-06-13 Thread Jonathan Gray
On Sat, Jun 13, 2020 at 12:15:13PM +0100, Stuart Henderson wrote:
> Same with a newer kernel.
> 
> OpenBSD 6.7-current (GENERIC.MP) #3: Thu Jun 11 19:47:48 BST 2020
> st...@symphytum.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> 
> uvm_fault(0xfd86e2f6c120, 0x51, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  i915_request_create+0x4b:   movq0x50(%r14),%rdi
> ddb{1}> tr

0x50 is the offset in the struct of requests
r14 in 1 in both traces and appears to be tl

I don't yet see how that is possible, can you try this diff and tell me
if the printf triggers?

Index: sys/dev/pci/drm/i915/i915_request.c
===
RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_request.c,v
retrieving revision 1.3
diff -u -p -r1.3 i915_request.c
--- sys/dev/pci/drm/i915/i915_request.c 8 Jun 2020 04:48:11 -   1.3
+++ sys/dev/pci/drm/i915/i915_request.c 14 Jun 2020 05:33:44 -
@@ -877,6 +877,11 @@ i915_request_create(struct intel_context
if (IS_ERR(tl))
return ERR_CAST(tl);
 
+   if ((vaddr_t)tl == 1) {
+   printf("%s tl == 1\n", __func__);
+   return ERR_PTR(-EINVAL);
+   }
+
/* Move our oldest request to the slab-cache (if not in use!) */
rq = list_first_entry(&tl->requests, typeof(*rq), link);
if (!list_is_last(&rq->link, &tl->requests))



Re: drm: uvm_fault / i915_gem_object_do_bit_17_swizzle

2020-06-14 Thread Jonathan Gray
On Sun, Jun 14, 2020 at 10:18:13AM +0200, Sebastien Marie wrote:
> Hi,
> 
> On i386, with recent locally built GENERIC.MP, I have a uvm_fault after xenodm
> starting (the system is fine as long I don't start xenodm).
> 
> the last version of the source tree tested is
> 68a1d3c69b863445cb9e4c9789ec5053e29e77b2 (Sat Jun 13 14:00:50 2020 UTC).
> 
> Please note that the trace below is from an slightly different tree (~ Thun 
> Jun 11)
> as I tried to first reproduce with recent tree.
> 
> Problem 100% reproductible on this particular host.

thanks for the report, should be fixed in i915_gem_fence_reg.c rev 1.3



Re: i915_request_create+0x4b: uvm_fault

2020-06-16 Thread Jonathan Gray
On Tue, Jun 16, 2020 at 06:46:41PM +0100, Stuart Henderson wrote:
> On 2020/06/15 21:50, Stuart Henderson wrote:
> > On 2020/06/14 15:45, Jonathan Gray wrote:
> > > On Sat, Jun 13, 2020 at 12:15:13PM +0100, Stuart Henderson wrote:
> > > > Same with a newer kernel.
> > > > 
> > > > OpenBSD 6.7-current (GENERIC.MP) #3: Thu Jun 11 19:47:48 BST 2020
> > > > st...@symphytum.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > > uvm_fault(0xfd86e2f6c120, 0x51, 0, 1) -> e
> > > > kernel: page fault trap, code=0
> > > > Stopped at  i915_request_create+0x4b:   movq0x50(%r14),%rdi
> > > > ddb{1}> tr
> > > 
> > > 0x50 is the offset in the struct of requests
> > > r14 in 1 in both traces and appears to be tl
> > > 
> > > I don't yet see how that is possible, can you try this diff and tell me
> > > if the printf triggers?
> > 
> > I'm running with it, hasn't triggered yet (3h uptime).
> 
> After some various reboots (tcp nfs related..) I have now seen
> it a couple of times in my current boot. The timing of the second
> one, pretty much exactly 2h after the first one, seems interesting,
> I wonder if there will be another one in 55 mins..
> 
> 2020-06-16T13:36:45.550Z symphytum /bsd: OpenBSD 6.7-current (GENERIC.MP) #3: 
> Tue Jun 16 13:35:25 BST 2020
> 2020-06-16T14:40:04.649Z symphytum /bsd: i915_request_create tl == 1
> 2020-06-16T16:40:04.413Z symphytum /bsd: i915_request_create tl == 1

The way we implement mutex_lock_interruptible() is
#define mutex_lock_interruptible(rwl)   -rw_enter(rwl, RW_WRITE | RW_INTR)

If something returned -1 we'd return 1 but all the error paths in the
functions involved should be returning positive errno values.

Can you try running with this as well to know which path is involved?

Index: dev/pci/drm/i915/gt/intel_context.h
===
RCS file: /cvs/src/sys/dev/pci/drm/i915/gt/intel_context.h,v
retrieving revision 1.1
diff -u -p -r1.1 intel_context.h
--- dev/pci/drm/i915/gt/intel_context.h 8 Jun 2020 04:48:13 -   1.1
+++ dev/pci/drm/i915/gt/intel_context.h 17 Jun 2020 05:53:14 -
@@ -145,9 +145,14 @@ intel_context_timeline_lock(struct intel
struct intel_timeline *tl = ce->timeline;
int err;
 
+   if ((vaddr_t)ce->timeline == 1)
+   printf("%s ce->timeline == 1\n", __func__);
+
err = mutex_lock_interruptible(&tl->mutex);
-   if (err)
+   if (err) {
+   printf("%s mutex_lock_interruptible() ret %d\n", __func__, err);
return ERR_PTR(err);
+   }
 
return tl;
 }



Re: i915_request_create+0x4b: uvm_fault

2020-06-21 Thread Jonathan Gray
On Sun, Jun 21, 2020 at 03:37:29PM +0100, Stuart Henderson wrote:
> On 2020/06/17 16:07, Jonathan Gray wrote:
> > On Tue, Jun 16, 2020 at 06:46:41PM +0100, Stuart Henderson wrote:
> > > On 2020/06/15 21:50, Stuart Henderson wrote:
> > > > On 2020/06/14 15:45, Jonathan Gray wrote:
> > > > > On Sat, Jun 13, 2020 at 12:15:13PM +0100, Stuart Henderson wrote:
> > > > > > Same with a newer kernel.
> > > > > > 
> > > > > > OpenBSD 6.7-current (GENERIC.MP) #3: Thu Jun 11 19:47:48 BST 2020
> > > > > > 
> > > > > > st...@symphytum.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
> > > > > > 
> > > > > > uvm_fault(0xfd86e2f6c120, 0x51, 0, 1) -> e
> > > > > > kernel: page fault trap, code=0
> > > > > > Stopped at  i915_request_create+0x4b:   movq
> > > > > > 0x50(%r14),%rdi
> > > > > > ddb{1}> tr
> > > > > 
> > > > > 0x50 is the offset in the struct of requests
> > > > > r14 in 1 in both traces and appears to be tl
> > > > > 
> > > > > I don't yet see how that is possible, can you try this diff and tell 
> > > > > me
> > > > > if the printf triggers?
> > > > 
> > > > I'm running with it, hasn't triggered yet (3h uptime).
> > > 
> > > After some various reboots (tcp nfs related..) I have now seen
> > > it a couple of times in my current boot. The timing of the second
> > > one, pretty much exactly 2h after the first one, seems interesting,
> > > I wonder if there will be another one in 55 mins..
> > > 
> > > 2020-06-16T13:36:45.550Z symphytum /bsd: OpenBSD 6.7-current (GENERIC.MP) 
> > > #3: 
> > > Tue Jun 16 13:35:25 BST 2020
> > > 2020-06-16T14:40:04.649Z symphytum /bsd: i915_request_create tl == 1
> > > 2020-06-16T16:40:04.413Z symphytum /bsd: i915_request_create tl == 1
> > 
> > The way we implement mutex_lock_interruptible() is
> > #define mutex_lock_interruptible(rwl)   -rw_enter(rwl, RW_WRITE | 
> > RW_INTR)
> > 
> > If something returned -1 we'd return 1 but all the error paths in the
> > functions involved should be returning positive errno values.
> > 
> > Can you try running with this as well to know which path is involved?
> > 
> > Index: dev/pci/drm/i915/gt/intel_context.h
> > ===
> > RCS file: /cvs/src/sys/dev/pci/drm/i915/gt/intel_context.h,v
> > retrieving revision 1.1
> > diff -u -p -r1.1 intel_context.h
> > --- dev/pci/drm/i915/gt/intel_context.h 8 Jun 2020 04:48:13 -   
> > 1.1
> > +++ dev/pci/drm/i915/gt/intel_context.h 17 Jun 2020 05:53:14 -
> > @@ -145,9 +145,14 @@ intel_context_timeline_lock(struct intel
> > struct intel_timeline *tl = ce->timeline;
> > int err;
> >  
> > +   if ((vaddr_t)ce->timeline == 1)
> > +   printf("%s ce->timeline == 1\n", __func__);
> > +
> > err = mutex_lock_interruptible(&tl->mutex);
> > -   if (err)
> > +   if (err) {
> > +   printf("%s mutex_lock_interruptible() ret %d\n", __func__, err);
> > return ERR_PTR(err);
> > +   }
> >  
> > return tl;
> >  }
> > 
> 
> Finally hit it:
> 
> 2020-06-21T14:10:11.005Z symphytum /bsd: intel_context_timeline_lock 
> mutex_lock_interruptible() ret 1
> 2020-06-21T14:10:11.007Z symphytum /bsd: i915_request_create tl == 1

thanks, mutex_lock_interruptible() was not expecting ERESTART to be -1

Index: dev/pci/drm/include/linux/mutex.h
===
RCS file: /cvs/src/sys/dev/pci/drm/include/linux/mutex.h,v
retrieving revision 1.3
diff -u -p -r1.3 mutex.h
--- dev/pci/drm/include/linux/mutex.h   8 Jun 2020 04:48:15 -   1.3
+++ dev/pci/drm/include/linux/mutex.h   21 Jun 2020 15:15:38 -
@@ -10,9 +10,8 @@
 
 #define DEFINE_MUTEX(x)struct rwlock x
 
-#define mutex_lock_interruptible(rwl)  -rw_enter(rwl, RW_WRITE | RW_INTR)
 #define mutex_lock_interruptible_nested(rwl, subc) \
-   -rw_enter(rwl, RW_WRITE | RW_INTR)
+   mutex_lock_interruptible(rwl)
 #define mutex_lock(rwl)rw_enter_write(rwl)
 #define mutex_lock_nest_lock(rwl, sub) rw_enter_write(rwl)
 #define mutex_lock_nested(rwl, sub)rw_enter_write(rwl)
@@ -20,6 +19,14 @@
 #define mutex_unlock(rwl)  rw_exit_write(rwl)
 #define mutex_is_locked(rwl)   (rw_status(rwl) != 0)
 #define mutex_destroy(rwl)
+
+static inline int
+mutex_lock_interruptible(struct rwlock *rwl)
+{
+   if (rw_enter(rwl, RW_WRITE | RW_INTR) != 0)
+   return -EINTR;
+   return 0;
+}
 
 enum mutex_trylock_recursive_result {
MUTEX_TRYLOCK_FAILED,



Re: null deref in xen_intr_barrier()

2020-06-28 Thread Jonathan Gray
On Fri, Jun 26, 2020 at 01:55:28PM +, t...@daybefore.net wrote:
> >Synopsis:null deref in xen_intr_barrier()
> >Category:kernel
> >Environment:
>   System  : OpenBSD 6.7
>   Details : OpenBSD 6.7-current (GENERIC) #291: Fri Jun 26 01:56:51 
> MDT 2020
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
>   I have a system that's running as a guest under Xen; recent snapshots
> panic while bringing up the xnf(4) if. This can happen in the ramdisk kernel 
> during
> a sysupgrade, or in a GENERIC kernel running netstart.
> 
> starting network
> uvm_fault(0xfd810ed64110, 0x28, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  intr_barrier+0x6:   movq0x28(%rdi),%rdi
> ddb> bt
> intr_barrier(0) at intr_barrier+0x6
> xen_intr_barrier(8) at xen_intr_barrier+0x1f
> xnf_stop(80193000) at xnf_stop+0x4c
> xnf_ioctl(80193120,8020690c,8063b000) at xnf_ioctl+0xd2
> in_ifinit(80193120,8063b000,8000225ce400,1) at 
> in_ifinit+0x
> f3
> in_ioctl_change_ifaddr(8040691a,8000225ce3f0,80193120,1) at 
> in_ioct
> l_change_ifaddr+0x376
> in_ioctl(8040691a,8000225ce3f0,80193120,1) at in_ioctl+0x103
> ifioctl(fd81028cb648,8040691a,8000225ce3f0,8000225d0870) at 
> ifioctl
> +0x98e
> sys_ioctl(8000225d0870,8000225ce500,8000225ce560) at 
> sys_ioctl+0x2c
> b
> syscall(8000225ce5d0) at syscall+0x315
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7e8c50, count: -11
> 
> In the definition of xen_intr_barrier() in dev/pv/xen.c, we find:
> 
>   /*
>* XXX This will need to be revised once intr_barrier starts
>* using its argument.
>*/
>   intr_barrier(NULL);
> 
> intr_barrier(9) started using its argument as of this commit:
> 
>   revision 1.53
>   date: 2020/06/16 23:35:10;  author: dlg;  state: Exp;  lines: +4 -3;
>   commitid: tVYPReymzTMuPlpA;
>   make intr_barrier run sched_barrier on the cpu the interrupt pinned to.
> 
>   intr_barrier passed NULL to sched_barrier before this, which ends
>   up being the primary cpu. that's been mostly right until this point,
>   but is set to change.
> 
> 
> >How-To-Repeat:
>   Boot a snapshot from 6/17 or later on a Xen domU with an xnf network 
> interface.
> >Fix:
>   Patching xen_intr_barrier() to do what intr_barrier(NULL) used to do
> eliminates the panic:

thanks, committed



Re: X11 graphics not working in snapshots on Braswell system

2020-06-30 Thread Jonathan Gray
On Thu, Jul 25, 2019 at 08:40:34PM +1000, Jonathan Gray wrote:
> On Tue, Jul 23, 2019 at 04:32:50PM +1000, Jonathan Gray wrote:
> > On Sun, Jul 21, 2019 at 11:23:12PM +1000, Ross L Richardson wrote:
> > > On Sun, Jul 21, 2019 at 07:41:58PM +1000, Jonathan Gray wrote:
> > > > On Sat, Jul 20, 2019 at 10:32:46PM +1000, open...@rlr.id.au wrote:
> > > > > >Synopsis:X11 graphics not working in snapshots on Braswell system
> > > > > >Category:amd64
> > > >[...]
> > > > Does this backport from linux help?
> > > >[...]
> > > 
> > > No.  The box still locks up (unfortunately).
> > 
> > Thanks for reporting and trying that.  Nothing else comes to mind.
> > 
> > I've placed an order for a Braswell NUC and can hopefully reproduce and
> > debug this when that shows up.
> > 
> 
> I see the same thing.

cherryview/braswell still had trouble even after recent port of drm from
linux 5.7.

I can run X on -current with < gen8 style rings.

Index: dev/pci/drm/i915/i915_pci.c
===
RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_pci.c,v
retrieving revision 1.3
diff -u -p -r1.3 i915_pci.c
--- dev/pci/drm/i915/i915_pci.c 8 Jun 2020 04:48:11 -   1.3
+++ dev/pci/drm/i915/i915_pci.c 30 Jun 2020 15:37:20 -
@@ -588,7 +588,7 @@ static const struct intel_device_info ch
.has_runtime_pm = 1,
.has_rc6 = 1,
.has_rps = true,
-   .has_logical_ring_contexts = 1,
+   .has_logical_ring_contexts = 0,
.display.has_gmch = 1,
.ppgtt_type = INTEL_PPGTT_ALIASING,
.ppgtt_size = 32,



Re: X11 graphics not working in snapshots on Braswell system

2020-06-30 Thread Jonathan Gray
On Tue, Jun 30, 2020 at 07:10:10PM +0200, Martin Ziemer wrote:
> On Wed, Jul 01, 2020 at 01:53:14AM +1000, Jonathan Gray wrote:
> > cherryview/braswell still had trouble even after recent port of drm from
> > linux 5.7.
> > 
> > I can run X on -current with < gen8 style rings.
> > 
> > Index: dev/pci/drm/i915/i915_pci.c
> > ===
> > RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_pci.c,v
> > retrieving revision 1.3
> > diff -u -p -r1.3 i915_pci.c
> > --- dev/pci/drm/i915/i915_pci.c 8 Jun 2020 04:48:11 -   1.3
> > +++ dev/pci/drm/i915/i915_pci.c 30 Jun 2020 15:37:20 -
> > @@ -588,7 +588,7 @@ static const struct intel_device_info ch
> > .has_runtime_pm = 1,
> > .has_rc6 = 1,
> > .has_rps = true,
> > -   .has_logical_ring_contexts = 1,
> > +   .has_logical_ring_contexts = 0,
> > .display.has_gmch = 1,
> > .ppgtt_type = INTEL_PPGTT_ALIASING,
> > .ppgtt_size = 32,
> > 
> 
> With this patch (and driver "intel" in /etc/X11/xorg.conf) my system works. 
> Tested suspend/resume and normal browsing.

Thanks for testing, I've committed this with a comment added.



Re: boot stuck at avail mem unless boot -c \n quit\n

2020-07-03 Thread Jonathan Gray
On Fri, Jul 03, 2020 at 06:12:28AM -0500, Abel Abraham Camarillo Ojeda wrote:
> On Fri, Jul 3, 2020 at 5:33 AM Stuart Henderson  wrote:
> 
> > On 2020/07/03 03:34, Abel Abraham Camarillo Ojeda wrote:
> > > >Synopsis: boot stuck at avail mem unless boot -c \n quit\n
> > > >Category: kernel amd64
> > > >Environment:
> > > System  : OpenBSD 6.7
> > > Details : OpenBSD 6.7-current (GENERIC.MP) #305: Fri Jun 26 09:07:29
> > > MDT 2020
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > >Description:
> > > boot stucks at line:
> > >
> > > OpenBSD 6.7-current (GENERIC.MP) #305: Fri Jun 26 09:07:29 MDT 2020
> > >dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > real mem = 17028538368 (16239MB)
> > > avail mem = 16497426432 (15733MB)
> > > (stops here)
> > >
> > > (img attached)
> > >
> > > machine boots if I
> > > run 'boot -c' at bootloader and just run 'quit', if not
> > > kernel stops after 'avail mem' line
> > >
> > > machine was running may 29 snapshot without boot problems,
> > > after a upgrade to jun 26 snapshot problems started to present.
> > >
> > > I bisected and tried several snapshots between jun 26 - may 29
> > > but problem still presented, then I reinstalled the *same* may 29
> > snapshot
> > > I was using and problems were also reproducible (but they weren't
> > > before) I'm without a clue here.
> > > >How-To-Repeat:
> > > 100% reproducible, tried it a dozen times:
> > >
> > > boot machine, GENERIC.MP without boot -c
> > > >Fix:
> > > boot -c \n quit \n
> > > and machine boots...
> >
> > - Try uninstalling intel-firwmare and see if that changes anything.
> >
> 
> Boots without problems after: pkg_delete intel-firmware;
> gets stuck at avail mem again if I do pkg_add intel-firmware;

It was thought that the intel breakage was limited to linux.
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/31

As that isn't the case we want the release which reverted skylake
updates:

== 20200616 Release ==
-- Updates upon 20200609 release --
Processor Identifier Version   Products
ModelStepping F-MO-S/PI  Old->New
 new platforms 

 updated platforms 
SKL-U/Y  D0   6-4e-3/c0 00dc->00d6 Core Gen6 Mobile
SKL-U23e K1   6-4e-3/c0 00dc->00d6 Core Gen6 Mobile
SKL-H/S  R0/N06-5e-3/36 00dc->00d6 Core Gen6; Xeon E3 v5

 removed platforms 

Index: Makefile
===
RCS file: /cvs/ports/sysutils/firmware/intel/Makefile,v
retrieving revision 1.23
diff -u -p -r1.23 Makefile
--- Makefile10 Jun 2020 01:36:28 -  1.23
+++ Makefile3 Jul 2020 11:22:37 -
@@ -3,7 +3,7 @@
 COMMENT=   microcode update binaries for Intel CPUs
 FW_DRIVER= intel
 
-FW_VER=20200609
+FW_VER=20200616
 EPOCH= 0
 GH_ACCOUNT=intel
 GH_PROJECT=Intel-Linux-Processor-Microcode-Data-Files
Index: distinfo
===
RCS file: /cvs/ports/sysutils/firmware/intel/distinfo,v
retrieving revision 1.16
diff -u -p -r1.16 distinfo
--- distinfo10 Jun 2020 01:36:28 -  1.16
+++ distinfo3 Jul 2020 11:22:43 -
@@ -1,2 +1,2 @@
-SHA256 (firmware/intel-20200609.tar.gz) = 
bFKVJlq9A6fNyBW4W/9PmDh/gTgmqIk1kEvCu8eD1eQ=
-SIZE (firmware/intel-20200609.tar.gz) = 3043809
+SHA256 (firmware/intel-20200616.tar.gz) = 
60+TlCGsbN6jxYbJ2YTsUYMg8AwH6ys9Z1QwnIPJM3E=
+SIZE (firmware/intel-20200616.tar.gz) = 3036726



Re: RaspberryPi 3 Stopped at data_abort+0x68:

2020-07-03 Thread Jonathan Gray
On Sat, Jul 04, 2020 at 12:19:59AM +0200, Mark Kettenis wrote:
> 
> Anyway, I think the problem is that OF_finddevice() returns -1 if the
> node can't be found.  Does the following diff help?

mp kernel boots on rpi3 with this
ok jsg@ if you also fix mainbus_attach_framebuffer()

arm64 efi_attach() also has a bad KASSERT()

node = OF_finddevice("/chosen");
KASSERT(node);

> 
> 
> Index: arch/arm64/dev/mainbus.c
> ===
> RCS file: /cvs/src/sys/arch/arm64/dev/mainbus.c,v
> retrieving revision 1.17
> diff -u -p -r1.17 mainbus.c
> --- arch/arm64/dev/mainbus.c  17 Jun 2020 08:00:22 -  1.17
> +++ arch/arm64/dev/mainbus.c  3 Jul 2020 22:16:43 -
> @@ -316,7 +316,7 @@ mainbus_attach_cpus(struct device *self,
>   int acells, scells;
>   char buf[32];
>  
> - if (node == 0)
> + if (node == -1)
>   return;
>  
>   acells = sc->sc_acells;
> @@ -369,7 +369,7 @@ mainbus_attach_psci(struct device *self)
>   struct mainbus_softc *sc = (struct mainbus_softc *)self;
>   int node = OF_finddevice("/psci");
>  
> - if (node == 0)
> + if (node == -1)
>   return;
>  
>   sc->sc_early = 1;
> @@ -384,7 +384,8 @@ mainbus_attach_efi(struct device *self)
>   struct fdt_attach_args fa;
>   int node = OF_finddevice("/chosen");
>  
> - if (node == 0 || OF_getproplen(node, "openbsd,uefi-system-table") <= 0)
> + if (node == -1 ||
> + OF_getproplen(node, "openbsd,uefi-system-table") <= 0)
>   return;
>  
>   memset(&fa, 0, sizeof(fa));
> 
> 



Re: amdgpu - extreme instability with Radeon RX 550

2020-07-04 Thread Jonathan Gray
On Fri, Jul 03, 2020 at 11:14:02PM -0400, Joe Gidi wrote:
> Hello,
> 
> I'm encountering severe instability problems while trying to use a Radeon
> RX 550 2GB GPU with the radeongpu driver. I'm using a fresh install of the
> latest amd64 snapshot and the just-updated amdgpu-firmware-20200619. I'm
> driving a single 1920x1080 display over HDMI.
> 
> Sometimes I'm able to log in at the xenodm prompt and work for 15-30
> minutes, only for X to suddenly freeze in an apparently unrecoverable
> state. I can ssh in from another machine and reboot gracefully, but
> ctrl-alt-backspace does nothing and I'm not able to VT switch.
> 
> Other times, X will hang as soon as I start typing my username in the
> xenodm login screen. Sometimes it will restart itself, maybe a few times,
> only to hang up permanently a few seconds later.
> 
> So far I have not been able to capture any error messages that would be
> helpful in debugging. I'm including a full dmesg and Xorg.0.log at the end
> of this message.

this is more likely than not related to gpu hangs/dmafence waits etc
seen on other hardware

> 
> For lack of better ideas, I experimented with setting
> machdep.allowaperture to 1 and 2, though I don't believe that should be
> necessary these days. It made no apparent difference.

with a drm driver doing kms not needed

> 
> Also, maybe related, maybe not, I'm surprised to see that WebGL does not
> work in Firefox. After setting layers.acceleration.force-enable to 'true'
> and restarting Firefox, I see the following errors:
> 
> libGL error: MESA-LOADER: failed to open radeonsi (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: radeonsi
> libGL error: MESA-LOADER: failed to open swrast (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: swrast
> libGL error: MESA-LOADER: failed to open radeonsi (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: radeonsi
> libGL error: MESA-LOADER: failed to open swrast (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: swrast
> Crash Annotation GraphicsCriticalError: |[G0][GFX1-]: [OPENGL] Failed to
> init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT
> (t=0.360464) [GFX1-]: [OPENGL] Failed to init compositor with reason:
> FEATURE_FAILURE_OPENGL_CREATE_CONTEXT
> libGL error: MESA-LOADER: failed to open radeonsi (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: radeonsi
> libGL error: MESA-LOADER: failed to open swrast (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: swrast
> Crash Annotation GraphicsCriticalError: |[G0][GFX1-]: [OPENGL] Failed to
> init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT
> (t=0.360464) |[G1][GFX1-]: [OPENGL] Failed to init compositor with reason:
> FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1.09095) [GFX1-]: [OPENGL] Failed
> to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT
> libGL error: MESA-LOADER: failed to open radeonsi (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: radeonsi
> libGL error: MESA-LOADER: failed to open swrast (search paths
> /usr/X11R6/lib/modules/dri)
> libGL error: failed to load driver: swrast
> 
> I don't know why this is failing, because the modules do exist in that
> directory.

firefox seems to be doing a dlopen after it has unveil'd and can't
open libLLVM.  unveil removes visibility of parts of the filesystem,
but it has to be done in the right place.

 37357 firefox  NAMI  "/usr/lib/libLLVM.so.2.0"
 37357 firefox  RET   open -1 errno 2 No such file or directory

this can be reproduced on other hardware by forcing swrast which also
uses libLLVM

LIBGL_ALWAYS_SOFTWARE=1 firefox

this a firefox specific problem which does not occur with chromium

> 
> Is there anything else I can try to resolve this instability? Any other
> info I can provide?

if you look in dmesg when it occurs you may see a message along the
lines of

[drm] *ERROR* ring gfx timeout, signaled seq=1417, emitted seq=1418

running top and looking at the wait channels may show 'dmafence'
or one of the other drm wait channels.

> 
> Dmesg and Xorg.0.log follow:
> 
> OpenBSD 6.7-current (GENERIC.MP) #323: Fri Jul  3 08:56:44 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34271576064 (32683MB)
> avail mem = 33217884160 (31679MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe6cc0 (29 entries)
> bios0: vendor American Megatrends Inc. version "P3.90" date 12/09/2019
> bios0: ASRock B450M Pro4
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT MCFG AAFT HPET SSDT
> UEFI BGRT SSDT CRAT CDIT SSDT SSDT WSMT SSDT
> acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP3(S4) GPP4(S4) GPP5(S4)
> GPP6(S4) GPP7(S

Re: amdgpu - extreme instability with Radeon RX 550

2020-07-04 Thread Jonathan Gray
On Sat, Jul 04, 2020 at 12:22:41PM +0200, Landry Breuil wrote:
> On Sat, Jul 04, 2020 at 05:58:07PM +1000, Jonathan Gray wrote:
> > On Fri, Jul 03, 2020 at 11:14:02PM -0400, Joe Gidi wrote:
> > > Hello,
> > > 
> > 
> > firefox seems to be doing a dlopen after it has unveil'd and can't
> > open libLLVM.  unveil removes visibility of parts of the filesystem,
> > but it has to be done in the right place.
> 
> i dont think firefox itself is doing this dlopen, rather MESA ?
> https://searchfox.org/mozilla-central/search?q=libllvm

firefox would be dlopen'ing libGL.so
the Mesa loader opens the dri driver which I suspect happens when
the GL context is created.

/usr/X11R6/lib/modules/dri/radeonsi_dri.so for amdgpu

in the case of radeonsi and swrast these are linked against
libLLVM (and libelf for amdgpu)

dlopen: loading: libGL.so
dlopen: libGL.so: done (success).
dlopen: loading: /usr/X11R6/lib/modules/dri/radeonsi_dri.so
 flags /usr/X11R6/lib/modules/dri/radeonsi_dri.so = 0x0
head /usr/X11R6/lib/modules/dri/radeonsi_dri.so
obj /usr/X11R6/lib/modules/dri/radeonsi_dri.so has 
/usr/X11R6/lib/modules/dri/radeonsi_dri.so as head
linking /usr/X11R6/lib/modules/dri/radeonsi_dri.so as dlopen()ed
head [/usr/X11R6/lib/modules/dri/radeonsi_dri.so]
examining: '/usr/X11R6/lib/modules/dri/radeonsi_dri.so'
loading: libelf.so.3.0 required by /usr/X11R6/lib/modules/dri/radeonsi_dri.so
dlopen: failed to open libelf.so.3.0
unload_shlib called on /usr/X11R6/lib/modules/dri/radeonsi_dri.so
unload_shlib unloading on /usr/X11R6/lib/modules/dri/radeonsi_dri.so
dlopen: /usr/X11R6/lib/modules/dri/radeonsi_dri.so: done (failed).
libGL error: MESA-LOADER: failed to open radeonsi (search paths 
/usr/X11R6/lib/modules/dri)

It works if I disable unveil in the kernel and permit DRM_IOCTL_GET_CLIENT
in pledge_ioctl_drm(), which was discussed at some point but apparently
didn't go in.  This should only be required if trying to create a
context after pledge, occurs in libdrm amdgpu init.

> 
> >  37357 firefox  NAMI  "/usr/lib/libLLVM.so.2.0"
> >  37357 firefox  RET   open -1 errno 2 No such file or directory
> > 
> > this can be reproduced on other hardware by forcing swrast which also
> > uses libLLVM
> > 
> > LIBGL_ALWAYS_SOFTWARE=1 firefox
> > 
> > this a firefox specific problem which does not occur with chromium
> 
> Since unveil doesnt allow wildcards, i guess adding '/usr/lib r' to
> /etc/firefox/unveil.gpu is the way to go. I dont have a machine with
> amdgpu, and it doesnt seem to help LIBGL_ALWAYS_SOFTWARE=1 firefox
> httpss://get.webgl.org here but maybe that's unrelated.
> 
> Landry
> 
> 



Re: amdgpu - extreme instability with Radeon RX 550

2020-07-04 Thread Jonathan Gray
On Sat, Jul 04, 2020 at 11:19:49AM -0400, Joe Gidi wrote:
> >> Is there anything else I can try to resolve this instability? Any other
> >> info I can provide?
> >
> > if you look in dmesg when it occurs you may see a message along the
> > lines of
> >
> > [drm] *ERROR* ring gfx timeout, signaled seq=1417, emitted seq=1418
> >
> > running top and looking at the wait channels may show 'dmafence'
> > or one of the other drm wait channels.
> 
> Hi Jonathan,
> 
> Thank you for your work on this driver! From reading the commit messages,
> it looks like it has been a monumental effort.
> 
> After playing around for quite a while, I was finally able to capture some
> output. The dmesg buffer overflowed with thousands of lines of the same
> error, finally ending with:
> 
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* ring gfx timeout, signaled seq=22, emitted seq=22
> [drm] *ERROR* Process information: process  pid 0 thread Xorg pid 17620
> [drm] *ERROR* ring comp_1.0.1 test failed (-60)
> 
> Please let me know if I can provide any other information. Would a
> donation of a video card be helpful to you?

Thanks for the offer but this kind of problem can be reproduced on other
amdgpu parts.  With both the linux 4.19 based drm and the recent port
of linux 5.7 drm 'piglit run quick ' will cause a ring timeout/hang.
piglit being a series of OpenGL/Mesa regression tests.  piglit finishes
with inteldrm.



Re: RaspberryPi 3 Stopped at data_abort+0x68:

2020-07-04 Thread Jonathan Gray
On Sat, Jul 04, 2020 at 05:24:24PM +0200, Paul de Weerd wrote:
> Hi Mark,
> 
> Thanks for the reply and the diff!
> 
> On Sat, Jul 04, 2020 at 12:19:59AM +0200, Mark Kettenis wrote:
> | That probably means Paul is using somewhat broken firmware.
> 
> pkg_info says I have:
> 
> raspberrypi-firmware-1.20200212 Raspberry Pi firmware
> 
> Is there something else I should have or do?
> 
> | Anyway, I think the problem is that OF_finddevice() returns -1 if the
> | node can't be found.  Does the following diff help?
> 
> It does allow the system to boot to multiuser (so the data_abort+0x68
> is gone).  The 'failed to spin up' lines now show up a lot earlier in
> dmesg, after cpu0 attaches.  See below for full dmesg.
> 
> For completeness, the machine still doesn't shutdown properly (need to
> powercycle it to bring it back).
> 
> Cheers,
> 
> Paul
> 
> OpenBSD 6.7-current (GENERIC.MP) #2: Sat Jul  4 17:12:49 CEST 2020
> we...@pie.alm.weirdnet.nl:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 953442304 (909MB)
> avail mem = 891834368 (850MB)
> random: good seed from bootblocks
> mainbus0 at root: Raspberry Pi 3 Model B Rev 1.2
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
> cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> cpu0: 512KB 64b/line 16-way L2 cache
> cpu1 at mainbus0 mpidr 1: failed to spin up
> cpu2 at mainbus0 mpidr 2: failed to spin up
> cpu3 at mainbus0 mpidr 3: failed to spin up
> efi0 at mainbus0: UEFI 2.0.5
> efi0: Das U-boot rev 0x0

This must be quite an old version of U-Boot, version has been encoded in
revision for a while, U-Boot 2020.04:

efi0 at mainbus0: UEFI 2.8
efi0: Das U-Boot rev 0x20200400

> apm0 at mainbus0
> simplefb0 at mainbus0: 656x416, 32bpp
> wsdisplay0 at simplefb0 mux 1
> wsdisplay0: screen 0-5 added (std, vt100 emulation)
> "system" at mainbus0 not configured
> "axi" at mainbus0 not configured
> simplebus0 at mainbus0: "soc"
> bcmdmac0 at simplebus0: DMA2 DMA4 DMA5 DMA8 DMA9 DMA10
> bcmintc0 at simplebus0
> bcmmbox0 at simplebus0
> bcmgpio0 at simplebus0
> syscon0 at simplebus0: "syscon"
> bcmdog0 at simplebus0
> bcmrng0 at simplebus0
> pluart0 at simplebus0
> bcmsdhost0 at simplebus0: 250 MHz base clock
> sdmmc0 at bcmsdhost0: 4-bit, sd high-speed, mmc high-speed, dma
> com0 at simplebus0: ns16550, no working fifo
> com0: console
> "mmc" at simplebus0 not configured
> dwctwo0 at simplebus0
> "firmware" at simplebus0 not configured
> "power" at simplebus0 not configured
> "leds" at simplebus0 not configured
> "fb" at simplebus0 not configured
> "vchiq" at simplebus0 not configured
> "thermal" at simplebus0 not configured
> "local_intc" at simplebus0 not configured
> "arm-pmu" at simplebus0 not configured
> "gpiomem" at simplebus0 not configured
> agtimer0 at simplebus0: tick rate 19200 KHz
> "virtgpio" at simplebus0 not configured
> simplebus1 at mainbus0: "clocks"
> "clock" at simplebus1 not configured
> "clock" at simplebus1 not configured
> "clock" at simplebus1 not configured
> "clock" at simplebus1 not configured
> "clock" at simplebus1 not configured
> "clock" at simplebus1 not configured
> "clock" at simplebus1 not configured
> usb0 at dwctwo0: USB revision 2.0
> sdmmc0: can't enable card
> uhub0 at usb0 configuration 1 interface 0 "Broadcom DWC2 root hub" rev 
> 2.00/1.00 addr 1
> uhub1 at uhub0 port 1 configuration 1 interface 0 "Standard Microsystems 
> product 0x9514" rev 2.00/2.00 addr 2
> smsc0 at uhub1 port 1 configuration 1 interface 0 "Standard Microsystems 
> SMSC9512/14" rev 2.00/2.00 addr 3
> smsc0: address b8:27:eb:93:42:d5
> ukphy0 at smsc0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 
> 0x0001f0, model 0x000c
> urtwn0 at uhub1 port 2 configuration 1 interface 0 "Realtek 802.11n NIC" rev 
> 2.00/0.00 addr 4
> urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R, address ec:08:6b:07:c2:1c
> umass0 at uhub1 port 4 configuration 1 interface 0 "SanDisk Ultra Fit" rev 
> 2.10/1.00 addr 5
> umass0: using SCSI over Bulk-Only
> scsibus0 at umass0: 2 targets, initiator 0
> sd0 at scsibus0 targ 1 lun 0:  removable 
> serial.07815583320220107303
> sd0: 29327MB, 512 bytes/sector, 60062500 sectors
> vscsi0 at root
> scsibus1 at vscsi0: 256 targets
> softraid0 at root
> scsibus2 at softraid0: 256 targets
> bootfile: sd0a:/bsd
> boot device: sd0
> root on sd0a (59de83c8e783b811.a) swap on sd0b dump on sd0b
> WARNING: CHECK AND RESET THE DATE!
> gpio0 at bcmgpio0: 54 pins
> 
> | 
> | Index: arch/arm64/dev/mainbus.c
> | ===
> | RCS file: /cvs/src/sys/arch/arm64/dev/mainbus.c,v
> | retrieving revision 1.17
> | diff -u -p -r1.17 mainbus.c
> | --- arch/arm64/dev/mainbus.c17 Jun 2020 08:00:22 -  1.17
> | +++ arch/arm64/dev/mainbus.c3 Jul 2020 22:16:43 -
> | @@ -316,7 +316,7 @@ mainbus_attach_cpus(struct device *self,
> | int acells, scells;
> | char buf[32];
> |  
> | -   if (node == 0)
> | +   if (node == -1)
> | return;
> 

Re: RaspberryPi 3 Stopped at data_abort+0x68:

2020-07-05 Thread Jonathan Gray
On Sun, Jul 05, 2020 at 06:14:33PM +0200, Paul de Weerd wrote:
> Hi Mark,
> 
> On Sat, Jul 04, 2020 at 05:39:26PM +0200, Mark Kettenis wrote:
> | > Is there something else I should have or do?
> | 
> | Yes.  You need to actually install the firmware onto the uSD card or
> | USB device that your Pi boots from.  At this point there is no easy
> | way to do that though.
> 
> Well, these bits are on the uSD card that's in the slot on the
> front(?) of the machine.  I simply wrote the latest miniroot67.img to
> another uSD card and swapped it with the previous one.  Now I need to
> specify that I want to boot from the USB stick on the OpenBSD boot
> prompt (as previously I didn't have an sd0 for the uSD), but it boots
> fine and I now have four CPU cores (and it reboots without issue, as
> you mentioned).

You likely had the U-Boot boot order changed on your previous sd card.
To change it again break into U-Boot when booting and do:

Hit any key to stop autoboot:  0
U-Boot> setenv boot_targets usb0 mmc0 pxe dhcp
U-Boot> saveenv
U-Boot> boot

It is also possible to modify config.txt on the sd card to enable
USB boot but that can't be undone.

https://www.raspberrypi.org/documentation/hardware/raspberrypi/bootmodes/msd.md

> 
> Thanks!
> 
> Paul
> 
> OpenBSD 6.7-current (GENERIC.MP) #700: Sat Jul  4 13:14:53 MDT 2020
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 957657088 (913MB)
> avail mem = 895942656 (854MB)
> random: good seed from bootblocks
> mainbus0 at root: Raspberry Pi 3 Model B Rev 1.2
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
> cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> cpu0: 512KB 64b/line 16-way L2 cache
> cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4
> cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> cpu1: 512KB 64b/line 16-way L2 cache
> cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4
> cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> cpu2: 512KB 64b/line 16-way L2 cache
> cpu3 at mainbus0 mpidr 3: ARM Cortex-A53 r0p4
> cpu3: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> cpu3: 512KB 64b/line 16-way L2 cache
> efi0 at mainbus0: UEFI 2.8
> efi0: Das U-Boot rev 0x20200400
> apm0 at mainbus0
> simplefb0 at mainbus0: 656x416, 32bpp
> wsdisplay0 at simplefb0 mux 1
> wsdisplay0: screen 0-5 added (std, vt100 emulation)
> "system" at mainbus0 not configured
> "axi" at mainbus0 not configured
> simplebus0 at mainbus0: "soc"
> bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA8 DMA9 DMA10
> bcmclock0 at simplebus0
> bcmmbox0 at simplebus0
> bcmgpio0 at simplebus0
> bcmaux0 at simplebus0
> bcmintc0 at simplebus0
> bcmdog0 at simplebus0
> bcmrng0 at simplebus0
> pluart0 at simplebus0: console
> bcmsdhost0 at simplebus0: 250 MHz base clock
> sdmmc0 at bcmsdhost0: 4-bit, sd high-speed, mmc high-speed, dma
> "dsi" at simplebus0 not configured
> dwctwo0 at simplebus0
> bcmtemp0 at simplebus0
> "local_intc" at simplebus0 not configured
> sdhc0 at simplebus0
> sdhc0: SDHC 3.0, 200 MHz base clock
> sdmmc1 at sdhc0: 4-bit, sd high-speed, mmc high-speed
> simplebus1 at simplebus0: "firmware"
> "expgpio" at simplebus1 not configured
> "power" at simplebus0 not configured
> "mailbox" at simplebus0 not configured
> "gpiomem" at simplebus0 not configured
> "fb" at simplebus0 not configured
> "vcsm" at simplebus0 not configured
> "virtgpio" at simplebus0 not configured
> simplebus2 at mainbus0: "clocks"
> "clock" at simplebus2 not configured
> "clock" at simplebus2 not configured
> "phy" at mainbus0 not configured
> "arm-pmu" at mainbus0 not configured
> agtimer0 at mainbus0: tick rate 19200 KHz
> "leds" at mainbus0 not configured
> "fixedregulator_3v3" at mainbus0 not configured
> "fixedregulator_5v0" at mainbus0 not configured
> usb0 at dwctwo0: USB revision 2.0
> scsibus0 at sdmmc0: 2 targets, initiator 0
> sd0 at scsibus0 targ 1 lun 0:  removable
> sd0: 3776MB, 512 bytes/sector, 7733248 sectors
> uhub0 at usb0 configuration 1 interface 0 "Broadcom DWC2 root hub" rev 
> 2.00/1.00 addr 1
> uhub1 at uhub0 port 1 configuration 1 interface 0 "Standard Microsystems 
> product 0x9514" rev 2.00/2.00 addr 2
> bwfm0 at sdmmc1 function 1
> manufacturer 0x02d0, product 0xa9a6 at sdmmc1 function 2 not configured
> smsc0 at uhub1 port 1 configuration 1 interface 0 "Standard Microsystems 
> SMSC9512/14" rev 2.00/2.00 addr 3
> smsc0: address b8:27:eb:93:42:d5
> ukphy0 at smsc0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 
> 0x0001f0, model 0x000c
> urtwn0 at uhub1 port 2 configuration 1 interface 0 "Realtek 802.11n NIC" rev 
> 2.00/0.00 addr 4
> urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R, address ec:08:6b:07:c2:1c
> umass0 at uhub1 port 4 configuration 1 interface 0 "SanDisk Ultra Fit" rev 
> 2.10/1.00 addr 5
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets, initiator 0
> sd1 at scsibus1 targ 1 lun 0:  removable 
> serial.07815583320220107303
> sd1: 29327MB,

Re: Kernel Panic in current snapshot with braswell and inteldrm

2020-07-06 Thread Jonathan Gray
On Mon, Jul 06, 2020 at 11:17:34AM +0200, Martin Ziemer wrote:
> On Wed, Jul 01, 2020 at 01:47:57PM +1000, Jonathan Gray wrote:
> > On Tue, Jun 30, 2020 at 07:10:10PM +0200, Martin Ziemer wrote:
> > > With this patch (and driver "intel" in /etc/X11/xorg.conf) my system 
> > > works. 
> > > Tested suspend/resume and normal browsing.
> > 
> > Thanks for testing, I've committed this with a comment added.
> Used the kernel i tested tuesday last week until today morning
> without getting problems.
> 
> Then i switched to a daily snapshot and got a kernel panic.
> 
> The kernel is:
> OpenBSD 6.7-current (GENERIC) #317: Sun Jul  5 20:02:17 MDT 2020
> 
> The crash message is: 
> kernel: page fault trap, code=0
> Stopped at drm_atomic_set_fb_for_plane+0x56: movl 0x58(%rax),%edx
> 
> I uploaded photos from ddb:
> https://photos.app.goo.gl/eqD2wg3hZcsCZGBb7

I can't reproduce this on a braswell nuc.
Does the following diff help?

Index: sys/dev/pci/drm/drm_fb_helper.c
===
RCS file: /cvs/src/sys/dev/pci/drm/drm_fb_helper.c,v
retrieving revision 1.26
diff -u -p -r1.26 drm_fb_helper.c
--- sys/dev/pci/drm/drm_fb_helper.c 2 Jul 2020 03:31:23 -   1.26
+++ sys/dev/pci/drm/drm_fb_helper.c 6 Jul 2020 13:39:50 -
@@ -1958,7 +1958,7 @@ EXPORT_SYMBOL(drm_fb_helper_initial_conf
  */
 int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper)
 {
-   struct fb_info *fbi = fb_helper->fbdev;
+   struct fb_info *fbi;
int err = 0;
 
if (!drm_fbdev_emulation || !fb_helper)
@@ -1985,6 +1985,7 @@ int drm_fb_helper_hotplug_event(struct d
drm_setup_crtcs_fb(fb_helper);
mutex_unlock(&fb_helper->lock);
 
+   fbi = fb_helper->fbdev;
if (fbi->fbops && fbi->fbops->fb_set_par)
fbi->fbops->fb_set_par(fbi);
else



Re: Kernel Panic in current snapshot with braswell and inteldrm

2020-07-06 Thread Jonathan Gray
On Mon, Jul 06, 2020 at 04:29:34PM +0200, Martin Ziemer wrote:
> On Mon, Jul 06, 2020 at 11:51:10PM +1000, Jonathan Gray wrote:
> > On Mon, Jul 06, 2020 at 11:17:34AM +0200, Martin Ziemer wrote:
> > > On Wed, Jul 01, 2020 at 01:47:57PM +1000, Jonathan Gray wrote:
> > > > On Tue, Jun 30, 2020 at 07:10:10PM +0200, Martin Ziemer wrote:
> > > > > With this patch (and driver "intel" in /etc/X11/xorg.conf) my system 
> > > > > works. 
> > > > > Tested suspend/resume and normal browsing.
> > > > 
> > > > Thanks for testing, I've committed this with a comment added.
> > > Used the kernel i tested tuesday last week until today morning
> > > without getting problems.
> > > 
> > > Then i switched to a daily snapshot and got a kernel panic.
> > > 
> > > The kernel is:
> > > OpenBSD 6.7-current (GENERIC) #317: Sun Jul  5 20:02:17 MDT 2020
> > > 
> > > The crash message is: 
> > > kernel: page fault trap, code=0
> > > Stopped at drm_atomic_set_fb_for_plane+0x56: movl 0x58(%rax),%edx
> > > 
> > > I uploaded photos from ddb:
> > > https://photos.app.goo.gl/eqD2wg3hZcsCZGBb7
> > 
> > I can't reproduce this on a braswell nuc.
> > Does the following diff help?
> > 
> > Index: sys/dev/pci/drm/drm_fb_helper.c
> > ===
> > RCS file: /cvs/src/sys/dev/pci/drm/drm_fb_helper.c,v
> > retrieving revision 1.26
> > diff -u -p -r1.26 drm_fb_helper.c
> > --- sys/dev/pci/drm/drm_fb_helper.c 2 Jul 2020 03:31:23 -   1.26
> > +++ sys/dev/pci/drm/drm_fb_helper.c 6 Jul 2020 13:39:50 -
> > @@ -1958,7 +1958,7 @@ EXPORT_SYMBOL(drm_fb_helper_initial_conf
> >   */
> >  int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper)
> >  {
> > -   struct fb_info *fbi = fb_helper->fbdev;
> > +   struct fb_info *fbi;
> > int err = 0;
> >  
> > if (!drm_fbdev_emulation || !fb_helper)
> > @@ -1985,6 +1985,7 @@ int drm_fb_helper_hotplug_event(struct d
> > drm_setup_crtcs_fb(fb_helper);
> > mutex_unlock(&fb_helper->lock);
> >  
> > +   fbi = fb_helper->fbdev;
> > if (fbi->fbops && fbi->fbops->fb_set_par)
> > fbi->fbops->fb_set_par(fbi);
> > else
> Yes, the diff helps. 
> 
> Thank you for the really fast solution!

thanks for the report, committed



Re: amdgpu error causes system to freeze

2020-07-14 Thread Jonathan Gray
On Mon, Jul 13, 2020 at 08:49:49PM -0300, Evandro Rathke wrote:
> Hello!
> I'm having issues on amdgpu on applications that require a little bit more
> 3D processing.
> The application is a port that I'm working on (Ultimaker Cura
> ).
> On Intel GPU the problem doesn't occur.
> The issue is similar to this thread
> .

Yes, this is a known problem.  Can be reproduced with:

piglit run quick -t "spec@glsl-1.30@execution@texelfetch fs sampler2d 
1x281-501x281" out

> 
> On the application side I got the error:
> amdgpu: amdgpu_cs_query_fence_status failed.
> amdgpu: The CS has been cancelled because the context is lost.
> 
> Evandro Rathke
> -
> DMESG output (repeated a lot of times):
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*
> VM_L2_PROTECTION_FAULT_STATUS:0x00200411
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  MORE_FAULTS: 0x1
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  WALKER_ERROR: 0x0
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  PERMISSION_FAULTS:
> 0x1
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  MAPPING_ERROR: 0x0
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  RW: 0x0
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR* [gfxhub0] retry page fault
> (src_id:0 ring:0 vmid:2 pasid:32981, for process  pid 0 thread python3.8
> pid 21348)
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*   in page starting at
> address 0x800104c0 from client 27
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*
> VM_L2_PROTECTION_FAULT_STATUS:0x00200411
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  MORE_FAULTS: 0x1
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  WALKER_ERROR: 0x0
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  PERMISSION_FAULTS:
> 0x1
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  MAPPING_ERROR: 0x0
> drm:pid32969:gmc_v9_0_process_interrupt *ERROR*  RW: 0x0
> [drm] psp command (0x5) failed and response status is (0x0007)
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> [drm] *ERROR* Failed to initialize parser -88!
> 
> -
> DMESG boot output:
> OpenBSD 6.7-current (GENERIC.MP) #339: Fri Jul 10 13:50:28 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 14941442048 (14249MB)
> avail mem = 14473564160 (13803MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe6cc0 (24 entries)
> bios0: vendor American Megatrends Inc. version "P6.00" date 08/03/2019
> bios0: ASRock A320M-HD
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT MCFG AAFT HPET UEFI
> BGRT SSDT CRAT CDIT SSDT SSDT WSMT
> acpi0: wakeup devices GPP0(S4) GPP2(S4) GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4)
> GP17(S4) XHC0(S4) XHC1(S4) GP18(S4) GPP1(S4) PTXH(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 3 3200G with Radeon Vega Graphics, 3600.53 MHz, 17-18-01
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully
> associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully
> associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 25MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: AMD Ryzen 3 3200G with Radeon Vega Graphics, 3599.96 MHz, 17-18-01
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR

Re: Panic on boot with Hyper-V since Jun 17 snapshot

2020-07-29 Thread Jonathan Gray
On Wed, Jul 29, 2020 at 02:05:18PM +0200, Andre Stoebe wrote:
> >Synopsis:Panic on boot with Hyper-V since Jun 17 snapshot
> >Category:kernel
> >Environment:
>   System  : OpenBSD 6.7
>   Details : OpenBSD 6.7-current (GENERIC.MP) #375: Sun Jul 26 
> 11:26:37 MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Booting -current on Hyper-V with hvn(4) results in a panic after
>   "starting network".
> 
>   This seems to only affect the multi-user boot; hvn(4) still
>   works flawlessly on the ramdisk.
> 
>   Last working snapshot:
>   OpenBSD 6.7-current (GENERIC.MP) #273: Mon Jun 15 19:13:12 MDT 2020
> 
>   First non-working snapshot:
>   OpenBSD 6.7-current (GENERIC.MP) #278: Wed Jun 17 12:18:35 MDT 2020
> 
>   Below is the serial output including ddb trace and ps.
> >How-To-Repeat:
>   Boot OpenBSD-current on Hyper-V with a hvn(4) network adapter.
> >Fix:
>   Unknown. As a workaround, disabling hvn(4) via "boot -c" does
>   not result in a panic.

more fallout of intr_barrier() now using the argument

Index: sys/dev/pv/if_hvn.c
===
RCS file: /cvs/src/sys/dev/pv/if_hvn.c,v
retrieving revision 1.41
diff -u -p -r1.41 if_hvn.c
--- sys/dev/pv/if_hvn.c 10 Jul 2020 13:26:40 -  1.41
+++ sys/dev/pv/if_hvn.c 29 Jul 2020 12:52:09 -
@@ -451,7 +451,7 @@ hvn_stop(struct hvn_softc *sc)
}
 
ifq_barrier(&ifp->if_snd);
-   intr_barrier(sc->sc_chan);
+   sched_barrier(NULL);
 
ifq_clr_oactive(&ifp->if_snd);
 }

> 
> NOTE: random seed is being reused.
> booting hd0a:/bsd: 14390600+3171336+344096+0+872448 
> [968918+128+1133640+857287]=0x14bde68
> entry point at 0x81001000
> [ using 2961000 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 6.7-current (GENERIC.MP) #278: Wed Jun 17 12:18:35 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4278124544 (4079MB)
> avail mem = 4133519360 (3942MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf93d0 (338 entries)
> bios0: vendor American Megatrends Inc. version "090008" date 12/07/2018
> bios0: Microsoft Corporation Virtual Machine
> acpi0 at bios0: ACPI 2.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP WAET SLIC OEM0 SRAT APIC OEMB
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihve0 at acpi0
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins, remapped
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz, 2770.10 MHz, 06-5e-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 166MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz, 2910.06 MHz, 06-5e-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz, 2910.09 MHz, 06-5e-03
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz, 2910.11 MHz, 06-5e-03
> cpu3:

Re: On boot, screen remains black after radeondrm driver initializes

2020-08-02 Thread Jonathan Gray
On Sun, Aug 02, 2020 at 01:05:37PM +0200, mgloc...@openbsd.org wrote:
> >Synopsis:On boot, screen remains black after radeondrm driver initializes
> >Category:kernel/graphics/firmware
> >Environment:
>   System  : OpenBSD 6.7
>   Details : OpenBSD 6.7-current (GENERIC.MP) #0: Sun Aug  2 09:57:37 
> CEST 2020
>
> ha...@imac.nazgul.ch:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> 
> >Description:
> Installing an OpenBSD snapshot the first time on this iMac Intel Core i3
> machine.  After the radaeon firmware package has been installed and you
> reboot the system, the screen remains black after the radeondrm driver
> initializes.  The system as such comes up fine - SSH possible.
> 
> >How-To-Repeat:
> Install an OpenBSD.amd64 or OpenBSD.i386 snapshot and reboot the system
> again after the radeon firmware package has been installed.
> 
> >Fix:
> No fix found yet.
> Workaround;
> Disable the radaeondrm driver so system boots without accelerated graphics 
> mode.

Try this.  RV730 is DCE3.2

Index: sys/dev/pci/drm/radeon/atombios_encoders.c
===
RCS file: /cvs/src/sys/dev/pci/drm/radeon/atombios_encoders.c,v
retrieving revision 1.15
diff -u -p -U6 -r1.15 atombios_encoders.c
--- sys/dev/pci/drm/radeon/atombios_encoders.c  8 Jun 2020 04:48:15 -   
1.15
+++ sys/dev/pci/drm/radeon/atombios_encoders.c  2 Aug 2020 16:18:23 -
@@ -2191,13 +2191,14 @@ int radeon_atom_pick_dig_encoder(struct 
/*
 * On DCE32 any encoder can drive any block so usually just use crtc id,
 * but Apple thinks different at least on iMac10,1, so there use linkb,
 * otherwise the internal eDP panel will stay dark.
 */
if (ASIC_IS_DCE32(rdev)) {
-   if (dmi_match(DMI_PRODUCT_NAME, "iMac10,1"))
+   if (dmi_match(DMI_PRODUCT_NAME, "iMac10,1") ||
+   dmi_match(DMI_PRODUCT_NAME, "iMac11,2"))
enc_idx = (dig->linkb) ? 1 : 0;
else
enc_idx = radeon_crtc->crtc_id;
 
goto assigned;
}



Re: Acer Aspire A315 laptop firmware fails to load with 6.7

2020-08-18 Thread Jonathan Gray
On Sun, Aug 16, 2020 at 02:27:46AM +, Jon Fineman wrote:
> I was not able to do a proper sendbug report because the PC hangs.
> 
> I have an Acer Aspire A315 laptop. 6.6 works fine. When I upgrade to 6.7 
> Release or Snapshot I get the below firmware load error. Below that output is 
> the dmesg from a 6.6 boot so you can see the hardware.

The first boot after a new install will not have the firmware.
fw_update(1) will run and install it into /etc/firmware/amdgpu/

Does amdgpu still not attach after you've verified the files are present
and rebooted?

> 
> It finishes booting and I get a login prompt. About 30 seconds after it boots 
> it powers off. For this test to get the below /var/log/messages file I 
> installed 6.7 to a flash drive and booted from that.
> 
> Any thoughts on what might be causing this or how to get more info on it?

That appears to be an unrelated problem.

> 
> Thanks.
> 
> Jon
> 
> 
> /var/log/messages:
> 
> Aug 14 09:58:42 laptop /bsd: ugen0 at uhub2 port 3 "Lite-On Technology 
> product 0x3015" rev 2.01/0.01 addr 4
> Aug 14 09:58:42 laptop /bsd: uvideo0 at uhub2 port 4 configuration 1 
> interface 0 "KSVGA0500171401441LM06 VGA WebCam" rev 2.00/0.06 addr 5
> Aug 14 09:58:42 laptop /bsd: video0 at uvideo0
> Aug 14 09:58:42 laptop /bsd: vscsi0 at root
> Aug 14 09:58:42 laptop /bsd: scsibus3 at vscsi0: 256 targets
> Aug 14 09:58:42 laptop /bsd: softraid0 at root
> Aug 14 09:58:42 laptop /bsd: scsibus4 at softraid0: 256 targets
> Aug 14 09:58:42 laptop /bsd: root on sd1a (c5239493226aa09b.a) swap on sd1b 
> dump on sd1b
> Aug 14 09:58:42 laptop /bsd: initializing kernel modesetting (STONEY 
> 0x1002:0x98E4 0x1025:0x1192 0xDA).
> Aug 14 09:58:42 laptop /bsd: amdgpu_irq_add_domain: stub
> Aug 14 09:58:42 laptop /bsd: drm:pid0:gfx_v8_0_init_microcode *ERROR* gfx8: 
> Failed to load firmware "amdgpu/stoney_pfp.bin"
> Aug 14 09:58:42 laptop /bsd: [drm] *ERROR* Failed to load gfx firmware!
> Aug 14 09:58:42 laptop /bsd: [drm] *ERROR* sw_init of IP block  
> failed -2
> Aug 14 09:58:42 laptop /bsd: drm:pid0:amdgpu_device_init *ERROR* 
> amdgpu_device_ip_init failed
> Aug 14 09:58:42 laptop /bsd: drm:pid0:amdgpu_attachhook *ERROR* Fatal error 
> during GPU init
> Aug 14 09:58:42 laptop /bsd: efifb0 at mainbus0: 1366x768, 32bpp
> Aug 14 09:58:42 laptop /bsd: wsdisplay0 at efifb0 mux 1: console (std, vt100 
> emulation), using wskbd0
> Aug 14 09:58:42 laptop /bsd: wsdisplay0: screen 1-5 added (std, vt100 
> emulation)
> 
> 
> 
> DMESG:
> 
> OpenBSD 6.6 (GENERIC.MP) #3: Mon Jul 20 23:21:24 MDT 2020
> t...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 5978714112 (5701MB)
> avail mem = 5784797184 (5516MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xdf266000 (27 entries)
> bios0: vendor Insyde Corp. version "V1.00" date 03/20/2017
> bios0: Acer Aspire A315-21
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP MSDM BOOT HPET MCFG WDAT UEFI SSDT IVRS SSDT
> SSDT SSDT UEFI SPCR SSDT CRAT TPM2 FPDT ASF! WDRT VFCT SSDT APIC
> SSDT BGRT acpi0: wakeup devices GPP1(S4) GPP4(S4) GFX0(S4) GFX1(S4)
> GFX2(S4) GFX3(S4) GFX4(S4) EHC1(S3) XHC0(S3) acpitimer0 at acpi0:
> 3579545 Hz, 32 bits acpihpet0 at acpi0: 14318180 Hz acpimcfg0 at
> acpi0 acpimcfg0: addr 0xf800, bus 0-63
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 16 (boot processor)
> cpu0: AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G, 2994.79 MHz,
> 15-70-00 cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,CPCTR,DBKP,PERFTSC,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,XSAVEOPT
> cpu0: 96KB 64b/line 3-way I-cache, 32KB 64b/line 8-way D-cache, 1MB
> 64b/line 16-way L2 cache cpu0: ITLB 48 4KB entries fully
> associative, 24 4MB entries fully associative cpu0: DTLB 64 4KB
> entries fully associative, 64 4MB entries fully associative cpu0:
> smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var
> ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0:
> mwait min=64, max=64, IBE cpu1 at mainbus0: apid 17 (application
> processor) cpu1: AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G,
> 2994.40 MHz, 15-70-00 cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,CPCTR,DBKP,PERFTSC,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,XSAVEOPT
> cpu1: 96KB 64b/line 3-way I-cache, 32KB 64b/line 8-way D-cache, 1MB
> 64b/line 16-way L2 cache cpu1

Re: Acer Aspire A315 laptop firmware fails to load with 6.7

2020-08-18 Thread Jonathan Gray
On Tue, Aug 18, 2020 at 12:09:56PM -0400, Jon Fineman wrote:
> On Wed, 19 Aug 2020 00:06:59 +1000
> Jonathan Gray  wrote:
> 
> > On Sun, Aug 16, 2020 at 02:27:46AM +, Jon Fineman wrote:
> > > I was not able to do a proper sendbug report because the PC hangs.
> > > 
> > > I have an Acer Aspire A315 laptop. 6.6 works fine. When I upgrade
> > > to 6.7 Release or Snapshot I get the below firmware load error.
> > > Below that output is the dmesg from a 6.6 boot so you can see the
> > > hardware.  
> > 
> > The first boot after a new install will not have the firmware.
> > fw_update(1) will run and install it into /etc/firmware/amdgpu/
> > 
> > Does amdgpu still not attach after you've verified the files are
> > present and rebooted?
> 
> 
> I mounted the memory stick and verified the amd firmware files are
> present in /etc/firmware/amdgpu. I also downloaded the ones from 6.7
> and copied them to that directory.
> 
> When I booted it wanted to update the firmware for amdgpu and vmm.
> 
> Then it dies while re-ording the kernel.
> 
> I am not sure how I would tell if the amdgpu attached or not.
> 
> /var/log/message from this test:
> 
> Aug 18 11:54:56 laptop /bsd: softraid0 at root
> Aug 18 11:54:56 laptop /bsd: scsibus4 at softraid0: 256 targets
> Aug 18 11:54:56 laptop /bsd: root on sd1a (c5239493226aa09b.a) swap on
> sd1b dump on sd1b Aug 18 11:54:56 laptop /bsd: initializing kernel
> modesetting (STONEY 0x1002:0x98E4 0x1025:0x1192 0xDA). Aug 18 11:54:56
> laptop /bsd: amdgpu_irq_add_domain: stub Aug 18 11:54:56 laptop /bsd:
> amdgpu0: 1366x768, 32bpp Aug 18 11:54:56 laptop /bsd: wsdisplay0 at
> amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 Aug 18
> 11:54:56 laptop /bsd: wsdisplay0: screen 1-5 added (std, vt100
> emulation)

It has attached here.  If there is fatal error such as missing firmware
it will detach and efifb or vga will reclaim the console.



Re: Panic on boot with Hyper-V since Jun 17 snapshot

2020-08-30 Thread Jonathan Gray
On Sun, Aug 30, 2020 at 12:20:50PM +0200, Andre Stoebe wrote:
> On 30.07.2020 12:05, Andre Stoebe wrote:
> > On 29.07.2020 15:01, Jonathan Gray wrote:
> >> more fallout of intr_barrier() now using the argument
> >>
> >> Index: sys/dev/pv/if_hvn.c
> >> ===
> >> RCS file: /cvs/src/sys/dev/pv/if_hvn.c,v
> >> retrieving revision 1.41
> >> diff -u -p -r1.41 if_hvn.c
> >> --- sys/dev/pv/if_hvn.c10 Jul 2020 13:26:40 -  1.41
> >> +++ sys/dev/pv/if_hvn.c29 Jul 2020 12:52:09 -
> >> @@ -451,7 +451,7 @@ hvn_stop(struct hvn_softc *sc)
> >>}
> >>  
> >>ifq_barrier(&ifp->if_snd);
> >> -  intr_barrier(sc->sc_chan);
> >> +  sched_barrier(NULL);
> >>  
> >>ifq_clr_oactive(&ifp->if_snd);
> >>  }
> > 
> > Hi Jonathan,
> > 
> > I can confirm that this fixes the panic.
> > 
> > Thanks,
> > Andre
> 
> Can this or a similar fix get committed? It's broken for some time now.

Sorry, I lost track of this committed.
If people want to rework it further that can be a different diff.



Re: amd64-current: inteldrm: witness(4) warnings during boot

2020-09-16 Thread Jonathan Gray
On Wed, Sep 16, 2020 at 02:03:21PM -0500, Scott Cheloha wrote:
> Yesterday I enabled witness(4) to debug something unrelated and
> started seeing traces during boot.
> 
> My kernel is up-to-date as of yesterday night.  My inteldrm firmware
> is also up-to-date.
> 
> The traces appear during every cold boot.  Here is the most recent set
> from my last reboot, plucked from the dmesg:

linux style SINGLE_DEPTH_NESTING annotations do not map cleanly
to witness.

All the drm witness traces never seem to be helpful but there
is no way to indicate that people should not report them.

> 
> root on sd0a (847a92f1384e1964.a) swap on sd0b dump on sd0b
> witness: lock order reversal:
>  1st 0xfd844b4aad98 i915_active (&ref->mutex)
>  2nd 0x8026ff10 vmlk (&vm->mutex)
> lock order "&vm->mutex"(rwlock) -> "&ref->mutex"(rwlock) first seen at:
> #0  witness_checkorder+0x466
> #1  rw_enter+0x67
> #2  i915_active_acquire+0x47
> #3  i915_vma_pin+0x2c4
> #4  i915_ggtt_pin+0x62
> #5  intel_gt_init+0xb9
> #6  i915_gem_init+0xa3
> #7  i915_driver_probe+0x821
> #8  inteldrm_attachhook+0x45
> #9  config_process_deferred_mountroot+0x6b
> #10 main+0x723
> lock order "&ref->mutex"(rwlock) -> "&vm->mutex"(rwlock) first seen at:
> #0  witness_checkorder+0x466
> #1  rw_enter+0x67
> #2  i915_vma_pin+0x250
> #3  i915_ggtt_pin+0x62
> #4  intel_ring_pin+0x65
> #5  __intel_context_active+0x33
> #6  i915_active_acquire+0x85
> #7  __intel_context_do_pin+0x3b
> #8  intel_engines_init+0x4e1
> #9  intel_gt_init+0x130
> #10 i915_gem_init+0xa3
> #11 i915_driver_probe+0x821
> #12 inteldrm_attachhook+0x45
> #13 config_process_deferred_mountroot+0x6b
> #14 main+0x723
> witness: acquiring duplicate lock of same type: "&ref->mutex"
>  1st i915_active
>  2nd i915_active
> Starting stack trace...
> witness_checkorder(80eb23c8,9,0) at witness_checkorder+0x809
> rw_enter(80eb23b8,11) at rw_enter+0x67
> i915_active_acquire(80eb23b0) at i915_active_acquire+0x47
> __intel_context_active(fd844b4aad80) at __intel_context_active+0x4d
> i915_active_acquire(fd844b4aad80) at i915_active_acquire+0x85
> __intel_context_do_pin(fd844b4aacb0) at __intel_context_do_pin+0x3b
> intel_engines_init(80272060) at intel_engines_init+0x4e1
> intel_gt_init(80272060) at intel_gt_init+0x130
> i915_gem_init(8026d000) at i915_gem_init+0xa3
> i915_driver_probe(8026d000,82029be8) at 
> i915_driver_probe+0x821
> inteldrm_attachhook(8026d000) at inteldrm_attachhook+0x45
> config_process_deferred_mountroot() at config_process_deferred_mountroot+0x6b
> main(0) at main+0x723
> end trace frame: 0x0, count: 244
> End of stack trace.
> inteldrm0: 2560x1440, 32bpp
> wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation), using wskbd0
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> iwm0: hw rev 0x230, fw ver 34.0.1, address 98:3b:8f:ef:6b:ef
> witness: acquiring duplicate lock of same type: "&wf->mutex"
>  1st wakeref.mutex
>  2nd wakeref.mutex
> Starting stack trace...
> witness_checkorder(80272500,9,0) at witness_checkorder+0x809
> rw_enter_write(802724f0) at rw_enter_write+0x43
> __intel_wakeref_get_first(802724e8) at __intel_wakeref_get_first+0x28
> __engine_unpark(8086c180) at __engine_unpark+0x46
> __intel_wakeref_get_first(8086c180) at __intel_wakeref_get_first+0x4a
> i915_active_acquire_preallocate_barrier(fd83d3bf5d88,8086c000) at 
> i915_active_acquire_preallocate_barrier+0xa8
> __intel_context_do_pin(fd83d3bf5cb8) at __intel_context_do_pin+0xa6
> i915_gem_do_execbuffer(8026d078,81364000,8000451ff000,8135a800,0)
>  at i915_gem_do_execbuffer+0x288f
> i915_gem_execbuffer2_ioctl(8026d078,8000451ff000,81364000)
>  at i915_gem_execbuffer2_ioctl+0x1da
> drm_do_ioctl(8026d078,100,80406469,8000451ff000) at 
> drm_do_ioctl+0x274
> drmioctl(15700,80406469,8000451ff000,3,8000452c2f98) at drmioctl+0xdc
> VOP_IOCTL(fd83d1e435c8,80406469,8000451ff000,3,fd8450fa8480,8000452c2f98)
>  at VOP_IOCTL+0x55
> vn_ioctl(fd83d22725f8,80406469,8000451ff000,8000452c2f98) at 
> vn_ioctl+0x75
> sys_ioctl(8000452c2f98,8000451ff110,8000451ff170) at 
> sys_ioctl+0x2d4
> syscall(8000451ff1e0) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7ca3a0, count: 241
> End of stack trace.
> witness: acquiring duplicate lock of same type: "&rq->lock"
>  1st &rq->lock
>  2nd &rq->lock
> Starting stack trace...
> witness_checkorder(8147a628,9,0) at witness_checkorder+0x809
> mtx_enter(8147a618) at mtx_enter+0x3c
> i915_active_ref(814a2bf0,81477000,8147a2e8) at 
> i915_active_ref+0x1fd
> i915_vma_move_to_active(814a2af8,8147a2e8,8018) at 
> i915_vma_move_to_active+0x5c
> i915_gem_do_execbuffer(8026d078,81364000,8000451ff000,814830

Re: 6.7 fails to boot on AMD64 after installing amdgpu firmware

2020-10-12 Thread Jonathan Gray
On Mon, Oct 12, 2020 at 07:37:28PM -0400, Ashton Fagg wrote:
> Hello,
> 
> I have an Acer Aspire 5 which has a Ryzen 3 3200 chip (with integrated
> graphics).
> 
> I installed 6.7 release - all came up fine. Upon booting for the first
> time, it suggested I run `syspatch`. I did this, and on the next reboot
> the machine hangs with "Initializing kernel modesetting (RAVEN
> 0x1002:0x1508 0x1025:0x134F 0xC4)". It never progresses any further than
> that.

This was really 0x1002:0x15d8 for picasso, amdgpu does not match on 0x1508

> 
> I did have ddb enabled in /etc/sysctl.conf, but unfortunately I can't
> get ddb to come up (nor does it appear to actually panic).
> 
> The only way I can successfully get the machine to boot now is by
> "disable amdgpu" in the boot configuration.
> 
> No other outputs except for the laptop's internal screen are connected.
> 
> fw_update shows no updates.
> 
> I have attached dmesg, pcidump and usbdevs output below -
> however, the only way I could retrieve this was disabling amdgpu so the
> machine would boot.
> 
> (I also have acpidump output if that's of help, but wasn't sure of the
> best way to post that)
> 
> I notice that on the list there was another report of this:
> 
> https://www.mail-archive.com/bugs@openbsd.org/msg14866.html
> 
> but it does not appear there was ever a resolution. I note there's some
> suggestions in that thread that -current might be worth a try - I'll see
> if I can get that to work.
> 
> Please let me know if there is anything else I can provide.

6.7 has drm based on linux 4.19.  If you can try a snapshot (which are
post 6.8 at the moment) you will be using a driver based on linux 5.7
and fw_update will fetch more recent firmware.

> 
> Regards,
> 
> Ash
> 
> dmesg.boot
> 
> OpenBSD 6.7 (GENERIC.MP) #7: Mon Oct  5 13:32:21 MDT 2020
> 
> r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 14951661568 (14259MB)
> avail mem = 14485905408 (13814MB)
> User Kernel Config
> UKC> disable amdgpu
> 240 amdgpu* disabled
> UKC> exit
> Continuing...
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.1 @ 0xae605000 (25 entries)
> bios0: vendor Insyde Corp. version "V1.09" date 06/01/2020
> bios0: Acer Aspire A515-43
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP UEFI SSDT MSDM ASF! BOOT HPET APIC MCFG WSMT SSDT 
> VFCT SSDT TPM2 IVRS SSDT CRAT CDIT SSDT SSDT SSDT SSDT SSDT FPDT SSDT SSDT 
> BGRT
> acpi0: wakeup devices GPP0(S4) GPP1(S4) GPP2(S4) GPP3(S4) GPP4(S4) GPP5(S4) 
> GP17(S4) XHC0(S4) XHC1(S4) GP18(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpihpet0 at acpi0: 14318180 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 3 3200U with Radeon Vega Mobile Gfx, 2595.45 MHz, 17-18-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD Ryzen 3 3200U with Radeon Vega Mobile Gfx, 2595.09 MHz, 17-18-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD Ryzen 3 3200U with Radeon Vega Mobile Gfx, 2595.09 MHz, 17-18-01
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,

Re: 6.7 fails to boot on AMD64 after installing amdgpu firmware

2020-10-12 Thread Jonathan Gray
On Mon, Oct 12, 2020 at 09:58:31PM -0400, Ashton Fagg wrote:
> Ashton Fagg  writes:
> 
> > Jonathan:
> >
> > Thanks for your reply.
> >
> > Jonathan Gray  writes:
> >
> >> This was really 0x1002:0x15d8 for picasso, amdgpu does not match on 0x1508
> >
> > Ah, yes, you are right.
> >
> >> 6.7 has drm based on linux 4.19.  If you can try a snapshot (which are
> >> post 6.8 at the moment) you will be using a driver based on linux 5.7
> >> and fw_update will fetch more recent firmware.
> >
> > Ok, I will try this and report back.
> >
> > - af
> 
> Ok, so happy to report that `sysupgrade -s` to the current 6.8 snapshot
> does indeed resolve this. X also works fine.
> 
> Inspecting dmesg it does still appear there's a couple of drm related
> errors. I've attached that in case it is of some use.
> 
> Thank you for the help!

So it turns out Ryzen 3 3200U is handled as raven2 in the driver.
We backported support for picasso but not raven2 into the 4.19 drm in
6.7.  At least in this case raven2 shares a pci id with picasso with
raven2 detected based on a revision.

Here is a backport of a linux 5.8 commit for raven2 which should
help, from

commit 6e29c227a4976460ec6d4cc70b998e3a8c30c873
Author: Alex Deucher 
Date:   Fri May 15 14:04:17 2020 -0400

drm/amdgpu: move gpu_info parsing after common early init

We need to get the silicon revision id before we parse
the firmware in order to load the correct gpu info firmware
for raven2 variants.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1103
Acked-by: Christian König 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher 

Index: sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c
===
RCS file: /cvs/src/sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c,v
retrieving revision 1.11
diff -u -p -r1.11 amdgpu_device.c
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c  13 Jul 2020 06:25:17 -  
1.11
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_device.c  13 Oct 2020 03:17:55 -
@@ -1767,10 +1767,6 @@ static int amdgpu_device_ip_early_init(s
return -EINVAL;
}
 
-   r = amdgpu_device_parse_gpu_info_fw(adev);
-   if (r)
-   return r;
-
if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
amdgpu_discovery_get_gfx_info(adev);
 
@@ -1809,6 +1805,10 @@ static int amdgpu_device_ip_early_init(s
}
/* get the vbios after the asic_funcs are set up */
if (adev->ip_blocks[i].version->type == 
AMD_IP_BLOCK_TYPE_COMMON) {
+   r = amdgpu_device_parse_gpu_info_fw(adev);
+   if (r)
+   return r;
+
/* Read BIOS */
if (!amdgpu_get_bios(adev))
return -EINVAL;



Re: 6.7 fails to boot on AMD64 after installing amdgpu firmware

2020-10-13 Thread Jonathan Gray
On Tue, Oct 13, 2020 at 06:33:20PM -0400, Ashton Fagg wrote:
> Jonathan Gray  writes:
> 
> > So it turns out Ryzen 3 3200U is handled as raven2 in the driver.
> > We backported support for picasso but not raven2 into the 4.19 drm in
> > 6.7.  At least in this case raven2 shares a pci id with picasso with
> > raven2 detected based on a revision.
> >
> > Here is a backport of a linux 5.8 commit for raven2 which should
> > help, from
> >
> > commit 6e29c227a4976460ec6d4cc70b998e3a8c30c873
> > Author: Alex Deucher 
> > Date:   Fri May 15 14:04:17 2020 -0400
> > *snip*
> 
> Jonathan,
> 
> Thank you for the insight.
> 
> I would be willing to try and submit a patch to correct this problem in
> 6.7. I'm not sure whether that would be "welcomed" as I understand 6.8
> is on the way. If you can advise that'd be great.
> 
> - ajf
> 

To clarify the patch was for -current with linux 5.7 drm.  We won't be
backporting raven2 support to 6.7.  It will be available in 6.8.



Re: Lenovo Workstation Dock Audio device attaches to ure

2020-10-14 Thread Jonathan Gray
On Wed, Oct 14, 2020 at 08:52:51PM +0200, sh+open...@codevoid.de wrote:
> >Synopsis:Lenovo Thunderbolt 3 Workstation Dock Audio is detected as ure
> >Category:kernel, amd64
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8-current (GENERIC.MP) #102: Thu Oct  8 
> 14:41:12 MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I have connected my Lenovo X1 Carbon Gen 7 to a Lenovo Thunderbolt 3 
> Workstation Dock.
>   The connection is done via Thunderbolt cable. Booting the machine takes 
> a long time,
>   because it waits for ureX timeouts.
> 
>   The relevant dmesg part:
> 
>   ure0 at uhub4 port 2 configuration 1 interface 0 "Lenovo ThinkPad 
> Thunderbolt 3 Dock USB Audio" rev 2.00/0.91 addr 5
>   ure0: , unknown ver 00ure0: timeout waiting for chip autoload
>   ure0: timeout waiting for phy to stabilize
>   ure0: timeout waiting for phy to stabilize
>   , address 7f:00:00:00:00:00
>   uaudio0 at uhub4 port 2 configuration 1 interface 1 "Lenovo ThinkPad 
> Thunderbolt 3 Dock USB Audio" rev 2.00/0.91 addr 5
>   uaudio0: class v1, full-speed, sync, channels: 2 play, 1 rec, 4 ctls
>   audio1 at uaudio0
>   ure1 at uhub4 port 2 configuration 1 interface 3 "Lenovo ThinkPad 
> Thunderbolt 3 Dock USB Audio" rev 2.00/0.91 addr 5
>   ure1: , unknown ver 00ure1: timeout waiting for chip autoload
>   ure1: timeout waiting for phy to stabilize
>   ure1: timeout waiting for phy to stabilize
>   , address 7f:00:00:00:00:00
> 
>   These are not network devices. usbdevfs -v reports:
> 
>   addr 05: 17ef:3083 Lenovo, ThinkPad Thunderbolt 3 Dock USB Audio
>full speed, power 100 mA, config 1, rev 0.91, iSerial 
> 
>driver: ure0
>driver: uaudio0
>driver: ure1
> 
>   There is actually a real ure device in the dock:
> 
>   addr 04: 17ef:3082 Realtek, ThinkPad TBT 3 Dock
>super speed, power 72 mA, config 1, rev 31.01, iSerial 
> 10156C61C
>driver: ure2
> 
>   The product IDs are defined as:
>   usbdevs.h:#define  USB_PRODUCT_LENOVO_RTL8153B_2   0x3083 /* RTL8153B */
>   usbdevs.h:#define  USB_PRODUCT_LENOVO_TB3DOCKGEN2  0x3082 /* 
> Thunderbolt 3 Dock Gen 2 */
>   and both used in if_ure.c

Thanks for the report, I'll remove it.

It was added as rtux64w10.INF has
%Lenovo-FFF.DeviceDesc% = RTL8153Bx64_S5WOL.ndi,USB\VID_17EF&PID_3083&REV_3101

> 
> >How-To-Repeat:
>   Attach a OpenBSD -current (I did not test 6.7) to a Lenovo Thunderbolt 
> 3 Dock Gen 2
>   via Thunderbolt/USBC connector. Boot and watch...
> >Fix:
>   Removing the USB_PRODUCT_LENOVO_RTL8153B_2 product ID helps, but that's 
> obviously not
>   a good solution as someone might have this product ID as network card.
> 
> 
> dmesg:
> OpenBSD 6.8-current (GENERIC.MP) #102: Thu Oct  8 14:41:12 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16959836160 (16174MB)
> avail mem = 16430788608 (15669MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.1 @ 0x77d49000 (64 entries)
> bios0: vendor LENOVO version "N2HET55P (1.38 )" date 08/24/2020
> bios0: LENOVO 20QD003MGE
> acpi0 at bios0: ACPI 6.1
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 UEFI SSDT HPET APIC 
> MCFG ECDT SSDT SSDT SSDT BOOT SLIC SSDT LPIT WSMT SSDT DBGP DBG2 MSDM BATB 
> DMAR NHLT FPDT UEFI
> acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) 
> RP02(S4) PXSX(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
> PXSX(S4) RP07(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 2399 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, 7509.93 MHz, 06-8e-0c
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i7-8565U CPU

Re: Page fault using the amdgpu driver

2020-10-15 Thread Jonathan Gray
On Thu, Oct 08, 2020 at 03:29:49PM +0200, mel...@marvinborner.de wrote:
> >Synopsis:Page fault when watching 2k videos
> >Category:kernel
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8-current (GENERIC.MP) #99: Wed Oct  7 22:24:13 
> MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> The whole system crashes when I'm using the GPU too much (e.g. watching 2k 
> videos on YouTube). While they run smooth at the start, graphical artifacts 
> appear a few minutes later ending with a complete crash to the kernel 
> console. The logs are full of no-retry page-faults (see below).
> I'm using an AMD Ryzen 2400g (APU) with a AB350N-Gaming WIFI Motherboard and 
> the latest OpenBSD snapshot, connected via HDMI to my 2K 27" monitor.
> 
> By the way: The amdgpu driver didn't work at all for me in 6.7 and it crashed 
> with "/bsd: [drm] *ERROR* construct: Invalid Connector ObjectID from Adapter 
> Service for connector index:1! type 0 expected 3", so the 6.8 update is still 
> a huge improvement as it works somewhat at least - thanks for your work! :)
> >How-To-Repeat:
> Watching high quality videos, generally doing GPU-extensive tasks.
> >Fix:
> General GPU usage works great and smooth, so not doing GPU-extensive tasks is 
> a fix.

The most direct way I am aware of to reproduce the no-retry page fault
situation is with piglit from ports

piglit run quick -t "spec@glsl-1.30@execution@texelfetch fs sampler2d 
1x281-501x281" out

With the following diff it no longer occurs in testing on vega10 / Vega 56.

diff --git sys/dev/pci/drm/drm_mm.c sys/dev/pci/drm/drm_mm.c
index 58a1afe504b..94c837ae655 100644
--- sys/dev/pci/drm/drm_mm.c
+++ sys/dev/pci/drm/drm_mm.c
@@ -93,19 +93,11 @@
  * some basic allocator dumpers for debugging.
  *
  * Note that this range allocator is not thread-safe, drivers need to protect
- * modifications with their on locking. The idea behind this is that for a full
+ * modifications with their own locking. The idea behind this is that for a 
full
  * memory manager additional data needs to be protected anyway, hence internal
  * locking would be fully redundant.
  */
 
-static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct 
drm_mm *mm,
-   u64 size,
-   u64 alignment,
-   unsigned long color,
-   u64 start,
-   u64 end,
-   enum drm_mm_search_flags flags);
-
 #ifdef CONFIG_DRM_DEBUG_MM
 #include 
 
@@ -115,25 +107,19 @@ static struct drm_mm_node 
*drm_mm_search_free_in_range_generic(const struct drm_
 static noinline void save_stack(struct drm_mm_node *node)
 {
unsigned long entries[STACKDEPTH];
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = STACKDEPTH,
-   .skip = 1
-   };
+   unsigned int n;
 
-   save_stack_trace(&trace);
-   if (trace.nr_entries != 0 &&
-   trace.entries[trace.nr_entries-1] == ULONG_MAX)
-   trace.nr_entries--;
+   n = stack_trace_save(entries, ARRAY_SIZE(entries), 1);
 
/* May be called under spinlock, so avoid sleeping */
-   node->stack = depot_save_stack(&trace, GFP_NOWAIT);
+   node->stack = stack_depot_save(entries, n, GFP_NOWAIT);
 }
 
 static void show_leaks(struct drm_mm *mm)
 {
struct drm_mm_node *node;
-   unsigned long entries[STACKDEPTH];
+   unsigned long *entries;
+   unsigned int nr_entries;
char *buf;
 
buf = kmalloc(BUFSZ, GFP_KERNEL);
@@ -141,19 +127,14 @@ static void show_leaks(struct drm_mm *mm)
return;
 
list_for_each_entry(node, drm_mm_nodes(mm), node_list) {
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = STACKDEPTH
-   };
-
if (!node->stack) {
DRM_ERROR("node [%08llx + %08llx]: unknown owner\n",
  node->start, node->size);
continue;
}
 
-   depot_fetch_stack(node->stack, &trace);
-   snprint_stack_trace(buf, BUFSZ, &trace, 0);
+   nr_entries = stack_depot_fetch(node->stack, &entries);
+   stack_trace_snprint(buf, BUFSZ, entries, nr_entries, 0);
DRM_ERROR("node [%08llx + %08llx]: inserted at\n%s",
  node->start, node->size, buf);
}
@@ -176,39 +157,85 @@ INTERVAL_TREE_DEFINE(struct drm_mm_node, rb,
 u64, __subtree_last,
 START, LAST, static inline, drm_mm_interval_tree)
 

Re: AMD Firepro m5950 blank screen with radeondrm

2020-10-15 Thread Jonathan Gray
On Wed, Oct 14, 2020 at 01:51:46PM +, Jeremy Broadway wrote:
> >Synopsis:  AMD Firepro m5950 blank screen with radeondrm
> >Category:  kernel
> >Environment:
> System  : OpenBSD 6.7
> Details : OpenBSD 6.7 (GENERIC.MP) #7: Mon Oct  5 13:32:21 MDT 
> 2020
> 
> r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> I have a dell m4600 with an amd firepro m5950, which should be near
> identical to the radeon hd 6730m/6770m card.  It is based on the whistler
> architecture which appears to be the mobile variant of the turks
> architecture.
> 
> Booting with the radeondrm driver the screen goes blank when it gets
> to kernel modesetting.  It is also trying to set my laptop display,
> which is a 1920x1080 panel to 1024x768 32bpp.  The system still
> starts up and is accessible over the network, only the screen is
> blank including when switching to other tty's.
> 
> initializing kernel modesetting (TURKS 0x1002:0x6740 0x1028:0x04A3 0x00).
> radeondrm0: 1024x768, 32bpp
> 
> All firmware is up to date and booting with radeondrm disabled works
> as expected with vesa mode displaying correctly.  I was not able to
> find any resolutions in my searching and blind logging in and
> starting X as some folks did to resolve a similar radeondrm
> problem does not work for me.  The current 6.8 snapshot also yields
> the same results.
> 
> I do not see whistler supported in radeon(4) but since it is
> supposedly a variant of turks which is listed as supported and the
> card seems to be detected as turks I figured I'd give it a shot and
> see if some minimal change may be able to support it, if not, no
> worries.

1024x768 is the mode used when nothing could be probed.

Does using an external display work?

Is there an option in the bios to change to Intel graphics?

If you can build a kernel with 'option DRMDEBUG' there should be verbose
messages on probing outputs which may provide some hints.  If you do this
use -current.

> 
> Thanks,
> Jeremy
> 
> >How-To-Repeat:
> boot normally with an amd firepro m5950 video card or
> potentially any card with a whistler architecture
> >Fix:
> Unsure, work around of boot -c disable radeondrm
> 
> 
> dmesg:
> OpenBSD 6.7 (GENERIC.MP) #7: Mon Oct  5 13:32:21 MDT 2020
> 
> r...@syspatch-67-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34297933824 (32709MB)
> avail mem = 33245851648 (31705MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf1f40 (108 entries)
> bios0: vendor Dell Inc. version "A17" date 05/12/2017
> bios0: Dell Inc. Precision M4600
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC TCPA SSDT MCFG HPET BOOT SSDT SSDT DMAR SLIC SSDT
> acpi0: wakeup devices EHC1(S3) EHC2(S3) HDEF(S4) GLAN(S4) RP01(S4)
> PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4)
> RP05(S4) PXSX(S4) RP06(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 2195.35 MHz, 06-2a-07
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 2195.02 MHz, 06-2a-07
> Cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 2195.02 MHz, 06-2a-07
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: 

Re: AMD Firepro m5950 blank screen with radeondrm

2020-10-18 Thread Jonathan Gray
On Thu, Oct 15, 2020 at 04:30:00PM +, Jeremy Broadway wrote:
> >On Thu, Oct 15, 2020 at 8:01 AM Jonathan Gray  wrote:
> >
> > 1024x768 is the mode used when nothing could be probed.
> >
> > Does using an external display work?
> >
> > Is there an option in the bios to change to Intel graphics?
> >
> > If you can build a kernel with 'option DRMDEBUG' there should be verbose
> > messages on probing outputs which may provide some hints.  If you do this
> > use -current.
> >
> 
> Booting with an external display plugged into the displayport output, DP-1,
> works, I did not have a hdmi or vga cable handy to test those outputs.  If
> the system was booted with xenodm started without being connected to
> the external display it did not display once I plugged it in until i stopped
> xenodm, interestingly enough though when stopping xenodm it did flash
> the console output on the laptop display before moving over to the
> external display. There is no option to disable the dedicated card in the
> bios.
> 
> This thing is a brick and the battery is dead so it's not moving away from
> my desk, so this is good enough for me, however in case it can help
> anyone else on older dell's that may experience a similar issue below
> is the dmesg output with option DRMDEBUG on -current for when it was
> booted without being plugged into an external monitor.  I'd be willing to
> test any changes if you wanted to pursue it.

I don't understand why a connector with LVDS doesn't show up.

Something like this on a T500 with RV635

[drm] radeon atom DIG backlight initialized
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   DVI-I-1
[drm]   HPD3
[drm]   DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
[drm]   Encoders:
[drm] DFP1: INTERNAL_UNIPHY
[drm] Connector 1:
[drm]   LVDS-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] LCD1: INTERNAL_KLDSCP_LVTMA
[drm] Connector 2:
[drm]   DP-1
[drm]   HPD1
[drm]   DDC: 0x7e20 0x7e20 0x7e24 0x7e24 0x7e28 0x7e28 0x7e2c 0x7e2c
[drm]   Encoders:
[drm] DFP2: INTERNAL_UNIPHY
[drm] Connector 3:
[drm]   VGA-1
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1

Can you try switch the boot mode in the BIOS from UEFI to Legacy/CSM and
reinstall?

> 
> Thanks for the help!
> Jeremy
> 
> OpenBSD 6.8-current (CUSTOM) #0: Thu Oct 15 09:57:05 EDT 2020
> jeremy@m4600:/usr/src/sys/arch/amd64/compile/CUSTOM
> real mem = 34297933824 (32709MB)
> avail mem = 33243414528 (31703MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf1f40 (108 entries)
> bios0: vendor Dell Inc. version "A17" date 05/12/2017
> bios0: Dell Inc. Precision M4600
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC TCPA SSDT MCFG HPET BOOT SSDT SSDT DMAR SLIC SSDT
> acpi0: wakeup devices EHC1(S3) EHC2(S3) HDEF(S4) GLAN(S4) RP01(S4)
> PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4)
> RP05(S4) PXSX(S4) RP06(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 2195.32 MHz, 06-2a-07
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> cpu at mainbus0: not configured
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf800, bus 0-63
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (P0P1)
> acpiprt2 at acpi0: bus 2 (RP01)
> acpiprt3 at acpi0: bus 3 (RP02)
> acpiprt4 at acpi0: bus 4 (RP03)
> acpiprt5 at acpi0: bus 10 (RP04)
> acpiprt6 at acpi0: bus -1 (RP05)
> acpiprt7 at acpi0: bus -1 (RP06)
> acpiprt8 at acpi0: bus -1 (RP07)
> acpiprt9 at acpi0

Re: AMDGPU crash at boot

2020-10-19 Thread Jonathan Gray
On Mon, Oct 19, 2020 at 05:58:25PM +0200, mel...@marvinborner.de wrote:
> >Synopsis:AMDGPU crashes with a no-retry page fault at boot, before Xorg 
> >starts
> >Category:kernel
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8-current (GENERIC.MP) #120: Sun Oct 18 
> 09:31:14 MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> Once my PC (AMD Ryzen 2400G) starts, just before the blue text switches to 
> the grey text (when the graphical driver gets started, in this case AMDGPU), 
> the whole system crashes with a no-retry page fault which is caused by a drm 
> timeout (see attached log), I think.
> >How-To-Repeat:
> Boot a Ryzen 2400G based PC
> >Fix:
> Disable AMDGPU in the kernel config.
> Switch to a older snapshot where the graphics worked but crashed quite often.

You perhaps have the drm_mm changes that were in snapshots but not the
fault changes:


revision 1.22
date: 2020/10/18 09:22:32;  author: kettenis;  state: Exp;  lines: +193 -89;  
commitid: xJRTpypWydiF9FMQ;
Fix several bugs in the TTM page fault handler and porperly integrate all the
changes made to Linux 5.7.  Pointed out by jsg@.

ok jsg@


Try a more recent snapshot.

> 
> 
> dmesg:
> OpenBSD 6.8-current (RAMDISK_CD) #113: Sun Oct 18 09:39:01 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 14961664000 (14268MB)
> avail mem = 14504198144 (13832MB)
> random: good seed from bootblocks
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xe8d60 (48 entries)
> bios0: vendor American Megatrends Inc. version "F50a" date 11/27/2019
> bios0: Gigabyte Technology Co., Ltd. AB350N-Gaming WIFI
> acpi0 at bios0: ACPI 6.0
> acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT MCFG SSDT HPET UEFI 
> TPM2 SSDT CRAT CDIT SSDT SSDT SSDT SSDT WSMT SSDT
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 5 2400G with Radeon Vega Graphics, 3593.94 MHz, 17-11-00
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> cpu at mainbus0: not configured
> essage repeated 6 times
> ioapic0 at mainbus0: apid 9 pa 0xfec0, version 21, 24 pins
> ioapic1 at mainbus0: apid 10 pa 0xfec01000, version 21, 32 pins
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (GPP0)
> acpiprt2 at acpi0: bus 9 (GPP2)
> acpiprt3 at acpi0: bus -1 (GPP3)
> acpiprt4 at acpi0: bus -1 (GPP4)
> acpiprt5 at acpi0: bus -1 (GPP5)
> acpiprt6 at acpi0: bus -1 (GPP6)
> acpiprt7 at acpi0: bus 10 (GP17)
> acpiprt8 at acpi0: bus 11 (GP18)
> acpiprt9 at acpi0: bus 1 (GPP1)
> acpipci0 at acpi0 PCI0: 0x0010 0x0011 0x
> acpicmos0 at acpi0
> "PNP0C0C" at acpi0 not configured
> amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins
> "MSFT0101" at acpi0 not configured
> "AMDIF030" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> "PNP0C14" at acpi0 not configured
> acpicpu at acpi0 not configured
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "AMD 17h/1xh Root Complex" rev 0x00
> pchb1 at pci0 dev 1 function 0 "AMD 17h PCIE" rev 0x00
> ppb0 at pci0 dev 1 function 2 "AMD 17h/1xh PCIE" rev 0x00: msi
> pci1 at ppb0 bus 1
> xhci0 at pci1 dev 0 function 0 "AMD 300 Series xHCI" rev 0x02: msi, xHCI 1.10
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 
> addr 1
> ahci0 at pci1 dev 0 function 1 "AMD 300 Series SATA" rev 0x02: msi, AHCI 1.3.1
> scsibus0 at ahci0: 32 targets
> ppb1 at pci1 dev 0 function 2 vendor "AMD", unknown product 0x43b2 rev 0x02
> pci2 at ppb1 bus 2
> ppb2 at pci2 dev 0 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci3 at ppb2 bus 3
> ppb3 at pci2 dev 1 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci4 at ppb3 bus 4
> ppb4 at pci2 dev 4 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci5 at ppb4 bus 5
> ppb5 at pci2 dev 5 function 0 "AMD 300 Series PCIE" rev 0x02: msi
> pci6 at ppb5 bus 6
> iwm0 at pci6 dev 0 function 0 "Intel Dual Band Wireless AC 3165" rev 0x81, msi
>

Re: system freezes since recent drm_mm changes

2020-10-22 Thread Jonathan Gray
On Thu, Oct 22, 2020 at 05:37:43PM -0400, Joe Gidi wrote:
> SENDBUG: -*- sendbug -*-
> SENDBUG: Lines starting with `SENDBUG' will be removed automatically.
> SENDBUG:
> SENDBUG: Choose from the following categories:
> SENDBUG:
> SENDBUG: system user library documentation kernel alpha amd64 arm hppa
> i386 m88k mips64 powerpc sh sparc sparc64 vax
> SENDBUG:
> SENDBUG:
> 
> >Synopsis:System freezes with radeondrm since latest drm_mm changes
> >Category:kernel
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8-current (GENERIC.MP) #131: Thu Oct 22 09:52:11
> MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I'm seeing system freezes while using Firefox since the recent drm_mm
> commits.
>   The system will freeze up, /var/log/messages will show many instances 
> of the
>   following lines, the screen goes black, and I have to reset the system.
> 
> Oct 22 17:18:39 ryzen /bsd: drm:pid23792:radeon_ring_test_lockup *ERROR*
> ring 0 stalled for more than 10040msec
> Oct 22 17:18:39 ryzen /bsd: drm:pid23792:radeon_fence_check_lockup
> *WARNING* GPU lockup (current fence id 0x00021e1f last fence id
> 0x00021e28 on ring 0)
> Oct 22 17:18:39 ryzen /bsd: drm:pid23792:radeon_ring_test_lockup *ERROR*
> ring 4 stalled for more than 1msec
> Oct 22 17:18:39 ryzen /bsd: drm:pid23792:radeon_fence_check_lockup
> *WARNING* GPU lockup (current fence id 0x0001e20a last fence id
> 0x0001e210 on ring 4)

Does reverting the drm_mm and drm_vma commits change this?

diff --git sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c 
sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c
index de0afe174d1..8014d760f69 100644
--- sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c
+++ sys/dev/pci/drm/amd/amdgpu/amdgpu_ttm.c
@@ -217,7 +217,6 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 static int amdgpu_verify_access(struct ttm_buffer_object *bo, struct file 
*filp)
 {
struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
-   struct drm_file *file_priv = (void *)filp;
 
/*
 * Don't verify access for KFD BOs. They don't have a GEM
@@ -228,7 +227,7 @@ static int amdgpu_verify_access(struct ttm_buffer_object 
*bo, struct file *filp)
 
if (amdgpu_ttm_tt_get_usermm(bo->ttm))
return -EPERM;
-   return drm_vma_node_verify_access(&abo->tbo.base.vma_node, file_priv);
+   return drm_vma_node_verify_access(&abo->tbo.base.vma_node, filp);
 }
 
 /**
diff --git sys/dev/pci/drm/drm_gem.c sys/dev/pci/drm/drm_gem.c
index 729b9e921ed..1a6f898cfda 100644
--- sys/dev/pci/drm/drm_gem.c
+++ sys/dev/pci/drm/drm_gem.c
@@ -198,7 +198,7 @@ udv_attach_drm(dev_t device, vm_prot_t accessprot, voff_t 
off, vsize_t size)
if (!obj)
return NULL;
 
-   if (!drm_vma_node_is_allowed(node, priv)) {
+   if (!drm_vma_node_is_allowed(node, filp)) {
drm_gem_object_put_unlocked(obj);
return NULL;
}
@@ -439,7 +439,7 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
dev->driver->gem_close_object(obj, file_priv);
 
drm_gem_remove_prime_handles(obj, file_priv);
-   drm_vma_node_revoke(&obj->vma_node, file_priv);
+   drm_vma_node_revoke(&obj->vma_node, file_priv->filp);
 
drm_gem_object_handle_put_unlocked(obj);
 
@@ -583,7 +583,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 
handle = ret;
 
-   ret = drm_vma_node_allow(&obj->vma_node, file_priv);
+   ret = drm_vma_node_allow(&obj->vma_node, file_priv->filp);
if (ret)
goto err_remove;
 
@@ -601,7 +601,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
return 0;
 
 err_revoke:
-   drm_vma_node_revoke(&obj->vma_node, file_priv);
+   drm_vma_node_revoke(&obj->vma_node, file_priv->filp);
 err_remove:
spin_lock(&file_priv->table_lock);
idr_remove(&file_priv->object_idr, handle);
diff --git sys/dev/pci/drm/drm_mm.c sys/dev/pci/drm/drm_mm.c
index 17c3fffcd0d..58a1afe504b 100644
--- sys/dev/pci/drm/drm_mm.c
+++ sys/dev/pci/drm/drm_mm.c
@@ -93,11 +93,19 @@
  * some basic allocator dumpers for debugging.
  *
  * Note that this range allocator is not thread-safe, drivers need to protect
- * modifications with their own locking. The idea behind this is that for a 
full
+ * modifications with their on locking. The idea behind this is that for a full
  * memory manager additional data needs to be protected anyway, hence internal
  * locking would be fully redundant.
  */
 
+static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct 
drm_mm *mm,
+   u64 size,
+   u64 alignment,
+   unsigned long color,
+

Re: Errors with radeon graphic card after install 68- xenodm won't work

2020-10-26 Thread Jonathan Gray
On Sun, Oct 25, 2020 at 09:45:53PM +0100, Jean-Louis ABRAHAM wrote:
> Dear OpenBSD Team
> 
>  
> 
> This mail has been generated with sendbug and I have manually added some 
> attachments to give as much infos as possible.
> 
> Of course, if you need more infos, please let me know.
> 
> Just hope my issue is not due to not reading the docs...
> 
>  
> 
> Regards
> 
> Jean-Louis
> 
>  
> 
> >Synopsis:    X error with radeon graphic card - xenodm won't work
> >Category:    amd64 xenodm
> >Environment:
>     System  : OpenBSD 6.8
>     Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
>              
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>     Architecture: OpenBSD.amd64
>     Machine : amd64
> >Description:
>     xenodm doesn't work; xenodm.log and X.org.log contain error messages.
> >How-To-Repeat:
>     at each reboot xenodm fails. However startx works.

> [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> [drm] *ERROR* radeon: cp isn't working (-22).
> drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> drm:pid0:r100_init *ERROR* Disabling GPU acceleration

With acceleration disabled the ati driver error path seems to result in
X:/usr/X11R6/lib/modules/drivers/radeon_drv.so: undefined symbol 
'exaGetPixmapDriverPrivate'
this may be related to
https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/-/commit/c0eb5dbd9c1db6b6d5b1574bcd8c584170d7ab54

The modesetting driver is used with startx which is why you don't see
that there.

I am not sure why the ring test fails on ES1000.  I had another report
off list of it failing on a system with ES1000.



Re: Errors with radeon graphic card after install 68- xenodm won't work

2020-10-26 Thread Jonathan Gray
On Mon, Oct 26, 2020 at 10:40:50AM +0100, Matthieu Herrb wrote:
> On Mon, Oct 26, 2020 at 07:24:47PM +1100, Jonathan Gray wrote:
> > On Sun, Oct 25, 2020 at 09:45:53PM +0100, Jean-Louis ABRAHAM wrote:
> > > Dear OpenBSD Team
> > > 
> > >  
> > > 
> > > This mail has been generated with sendbug and I have manually added some 
> > > attachments to give as much infos as possible.
> > > 
> > > Of course, if you need more infos, please let me know.
> > > 
> > > Just hope my issue is not due to not reading the docs...
> > > 
> > >  
> > > 
> > > Regards
> > > 
> > > Jean-Louis
> > > 
> > >  
> > > 
> > > >Synopsis:    X error with radeon graphic card - xenodm won't work
> > > >Category:    amd64 xenodm
> > > >Environment:
> > >     System  : OpenBSD 6.8
> > >     Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 
> > > 2020
> > >              
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >     Architecture: OpenBSD.amd64
> > >     Machine : amd64
> > > >Description:
> > >     xenodm doesn't work; xenodm.log and X.org.log contain error messages.
> > > >How-To-Repeat:
> > >     at each reboot xenodm fails. However startx works.
> > 
> > > [drm] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> > > [drm] *ERROR* radeon: cp isn't working (-22).
> > > drm:pid0:r100_startup *ERROR* failed initializing CP (-22).
> > > drm:pid0:r100_init *ERROR* Disabling GPU acceleration
> > 
> > With acceleration disabled the ati driver error path seems to result in
> > X:/usr/X11R6/lib/modules/drivers/radeon_drv.so: undefined symbol 
> > 'exaGetPixmapDriverPrivate'
> > this may be related to
> > https://gitlab.freedesktop.org/xorg/driver/xf86-video-ati/-/commit/c0eb5dbd9c1db6b6d5b1574bcd8c584170d7ab54
> 
> Yes that looks like the problem... We can probably merge this commit.

I can reproduce the way it fails using an xorg.conf with 'Option "Accel" "off"'.

Needs two commits to fix:
c0eb5dbd9c1db6b6d5b1574bcd8c584170d7ab54 Don't crash X server if GPU 
acceleration is not available
4d84cf438e7f1bebf0053035ef0292e9fed257d1 Handle NULL fb_ptr in pixmap_get_fb

Index: src/radeon.h
===
RCS file: /cvs/xenocara/driver/xf86-video-ati/src/radeon.h,v
retrieving revision 1.22
diff -u -p -r1.22 radeon.h
--- src/radeon.h26 Oct 2019 09:37:25 -  1.22
+++ src/radeon.h26 Oct 2020 12:19:40 -
@@ -790,8 +790,8 @@ static inline Bool radeon_set_pixmap_bo(
 
 static inline struct radeon_buffer *radeon_get_pixmap_bo(PixmapPtr pPix)
 {
-#ifdef USE_GLAMOR
 RADEONInfoPtr info = RADEONPTR(xf86ScreenToScrn(pPix->drawable.pScreen));
+#ifdef USE_GLAMOR
 
 if (info->use_glamor) {
struct radeon_pixmap *priv;
@@ -799,7 +799,7 @@ static inline struct radeon_buffer *rade
return priv ? priv->bo : NULL;
 } else
 #endif
-{
+if (info->accelOn) {
struct radeon_exa_pixmap_priv *driver_priv;
driver_priv = exaGetPixmapDriverPrivate(pPix);
return driver_priv ? driver_priv->bo : NULL;
@@ -896,7 +896,7 @@ radeon_pixmap_get_fb(PixmapPtr pix)
   handle);
 }
 
-return *fb_ptr;
+return fb_ptr ? *fb_ptr : NULL;
 }
 
 



Re: init: single user shell terminated, restarting on i386 snap

2020-12-12 Thread Jonathan Gray
On Sat, Dec 12, 2020 at 11:30:14PM +0100, Alexander Bluhm wrote:
> On Fri, Dec 11, 2020 at 03:14:52PM -0500, Johan Huldtgren wrote:
> > init: single user shell terminated, restarting
> > init: single user shell terminated, restarting
> 
> The problem is that libc setjmp tries to save the MXCSR register.
> 
> > cpu0: Geode(TM) Integrated Processor by AMD PCS ("AuthenticAMD" 586-class) 
> > 500 MHz, 05-0a-02
> > cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX,MMXX,3DNOW2,3DNOW
> 
> This processor has no SSE support, so accessing MXCSR fails.
> 
> I tried several variants to detect SSE support during runtime in
> libc.  None of them was working in a nice way.  So I suggest to
> remove the MXCSR bits.  For regress/lib/libc/setjmp-fpu it is enough
> to save the FPU CW register.
> 
> i386 compiler does not use SSE by default.  There is some code in
> libm/arch/i387/fenv.c that may access MXCSR.  Can we assume that
> programs working with MXCSR will care about the state itself or not
> use setjmp?
> 
> ok?
> 
> bluhm
> 
> Index: lib/libc/arch/i386/gen/_setjmp.S
> ===
> RCS file: /mount/openbsd/cvs/src/lib/libc/arch/i386/gen/_setjmp.S,v
> retrieving revision 1.8
> diff -u -p -r1.8 _setjmp.S
> --- lib/libc/arch/i386/gen/_setjmp.S  6 Dec 2020 18:13:15 -   1.8
> +++ lib/libc/arch/i386/gen/_setjmp.S  12 Dec 2020 22:03:57 -
> @@ -63,7 +63,6 @@ ENTRY(_setjmp)
>   movl%ecx,(_JB_EBP * 4)(%eax)
>   movl%esi,(_JB_ESI * 4)(%eax)
>   movl%edi,(_JB_EDI * 4)(%eax)
> - stmxcsr (_JB_MXCSR * 4)(%eax)
>   fnstcw  (_JB_FCW * 4)(%eax)
>   xorl%eax,%eax
>   ret
> @@ -75,7 +74,6 @@ ENTRY(_longjmp)
>   addl$__jmpxor-1b,%ecx   # load cookie address
>   movl4(%esp),%edx# parameter, pointer to env
>   movl8(%esp),%eax# parameter, val
> - ldmxcsr (_JB_MXCSR * 4)(%edx)
>   fldcw   (_JB_FCW * 4)(%edx)
>   movl(_JB_EBX * 4)(%edx),%ebx
>   movl(_JB_ESP * 4)(%edx),%esi
> Index: lib/libc/arch/i386/gen/setjmp.S
> ===
> RCS file: /mount/openbsd/cvs/src/lib/libc/arch/i386/gen/setjmp.S,v
> retrieving revision 1.13
> diff -u -p -r1.13 setjmp.S
> --- lib/libc/arch/i386/gen/setjmp.S   6 Dec 2020 18:13:15 -   1.13
> +++ lib/libc/arch/i386/gen/setjmp.S   12 Dec 2020 22:04:01 -
> @@ -78,7 +78,6 @@ ENTRY(setjmp)
>   movl8(%edx),%edx# load eip cookie over cookie address
>   xorl0(%esp),%edx# caller address
>   movl%edx,(_JB_EIP * 4)(%ecx)
> - stmxcsr (_JB_MXCSR * 4)(%ecx)
>   fnstcw  (_JB_FCW * 4)(%ecx)
>   xorl%eax,%eax
>   ret
> @@ -97,7 +96,6 @@ ENTRY(longjmp)
>  
>   movl4(%esp),%edx# parameter, pointer to env
>   movl8(%esp),%eax# parameter, val
> - ldmxcsr (_JB_MXCSR * 4)(%edx)
>   fldcw   (_JB_FCW * 4)(%edx)
>   movl(_JB_EBX * 4)(%edx),%ebx
>   movl(_JB_ESP * 4)(%edx),%esi
> Index: lib/libc/arch/i386/gen/sigsetjmp.S
> ===
> RCS file: /mount/openbsd/cvs/src/lib/libc/arch/i386/gen/sigsetjmp.S,v
> retrieving revision 1.12
> diff -u -p -r1.12 sigsetjmp.S
> --- lib/libc/arch/i386/gen/sigsetjmp.S6 Dec 2020 18:13:15 -   
> 1.12
> +++ lib/libc/arch/i386/gen/sigsetjmp.S12 Dec 2020 22:04:05 -
> @@ -67,7 +67,6 @@ ENTRY(sigsetjmp)
>   movl4(%edx),%edx# load eip cookie over cookie address
>   xorl0(%esp),%edx
>   movl%edx,(_JB_EIP * 4)(%ecx)
> - stmxcsr (_JB_MXCSR * 4)(%ecx)
>   fnstcw  (_JB_FCW * 4)(%ecx)
>   xorl%eax,%eax
>   ret
> @@ -91,7 +90,6 @@ ENTRY(siglongjmp)
>  
>   movl4(%esp),%edx# reload in case sigprocmask failed
>   movl8(%esp),%eax# parameter, val
> - ldmxcsr (_JB_MXCSR * 4)(%edx)
>   fldcw   (_JB_FCW * 4)(%edx)
>   movl(_JB_EBX * 4)(%edx),%ebx
>   movl(_JB_ESP * 4)(%edx),%esi
> Index: sys/arch/i386/include/setjmp.h
> ===
> RCS file: /mount/openbsd/cvs/src/sys/arch/i386/include/setjmp.h,v
> retrieving revision 1.3
> diff -u -p -r1.3 setjmp.h
> --- sys/arch/i386/include/setjmp.h6 Dec 2020 15:31:30 -   1.3
> +++ sys/arch/i386/include/setjmp.h12 Dec 2020 22:04:32 -
> @@ -13,7 +13,6 @@
>  #define _JB_EDI  5
>  #define _JB_SIGMASK  6
>  #define _JB_SIGFLAG  7
> -#define _JB_MXCSR8
> -#define _JB_FCW  9
> +#define _JB_FCW  8
>  
>  #define _JBLEN   10  /* size, in longs, of a jmp_buf */

ok jsg@

While _JBLEN could change to 9 after this it was historically 10 and
changing it would break ABI?



Re: X11 freezes with Radeon RX 560 and other issues

2020-12-18 Thread Jonathan Gray
On Fri, Dec 18, 2020 at 01:37:00PM +0100, Alex Raschi wrote:
> >Synopsis:X11 freezes with Radeon RX 560 and other issues
> >Category:system kernel amd64
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8-current (GENERIC.MP) #218: Wed Dec  9 
> 23:06:07 MST 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   X11 freezes after some usage with amdgpu driver leaving the screen
>   unresponsive, this happen with both xf86-video-amdgpu and
>   xf86-video-modesetting. With modesetting the system is generally more
>   stable and lasts longer. Sometimes i am able to switch to a tty and
>   restart xenodm otherwise i need to reboot.
> 
>   With xf86-video-amdgpu libGL is unable to use DRI3 and falls back to
>   DRI2, this does not happen with xf86-video-modesetting. I noticed this
>   also on a intel computer but i cannot try with modesetting there
>   because the gpu supports only opengl 1.4. I added a sample output of
>   `LIBGL_DEBUG=verbose glxinfo` with both drivers at the end. I tried
>   also with `Option "DRI" "3"` but without luck.
> 
>   With xf86-video-amdgpu if the monitor turns off for some time (for
>   example with dpms/screensaver) i cannot turn it on again anymore (for
>   example by typing or moving mouse), this again does not happen with
>   xf86-video-modesetting. When this occurs the kernel and X11 do not
>   write anything, the logs are the same as the ones below without the
>   error messages.

Not sure what is going on there, don't see that here.

> 
>   On Linux mesa creates $XDG_CACHE_HOME/mesa_shader_cache but on OpenBSD
>   this does not happen, i know this is just an optimization but i wonder
>   if it was disabled or simply doesn't work.

The shader cache is disabled to avoid pledge/unveil violations with web
browsers.

> 
>   I had a file in /etc/X11/xorg.conf.d with inside:
> 
>   Section "ServerFlags"
>   Option "DontVTSwitch" "1"
>   Option "DontZap" "1"
>   EndSection
> 
>   But after removing it and leaving the xorg.conf.d directory empty i
>   still have the freeze issue. I initially thought this file was the
>   problem. During the last month i tried the multiple snapshots of
>   -current.
> 
>   Related:
>   https://marc.info/?l=openbsd-bugs&m=160736093218686&w=2
>   https://marc.info/?l=openbsd-bugs&m=159383256417523&w=2
> >How-To-Repeat:
>   X11 can freeze near boot (after xenodm login) or after some hours/days,
>   the most problematic usage seems to be using opengl/acceleration
>   programs like games and fat browsers. Sometimes running 2 games for
>   some seconds is enough.
> >Fix:
>   With xf86-video-modesetting there are no issues with DRI3/monitor and
>   it is more stable than xf86-video-amdgpu, however X11 freezes with this
>   driver too. A complete fix is unknown.

The ring timeouts/gpu hangs you see show on other amdgpu parts as well
and can also be triggered with piglit.  Newer Mesa releases don't seem
to help.  It seems to be a problem specific to OpenBSD but it isn't
clear what is causing it.



Re: bsd.rd hangs on boot; bsd.mp works

2020-12-20 Thread Jonathan Gray
On Sun, Dec 20, 2020 at 06:27:33AM +, James Cook wrote:
> On Sat, Dec 19, 2020 at 07:33:42AM +, James Cook wrote:
> > > Suggestions are welcome. In the meantime I am slowly trying to debug
> > > this myself, mostly as a learning exercise. I've successfully built my
> > > own bsd.rd (using the instructions on the release(8) man page) with the
> > > intention of adding some debug output to narrow down where it's
> > > getting stuck, but I don't know my way around the kernel code.
> > 
> > Minor progress: I have determined that the kernel gets at least as far
> > as exec-ing the init process (more precisely, calling sys_execve in
> > init_main.c).
> 
> I found out init gets stuck calling sleep(2) in setctty in
> sbin/init/init.c. (Details below on how I determined that.)
> 
> Any idea what could cause a call to sleep to just hang indefinitely?
> 

Can you try hpet instead of tsc?

Either sysctl kern.timecounter.hardware=acpihpet0

or build a kernel with something like this which will give hpet a higher
priority than tsc.

Index: sys/dev/acpi/acpihpet.c
===
RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v
retrieving revision 1.24
diff -u -p -r1.24 acpihpet.c
--- sys/dev/acpi/acpihpet.c 6 Jul 2020 13:33:08 -   1.24
+++ sys/dev/acpi/acpihpet.c 20 Dec 2020 09:32:40 -
@@ -45,7 +45,7 @@ static struct timecounter hpet_timecount
0x, /* counter_mask (32 bits) */
0,  /* frequency */
0,  /* name */
-   1000,   /* quality */
+   2500,   /* quality */
NULL,   /* private bits */
0,  /* expose to user */
 };



Re: bsd.rd hangs on boot; bsd.mp works

2020-12-21 Thread Jonathan Gray
On Tue, Dec 22, 2020 at 02:56:58AM +, James Cook wrote:
> On Sun, Dec 20, 2020 at 06:52:39PM +, James Cook wrote:
> > On Sun, Dec 20, 2020 at 08:41:00PM +1100, Jonathan Gray wrote:
> > > On Sun, Dec 20, 2020 at 06:27:33AM +, James Cook wrote:
> > > > On Sat, Dec 19, 2020 at 07:33:42AM +, James Cook wrote:
> > > > > > Suggestions are welcome. In the meantime I am slowly trying to debug
> > > > > > this myself, mostly as a learning exercise. I've successfully built 
> > > > > > my
> > > > > > own bsd.rd (using the instructions on the release(8) man page) with 
> > > > > > the
> > > > > > intention of adding some debug output to narrow down where it's
> > > > > > getting stuck, but I don't know my way around the kernel code.
> > > > > 
> > > > > Minor progress: I have determined that the kernel gets at least as far
> > > > > as exec-ing the init process (more precisely, calling sys_execve in
> > > > > init_main.c).
> > > > 
> > > > I found out init gets stuck calling sleep(2) in setctty in
> > > > sbin/init/init.c. (Details below on how I determined that.)
> > > > 
> > > > Any idea what could cause a call to sleep to just hang indefinitely?
> > > > 
> > > 
> > > Can you try hpet instead of tsc?
> > > 
> > > Either sysctl kern.timecounter.hardware=acpihpet0
> > 
> > With (my custom) bsd.rd, that sysctl does not seem to exist, and it still 
> > hangs
> > on boot with the below code change.
> > 
> > With bsd.sp and bsd.mp, the sysctl was already set to acpihpet0. If I set 
> > it to
> > tsc, then "sleep 1" gets somewhat slower but doesn't take on the order of
> > minutes to an hour like it does with bsd.rd.
> > 
> > If it's relevant: I noticed the time reported by the "date" command advances
> > normally with bsd.sp/bsd.mp and acpihpet0 (I ran two date commands 15s apart
> > according to my phone timer, and got dates 15s apart). With the sysctl set 
> > to
> > tsc, or with bsd.rd, the time returned by date advances slowly.
> > 
> > (With bsd.sp and bsd.mp, I tried "disable acpihpet*" at the boot_config 
> > prompt,
> > but the sysctl was still set to acpihpet0 when I booted.)
> 
> Sorry Jonathan, I somehow didn't include you in my reply, which is
> quoted above.
> 
> I have a bit more data to share.
> 
> It appears bsd.rd's tsleep_nsec is sleeping for about 2.5x as long as
> needed, but its uptime increases at only 1/2500 the rate of real time,
> so sys_nanosleep is calling tsleep_nsec about 1000 times to compensate.
> (I didn't wait around for all 1000 loop iterations).
> 
> By contrast, bsd.sp's tsleep_nsec sleeps for about 4.2x as long as
> requested, but it is able to measure real time accurately.
> 
> Based on my previous email, I guess bsd.rd is using tsc and bsd.sp is
> using hpet.
> 
> In more detail:
> 
> I added some debug output to the sys_nanosleep function, printing the
> "start" and "end" values returned by getnanouptime on each iteration of
> the loop. Diff at bottom of email.
> 
> I've summarized in these two tables and explained them in the text
> below.
> 
> 
> bsd.rd:
> 
> requested nsecs | actual time per loop iteration | end - start
> ++-
>  20 | 5s | 2ms
> 
> 
> bsd.sp:
> 
> requested nsecs | end - start (plausibly matches actual time)
> +
>  10 |   4.22s
>  20 |   8.40s
>  50 |  20.93s
> 100 |  41.82s
> 
> 
> requested nsecs:
> 
> This is the "nsecs" variable in the inner loop, matching what was
> passed to the nanosleep system call. In the case of bsd.rd, I list the
> value from the first loop iteration; for bsd.sp, there was always just
> one iteration.
> 
> 
> end - start:
> 
> This is the difference between the "stop" and "start" timespecs, filled
> by getnanouptime before and after the call to tsleep_nsec.
> 
> Note that for bsd.rd, it only goes up by 2ms, only 1/1000 of what was
> requested.
> 
> For bsd.sp, it seems to be about 0.04 + 4.18 * (requested time).
> 
> 
> "actual time per loop iteration":
> 
> For bsd.rd, each loop iteration takes about 5s (as measured 

Re: bsd.rd hangs on boot; bsd.mp works

2020-12-22 Thread Jonathan Gray
On Tue, Dec 22, 2020 at 06:30:48PM +, James Cook wrote:
> > +   case 0xa6: /* Coffeelake mobile */
> 
> The laptop's CPU is an i7-10710U, which I think is in the Comet Lake
> series, not Coffee Lake.

Yes 0xa6 is comet lake.

But we should really do what FreeBSD and Linux do and fallback to
cpuid 0x16 as Intel keeps creating new skylake variants.

The frequency from cpuid 0x15 is Hz, from 0x16 it is MHz.

Untested as I don't have any >= skylake machines.
If you can add a printf to check the value is sane that would
be helpful.

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.21
diff -u -p -r1.21 tsc.c
--- sys/arch/amd64/amd64/tsc.c  6 Sep 2020 20:50:00 -   1.21
+++ sys/arch/amd64/amd64/tsc.c  23 Dec 2020 00:52:03 -
@@ -66,14 +66,16 @@ tsc_freq_cpuid(struct cpu_info *ci)
eax = ebx = khz = dummy = 0;
CPUID(0x15, eax, ebx, khz, dummy);
khz /= 1000;
-   if (khz == 0) {
+   /*
+* Fallback to 'Processor Base Frequency' from cpuid 0x16 when
+* 'nominal frequency of the core crystal clock' from cpuid 0x15
+* is 0 on >= Skylake
+*/
+   if (khz == 0 && cpuid_level >= 0x16) {
+   CPUID(0x16, khz, dummy, dummy, dummy);
+   khz *= 1000;
+   } else if (khz == 0) {
switch (ci->ci_model) {
-   case 0x4e: /* Skylake mobile */
-   case 0x5e: /* Skylake desktop */
-   case 0x8e: /* Kabylake mobile */
-   case 0x9e: /* Kabylake desktop */
-   khz = 24000; /* 24.0 MHz */
-   break;
case 0x5f: /* Atom Denverton */
khz = 25000; /* 25.0 MHz */
break;



Re: bsd.rd hangs on boot; bsd.mp works

2020-12-23 Thread Jonathan Gray
On Wed, Dec 23, 2020 at 12:31:10PM +1100, Jonathan Gray wrote:
> On Tue, Dec 22, 2020 at 06:30:48PM +, James Cook wrote:
> > > + case 0xa6: /* Coffeelake mobile */
> > 
> > The laptop's CPU is an i7-10710U, which I think is in the Comet Lake
> > series, not Coffee Lake.
> 
> Yes 0xa6 is comet lake.
> 
> But we should really do what FreeBSD and Linux do and fallback to
> cpuid 0x16 as Intel keeps creating new skylake variants.
> 
> The frequency from cpuid 0x15 is Hz, from 0x16 it is MHz.
> 
> Untested as I don't have any >= skylake machines.
> If you can add a printf to check the value is sane that would
> be helpful.

As noticed by tb@ the last diff wasn't quite right:

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.21
diff -u -p -r1.21 tsc.c
--- sys/arch/amd64/amd64/tsc.c  6 Sep 2020 20:50:00 -   1.21
+++ sys/arch/amd64/amd64/tsc.c  23 Dec 2020 12:25:32 -
@@ -66,14 +66,16 @@ tsc_freq_cpuid(struct cpu_info *ci)
eax = ebx = khz = dummy = 0;
CPUID(0x15, eax, ebx, khz, dummy);
khz /= 1000;
-   if (khz == 0) {
+   /*
+* Fallback to 'Processor Base Frequency' from cpuid 0x16 when
+* 'nominal frequency of the core crystal clock' from cpuid 0x15
+* is 0 on >= Skylake
+*/
+   if (khz == 0 && cpuid_level >= 0x16) {
+   CPUID(0x16, khz, dummy, dummy, dummy);
+   khz = khz * 1000 * eax / ebx;
+   } else if (khz == 0) {
switch (ci->ci_model) {
-   case 0x4e: /* Skylake mobile */
-   case 0x5e: /* Skylake desktop */
-   case 0x8e: /* Kabylake mobile */
-   case 0x9e: /* Kabylake desktop */
-   khz = 24000; /* 24.0 MHz */
-   break;
case 0x5f: /* Atom Denverton */
khz = 25000; /* 25.0 MHz */
break;



Re: bsd.rd hangs on boot; bsd.mp works

2020-12-23 Thread Jonathan Gray
On Wed, Dec 23, 2020 at 08:43:10PM +, James Cook wrote:
> On Wed, Dec 23, 2020 at 11:47:05PM +1100, Jonathan Gray wrote:
> > On Wed, Dec 23, 2020 at 12:31:10PM +1100, Jonathan Gray wrote:
> > > On Tue, Dec 22, 2020 at 06:30:48PM +, James Cook wrote:
> > > > > + case 0xa6: /* Coffeelake mobile */
> > > > 
> > > > The laptop's CPU is an i7-10710U, which I think is in the Comet Lake
> > > > series, not Coffee Lake.
> > > 
> > > Yes 0xa6 is comet lake.
> > > 
> > > But we should really do what FreeBSD and Linux do and fallback to
> > > cpuid 0x16 as Intel keeps creating new skylake variants.
> > > 
> > > The frequency from cpuid 0x15 is Hz, from 0x16 it is MHz.
> > > 
> > > Untested as I don't have any >= skylake machines.
> > > If you can add a printf to check the value is sane that would
> > > be helpful.
> > 
> > As noticed by tb@ the last diff wasn't quite right:
> > 
> > Index: sys/arch/amd64/amd64/tsc.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > retrieving revision 1.21
> > diff -u -p -r1.21 tsc.c
> > --- sys/arch/amd64/amd64/tsc.c  6 Sep 2020 20:50:00 -   1.21
> > +++ sys/arch/amd64/amd64/tsc.c  23 Dec 2020 12:25:32 -
> > @@ -66,14 +66,16 @@ tsc_freq_cpuid(struct cpu_info *ci)
> > eax = ebx = khz = dummy = 0;
> > CPUID(0x15, eax, ebx, khz, dummy);
> > khz /= 1000;
> > -   if (khz == 0) {
> > +   /*
> > +* Fallback to 'Processor Base Frequency' from cpuid 0x16 when
> > +* 'nominal frequency of the core crystal clock' from cpuid 0x15
> > +* is 0 on >= Skylake
> > +*/
> > +   if (khz == 0 && cpuid_level >= 0x16) {
> > +   CPUID(0x16, khz, dummy, dummy, dummy);
> > +   khz = khz * 1000 * eax / ebx;
> > +   } else if (khz == 0) {
> > switch (ci->ci_model) {
> > -   case 0x4e: /* Skylake mobile */
> > -   case 0x5e: /* Skylake desktop */
> > -   case 0x8e: /* Kabylake mobile */
> > -   case 0x9e: /* Kabylake desktop */
> > -   khz = 24000; /* 24.0 MHz */
> > -   break;
> > case 0x5f: /* Atom Denverton */
> > khz = 25000; /* 25.0 MHz */
> > break;
> 
> The patch works (I tested bsd.rd; sleep and date both behave right).
> 
> Based on added printfs, it ends up with a khz of 23880, computed as
> 1600 * 1000 * 2 / 134.
> 
> For reference, I've attached dmesg, and the diff (applied on top of
> your diff) with the relevant printfs.
> 
> I notice the upper 16 bits of EAX for leaf 0x16 are described as
> "Reserved=0" in the intel manual. Should they be masked out?

Thanks for testing again.  I committed case additions instead of
this as kettenis@ could not get a Skylake machine to sync the clock
with it and there is reason to believe future processors won't report
a frequency of 0 as Cannon Lake, Ice Lake, and Tiger Lake don't.

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.21
diff -u -p -r1.21 tsc.c
--- sys/arch/amd64/amd64/tsc.c  6 Sep 2020 20:50:00 -   1.21
+++ sys/arch/amd64/amd64/tsc.c  24 Dec 2020 01:59:19 -
@@ -72,6 +72,8 @@ tsc_freq_cpuid(struct cpu_info *ci)
case 0x5e: /* Skylake desktop */
case 0x8e: /* Kabylake mobile */
case 0x9e: /* Kabylake desktop */
+   case 0xa5: /* CML-H CML-S62 CML-S102 */
+   case 0xa6: /* CML-U62 */
khz = 24000; /* 24.0 MHz */
break;
case 0x5f: /* Atom Denverton */

i5-10210U 000806EC 06-8e-0c CML-U42

http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel00806EC_CometLake_CPUID2.txt

CPUID 0015: 0002-00B0-- [88.00x / 0]
CPUID 0016: 0834-1068-0064- [2100 / 4200 / 100]

i7-10710U 000A0660 06-a6-00 CML-U62

http://users.atw.hu/instlatx64/GenuineIntel/GenuineIntel00A0660_CometLake_CPUID1.txt

CPUID 0015: 0002-0086-- [67.00x / 0]
CPUID 0016: 0640-125C-0064- [1600 / 4700 / 100]

i5-106

Re: bsd.rd hangs on boot; bsd.mp works

2020-12-23 Thread Jonathan Gray
On Thu, Dec 24, 2020 at 12:42:23AM +, James Cook wrote:
> On Wed, Dec 23, 2020 at 08:43:10PM +, James Cook wrote:
> > On Wed, Dec 23, 2020 at 11:47:05PM +1100, Jonathan Gray wrote:
> > > On Wed, Dec 23, 2020 at 12:31:10PM +1100, Jonathan Gray wrote:
> > > > On Tue, Dec 22, 2020 at 06:30:48PM +, James Cook wrote:
> > > > > > +   case 0xa6: /* Coffeelake mobile */
> > > > > 
> > > > > The laptop's CPU is an i7-10710U, which I think is in the Comet Lake
> > > > > series, not Coffee Lake.
> > > > 
> > > > Yes 0xa6 is comet lake.
> > > > 
> > > > But we should really do what FreeBSD and Linux do and fallback to
> > > > cpuid 0x16 as Intel keeps creating new skylake variants.
> > > > 
> > > > The frequency from cpuid 0x15 is Hz, from 0x16 it is MHz.
> > > > 
> > > > Untested as I don't have any >= skylake machines.
> > > > If you can add a printf to check the value is sane that would
> > > > be helpful.
> > > 
> > > As noticed by tb@ the last diff wasn't quite right:
> > > 
> > > Index: sys/arch/amd64/amd64/tsc.c
> > > ===
> > > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > > retrieving revision 1.21
> > > diff -u -p -r1.21 tsc.c
> > > --- sys/arch/amd64/amd64/tsc.c6 Sep 2020 20:50:00 -   1.21
> > > +++ sys/arch/amd64/amd64/tsc.c23 Dec 2020 12:25:32 -
> > > @@ -66,14 +66,16 @@ tsc_freq_cpuid(struct cpu_info *ci)
> > >   eax = ebx = khz = dummy = 0;
> > >   CPUID(0x15, eax, ebx, khz, dummy);
> > >   khz /= 1000;
> > > - if (khz == 0) {
> > > + /*
> > > +  * Fallback to 'Processor Base Frequency' from cpuid 0x16 when
> > > +  * 'nominal frequency of the core crystal clock' from cpuid 0x15
> > > +  * is 0 on >= Skylake
> > > +  */
> > > + if (khz == 0 && cpuid_level >= 0x16) {
> > > + CPUID(0x16, khz, dummy, dummy, dummy);
> > > + khz = khz * 1000 * eax / ebx;
> > > + } else if (khz == 0) {
> > >   switch (ci->ci_model) {
> > > - case 0x4e: /* Skylake mobile */
> > > - case 0x5e: /* Skylake desktop */
> > > - case 0x8e: /* Kabylake mobile */
> > > - case 0x9e: /* Kabylake desktop */
> > > - khz = 24000; /* 24.0 MHz */
> > > - break;
> > >   case 0x5f: /* Atom Denverton */
> > >   khz = 25000; /* 25.0 MHz */
> > >   break;
> > 
> > The patch works (I tested bsd.rd; sleep and date both behave right).
> > 
> > Based on added printfs, it ends up with a khz of 23880, computed as
> > 1600 * 1000 * 2 / 134.
> 
> I noticed something strange about the hw.setperf and hw.cpuspeed
> sysctls. I don't know if they're related to the original bug or its
> fix.
> 
> My hw.cpuspeed sysctl starts out at 16264, which seems way too high.
> This page
> 
>   
> https://ark.intel.com/content/www/us/en/ark/products/196448/intel-core-i7-10710u-processor-12m-cache-up-to-4-70-ghz.html
> 
> claims a "Max Turbo Frequency" of 4.70 GHz.
> 
> hw.setperf seems to start out at 1320, as indicated by sysctl output
> when I change it. Of course, if I lower it, I can't bring it back to
> 1320. If I set it to 100, hw.cpuspeed is a more plausible 1601.

The max + 1 state is the turbo state.

As for why cpuspeed is so high, have a look at
sys/arch/amd64/amd64/identcpu.c cpu_freq() can you confirm if
cpu_freq_ctr() returns a non zero value?  If so there is something wrong
with the performance counter method of getting cpuspeed.

> 
> This is mostly measured with bsd.sp with the above change plus all my
> printfs. I also briefly tried with bsd.mp and confirmed hw.setperf
> seems to start at 1320 there. I haven't tried testing the actual
> performance, but none of this seems to cause time distortion, at least.
> 
> Manually transcribed session with bsd.sp, the above patch, and my
> printfs:
> 
> nomad# sysctl hw.cpuspeed
> hw.cpuspeed=16264
> nomad# sysctl hw.setperf=0
> hw.setperf: 1320 -> 0
> nomad# sysctl hw.cpuspeed
> hw.cpuspeed=400
> nomad# sysctl hw.setperf=100
> hw.setperf: 0 -> 100
> nomad# sysctl hw.cpuspeed
> 1601
> nomad# sysctl hw.setperf=1320
> ssyctl: hw.setperf: Invalid argument
> 
> -- 
> James
> 
> 



Re: Cannot install - "WARNING: CHECK AND RESET THE DATE!"

2020-12-26 Thread Jonathan Gray
On Sat, Dec 26, 2020 at 11:36:10PM +0100, d1.and...@tuta.io wrote:
> Hi!
> 
> Whenever I try to install OpenBSD on my host, the console freezes before I 
> can ask for a shell or an installation with the message "WARNING: CHECK AND 
> RESET THE DATE!".
> 
> I'd like to send some logs but I don't know how since my issue prevents me 
> doing other stuff.
> 
> Here are my system infos: Lenovo ThinkPad P1 Gen 3, Intel i7-10750H, 8GB of 
> RAM.  The exact same issue is also manifesting on my friend's identical 
> ThinkPad.
> 
> Here is what I've done so far:
> - Updated my machine's firmware.
> - Set my time in the BIOS to the current date (2020-12-26) at the current UTC 
> time, as specified in the manual. (Also in my BIOS, under the 'UEFI BIOS 
> version' there's the parameter 'UEFI BIOS date: 2020-09-09' I don't think 
> that's related tough).
> - Tried redownloading the install68.img (also checking the hash) and 
> reburning multiple times on my USBs, with dd, echo and Etcher.
> - Tried the most recent snapshot, same issue.
> - Grasping onto nothingness for answers I tried disabling kvm, Hyperthreading 
> and multi-core processing in my BIOS.
> 
> I'm kinda desperate at this point. As I said, I'd like to provide more infos, 
> so let me know if I can in some magic way.

You will need to use a snapshot on these new Comet Lake machines, which
should include the following commit to sys/arch/amd64/amd64/tsc.c

i7-10750H likely has a cpuid model of 0xa5


revision 1.22
date: 2020/12/24 04:20:48;  author: jsg;  state: Exp;  lines: +3 -1;  commitid: 
U1nedAeXF0flsb1r;
handle reported core clock frequency of 0 on newer Intel Comet Lake

The 'nominal core crystal clock frequency' from cpuid 0x15 is 0 on
Intel model 0xa5 (CML-H CML-S62 CML-S102) and 0xa6 (CML-U62). So act as
if 24 MHz was reported like we do on other Skylake/Kaby Lake variants.
Comet Lake processors with model 0x8e (CML-U42 CML-Y42) use the same model
number used by Kaby Lake and many other parts which was already handled.

While we could approximate the crystal frequency with 'Processor Base
Frequency' from cpuid 0x16 eax like FreeBSD and Linux do, kettenis@ couldn't
get ntpd to sync a clock on a Skylake machine with:
CPUID 0x15: eax=2, ebx=134, khz=0
CPUID 0x16: eax=1600, ebx=1600, ecx=100, edx=0
with reported crystal frequency changing from 24000 kHz to 23880 kHz
(cpuid 0x16 eax * 1000 * cpuid 0x15 eax / cpuid 0x15 ebx) and
TSC frequency changing from 160800 to 159996.

Cannon Lake, Ice Lake, and Tiger Lake are known to return non-zero
frequency in cpuid 0x15 so hopefully no other model ids have to be added.

James Cook reported hangs on bsd.rd with i7-10710U 06-a6-00 (CML-U62)
(which does not have acpihpet) but not with bsd.mp (which does) and has
confirmed that both approaches fixed the problem.




Re: bsd.rd hangs on boot; bsd.mp works

2020-12-26 Thread Jonathan Gray
On Sun, Dec 27, 2020 at 04:34:32AM +, James Cook wrote:
> On Thu, Dec 24, 2020 at 03:34:33PM +1100, Jonathan Gray wrote:
> > On Thu, Dec 24, 2020 at 12:42:23AM +, James Cook wrote:
> > > On Wed, Dec 23, 2020 at 08:43:10PM +, James Cook wrote:
> > > > On Wed, Dec 23, 2020 at 11:47:05PM +1100, Jonathan Gray wrote:
> > > > > On Wed, Dec 23, 2020 at 12:31:10PM +1100, Jonathan Gray wrote:
> > > > > > On Tue, Dec 22, 2020 at 06:30:48PM +, James Cook wrote:
> > > > > > > > +   case 0xa6: /* Coffeelake mobile */
> > > > > > > 
> > > > > > > The laptop's CPU is an i7-10710U, which I think is in the Comet 
> > > > > > > Lake
> > > > > > > series, not Coffee Lake.
> > > > > > 
> > > > > > Yes 0xa6 is comet lake.
> > > > > > 
> > > > > > But we should really do what FreeBSD and Linux do and fallback to
> > > > > > cpuid 0x16 as Intel keeps creating new skylake variants.
> > > > > > 
> > > > > > The frequency from cpuid 0x15 is Hz, from 0x16 it is MHz.
> > > > > > 
> > > > > > Untested as I don't have any >= skylake machines.
> > > > > > If you can add a printf to check the value is sane that would
> > > > > > be helpful.
> > > > > 
> > > > > As noticed by tb@ the last diff wasn't quite right:
> > > > > 
> > > > > Index: sys/arch/amd64/amd64/tsc.c
> > > > > ===
> > > > > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > > > > retrieving revision 1.21
> > > > > diff -u -p -r1.21 tsc.c
> > > > > --- sys/arch/amd64/amd64/tsc.c6 Sep 2020 20:50:00 -   
> > > > > 1.21
> > > > > +++ sys/arch/amd64/amd64/tsc.c23 Dec 2020 12:25:32 -
> > > > > @@ -66,14 +66,16 @@ tsc_freq_cpuid(struct cpu_info *ci)
> > > > >   eax = ebx = khz = dummy = 0;
> > > > >   CPUID(0x15, eax, ebx, khz, dummy);
> > > > >   khz /= 1000;
> > > > > - if (khz == 0) {
> > > > > + /*
> > > > > +  * Fallback to 'Processor Base Frequency' from cpuid 
> > > > > 0x16 when
> > > > > +  * 'nominal frequency of the core crystal clock' from 
> > > > > cpuid 0x15
> > > > > +  * is 0 on >= Skylake
> > > > > +  */
> > > > > + if (khz == 0 && cpuid_level >= 0x16) {
> > > > > + CPUID(0x16, khz, dummy, dummy, dummy);
> > > > > + khz = khz * 1000 * eax / ebx;
> > > > > + } else if (khz == 0) {
> > > > >   switch (ci->ci_model) {
> > > > > - case 0x4e: /* Skylake mobile */
> > > > > - case 0x5e: /* Skylake desktop */
> > > > > - case 0x8e: /* Kabylake mobile */
> > > > > - case 0x9e: /* Kabylake desktop */
> > > > > - khz = 24000; /* 24.0 MHz */
> > > > > - break;
> > > > >   case 0x5f: /* Atom Denverton */
> > > > >   khz = 25000; /* 25.0 MHz */
> > > > >   break;
> > > > 
> > > > The patch works (I tested bsd.rd; sleep and date both behave right).
> > > > 
> > > > Based on added printfs, it ends up with a khz of 23880, computed as
> > > > 1600 * 1000 * 2 / 134.
> > > 
> > > I noticed something strange about the hw.setperf and hw.cpuspeed
> > > sysctls. I don't know if they're related to the original bug or its
> > > fix.
> > > 
> > > My hw.cpuspeed sysctl starts out at 16264, which seems way too high.
> > > This page
> > > 
> > >   
> > > https://ark.intel.com/content/www/us/en/ark/products/196448/intel-core-i7-10710u-processor-12m-cache-up-to-4-70-ghz.html
> > > 
> > > claims a "Max Turbo Frequency" of 4.70 GHz.
> > > 
> > > hw.setperf seems to start out at 1320, as indicated by s

Re: Failed to boot install image of OpenBSD 6.8

2020-12-27 Thread Jonathan Gray
On Sun, Dec 27, 2020 at 11:50:18PM +, Douglas S. wrote:
> Hi. I'm using this email to report the bug because I haven't been able to 
> install OpenBSD and use its bug reporting tool.
> 
> I'm using install68.img that I've written to a USB stick. The image checksum 
> and signature was verified and found no problems.
> 
> So I shutdown my computer and boot from that USB stick. I see a lot of logs 
> and then it stops right here.
> 
> (see picture attached)
> 
> At this point the system appears to be completely frozen. The keyboard lights 
> (CAPS and others) don't respond.
> 
> Hardware information:
> Motherboard: Gigabyte B460M DS3H
> Processor: 10th gen. Intel Core i3-10100
> RAM: 16 GB HyperX DDR4 2400 MHz
> Connected through: Ethernet cable (no WiFi)
> Storage: Kingston A2000 NVMe M.2 SSD
> Graphics card: AMD RX 560
> UEFI (secure boot disabled)

You will need to use a snapshot on these new Comet Lake machines, which
should include the following commit to sys/arch/amd64/amd64/tsc.c

i3-10100 likely has a cpuid model of 0xa5


revision 1.22
date: 2020/12/24 04:20:48;  author: jsg;  state: Exp;  lines: +3 -1;  commitid: 
U1nedAeXF0flsb1r;
handle reported core clock frequency of 0 on newer Intel Comet Lake

The 'nominal core crystal clock frequency' from cpuid 0x15 is 0 on
Intel model 0xa5 (CML-H CML-S62 CML-S102) and 0xa6 (CML-U62). So act as
if 24 MHz was reported like we do on other Skylake/Kaby Lake variants.
Comet Lake processors with model 0x8e (CML-U42 CML-Y42) use the same model
number used by Kaby Lake and many other parts which was already handled.

While we could approximate the crystal frequency with 'Processor Base
Frequency' from cpuid 0x16 eax like FreeBSD and Linux do, kettenis@ couldn't
get ntpd to sync a clock on a Skylake machine with:
CPUID 0x15: eax=2, ebx=134, khz=0
CPUID 0x16: eax=1600, ebx=1600, ecx=100, edx=0
with reported crystal frequency changing from 24000 kHz to 23880 kHz
(cpuid 0x16 eax * 1000 * cpuid 0x15 eax / cpuid 0x15 ebx) and
TSC frequency changing from 160800 to 159996.

Cannon Lake, Ice Lake, and Tiger Lake are known to return non-zero
frequency in cpuid 0x15 so hopefully no other model ids have to be added.

James Cook reported hangs on bsd.rd with i7-10710U 06-a6-00 (CML-U62)
(which does not have acpihpet) but not with bsd.mp (which does) and has
confirmed that both approaches fixed the problem.




Re: bsd.rd hangs on boot; bsd.mp works

2020-12-27 Thread Jonathan Gray
On Sun, Dec 27, 2020 at 07:48:58PM +, James Cook wrote:
> > from your earlier bsd.mp dmesg:
> > 
> > cpu0: Enhanced SpeedStep 16268 MHz: speeds: 1601, 1600, 1500, 1400, 1300, 
> > 1200, 1100, 1000, 900, 800, 700, 600, 500, 400 MHz
> > 
> > 1601 is variable/turbo speed the others are fixed.  When running in
> > turbo mode getting the current frequency involves msrs which
> > hw.cpuspeed doesn't do. The initial 'x MHz:' is from cpuspeed not from
> > the acpi table.
> > 
> > On modern machines these values are from acpicpu(4)/acpi _PSS.
> > 
> > Can you show the output of running with the following diff to dump the
> > performance counter control values?
> > 
> > MSR_PERF_FIXED_CTR_CTRL 0x38d 
> > MSR_PERF_GLOBAL_CTRL 0x38f
> > 
> > on a broadwell laptop this shows
> > 
> > cpu0 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > cpu0 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > cpu0 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > cpu0 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > cpu1 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > cpu1 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > cpu1 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > cpu1 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > cpu2 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > cpu2 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > cpu2 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > cpu2 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > cpu3 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > cpu3 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > cpu3 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > cpu3 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> 
> Here's dmesg with that patch.

Thanks, here is another diff to try.  It sets the bit to be enabled
not just for ring 0, fixes clearing a value and shows the values
read out of the counter.

Index: sys/arch/amd64/amd64/identcpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.117
diff -u -p -r1.117 identcpu.c
--- sys/arch/amd64/amd64/identcpu.c 13 Sep 2020 05:57:28 -  1.117
+++ sys/arch/amd64/amd64/identcpu.c 28 Dec 2020 02:03:27 -
@@ -411,6 +411,9 @@ via_update_sensor(void *args)
 }
 #endif
 
+#define MSR_PERF_GLOBAL_STATUS 0x38e
+#define MSR_PERF_GLOBAL_INUSE 0x392
+
 uint64_t
 cpu_freq_ctr(struct cpu_info *ci)
 {
@@ -421,13 +424,22 @@ cpu_freq_ctr(struct cpu_info *ci)
CPUIDEDX_NUM_FC(cpu_perf_edx) <= 1)
return (0);
 
+   if ((cpu_perf_eax & CPUIDEAX_VERID) > 3) {
+   msr = rdmsr(MSR_PERF_GLOBAL_INUSE);
+   printf("%s %s MSR_PERF_GLOBAL_INUSE 0x%llx\n", 
ci->ci_dev->dv_xname,
+   __func__, msr);
+   }
+   msr = rdmsr(MSR_PERF_GLOBAL_STATUS);
+   printf("%s %s MSR_PERF_GLOBAL_STATUS before 0x%llx\n", 
ci->ci_dev->dv_xname,
+   __func__, msr);
+
msr = rdmsr(MSR_PERF_FIXED_CTR_CTRL);
if (msr & MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK)) {
/* some hypervisor is dicking us around */
return (0);
}
 
-   msr |= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_1);
+   msr |= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_ANY);
wrmsr(MSR_PERF_FIXED_CTR_CTRL, msr);
 
msr = rdmsr(MSR_PERF_GLOBAL_CTRL) | MSR_PERF_GLOBAL_CTR1_EN;
@@ -437,13 +449,20 @@ cpu_freq_ctr(struct cpu_info *ci)
delay(10);
count = rdmsr(MSR_PERF_FIXED_CTR1);
 
+   msr = rdmsr(MSR_PERF_GLOBAL_STATUS);
+   printf("%s %s MSR_PERF_GLOBAL_STATUS after 0x%llx\n", 
ci->ci_dev->dv_xname,
+   __func__, msr);
+
msr = rdmsr(MSR_PERF_FIXED_CTR_CTRL);
-   msr &= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK);
+   msr &= ~MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK);
wrmsr(MSR_PERF_FIXED_CTR_CTRL, msr);
 
msr = rdmsr(MSR_PERF_GLOBAL_CTRL);
msr &= ~MSR_PERF_GLOBAL_CTR1_EN;
wrmsr(MSR_PERF_GLOBAL_CTRL, msr);
+
+   printf("%s %s count %lld last_count %lld freq %lld\n", 
ci->ci_dev->dv_xname,
+   __func__, count, last_count, ((count - last_count) * 10));
 
return ((count - last_count) * 10);
 }



Re: bsd.rd hangs on boot; bsd.mp works

2020-12-27 Thread Jonathan Gray
On Mon, Dec 28, 2020 at 04:31:09AM +, James Cook wrote:
> On Mon, Dec 28, 2020 at 01:19:19PM +1100, Jonathan Gray wrote:
> > On Sun, Dec 27, 2020 at 07:48:58PM +, James Cook wrote:
> > > > from your earlier bsd.mp dmesg:
> > > > 
> > > > cpu0: Enhanced SpeedStep 16268 MHz: speeds: 1601, 1600, 1500, 1400, 
> > > > 1300, 1200, 1100, 1000, 900, 800, 700, 600, 500, 400 MHz
> > > > 
> > > > 1601 is variable/turbo speed the others are fixed.  When running in
> > > > turbo mode getting the current frequency involves msrs which
> > > > hw.cpuspeed doesn't do. The initial 'x MHz:' is from cpuspeed not from
> > > > the acpi table.
> > > > 
> > > > On modern machines these values are from acpicpu(4)/acpi _PSS.
> > > > 
> > > > Can you show the output of running with the following diff to dump the
> > > > performance counter control values?
> > > > 
> > > > MSR_PERF_FIXED_CTR_CTRL 0x38d 
> > > > MSR_PERF_GLOBAL_CTRL 0x38f
> > > > 
> > > > on a broadwell laptop this shows
> > > > 
> > > > cpu0 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > > > cpu0 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > > > cpu0 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > > > cpu0 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > > > cpu1 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > > > cpu1 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > > > cpu1 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > > > cpu1 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > > > cpu2 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > > > cpu2 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > > > cpu2 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > > > cpu2 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > > > cpu3 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603
> > > > cpu3 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3
> > > > cpu3 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0
> > > > cpu3 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf
> > > 
> > > Here's dmesg with that patch.
> > 
> > Thanks, here is another diff to try.  It sets the bit to be enabled
> > not just for ring 0, fixes clearing a value and shows the values
> > read out of the counter.
> > 
> > Index: sys/arch/amd64/amd64/identcpu.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
> > retrieving revision 1.117
> > diff -u -p -r1.117 identcpu.c
> > --- sys/arch/amd64/amd64/identcpu.c 13 Sep 2020 05:57:28 -  1.117
> > +++ sys/arch/amd64/amd64/identcpu.c 28 Dec 2020 02:03:27 -
> > @@ -411,6 +411,9 @@ via_update_sensor(void *args)
> >  }
> >  #endif
> >  
> > +#define MSR_PERF_GLOBAL_STATUS 0x38e
> > +#define MSR_PERF_GLOBAL_INUSE 0x392
> > +
> >  uint64_t
> >  cpu_freq_ctr(struct cpu_info *ci)
> >  {
> > @@ -421,13 +424,22 @@ cpu_freq_ctr(struct cpu_info *ci)
> > CPUIDEDX_NUM_FC(cpu_perf_edx) <= 1)
> > return (0);
> >  
> > +   if ((cpu_perf_eax & CPUIDEAX_VERID) > 3) {
> > +   msr = rdmsr(MSR_PERF_GLOBAL_INUSE);
> > +   printf("%s %s MSR_PERF_GLOBAL_INUSE 0x%llx\n", 
> > ci->ci_dev->dv_xname,
> > +   __func__, msr);
> > +   }
> > +   msr = rdmsr(MSR_PERF_GLOBAL_STATUS);
> > +   printf("%s %s MSR_PERF_GLOBAL_STATUS before 0x%llx\n", 
> > ci->ci_dev->dv_xname,
> > +   __func__, msr);
> > +
> > msr = rdmsr(MSR_PERF_FIXED_CTR_CTRL);
> > if (msr & MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK)) {
> > /* some hypervisor is dicking us around */
> > return (0);
> > }
> >  
> > -   msr |= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_1);
> > +   msr |= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_ANY);
> > wrmsr(MSR_PERF_FIXED_CTR_CTRL, msr);
> >  
> > msr = rdmsr(MSR_PERF_GLOBAL_CTRL) | MSR_PERF_GLOBAL_CTR1_EN;
> > @@ -437,13 +449,20 @@ cpu_freq_ctr(struct cpu_info *ci)
> > delay(10);
> > count = rdmsr(MSR_PERF_FIXED_CTR1);
> >  
> > +   msr = rdmsr(MSR_PERF_GLOBAL_STATUS);
> > +   printf("%s %s MSR_PERF_GLOBAL_STATUS after 0x%llx\n", 
> > ci->ci_dev->dv_xname,
> > +   __func__, msr);
> > +
> >   

Re: Fwd: Re: Protectli FW1 with Intel 82583V - Interfaces errors and latency spike issue

2021-01-06 Thread Jonathan Gray
On Tue, Jan 05, 2021 at 10:28:20PM -1000, st...@wdwd.me wrote:
> I tested with a Protectli FW1 router (dmesg below) forwarding packets
> between two test machines. The latency spikes occur when running headless
> beginning with this commit:

As the interrupt is handled via msi it wouldn't be a shared interrupt
related problem.

Perhaps some drm kernel thread, but I can't think of anything that would
be doing work with no display connected.

Can you show 'vmstat -iz' output?

> 
> commit 78cd60329a9b42e2a8e91bb88c3d556b4e420e89
> Author: jsg 
> Date:   Mon Sep 9 01:35:43 2019 +
> 
> When no display outputs are connected on boot linux 4.19 drm relies on
> deferred setup to handle the console framebuffer where as linux 4.4 drm
> created a 1024x768 console framebuffer in this situation.
> 
> As we only handle setting up rasops and wsdisplay on attach go back to
> the old behaviour for now so a display can be connected after booting
> with none attached to interact with the console.
> 
> This partly reverts linux commit
> drm/fb-helper: Support deferred setup
> ca91a2758fcef6635626993557dd51cfbb6dd134
> 
> Reported and tested by Marcus MERIGHI.
> Tested by and ok kettenis@
> 
> diff --git sys/dev/pci/drm/drm_fb_helper.c sys/dev/pci/drm/drm_fb_helper.c
> index 8e134187f6c..d95b8acbebd 100644
> --- sys/dev/pci/drm/drm_fb_helper.c
> +++ sys/dev/pci/drm/drm_fb_helper.c
> @@ -2002,15 +2002,23 @@ static int drm_fb_helper_single_fb_probe(struct
> drm_fb_helper *fb_helper,
> }
> 
> if (crtc_count == 0 || sizes.fb_width == -1 || sizes.fb_height ==
> -1) {
> +#ifdef __linux__
> DRM_INFO("Cannot find any crtc or sizes\n");
> 
> /* First time: disable all crtc's.. */
> -#ifdef notyet
> /* XXX calling this hangs boot with no connected outputs */
> if (!fb_helper->deferred_setup /* &&
> SPLAY_EMPTY(fb_helper->dev->files) */)
> restore_fbdev_mode(fb_helper);
> -#endif
> return -EAGAIN;
> +#else
> +   /*
> +* hmm everyone went away - assume VGA cable just fell out
> +* and will come back later.
> +*/
> +   DRM_INFO("Cannot find any crtc or sizes - going
> 1024x768\n");
> +   sizes.fb_width = sizes.surface_width = 1024;
> +   sizes.fb_height = sizes.surface_height = 768;
> +#endif
> }
> 
> /* Handle our overallocation */
> 
> 
> Here are the test results:
> 
> OpenBSD 6.5
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.444/0.892/2.103/0.200 ms
> 
> OpenBSD 6.6
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.371/2.625/173.884/13.270 ms
> 
> OpenBSD 6.6 with commit reversed
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.418/0.567/1.303/0.093 ms
> 
> OpenBSD 6.7
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.392/1.312/170.699/4.993 ms
> 
> OpenBSD 6.7 with commit reversed
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.397/0.545/1.263/0.071 ms
> 
> OpenBSD 6.8
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.402/1.144/97.022/2.396 ms
> 
> OpenBSD 6.8 with commit reversed
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.466/0.935/2.116/0.239 ms
> 
> OpenBSD 2021-01-03 snapshot
> --- 192.168.2.100 ping statistics ---
> 3600 packets transmitted, 3600 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.405/1.192/79.021/2.304 ms
> 
> 
> OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4164894720 (3971MB)
> avail mem = 4023623680 (3837MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xebfd0 (51 entries)
> bios0: vendor American Megatrends Inc. version "5.6.5" date 05/14/2019
> bios0: Protectli FW1
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG LPIT HPET SSDT SSDT SSDT UEFI
> acpi0: wakeup devices PS2K(S0) PS2M(S0) XHC1(S4) RP01(S4) PXSX(S4) RP02(S4)
> PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0

Re: A sparc oddity (hair-pulling bug)

2021-01-06 Thread Jonathan Gray
On Wed, Jan 06, 2021 at 07:14:08PM +, Miod Vallat wrote:
> I have been confused by a very strange issue on sparc64 over the last
> few days, and I can't figure out its cause.
> 
> What happens is that cold boots work, but warm boots (i.e. rebooting)
> almost always fail like this:
> 
> Rebooting with command: boot  
> Boot device: disk  File and args: 
> OpenBSD IEEE 1275 Bootblock 2.1
> ..>> OpenBSD BOOT 1.20
> Trying bsd...
> NOTE: random seed is being reused.
> Booting /sbus@1f,0/SUNW,fas@e,880/sd@0,0:a/bsd
> 9956944@0x100+4528@0x197ee50+191788@0x1c0+4002516@0x1c2ed2c 
> symbols @ 0xffde0400 476705+165+641112+443003 start=0x100
> [ using 1562024 bytes of bsd ELF symbol table ]
> Fast Data Access MMU Miss
> ok 
> 
> However, sometimes, a bsd.rd -> bsd transition works (i.e. boot bsd.rd,
> install or upgrade or even do nothing, then reboot to bsd). But most of
> the time it fails in the same way. And once it has failed, there does
> not seem to be a way to boot a kernel without having to poweroff
> (reset-all will not help).
> 
> At first I thought this was a subtle relinking problem, but it isn't.
> >From the prom, I have been able to get this trace:
>   mtx_enter+0x58
>   msgbuf_putchar+0x2c
>   initmsgbuf+0x80
>   pmap_bootstrap+0x140
>   bootstrap+0x18c
> 
> This is a on an Ultra 1, thus single-processor machine. The code for
> mtx_enter() is:
> 
>   void
>   mtx_enter(struct mutex *mtx)
>   {
>   struct cpu_info *ci = curcpu();
> 
>   /* Avoid deadlocks after panic or in DDB */
>   if (panicstr || db_active)
>   return;
> 
>   WITNESS_CHECKORDER(MUTEX_LOCK_OBJECT(mtx),
>   LOP_EXCLUSIVE | LOP_NEWORDER, NULL);
> 
>   #ifdef DIAGNOSTIC
>   if (__predict_false(mtx->mtx_owner == ci))
>   panic("mtx %p: locking against myself", mtx);
>   #endif
> 
>   if (mtx->mtx_wantipl != IPL_NONE)
>   mtx->mtx_oldipl = splraise(mtx->mtx_wantipl);
> 
>   mtx->mtx_owner = ci;
> 
>   #ifdef DIAGNOSTIC
>   ci->ci_mutex_level++;
>   #endif
>   WITNESS_LOCK(MUTEX_LOCK_OBJECT(mtx), LOP_EXCLUSIVE);
>   }
> 
> and the "Fast Data Access MMU Miss" occurs on the
>   ci->ci_mutex_level++;
> line.
> 
> It turns out that, being a single-processor kernel, ci == CPUINFO_VA ==
> 0xe0018000 (KERNEND + 64KB + 32KB).
> 
> And the prom tells me that:
> 
>   ok e0018000 map?
>   VA:e0018000 
>   G:0 W:0 P:0 E:0 CV:0 CP:0 L:0 Soft1:0 PA[40:13]:0 PA:0 
>   Diag:0 Soft2:0 IE:0 NFO:0 Size:0 V:0 
>   Invalid
> 
> Is this a PROM mapping which got invalidated by mistake, or a mapping
> which ought to have been set up by the boot blocks but is no longer set
> up correctly? I see no obvious change to blame about this in the last
> few releases.
> 
> Any ideas on where to look or what to try to get to understand that
> problem better?

I saw something like this on a V120.

Booting /pci@1f,0/pci@1/scsi@8/disk@0,0:a/bsd.1105
9901216@0x100+2912@0x19714a0+191348@0x1c0+4002956@0x1c2eb74
symbols @ 0xfee82400 479089+165+641136+442948 start=0x100
[ using 1564376 bytes of bsd ELF symbol table ]
Fast Data Access MMU Miss
ok

instead of
9910440@0x100+1880@0x19738a8+188572@0x1c0+4005732@0x1c2e09c
symbols @ 0xfee80400 481463+165+641016+442891 start=0x100
[ using 1566568 bytes of bsd ELF symbol table ]
console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8
...

(with bsd did not occur with bsd.rd)

Booting a known good kernel was not enough to clear this state
or even reset-all at the ok prompt.  I had to do power-off at
ok prompt and poweron at lom prompt.

I think the window this occurs is something like:

bad
OpenBSD 6.8-current (GENERIC) #510: Thu Oct 29 19:58:32 MDT 2020

good
OpenBSD 6.8-current (GENERIC) #508: Thu Oct 29 06:05:29 MDT 2020



Re: Stopped at tsc_delay+0x63: lfence

2021-01-27 Thread Jonathan Gray
On Wed, Jan 27, 2021 at 07:11:49AM +0100, alf wrote:
> Hello,
> 
> while trying to upgrade one of our machines to 6.8 we experienced a
> repeatable crash while booting (bsd.rd + install went fine).
> 
> The machine in question is a:
> ...
> hw.vendor=HP
> hw.product=ProLiant DL360 G7
> hw.serialno=CZ3451KJW6
> hw.uuid=3637-3738-435a-3334-35314b4a5736
> hw.physmem=8562860032
> hw.usermem=8562847744
> hw.ncpufound=12
> hw.allowpowerdown=1
> hw.perfpolicy=manual
> hw.smt=0
> hw.ncpuonline=6
> ...
> 
> Since this is a production machine we downgraded to 6.7 (upgrade from
> 6.6 which it was running before went flawlessly).
> 
> Find below the dmesg of the 6.8 kernel, 6.8-current and finally the
> 6.7 kernel. For the 6.8* I also provided 'trace' and 'show registers'
> output.
> 
> I hope this is enough info to get an idea of what was going on.
> I'll happily will provide additional info if needed.
> 
> Alf
> 
> cpu0: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, 2667.08 MHz, 06-2c-02
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN

> initializing kernel modesetting (RV100 0x1002:0x515E 0x103C:0x31FB 0x02).
> NMI ... going to debugger
> Stopped at  tsc_delay+0x63: lfence
> ddb{0}> trace
> tsc_delay(1) at tsc_delay+0x63
> r100_ring_test(801a4000,801a5858) at r100_ring_test+0x277
> r100_cp_init(801a4000,10) at r100_cp_init+0x5a1
> r100_startup(801a4000) at r100_startup+0x535
> r100_init(801a4000) at r100_init+0x4ac
> radeon_device_init(801a4000,80196800,80196840,840001) 
> a
> t radeon_device_init+0x944
> radeondrm_attachhook(801a4000) at radeondrm_attachhook+0x36
> config_process_deferred_mountroot() at config_process_deferred_mountroot+0x6b
> main(0) at main+0x723
> end trace frame: 0x0, count: -9

I don't understand why an lfence would cause an nmi.

Does it still occur with the below diff to change lfence;rdtsc to rdtscp?
This requires RDTSCP which your machine has but bluhm's machine does not.

Perhaps it is related to some kind of watchdog timer?  Can you check if
the ilo event log has any relevant information?

Index: sys/arch/amd64/include/cpufunc.h
===
RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v
retrieving revision 1.36
diff -u -p -r1.36 cpufunc.h
--- sys/arch/amd64/include/cpufunc.h13 Sep 2020 11:53:16 -  1.36
+++ sys/arch/amd64/include/cpufunc.h28 Jan 2021 00:47:16 -
@@ -307,7 +307,8 @@ rdtsc_lfence(void)
 {
uint32_t hi, lo;
 
-   __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
+// __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
+   __asm volatile("rdtscp" : "=d" (hi), "=a" (lo) :: "ecx");
return (((uint64_t)hi << 32) | (uint64_t) lo);
 }
 



Re: Stopped at tsc_delay+0x63: lfence

2021-01-27 Thread Jonathan Gray
On Thu, Jan 28, 2021 at 03:02:39AM +, Stuart Henderson wrote:
> On 2021/01/28 12:21, Jonathan Gray wrote:
> > > NMI ... going to debugger
> > > Stopped at  tsc_delay+0x63: lfence
> > > ddb{0}> trace
> > > tsc_delay(1) at tsc_delay+0x63
> > > r100_ring_test(801a4000,801a5858) at r100_ring_test+0x277
> > > r100_cp_init(801a4000,10) at r100_cp_init+0x5a1
> > > r100_startup(801a4000) at r100_startup+0x535
> > > r100_init(801a4000) at r100_init+0x4ac
> > > radeon_device_init(801a4000,80196800,80196840,840001)
> > >  a
> > > t radeon_device_init+0x944
> > > radeondrm_attachhook(801a4000) at radeondrm_attachhook+0x36
> > > config_process_deferred_mountroot() at 
> > > config_process_deferred_mountroot+0x6b
> > > main(0) at main+0x723
> > > end trace frame: 0x0, count: -9
> > 
> > I don't understand why an lfence would cause an nmi.
> 
> I was thinking that it might not be the lfence triggering it but
> something that happened just before connected with the video init,
> and it's just that the tsc_delay/lfence is what's running when it hit ..

right, as cheloha noted the other trace points at rdtsc.

> 
> > Does it still occur with the below diff to change lfence;rdtsc to rdtscp?
> > This requires RDTSCP which your machine has but bluhm's machine does not.
> > 
> > Perhaps it is related to some kind of watchdog timer?  Can you check if
> > the ilo event log has any relevant information?
> > 
> > Index: sys/arch/amd64/include/cpufunc.h
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v
> > retrieving revision 1.36
> > diff -u -p -r1.36 cpufunc.h
> > --- sys/arch/amd64/include/cpufunc.h13 Sep 2020 11:53:16 -  
> > 1.36
> > +++ sys/arch/amd64/include/cpufunc.h28 Jan 2021 00:47:16 -
> > @@ -307,7 +307,8 @@ rdtsc_lfence(void)
> >  {
> > uint32_t hi, lo;
> >  
> > -   __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
> > +// __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
> > +   __asm volatile("rdtscp" : "=d" (hi), "=a" (lo) :: "ecx");
> > return (((uint64_t)hi << 32) | (uint64_t) lo);
> >  }
> >  
> > 
> 
> 



Re: Packages missing

2021-01-31 Thread Jonathan Gray
On Sun, Jan 31, 2021 at 09:25:43AM +0100, Alessandro Ricci wrote:
> It appears that at least two binary packages are missing.
> I'm talking about games/nblood and games/eduke32.
> pkg_info displays nothing; I'm running a recent snapshot, amd64.
> I checked cdn.openbsd.org and a few other mirrors, inside packages/
> and packages-stable/ under different archs (amd64, i386) and they
> aren't there.
> Ports are present, are discussed in mailing lists, are also listed at
> www.openports.pl.
> Am I wrong?
> Could be other packages missing?
> Thanks.

packages for eduke32 and nblood are not on mirrors as the ports both have

PERMIT_PACKAGE ="BUILD engine license is not compatible with GPLv2."



Re: radeondrm AMD A6-3410MX APU with Radeon HD

2021-02-12 Thread Jonathan Gray
On Sat, Feb 13, 2021 at 05:24:17AM +0200, V S wrote:
> > Synopsis:  system lockup during drm stage
> > Category:   radeondrm
> > Environment:
> 
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9-beta (GENERIC.MP) #331: Thu Feb 11 20:28:45 
> MST 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> > Description:
>   During booting after radeondrm gets loaded screen goes black following
> the 100% cpu FAN sound, suspecting kernel panic.

Are you able to connect an external display to confirm that is the case?
Failing that or serial it should also be possible to disable panic and
get a trace via /var/crash as described in crash(8).

As this likely occurs before the rc script runs you will need to use a
kernel with the following diff or

boot> boot -d
ddb> w db_panic 0
ddb> c

Index: sys/kern/subr_prf.c
===
RCS file: /cvs/src/sys/kern/subr_prf.c,v
retrieving revision 1.102
diff -u -p -r1.102 subr_prf.c
--- sys/kern/subr_prf.c 28 Nov 2020 17:53:05 -  1.102
+++ sys/kern/subr_prf.c 13 Feb 2021 06:08:57 -
@@ -104,7 +104,7 @@ const   char *faultstr; /* page fault stri
 /*
  * Enter ddb on panic.
  */
-intdb_panic = 1;
+intdb_panic = 0;
 
 /*
  * db_console controls if we can be able to enter ddb by a special key



Re: radeondrm AMD A6-3410MX APU with Radeon HD

2021-02-13 Thread Jonathan Gray
On Sat, Feb 13, 2021 at 10:24:42AM +0200, V S wrote:
> Please have in mind that I have retyped these by hand from the screen and the 
> first example might
> not be acurate as I've retyped it of a blurry photo I made. But the other 
> reboot attempts using
> 
> boot> boot -d
> ddb> w db_panic 0
> ddb> c
> 
> got different messages on screen some of them might be more informant than 
> others as the last one of these three
> only had very few whitebackground places at ' [s]c[]p[]i[]ew[]s[]kb[]d[]0a ' 
> (substituting with [] for those places
> 
> And I also made photos of these in case those could be more useful than this. 
> And I'm writing these because there
> appears to only be a single file in /var/crash thats called minfree 
> containing text 4096.
> So in the time being this all I have. I could reproduce some more of these 
> messages assuming they're useful. I'll read
> these man pages related to gathering useful information in such cases for 
> now. Unless you've got other suggestions

If you can see a stack trace on the screen the first few lines of
at func_one+0x...
at func_two+0x...

or a photo would help.

I'm suprised the stack trace shows on the screen with db_panic 0
when you mentioned it previously booted to a black screen.

> 
> 
> 0)r a t ondrm0: 1280x768, 32bpp
> wsdisplay at radeondrm0 mux 1: consol e  ( s t d,   vh an le r u+l0axtfif n
> ), using wskbd0
> acpi_thread(8002d170) at wsdisplay0: screen 1-5 added (std, vt100 
> emulation)
> acr i d t hore d r + 0 x 1 b 8 R
> KeSn
> d trace frame: 0x0, count: 253
> End of stack trace.
> syncing disks... done
> 
> dumping to dev 4,1 offset 492575
> 
> 
> 
> 
> 
> //-
> 
> 
> radeondrm0: 1280x768, 32bpp
> awcspdiiesp _ta py n a d l e a de o nd
>  0 mux 1: console (std, vt100 emulation), using wskbd0
> acpi_thread(8002d170) at wsdisplay0: screen 1-5 added (std, vt100 
> acol_ t h r e a
> +0x1r a   d
> a n d r m t : oURa
> g: 0x0, count: 253n
> End of stack trace.
> syncing disks... done
> 
> dumping to dev 4,1 offset 492575
> dump
> 
> 
> 
> 
> 
> 
> 
> 
> //
> 
> 
> radeondrm0: 1280x768, 32bpp
> wsdisplay0 at radeon drm0 mux 1: console (std, vt100 emulation), a sc p i ew 
> s kb d 0a
> ndler(800028400,16,8013b000) at acpiec_gpehandler+0xff
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> radeondrm1: TURKS
> acpi_thread(8002d170) at acpi_thread+0x1b8
> end trace frame: 0x0, count: 253
> End of stack trace.
> syncing disks... done
> 
> dumping to dev 4,1 offset 492575
> dump
> 
> > > Are you able to connect an external display to confirm that is the case?
> > > Failing that or serial it should also be possible to disable panic and
> > > get a trace via /var/crash as described in crash(8).
> > > 
> > > As this likely occurs before the rc script runs you will need to use a
> > > kernel with the following diff or
> > > 
> > > boot> boot -d
> > > ddb> w db_panic 0
> > > ddb> c
> > > 
> > > Index: sys/kern/subr_prf.c
> > > ===
> > > RCS file: /cvs/src/sys/kern/subr_prf.c,v
> > > retrieving revision 1.102
> > > diff -u -p -r1.102 subr_prf.c
> > > --- sys/kern/subr_prf.c   28 Nov 2020 17:53:05 -  1.102
> > > +++ sys/kern/subr_prf.c   13 Feb 2021 06:08:57 -
> > > @@ -104,7 +104,7 @@ const char *faultstr; /* page fault stri
> > >   /*
> > >* Enter ddb on panic.
> > >*/
> > > -int  db_panic = 1;
> > > +int  db_panic = 0;
> > >   /*
> > >* db_console controls if we can be able to enter ddb by a special key
> 
> 



Re: radeondrm AMD A6-3410MX APU with Radeon HD

2021-02-13 Thread Jonathan Gray
On Sat, Feb 13, 2021 at 03:01:21PM +0200, V S wrote:
> 
> > If you can see a stack trace on the screen the first few lines of
> > at func_one+0x...
> > at func_two+0x...
> > 
> > or a photo would help.
> > 
> > I'm suprised the stack trace shows on the screen with db_panic 0
> > when you mentioned it previously booted to a black screen.
> > 
> > 
> 
> 
> The output on the screen seems to be different each time, but repeatedly
> providing pieces of similar
> information and seemingly randomly some text is distorted with those white
> areas which are not
> always in perfect shape, but I include this URL below which has a bit more
> consistent piece of
> text that fits your described pattern. So previous 3 examples were done on
> yesterdays snapshot,
> this one after getting another snapshot upgrade an hour or so ago.
> 
> 
> I'm trying to grasp for how could I provide more information consulting
> someone on IRC in offtopic
> channel as I'm also reading man pages for crash and related. And as crash(8)
> provides an example
> of looking for information with pattern "function+04711" or similar,
> something like that is always visible
> on the screen with your suggested boot options.
> 
> 
> Disabling swap.encrypt does not seem to aid in genrating anything at
> /car/crash. But to follow with the
> example I see that I need to get sources and do stuff that's beyond me
> especially while any building
> on that machine takes centuries. SO I'll continue reading these man pages
> and try to understand how
> exactly to get more information on this out of the system.
> 
> 
> I could reinstall back to 6.8 unsure whether that or these snapshots use
> would be more useful in solving
> this. I'll just keep it on snapshots for now. Also I forgot to clarify that
> connecting external monitor when
> screen is blacked out results in just same black screen on that external
> monitor, with not responsive
> keyboard, ctrl_alt_esc does not do anything and I was also trying to adjust
> ddb.console to 1
> 
> that still did not let ctrl_alt_esc to happen from the point which can be
> seen in the picture.
> 
> 
> P.S. I havent really used mailing lists before and previously I've attempted
> to send this with an image
> attachment, but havent noticed it being delivered for a while now and
> someone on #openbsd told me
> it might be better to use an upload service, so I'm resending this with an
> URL instead of attachment.
> 
> https://0x0.st/-XcC.jpg

The 'KASSERT(sc->sc_ecbusy == 0)' in acpiec_gpehandler() fails.

Can you try disable acpivideo instead of radeondrm?



Re: firefox pledge violation

2021-02-19 Thread Jonathan Gray
On Fri, Feb 19, 2021 at 11:31:25AM +0100, Landry Breuil wrote:
> On Fri, Feb 19, 2021 at 11:17:35AM +0100, Martin Pieuchot wrote:
> > Firefox from -current, tab crashes, kernel says:
> > 
> > firefox[86270]: pledge "", syscall 289
> 
> maybe the drm update triggers again the codepaths leading to shm calls
> prevented by pledge/unveil ?
> try fiddling with the various knobs forcing/disabling acceleration ?
> depends on the gfx chipset ?

$ LIBGL_ALWAYS_SOFTWARE=true firefox
go to https://get.webgl.org/
firefox[71928]: pledge "", syscall 289

$ LIBGL_ALWAYS_SOFTWARE=true chrome
chrome[4]: pledge "", syscall 289
chrome[85649]: pledge "", syscall 289
chrome[23547]: pledge "", syscall 289

> 
> anyway, nothing i can do here, i'm just the guy pushing random buttons
> to have updates.
> 
> you know better than me how to debug that :)
> 
> 



Re: heads up: amdgpu_gem_userptr_ioctl: stub

2021-02-20 Thread Jonathan Gray
On Sat, Feb 20, 2021 at 02:13:26PM +0100, Benjamin Baier wrote:
> This is just a heads up, that i'm hitting a stubbed out function.
> 
> While testing a port update (love2d) 
> Not quite sure how, the only GEM related ioctl, that is called,
> is DRM_IOCTL_GEM_CLOSE according to ktrace.

The userptr ioctls are known not to be implemented.  I've removed the
warning which radeondrm and inteldrm didn't have in the same place.

Thanks for the report.



Re: iostream headers and _POSIX_C_SOURCE

2021-03-09 Thread Jonathan Gray
On Tue, Mar 09, 2021 at 01:41:25AM -0700, Anthony J. Bentley wrote:
> Hi,
> 
> When updating a port I came across a weird compile error within iostream
> if _POSIX_C_SOURCE is set. 200112 and 200809 both error out in different
> ways.

This looks like the same problem as
https://marc.info/?l=openbsd-bugs&m=157758838031146&w=2

> 
> $ cat foo.cc
> #include 
> int main() {}
> $ clang++ -D_POSIX_C_SOURCE=200112 foo.cc
> In file included from foo.cc:1:
> In file included from /usr/include/c++/v1/iostream:37:
> In file included from /usr/include/c++/v1/ios:215:
> In file included from /usr/include/c++/v1/__locale:32:
> In file included from /usr/include/c++/v1/support/newlib/xlocale.h:25:
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:23:64: error: 
> unknown
>   type name 'locale_t'
> char **endptr, locale_t) {
>^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:28:65: error: 
> unknown
>   type name 'locale_t'
>  char **endptr, locale_t) {
> ^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:33:71: error: 
> unknown
>   type name 'locale_t'
>   ...char **endptr, locale_t) {
> ^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:38:54: error: 
> unknown
>   type name 'locale_t'
> strtoll_l(const char *nptr, char **endptr, int base, locale_t) {
>  ^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:43:55: error: 
> unknown
>   type name 'locale_t'
> strtoull_l(const char *nptr, char **endptr, int base, locale_t) {
>   ^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:48:60: error: 
> unknown
>   type name 'locale_t'
> wcstoll_l(const wchar_t *nptr, wchar_t **endptr, int base, locale_t) {
>^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:53:61: error: 
> unknown
>   type name 'locale_t'
> wcstoull_l(const wchar_t *nptr, wchar_t **endptr, int base, locale_t) {
> ^
> /usr/include/c++/v1/support/xlocale/__strtonum_fallback.h:58:74: error: 
> unknown
>   type name 'locale_t'
>   ...wchar_t **endptr, locale_t) {
>^
> In file included from foo.cc:1:
> In file included from /usr/include/c++/v1/iostream:37:
> In file included from /usr/include/c++/v1/ios:215:
> /usr/include/c++/v1/__locale:54:25: error: unknown type name 'locale_t'
>   __libcpp_locale_guard(locale_t& __loc) : __old_loc_(uselocale(__loc)) {}
> ^
> /usr/include/c++/v1/__locale:62:3: error: unknown type name 'locale_t'
>   locale_t __old_loc_;
>   ^
> /usr/include/c++/v1/__locale:132:20: error: use of undeclared identifier
>   'LC_COLLATE_MASK'
> collate  = LC_COLLATE_MASK,
>^
> /usr/include/c++/v1/__locale:133:20: error: use of undeclared identifier
>   'LC_CTYPE_MASK'
> ctype= LC_CTYPE_MASK,
>^
> /usr/include/c++/v1/__locale:134:20: error: use of undeclared identifier
>   'LC_MONETARY_MASK'
> monetary = LC_MONETARY_MASK,
>^
> /usr/include/c++/v1/__locale:135:20: error: use of undeclared identifier
>   'LC_NUMERIC_MASK'
> numeric  = LC_NUMERIC_MASK,
>^
> /usr/include/c++/v1/__locale:136:20: error: use of undeclared identifier
>   'LC_TIME_MASK'
> time = LC_TIME_MASK,
>^
> /usr/include/c++/v1/__locale:137:20: error: use of undeclared identifier
>   'LC_MESSAGES_MASK'
> messages = LC_MESSAGES_MASK,
>^
> /usr/include/c++/v1/__locale:349:5: error: unknown type name 'locale_t'; did 
> you
>   mean 'locale'?
> locale_t __l;
> ^
> /usr/include/c++/v1/__locale:121:24: note: 'locale' declared here
> class _LIBCPP_TYPE_VIS locale
>^
> /usr/include/c++/v1/__locale:368:5: error: unknown type name 'locale_t'; did 
> you
>   mean 'locale'?
> locale_t __l;
> ^
> /usr/include/c++/v1/__locale:121:24: note: 'locale' declared here
> class _LIBCPP_TYPE_VIS locale
>^
> /usr/include/c++/v1/__locale:740:5: error: unknown type name 'locale_t'; did 
> you
>   mean 'locale'?
> locale_t __l;
> ^
> /usr/include/c++/v1/__locale:121:24: note: 'locale' declared here
> class _LIBCPP_TYPE_VIS locale
>^
> fatal error: too many errors emitted, stopping now [-ferror-limit=]
> 20 errors generated.
> 
> $ clang++ -D_POSIX_C_SOURCE=200809 foo.cc 
> In file included from foo.cc:1:
> In file included from /usr/include/c++/v1/iostream:39:
> In file included from /usr/include/c++/

Re: AMD Ryzen based Asus ZENBOOK 14 UM433DA-PURE4 14" panic at first boot post install - how to debug further

2021-04-29 Thread Jonathan Gray
On Thu, Apr 29, 2021 at 10:27:49PM +0200, Peter Nicolai Mathias Hansteen wrote:
> I just spent the evening trying to work around an odd error that happens 
> after an apparently straightforward install on a new laptop.
> 
> The most useful info I can offer is that the install proceeds with no 
> complaints, but on first boot this happens:
> 
> 
> 
> 
> Followed by
> 
> 
> 
> This is a modern laptop so things such as serial consoles are not easily 
> available.
> 
> How do I go about supplying useful information here? I tried but failed to 
> collect useful things such as dmesg (trying to write to the install USB only 
> corrupts and so forth).
> 
> The store’s return policy is friendly enough that I can have this one lying 
> around at least a few days (actually 60 but I suspect the lady of the house 
> will not be quite as accommodating)
> 
> All the best,
> Peter
> 
> PS in case the attachments do not survive the pics are also up at 
> https://www.bsdly.net/~peter/20210429_190606.jpg 
>  and 
> https://www.bsdly.net/~peter/20210429_190645.jpg 
> 
> 

acpicpu0 at acpi0kernel: integer divide fault trap, code=0
Stopped at acpi_gasio+0x36: idivl %r8d,%eax
acpi_gasio(x,1,0,0.0.0) at acpi_gasio+0x36
acpi_write_pmreg(x,2,0,3,x,3) at acpi_write_pmreg+0xba
acpi_write_pmreg(x,10,0,3,x,x) at acpi_write_pmreg+0x18f
acpicpu_attach(...) at acpicpu_attach+0x1c3

acpi_gasio+0x36 is /usr/src/sys/dev/acpi/acpi.c line 265

acpi_gasio(struct acpi_softc *sc, int iodir, int iospace, uint64_t address,
int access_size, int len, void *buffer)
..
KASSERT((len % access_size) == 0);

acpi_write_pmreg+0xba is /usr/src/sys/dev/acpi/acpi.c line 1541

/*
 * For Hardware-reduced ACPI we also emulate PM1A_CNT using
 * SLEEP_CONTROL_REG.
 */
if (sc->sc_hw_reduced && reg == ACPIREG_PM1A_CNT) {
uint8_t value = (regval >> 8);

KASSERT(offset == 0);
acpi_gasio(sc, ACPI_IOWRITE,
sc->sc_fadt->sleep_control_reg.address_space_id,
sc->sc_fadt->sleep_control_reg.address,
sc->sc_fadt->sleep_control_reg.register_bit_width / 8,
sc->sc_fadt->sleep_control_reg.access_size, &value);
return;
}

If you can build a kernel on another machine try

Index: sys/dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.397
diff -u -p -r1.397 acpi.c
--- sys/dev/acpi/acpi.c 15 Mar 2021 22:44:57 -  1.397
+++ sys/dev/acpi/acpi.c 30 Apr 2021 01:57:00 -
@@ -262,6 +262,11 @@ acpi_gasio(struct acpi_softc *sc, int io
dnprintf(50, "gasio: %.2x 0x%.8llx %s\n",
iospace, address, (iodir == ACPI_IOWRITE) ? "write" : "read");
 
+   if (access_size == 0) {
+   printf("%s: invalid size 0\n", DEVNAME(sc));
+   return -1;
+   }
+
KASSERT((len % access_size) == 0);
 
pb = (uint8_t *)buffer;



Re: AMD Ryzen based Asus ZENBOOK 14 UM433DA-PURE4 14" panic at first boot post install - how to debug further

2021-04-30 Thread Jonathan Gray
On Fri, Apr 30, 2021 at 01:51:11PM +0200, Peter N. M. Hansteen wrote:
> On Fri, Apr 30, 2021 at 12:00:05PM +1000, Jonathan Gray wrote:
> > 
> > If you can build a kernel on another machine try
> > 
> > Index: sys/dev/acpi/acpi.c
> > ===
> > RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
> > retrieving revision 1.397
> > diff -u -p -r1.397 acpi.c
> > --- sys/dev/acpi/acpi.c 15 Mar 2021 22:44:57 -  1.397
> > +++ sys/dev/acpi/acpi.c 30 Apr 2021 01:57:00 -
> > @@ -262,6 +262,11 @@ acpi_gasio(struct acpi_softc *sc, int io
> > dnprintf(50, "gasio: %.2x 0x%.8llx %s\n",
> > iospace, address, (iodir == ACPI_IOWRITE) ? "write" : "read");
> >  
> > +   if (access_size == 0) {
> > +   printf("%s: invalid size 0\n", DEVNAME(sc));
> > +   return -1;
> > +   }
> > +
> > KASSERT((len % access_size) == 0);
> >  
> > pb = (uint8_t *)buffer;
> 
> I'm building with that now on the older machine. I wonder, is this change
> small and non-intrusive enough that we could hope it makes it into an amd64
> snapshot soon?
> 
> (I fully appreciate why developers want faster machines :))
> 
> - Peter

If you can boot bsd.rd you should be able to mount / and fetch a kernel
over http without having to build a snapshot.

Or perhaps boot -c and disable acpicpu* if that doesn't break anything.



Re: 6.9-stable drops to ddb on 2nd reboot

2021-05-04 Thread Jonathan Gray
On Tue, May 04, 2021 at 10:27:05AM +0200, Matthias Pressfreund wrote:
> >Environment:
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9 (GENERIC.MP) #473: Mon Apr 19 10:40:28 MDT 
> 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   After the first boot into a fresh 6.9-stable installation succeeded, 
> the system dropped to ddb from the 2nd reboot on. I was connected through the 
> serial console only, no monitor connected ...
> 
> 
> 2nd reboot:
> -
> >> OpenBSD/amd64 BOOT 3.53
> boot>
> booting sr0a:/bsd: 14419240+3220496+344096+0+1175552 
> [1011652+128+1145856+866050]=0x152a750
> entry point at 0x81001000
> [ using 3024720 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2021 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 6.9 (GENERIC.MP) #473: Mon Apr 19 10:40:28 MDT 2021
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 410880 (3918MB)
> avail mem = 396832 (3785MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xecef0 (51 entries)
> bios0: vendor American Megatrends Inc. version "D7530A09" date 06/18/2019
> bios0: MiTAC PD14RI
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT SSDT UEFI LPIT CSRT
> acpi0: wakeup devices PS2K(S0) PS2M(S0) BRCM(S0) XHC1(S4) HDEF(S4) RP01(S4) 
> PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Pentium(R) CPU N3700 @ 1.60GHz, 1600.45 MHz, 06-4c-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu0: 1MB 64b/line 16-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 80MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Pentium(R) CPU N3700 @ 1.60GHz, 1599.97 MHz, 06-4c-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu1: 1MB 64b/line 16-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Pentium(R) CPU N3700 @ 1.60GHz, 1599.97 MHz, 06-4c-03
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu2: 1MB 64b/line 16-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Pentium(R) CPU N3700 @ 1.60GHz, 1599.97 MHz, 06-4c-03
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu3: 1MB 64b/line 16-way L2 cache
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 115 pins
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xe000, bus 0-255
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (RP01)
> acpiprt2 at acpi0: bus 2 (RP02)
> acpiprt3 at acpi0: bus -1 (RP03)
> acpiprt4 at acpi0: bus -1 (RP04)
> acpiec0 at acpi0: not present
> acpicmos0 at acpi0
> acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
> "BCM43241" at acpi0 not configured
> acpibtn0 at acpi0: SLPB
> "PNP0C0B" at acpi0 not configured
> acpicpu0 at acpi0: C2(10@500 mwait.1@0x58), C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C2(10@500 mwait.1@0x58), C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C2(10@500 mwait.1@0x58), C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C2(10@500 mwait.1@0x58), C1

Re: inteldrm: cleanup_done timed out with external monitor

2019-05-27 Thread Jonathan Gray
On Mon, May 27, 2019 at 10:22:12PM +0200, Sven M. Hallberg wrote:
> Sven M. Hallberg on Fri, May 24 2019:
> >>Fix:
> > ?
> 
> I discovered the immediate culprit to be these lines in
> /sys/dev/pci/drm/i915/intel_display.c:
> 
> static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> {
> ...
> 
> /*
>  * Defer the cleanup of the old state to a separate worker to not
>  * impede the current task (userspace for blocking modesets) that
>  * are executed inline. For out-of-line asynchronous modesets/flips,
>  * deferring to a new worker seems overkill, but we would place a
>  * schedule point (cond_resched()) here anyway to keep latencies
>  * down. 
>  */
> INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
> schedule_work(&state->commit_work);
> }
> 
> The comment seems to indicate that the cleanup work is expected to be
> performed (almost) immediately as if called directly. So, as a
> workaround, replacing the last line with the direct call
> 
> intel_atomic_cleanup_work(&state->commit_work);
> 
> makes things work fine for me.
> 
> As for when and why the scheduled tasks do or don't run, my best theory
> is that there is a subtle difference between Linux workqueues and BSD
> taskqs (which are used to emulate the former here).
> Also, the "commits" where cleanup is delayed happen to be those which
> perform actual modesetting. This could of course be pure coincidence.

The timing is apparently tight in linux as well, in later versions
they switched to using a 'system_highpri_wq', INIT_WORK/schedule_work
results in the task being added to 'system_wq' which for us is a drm
specific taskq.

If we just created another taskq for 'system_highpri_wq' the same thing
would likely happen.

Linux deferred cleanup to avoid lock contention/deadlock.  If you
raise the timeout in drm_atomic_helper.c stall_checks() does that
also avoid the 'cleanup_done timed out' messages at some point?

commit 41db645a33e775855aeeec1a437d5c1e24ff6c88
Author: Chris Wilson 
Date:   Thu Jul 12 12:57:29 2018 +0100

drm/i915: Bump priority of clean up work

We require that we keep the list of outstanding work short so that we do
not "leak" memory while pageflipping under stress. However that system
stress may delay kernel workers virtually indefinitely, which incurs the
pageflips stall and eventually hit a timeout waiting for the cleanup.

Try to combat CPU starvation of our short-lived cleanup workers by
switching to a high priority workqueue.

Testcase: igt/kms_cursor_legacy/all-pipes-torture-move
References: https://bugs.freedesktop.org/show_bug.cgi?id=107122
Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
Reviewed-by: Mika Kuoppala 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20180712115729.3506-1-ch...@chris-wilson.co.uk

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 53e7a7e75384..366ff66e9279 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12738,7 +12738,7 @@ static void intel_atomic_commit_tail(struct 
drm_atomic_state *state)
 * down.
 */
INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
-   schedule_work(&state->commit_work);
+   queue_work(system_highpri_wq, &state->commit_work);
 }
 
 static void intel_atomic_commit_work(struct work_struct *work)

commit 8d52e447807b350b98ffb4e64bc2fcc1f181c5be
Author: Chris Wilson 
Date:   Sat Jun 23 11:39:51 2018 +0100

drm/i915: Defer modeset cleanup to a secondary task

If we avoid cleaning up the old state immediately in
intel_atomic_commit_tail() and defer it to a second task, we can avoid
taking heavily contended locks when the caller is ready to procede.
Subsequent modesets will wait for the cleanup operation (either directly
via the ordered modeset wq or indirectly through the atomic helperr)
which keeps the number of inflight cleanup tasks in check.

As an example, during reset an immediate modeset is performed to disable
the displays before the HW is reset, which must avoid struct_mutex to
avoid recursion. Moving the cleanup to a separate task, defers acquiring
the struct_mutex to after the GPU is running again, allowing it to
complete. Even in a few patches time (optimist!) when we no longer
require struct_mutex to unpin the framebuffers, it will still be good
practice to minimise the number of contention points along reset. The
mutex dependency still exists (as one modeset flushes the other), but in
the short term it resolves the deadlock for simple reset cases.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101600
Signed-off-by: Chris Wilson 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20180623103951.23889-1-ch...@chris-wilson.co.u

  1   2   3   4   5   6   >