Re: one USB port stops working after unhibernate

2024-07-31 Thread Mark Kettenis
> Date: Wed, 31 Jul 2024 11:23:08 +0200
> From: Mark Kettenis 
> 
> > Date: Mon, 22 Jul 2024 22:33:39 +
> > From: Lucas Gabriel Vuotto 
> > 
> > On Mon, Jul 22, 2024 at 10:52:19PM GMT, Mark Kettenis wrote:
> > > However, it is entirely plausible that the breakage is caused by either:
> > > 
> > >   dev/acpi/acpipwrres.c rev. 1.14
> > 
> > this is the one I tried backing out with no success.
> > 
> > > or
> > > 
> > >   dev/acpi/acpi.c rev. 1.434
> > 
> > backing only this one out didn't work, neither
> > 
> > > so it might be worth trying to revert those changes.
> > 
> > backing out *both files* did make it work.
> 
> Hmm, that doesn't make a lot of sense.
> 
> However, I think I can reproduce the problem on one of my laptops.
> Not immediately obvious what's wrong, but I'll dig seeper.

Can you try the diff below?  Fixes my issue...


Index: dev/acpi/acpipwrres.c
===
RCS file: /cvs/src/sys/dev/acpi/acpipwrres.c,v
diff -u -p -r1.14 acpipwrres.c
--- dev/acpi/acpipwrres.c   14 Jul 2024 10:48:55 -  1.14
+++ dev/acpi/acpipwrres.c   31 Jul 2024 20:36:02 -
@@ -148,7 +148,7 @@ acpipwrres_activate(struct device *self,
struct acpipwrres_softc *sc = (struct acpipwrres_softc *)self;
 
switch (act) {
-   case DVACT_SUSPEND:
+   case DVACT_POWERDOWN:
if (sc->sc_cons_ref == 0 && sc->sc_state != ACPIPWRRES_OFF) {
aml_evalname(sc->sc_acpi, sc->sc_devnode, "_OFF", 0,
NULL, NULL);



Re: one USB port stops working after unhibernate

2024-07-31 Thread Mark Kettenis
> Date: Mon, 22 Jul 2024 22:33:39 +
> From: Lucas Gabriel Vuotto 
> 
> On Mon, Jul 22, 2024 at 10:52:19PM GMT, Mark Kettenis wrote:
> > However, it is entirely plausible that the breakage is caused by either:
> > 
> >   dev/acpi/acpipwrres.c rev. 1.14
> 
> this is the one I tried backing out with no success.
> 
> > or
> > 
> >   dev/acpi/acpi.c rev. 1.434
> 
> backing only this one out didn't work, neither
> 
> > so it might be worth trying to revert those changes.
> 
> backing out *both files* did make it work.

Hmm, that doesn't make a lot of sense.

However, I think I can reproduce the problem on one of my laptops.
Not immediately obvious what's wrong, but I'll dig seeper.



Re: one USB port stops working after unhibernate

2024-07-22 Thread Mark Kettenis
> Date: Mon, 22 Jul 2024 20:30:20 +
> From: Lucas Gabriel Vuotto 
> 
> On Mon, Jul 22, 2024 at 12:50:03PM GMT, Mike Larkin wrote:
> > does zzz (suspend, lowercase z) work properly?
> 
> It does, regarding USB. (the "wakeup event: GPE 0x0" is still present.)

That "wakeup event" message is harmless.

However, it is entirely plausible that the breakage is caused by either:

  dev/acpi/acpipwrres.c rev. 1.14

or

  dev/acpi/acpi.c rev. 1.434

so it might be worth trying to revert those changes.



mg: auto-indent-mode gets indentation wrong [PATCH]

2024-07-02 Thread Mark Willson

Hi,

When the tab width is other than 8, auto-indent-mode can
miscompute the indentation column.  E.g. when (set-tab-width 4):


    
    
    .
    ^
    |
    Cursor positioned here following return after 

This seems to be due to the value 8 being hard-coded in the the
doindent function in util.c.  This patch appears to fix the issue:

--- util.c.orig    Tue Jul  2 08:10:09 2024
+++ util.c Tue Jul  2 08:11:32 2024
@@ -354,9 +354,10 @@

 if (curbp->b_flag & BFNOTAB)
         return (linsert(cols, ' '));
-    if ((n = cols / 8) != 0 && linsert(n, '\t') == FALSE)
+    if ((n = cols / curwp->w_bufp->b_tabw) != 0 &&
+        linsert(n, '\t') == FALSE)
             return (FALSE);
-    if ((n = cols % 8) != 0 && linsert(n, ' ') == FALSE)
+    if ((n = cols % curwp->w_bufp->b_tabw) != 0 && linsert(n, ' ') == 
FALSE)

         return (FALSE);
 return (TRUE);
 }

Best Regards,
Mark

--
Mark Willson
mark.will...@hydrus.org.uk



Re: strmode should take a mode_t instead of int.

2024-06-22 Thread Mark Kettenis
> Date: Sat, 22 Jun 2024 15:40:14 +0200
> From: Otto Moerbeek 
> 
> On Thu, Jun 20, 2024 at 09:17:38AM +0200, Otto Moerbeek wrote:
> 
> > On Wed, Jun 19, 2024 at 06:44:56PM +0200, Theo Buehler wrote:
> > 
> > > These are the ports using strmode.
> > > 
> > > archivers/libarchive
> > > archivers/libtar
> > > editors/emacs
> > > games/gemrb
> > > math/octave
> > > misc/findutils
> > > net/lftp
> > > security/ssh-ldap-helper
> > > shells/ksh93
> > > sysutils/bfs
> > > sysutils/colorls
> > > sysutils/coreutils
> > > sysutils/lnav
> > > sysutils/tarsnap
> > > 
> > > Given the short list and the nature of the change, I don't think it's
> > > necessary to run a bulk, but inspecting a few of them would be good,
> > > especially libarchive and coreutils are depended upon by a lot of ports.
> > > And there's emacs in this list.
> > 
> > New diff, taking the suggestion (but not all of it, the implementation
> > can use mode_t as it includes sys/types.h
> > 
> > I tested base + coreutils + emacs builds with this.
> 
> ping...

ok kettenis@

> > Index: include/string.h
> > ===
> > RCS file: /home/cvs/src/include/string.h,v
> > diff -u -p -r1.32 string.h
> > --- include/string.h5 Sep 2017 03:16:13 -   1.32
> > +++ include/string.h20 Jun 2024 07:13:03 -
> > @@ -37,7 +37,7 @@
> >  
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  
> >  /*
> >   * POSIX mandates that certain string functions not present in ISO C
> > @@ -128,7 +128,7 @@ size_t   strlcat(char *, const char *, si
> > __attribute__ ((__bounded__(__string__,1,3)));
> >  size_t  strlcpy(char *, const char *, size_t)
> > __attribute__ ((__bounded__(__string__,1,3)));
> > -voidstrmode(int, char *);
> > +voidstrmode(__mode_t, char *);
> >  char   *strsep(char **, const char *);
> >  int timingsafe_bcmp(const void *, const void *, size_t);
> >  int timingsafe_memcmp(const void *, const void *, size_t);
> > Index: lib/libc/string/strmode.c
> > ===
> > RCS file: /home/cvs/src/lib/libc/string/strmode.c,v
> > diff -u -p -r1.8 strmode.c
> > --- lib/libc/string/strmode.c   31 Aug 2015 02:53:57 -  1.8
> > +++ lib/libc/string/strmode.c   20 Jun 2024 07:13:03 -
> > @@ -32,10 +32,8 @@
> >  #include 
> >  #include 
> >  
> > -/* XXX mode should be mode_t */
> > -
> >  void
> > -strmode(int mode, char *p)
> > +strmode(mode_t mode, char *p)
> >  {
> >  /* print type */
> > switch (mode & S_IFMT) {
> > 
> 
> 



Re: strmode should take a mode_t instead of int.

2024-06-19 Thread Mark Kettenis
> Date: Wed, 19 Jun 2024 15:17:05 +0200
> From: Otto Moerbeek 
> 
> On Tue, Jun 18, 2024 at 10:00:20PM -0700, Collin Funk wrote:
> 
> > Hi,
> > 
> > I noticed that strmode(3) says that the first argument should be
> > mode_t. OpenBSD declares it with int which is not compatible since
> > mode_t appears to be unsigned, from what I can tell.
> > 
> > NetBSD fixed this a long time ago and FreeBSD did the same before the
> > 14.0 release.
> > 
> > Apologies for the lack of diff, I don't have access to an OpenBSD
> > machine at the moment. I think something like this would work though:
> > 
> > In sys/_types.h:
> 
> I think this snippet should be in sys/types.h.
> 
> > 
> > #ifndef _MODE_T_DEFINED_
> > #define _MODE_T_DEFINED_
> > typedef __mode_tmode_t
> > #endif
> > 
> > and then in string.h:
> 
> This part is not going to work as string.h include machine/_types.h
> but not sys/_types.h (or sys/types.h for that matter). FreeBSD
> modified it to include sys/_types.h
> 
> > #ifndef _MODE_T_DEFINED_
> > #define _MODE_T_DEFINED_
> > typedef __mode_tmode_t
> > #endif
> > void strmode(mode_t, char *);
> > 
> > Thanks,
> > Collin
> > 
> 
> Additionally, the implementation in src/libn/libc/string/strmode.c
> needs to start using mode_t.
> 
> Building base now with the diff below. So far so good.
> 
> But this is more tricky you would think. Modifying string.h to include
> more could have unwanted side effects for applications.
> 
>   -Otto
> 
> Index: include/string.h
> ===
> RCS file: /home/cvs/src/include/string.h,v
> diff -u -p -r1.32 string.h
> --- include/string.h  5 Sep 2017 03:16:13 -   1.32
> +++ include/string.h  19 Jun 2024 13:11:42 -
> @@ -37,7 +37,7 @@
>  
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  /*
>   * POSIX mandates that certain string functions not present in ISO C
> @@ -128,7 +128,11 @@ size_tstrlcat(char *, const char *, si
>   __attribute__ ((__bounded__(__string__,1,3)));
>  size_tstrlcpy(char *, const char *, size_t)
>   __attribute__ ((__bounded__(__string__,1,3)));
> -void  strmode(int, char *);
> +#ifndef _MODE_T_DEFINED_
> +#define _MODE_T_DEFINED_
> +typedef __mode_t mode_t;
> +#endif

It may be safer to drop this bit...

> +void  strmode(mode_t, char *);

...and use __mode_t in the prototype and implementation.

>  char *strsep(char **, const char *);
>  int   timingsafe_bcmp(const void *, const void *, size_t);
>  int   timingsafe_memcmp(const void *, const void *, size_t);
> Index: lib/libc/string/strmode.c
> ===
> RCS file: /home/cvs/src/lib/libc/string/strmode.c,v
> diff -u -p -r1.8 strmode.c
> --- lib/libc/string/strmode.c 31 Aug 2015 02:53:57 -  1.8
> +++ lib/libc/string/strmode.c 19 Jun 2024 13:11:42 -
> @@ -32,10 +32,8 @@
>  #include 
>  #include 
>  
> -/* XXX mode should be mode_t */
> -
>  void
> -strmode(int mode, char *p)
> +strmode(mode_t mode, char *p)
>  {
>/* print type */
>   switch (mode & S_IFMT) {
> Index: sys/sys/types.h
> ===
> RCS file: /home/cvs/src/sys/sys/types.h,v
> diff -u -p -r1.49 types.h
> --- sys/sys/types.h   6 Aug 2022 13:31:13 -   1.49
> +++ sys/sys/types.h   19 Jun 2024 13:11:43 -
> @@ -140,7 +140,10 @@ typedef  __gid_t gid_t;  /* group id */
>  typedef  __id_t  id_t;   /* may contain pid, uid or gid 
> */
>  typedef  __ino_t ino_t;  /* inode number */
>  typedef  __key_t key_t;  /* IPC key (for Sys V IPC) */
> +#ifndef _MODE_T_DEFINED_
> +#define _MODE_T_DEFINED_
>  typedef  __mode_tmode_t; /* permissions */
> +#endif
>  typedef  __nlink_t   nlink_t;/* link count */
>  typedef  __rlim_trlim_t; /* resource limit */
>  typedef  __segsz_t   segsz_t;/* segment size */
> 
> 



Re: Mac Studio hangs; locking problems on WITNESS/MP_LOCKDEBUG kernels

2024-06-19 Thread Mark Kettenis
> From: Dana Koch 
> Date: Tue, 18 Jun 2024 23:34:07 -0400

Hi Dana,

Thanks for the report.  I have an M2 Pro Mac Mini that is very
reliable.  And I believe there are folks using machines with M2 Max
without issues as well.  So these issues are likely specific to the M2
Ultra SoC.

The fact that "mach ddbcpu X" doesn't work for X > 17 makes me wonder
if there is something subtly wrong with interrupts on the M2 Ultra.
I'll need to see if I can find out more.

One thing that would help me investigate further is "eeprom -p" output
for this machine.

Thanks,

Mark

> >Synopsis: Mac Studio hangs; locking problems on WITNESS/MP_LOCKDEBUG kernels
> >Category: kernel
> >Environment:
> System  : OpenBSD 7.5
> Details : OpenBSD 7.5-current (GENERIC.MP) #69: Wed Jun 12 04:43:28 MDT 
> 2024
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.arm64
> Machine : arm64
> >Description:
> System can hang and be unresponsive on a Mac Studio (M2, Ultra),
> either soon after boot passes to userland during/after "starting
> network", or under load. When on a kernel with MP_LOCKDEBUG and
> WITNESS options turned on, these points will trigger locking-related
> panics
> 
> When trying to bisect and boot onto different kernel binaries built
> from different points of time, the system may successfully pass the
> "starting network" point at boot without panic'ing, but may instead
> panic at some other seemingly random point under load. There appeared
> to be no good correlation between commits at different points of time
> and the reliability of these locking-related panics happening. (FWIW,
> I did not bisect far back enough such that I would need to completely
> wipe and downgrade userland.)
> 
> See below for ddb session fragments.
> 
> >How-To-Repeat:
> * Build a recent kernel with MP_LOCKDEBUG and WITNESS options turned on.
> * Disable apldrm(4), since display output is currently not working
> with this device enabled (separate problem).
> * Boot on this new kernel.
> * If the system does not panic after "starting network", building a
> kernel with `make -j24` will often trigger a similar locking-related
> panic instead.
> 
> >Fix:
> Workarounds:
> * use a single-processor kernel;
> * non-WITNESS/MP_LOCKDEBUG kernels will obviously not panic, but can still 
> hang
> 
> ddb fragments:
> 1. during "make -j24" (oddly, mach ddbcpu X did not seem to give
> information for X > 17 when there are 24 processors)
> ddb{16}> show panic
>  cpu0: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(
> uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/op
> enbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
>  cpu22: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/o
> penbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
>  cpu20: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/o
> penbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
>  cpu19: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/o
> penbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
>  cpu17: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/o
> penbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
> *cpu16: acquiring blockable sleep lock with spinlock or critical section held 
> (
> kernel_lock) _lock
>  cpu15: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/o
> penbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
>  cpu14: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/src/o
> penbsd/openbsd-src/sys/uvm/uvm_vnode.c", line 953
>  cpu11: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held
> (uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/home/dana/sr

Re: Unchartevice 6640MA dmesg + AHCI MSI quirk

2024-06-16 Thread Mark Kettenis
> Date: Sun, 16 Jun 2024 16:37:10 +
> From: Klemens Nanni 
> 
> GENERIC cpu(4) fix and pcidevs have been committed.
> Now only this ahci(4) quirk is pending to fix the SSD.
> 
> Neither Linux nor FreeBSD seem to have AHCI and/or MSI specific
> quirks for this, but contrary to OpenBSD they can boot and use the
> SSD.
> 
> Until there's a better way, this disabling MSI for that specific
> AHCI controller like already done for specific Intel MacBooks make
> snapshhots usable for me.
> 
> Feedback? Objection? OK?

Since other devices seem to work fine with MSI, this is the right fix.

ok kettenis@

> Index: sys/dev/pci/ahci_pci.c
> ===
> RCS file: /cvs/src/sys/dev/pci/ahci_pci.c,v
> diff -u -p -r1.17 ahci_pci.c
> --- sys/dev/pci/ahci_pci.c24 May 2024 06:02:53 -  1.17
> +++ sys/dev/pci/ahci_pci.c15 Jun 2024 20:19:30 -
> @@ -71,6 +71,8 @@ int ahci_intel_attach(struct ahci_soft
>   struct pci_attach_args *);
>  int  ahci_samsung_attach(struct ahci_softc *,
>   struct pci_attach_args *);
> +int  ahci_storx_attach(struct ahci_softc *,
> + struct pci_attach_args *);
>  
>  static const struct ahci_device ahci_devices[] = {
>   { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_HUDSON2_SATA_1,
> @@ -148,7 +150,10 @@ static const struct ahci_device ahci_dev
>   NULL,   ahci_samsung_attach },
>  
>   { PCI_VENDOR_VIATECH,   PCI_PRODUCT_VIATECH_VT8251_SATA,
> -   ahci_no_match,ahci_vt8251_attach }
> +   ahci_no_match,ahci_vt8251_attach },
> +
> + { PCI_VENDOR_ZHAOXIN,   PCI_PRODUCT_ZHAOXIN_STORX_AHCI,
> +   NULL, ahci_storx_attach },
>  };
>  
>  int  ahci_pci_match(struct device *, void *, void *);
> @@ -279,6 +284,19 @@ ahci_samsung_attach(struct ahci_softc *s
>* as the XP941 SSD controller.
>* https://bugzilla.kernel.org/show_bug.cgi?id=60731
>* https://bugzilla.kernel.org/show_bug.cgi?id=89171
> +  */
> + sc->sc_flags |= AHCI_F_NO_MSI;
> +
> + return (0);
> +}
> +
> +int
> +ahci_storx_attach(struct ahci_softc *sc, struct pci_attach_args *pa)
> +{
> + /*
> +  * Disable MSI with the ZX-100/ZX-200/ZX-E StorX AHCI Controller
> +  * in the Unchartevice 6640MA notebook, otherwise ahci(4) hangs
> +  * with SATA speed set to "Gen3" in BIOS.
>*/
>   sc->sc_flags |= AHCI_F_NO_MSI;
>  
> 
> 



Re: Unchartevice 6640MA dmesg + AHCI MSI quirk

2024-06-15 Thread Mark Kettenis
> Date: Sat, 15 Jun 2024 21:35:30 +
> From: Klemens Nanni 
> 
> Unchartevice 6640MA notebook with amd64 ZHAOXIN KaiXian KX-6640MA CPU.
> 
> https://unchartevice.ru
> https://www.devicekb.com/hardware/pci-vendors/ven_1d17-dev_9083
> 
> BIOS one of three SATA speed modes:
> - Gen1/2: bsd.rd reaches installer, but SSD does not attach
>   ahci0: device not communicating on port 0
> - Gen3:   bsd.rd attaches SSD and hangs after
>   scsibus2 at softraid0: 256 targets
> 
> Diff below adds PCI IDs and disables MSI for the AHCI controller,
> then bsd.rd attaches the SSD and successfully installs to it:
> 
> |@@
> |-ahci0 at pci0 dev 15 function 0 unknown vendor 0x1d17 product 0x9083 rev 
> 0x01: msi, AHCI 1.3.1
> |-ahci0: device not communicating on port 0
> |+ahci0 at pci0 dev 15 function 0 "Zhaoxin StorX AHCI" rev 0x01: apic 9 int 
> 21, AHCI 1.3.1
> |+ahci0: port 0: 6.0Gb/s
> | scsibus0 at ahci0: 32 targets
> |+sd0 at scsibus0 targ 0 lun 0:  naa.5000
> |+sd0: 244198MB, 512 bytes/sector, 500118192 sectors, thin
> |@@
> |-sd0 at scsibus1 targ 1 lun 0:  removable 
> serial.07815583810735aa43ca
> |-sd0: 29340MB, 512 bytes/sector, 60088320 sectors
> |+sd1 at scsibus1 targ 1 lun 0:  removable 
> serial.07815583810735aa43ca
> |+sd1: 29340MB, 512 bytes/sector, 60088320 sectors
> 
> Feedback? Objection? OK? on either pcidevs or ahci?

MSIs work for the other devices (e.g. iwm(4) or xhci(4))?

> GENERIC and GENERIC_MP both hang with this diff after
>   cpu0: 4MB 64b/line 16-way L2 cache
> but I haven't looked into that yet.
> 
> 
> Index: dev/pci/pcidevs
> ===
> RCS file: /cvs/src/sys/dev/pci/pcidevs,v
> diff -u -p -r1.2076 pcidevs
> --- dev/pci/pcidevs   22 May 2024 16:24:59 -  1.2076
> +++ dev/pci/pcidevs   15 Jun 2024 20:11:14 -
> @@ -352,6 +352,7 @@ vendorROCKCHIP0x1d87  Rockchip
>  vendor   LONGSYS 0x1d97  Longsys
>  vendor   TEKRAM2 0x1de1  Tekram
>  vendor   AMPERE  0x1def  Ampere
> +vendor   ZHAOXIN 0x1d17  Zhaoxin
>  vendor   KIOXIA  0x1e0f  Kioxia
>  vendor   YMTC0x1e49  YMTC
>  vendor   SSSTC   0x1e95  SSSTC
> @@ -10038,6 +10039,9 @@ product YMTC PC0050x1001  PC005
>  
>  /* Zeinet products */
>  product ZEINET 1221  0x0001  1221
> +
> +/* Zhaoxin products */
> +product ZHAOXIN STORX_AHCI   0x9083  StorX AHCI
>  
>  /* Ziatech products */
>  product ZIATECH ZT8905   0x8905  PCI-ST32
> Index: dev/pci/ahci_pci.c
> ===
> RCS file: /cvs/src/sys/dev/pci/ahci_pci.c,v
> diff -u -p -r1.17 ahci_pci.c
> --- dev/pci/ahci_pci.c24 May 2024 06:02:53 -  1.17
> +++ dev/pci/ahci_pci.c15 Jun 2024 20:19:30 -
> @@ -71,6 +71,8 @@ int ahci_intel_attach(struct ahci_soft
>   struct pci_attach_args *);
>  int  ahci_samsung_attach(struct ahci_softc *,
>   struct pci_attach_args *);
> +int  ahci_storx_attach(struct ahci_softc *,
> + struct pci_attach_args *);
>  
>  static const struct ahci_device ahci_devices[] = {
>   { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_HUDSON2_SATA_1,
> @@ -148,7 +150,10 @@ static const struct ahci_device ahci_dev
>   NULL,   ahci_samsung_attach },
>  
>   { PCI_VENDOR_VIATECH,   PCI_PRODUCT_VIATECH_VT8251_SATA,
> -   ahci_no_match,ahci_vt8251_attach }
> +   ahci_no_match,ahci_vt8251_attach },
> +
> + { PCI_VENDOR_ZHAOXIN,   PCI_PRODUCT_ZHAOXIN_STORX_AHCI,
> +   NULL, ahci_storx_attach },
>  };
>  
>  int  ahci_pci_match(struct device *, void *, void *);
> @@ -279,6 +284,19 @@ ahci_samsung_attach(struct ahci_softc *s
>* as the XP941 SSD controller.
>* https://bugzilla.kernel.org/show_bug.cgi?id=60731
>* https://bugzilla.kernel.org/show_bug.cgi?id=89171
> +  */
> + sc->sc_flags |= AHCI_F_NO_MSI;
> +
> + return (0);
> +}
> +
> +int
> +ahci_storx_attach(struct ahci_softc *sc, struct pci_attach_args *pa)
> +{
> + /*
> +  * Disable MSI with the ZX-100/ZX-200/ZX-E StorX AHCI Controller
> +  * in the Unchartevice 6640MA notebook, otherwise ahci(4) hangs
> +  * with SATA speed set to "Gen3" in BIOS.
>*/
>   sc->sc_flags |= AHCI_F_NO_MSI;
>  
> 
> 
> OpenBSD 7.5-current (RAMDISK_CD) #127: Fri Jun 14 09:55:04 MDT 2024
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> real mem = 8026165248 (7654MB)
> avail mem = 7778664448 (7418MB)
> random: good seed from bootblocks
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.1 @ 0x9648a000 (37 entries)
> bios0: vendor Byosoft version "R20" date 05/06/2022
> bios0: IP3 Tech. ZEN1
> acpi0 at bios0: ACPI 6.0
> acpi0: tables DSDT FACP SLIC UEFI HPET APIC MCFG DMAR FPDT BGRT
> 

Re: powerpc64/pmap.c trouble report

2024-06-10 Thread Mark Kettenis
> From: Eric Grosse 
> Date: Mon, 10 Jun 2024 11:17:23 -0700
>
> A bit of progress: in a kernel built with George Koehler's suggested
> replacement of isync by sync and with uncommented GENERIC.MP
> option MP_LOCKDEBUG
> option WITNESS
> and (getting rid of any dependence on Go) generating load
> by running make -j64 build in /usr/src, I fairly quickly get a panic:
> 
> panic: acquiring blockable sleep lock with spinlock or critical
> section held (rwlock) kmmaplk
> Stopped at  panic+0x134:ori r0,r0,0x0
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  484082  12380 21  0x1803  07  cc
>  104407  11776 21  0x1803  03  cc
>  182886  45104 21  0x1803  06  cc
>  385115  79953 21  0x1803  02  cc
>  158768  20484  0 0x14000  0x2005  zerothread
> *413982  47818  0 0x14000  0x2001  reaper
> panic+0x134
> witness_checkorder+0x954
> rw_enter_read+0x8c
> vm_map_lock_read_ln+0x38
> uvmfault_lookup+0x114
> uvm_fault_check+0x68
> uvm_fault+0x12c
> trap+0x7a4
> trapagain+0x4
> --- trap (type 0x300) ---
> pmap_remove+0x120
> uvm_unmap_kill_entry_withlock+0x1c0
> uvm_map_teardown+0x184
> uvmspace_free+0x70
> uvm_exit+0x38
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}>

dv@ posted a very similar crash that happened on amd64 a few days ago:

https://marc.info/?l=openbsd-bugs=171761402618985=2



Re: witness panic: "acquiring blockable sleep lock..." from reaper

2024-06-06 Thread Mark Kettenis
> From: Dave Voutila 
> Date: Wed, 05 Jun 2024 14:56:45 -0400
> 
> >Synopsis: witness panic: acquiring blockable sleep lock with spinlock
>or critical section held (rwlock) vmmaplk
> >Category:
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 
> CEST 2024
>
> dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
> Was running a vmm test on some dual-socket intel xeon hardware with
> Witness enabled when I hit this panic. I've hit it now twice with the
> same panic from the reaper tearing down uvm maps.
> 
> This is using a kernel built locally (because of Witness) where the last
> commit was Wed Jun 5 13:36:28 2024 UTC.
> 
> Abbreviated backtrace from prior to witness_checkorder on CPU 4:
> 
> rw_enter_read(...) at +0x50
> uvmfault_lookup(..., 0) at +0x8a
> uvm_fault_check(...) at +0x36
> uvm_fault(0x827d1558, 0x8001, 0, 1) at +0xfb
> kpageflttrap(0x8000594811f0, 0x80010039) at +0x158
> kerntrap() at +0xaf
> alltraps_kern_meltdown() at +0x7b
> pmap_remove_ptes(...) at +0x16e
> pmap_do_remove(...) at +0x2db
> uvm_unmap_kill_entry_withlock(..., ..., 0) at +0x14b
> uvm_map_teardown(...) at +0x1c4

So somehow pmap_remove_ptes() is accessing a (likely bogus) userland
address here.  That shouldn't happen; I suspect your page tables are
corrupt.

If your system supported SMAP you would have seen a

  "attempt to access user address 0x80010039 in supervisor mode"

panic.  But your system doesn't.  So you go down an unexpected error
path with a mutex held, with the witness panic as a consequence.  This
probably would have produced a:

  "uvm_fault(...) -> ..."

panic on a non-witness kernel.

> "show all locks" output:
> 
> CPU 4:
> exclusive mutex &(curpg)->mdpage.pv_mtx
> exclusive mutex >pm_mtx
> Process 45917 (reaper) thread ...
> exclusive rwlock vmmaplk
> exclusive mutex &(curpg)->mdpage.pv_mtx
> exclusive mutex >pm_mtx
> 
> "show all procs /o" output abbreviated:
> uid   cpu   command
> 107   12vmd
> 0 4 reaper
> 0 6 softnet0
> 0 0 softclock
> 
> 
> >How-To-Repeat:
> 
> I've been trying to isolate (unrelated?) amap and anon pool corruption
> caused by vmm on dual-socket Intel hardware. I'm booting ramdisk kernels
> and disk-based vms, letting them boot a bit, and tearing them down.
> 
> >Fix:
> ???
> 
> dmesg:
> OpenBSD 7.5-current (GENERIC.MP) #5: Wed Jun  5 20:07:42 CEST 2024
> 
> dv@current1.openbsd.amsterdam:/home/dv/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 412202078208 (393106MB)
> avail mem = 396673601536 (378297MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7a32f000 (77 entries)
> bios0: vendor Dell Inc. version "2.19.0" date 12/12/2023
> bios0: Dell Inc. PowerEdge R630
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S5
> acpi0: tables DSDT FACP MCEJ WD__ SLIC HPET APIC MCFG MSCT SLIT SRAT SSDT 
> SSDT SSDT PRAD DMAR HEST BERT ERST EINJ
> acpi0: wakeup devices PCI0(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) 
> BR2D(S4) BR3A(S4) BR3B(S4) BR3C(S4) BR3D(S4) XHC_(S0) RP02(S4) RP03(S4) 
> RP05(S4) RP08(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.02 MHz, 06-3f-02, patch 
> 0049
> cpu0: cpuid 1 
> edx=bfebfbff
>  
> ecx=77fefbff
> cpu0: cpuid 6 eax=77 ecx=9
> cpu0: cpuid 7.0 
> ebx=37ab 
> edx=9c000400
> cpu0: cpuid a vers=3, gp=4, gpwidth=48, ff=3, ffwidth=48
> cpu0: cpuid d.1 eax=1
> cpu0: cpuid 8001 edx=2c100800 ecx=21
> cpu0: cpuid 8007 edx=100
> cpu0: MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 20MB 64b/line 20-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
> cpu1 at mainbus0: apid 16 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.40 MHz, 06-3f-02, patch 
> 0049
> cpu1: smt 0, core 0, package 1
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.05 MHz, 06-3f-02, patch 
> 0049
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 18 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2401.63 MHz, 06-3f-02, patch 
> 0049
> cpu3: smt 0, core 1, package 1
> cpu4 at mainbus0: apid 4 (application processor)
> cpu4: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 2400.08 MHz, 06-3f-02, patch 
> 0049
> cpu4: smt 0, core 2, package 0
> cpu5 at mainbus0: apid 20 

Re: Upgrade to OpenBSD 7.5 broke the bsd game of cribbage

2024-06-01 Thread Mark Jamsek
Otto Moerbeek  wrote:
> On Wed, May 29, 2024 at 08:05:14AM +0200, Otto Moerbeek wrote:
> 
> > On Mon, May 27, 2024 at 09:21:34PM -0500, Don Wilburn wrote:
> > 
> > > Dear OpenBSD,
> > > 
> > > I recently upgraded from version 7.4 to 7.5.  This broke the old cribbage
> > > game.  This is included with OpenBSD, if you choose to install the games.
> > > 
> > > I'm not a programmer, but I promise you this happened because ncurses was
> > > updated from version 5.7 to 6.4
> > > 
> > > The problem:
> > > 
> > > Normally the game gives prompts for play options and cards.  It's supposed
> > > to leave the prompt after the response, then advance to a new line.  This
> > > gives a brief history of selections
> > > 
> > > Now, starting with  the third prompt (cut the cards), the prompts 
> > > disappear
> > > when a response key is pressed.  This ruins the game. The effect is 
> > > obvious,
> > > even if you don't know how to play cribbage.
> > > 
> > > It would be even more obvious if you have an older system to compare with 
> > > a
> > > current v7.5 system.
> > > 
> > > This happened to linux bsd-games many years ago.  A search will indicate
> > > that I filed this same bug with Gentoo linux over 9 years ago.  Linux
> > > classic bsd-games has been unmaintained since before that time.  This is
> > > where I observed that the bug happened with a ncurses update.  Nobody
> > > pursued the solution.
> > > 
> > > I don't have the skills to butcher the game code to work with with the
> > > update of ncurses.  Likewise, I don't know how to use a debugger or write 
> > > a
> > > sample program to replicate the effect.  I can't demonstrate WHY ncurses 
> > > is
> > > the problem.  Maybe it's the C compiler's fault?
> > > 
> > > I still play this obsolete command line game.  It's nostalgia, I guess.  I
> > > know OpenBSD developers have really important things to maintain.   If
> > > someone could spare some time for this little bug, I'd be happy.  Maybe it
> > > could be delegated to a student?
> > > 
> > > Thanks for reading,  DW
> > > 
> > 
> > One remains a student forever.
> > 
> > Try this, it does not try to cut corners with switching windows.
> 
> No response from the original reporter.
> 
> Is anybody else interested in testing/reviewing?
> 
>   -Otto

Hi Otto,

I can confirm the behaviour reported by Don Wilburn and that your diff
fixes the issue. I have no idea how to play cribbage, but as Don noted,
the impact is obvious.

FWIW, your fix makes sense to me. A changed line runs to 86 columns as
annotated inline but in the cribbage tree there seems to be instances
where its reflowed to fit within 80 and others where it doesn't.


> > 
> > Index: io.c
> > ===
> > RCS file: /home/cvs/src/games/cribbage/io.c,v
> > diff -u -p -r1.22 io.c
> > --- io.c10 Jan 2016 13:35:09 -  1.22
> > +++ io.c29 May 2024 06:00:03 -
> > @@ -505,14 +505,11 @@ get_line(void)
> >  {
> > size_t pos;
> > int c, oy, ox;
> > -   WINDOW *oscr;
> >  
> > -   oscr = stdscr;
> > -   stdscr = Msgwin;
> > -   getyx(stdscr, oy, ox);
> > -   refresh();
> > +   getyx(Msgwin, oy, ox);
> > +   wrefresh(Msgwin);
> > /* loop reading in the string, and put it in a temporary buffer */
> > -   for (pos = 0; (c = readchar()) != '\n'; clrtoeol(), refresh()) {
> > +   for (pos = 0; (c = readchar()) != '\n'; wclrtoeol(Msgwin), 
> > wrefresh(Msgwin)) {

The above line runs to 86 columns, perhaps:

for (pos = 0; (c = readchar()) != '\n';
wclrtoeol(Msgwin), wrefresh(Msgwin)) {

> > if (c == -1)
> > continue;
> > if (c == ' ' && (pos == 0 || linebuf[pos - 1] == ' '))
> > @@ -522,13 +519,13 @@ get_line(void)
> > int i;
> > pos--;
> > for (i = strlen(unctrl(linebuf[pos])); i; i--)
> > -   addch('\b');
> > +   waddch(Msgwin, '\b');
> > }
> > continue;
> > }
> >     if (c == killchar()) {
> > pos = 0;
> > -   move(oy, ox);
> > +   wmove(Msgwin, oy, ox);
> > continue;
> > }
> > if (pos >= LINESIZE - 1 || !(isalnum(c) || c == ' ')) {
> > @@ -538,12 +535,11 @@ get_line(void)
> > if (islower(c))
> > c = toupper(c);
> > linebuf[pos++] = c;
> > -   addstr(unctrl(c));
> > +   waddstr(Msgwin, unctrl(c));
> > Mpos++;
> > }
> > while (pos < sizeof(linebuf))
> > linebuf[pos++] = '\0';
> > -   stdscr = oscr;
> > return (linebuf);
> >  }
> >  
> > 


-- 
Mark Jamsek <https://bsdbox.org>
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68



[PATCH] mg: endless loop using replace-regexp [MEA CULPA]

2024-05-03 Thread Mark Willson
Hi Folks,

The mg command 'regexp-replace "^.*$" ""' enters an endless loop (until
memory exhausted).  This behaviour also occurs in query-replace-regexp
with the "!" option.

This is due to a change I suggested to re_forwsrch, that is, not moving
dot when the line is empty (re_search.c 1.35). The reason for this
change was to ensure the replacement took effect on the starting line of
a replace-regexp, even if the line was empty. Well, the patch did that but
caused this worse issue instead. .

Here's yet another suggested patch (3 of 3) to fix the issue:

--- re_search.c.origFri Apr 12 09:32:15 2024
+++ re_search.c Fri Apr 12 14:00:54 2024
@@ -146,6 +146,11 @@
return (ABORT);
ewprintf("Query replacing %s with %s:", re_pat, news);

+   /* If dot on empty line, instruct re_forwsrch to not advance
+* line
+*/
+   if (curwp->w_doto == 0 && curwp->w_dotp->l_used == 0)
+   curwp->w_doto = -1;
/*
 * Search forward repeatedly, checking each time whether to insert
 * or not.  The "!" case makes the check always true, so it gets put
@@ -220,6 +225,8 @@
EFNUL | EFNEW | EFCR, re_pat) == NULL)
 return (ABORT);

+   if (curwp->w_doto == 0 && curwp->w_dotp->l_used == 0)
+   curwp->w_doto = -1;
while (re_forwsrch() == TRUE) {
plen = regex_match[0].rm_eo - regex_match[0].rm_so;
if (re_doreplace((RSIZE)plen, news) == FALSE)
@@ -231,7 +238,7 @@
update(CMODE);
if (!inmacro)
ewprintf("(%d replacement(s) done)", rcnt);
-
+
return(TRUE);
 }
mg: endless loop using replace-regexpmg: endless loop using replace-regexp
@@ -339,17 +346,24 @@
tbo = curwp->w_doto;
tdotline = curwp->w_dotline;

-   if (tbo == clp->l_used)
+   if (tbo == clp->l_used) {
/*
 * Don't start matching past end of line -- must move to
-* beginning of next line, unless line is empty or at
-* end of file.
+* beginning of next line, unless at end of file.
 */
-   if (clp != curbp->b_headp && llength(clp) != 0) {
+   if (clp != curbp->b_headp) {
clp = lforw(clp);
tdotline++;
tbo = 0;
}
+   }
+   else if (tbo < 0) {
+   /* Don't advance to next line when dot on empty line;
+* reset tbo to correct value.
+*/
+   tbo = 0;
+   }
+
/*
 * Note this loop does not process the last line, but this editor
 * always makes the last line empty so this is good.

Best Regards.
Mark



Re: lock order reversal in soreceive and NFS

2024-04-30 Thread Mark Kettenis
> Date: Tue, 30 Apr 2024 16:18:31 +0300
> From: Vitaliy Makkoveev 
> 
> On Tue, Apr 30, 2024 at 11:08:13AM +0200, Martin Pieuchot wrote:
> > 
> > On the other side, would that make sense to have a NET_LOCK()-free
> > sysctl path?
> > 
> 
> To me it's better to remove uvm_vslock() from network related sysctl
> paths. uvm_vslock() used to avoid context switch in the uiomove() call
> to not break kernel lock protected data. It is not required for netlock
> protected network stuff.

I don't think uvm_vslock() plays a role in the lock order reversal
being discussed here.

> So, I propose to resurrect my sysctl unlocking diff. To push it forward
> i386 should be stable while performing dpb(1) build...

Ditto.



Re: lock order reversal in soreceive and NFS

2024-04-30 Thread Mark Kettenis
> Date: Tue, 30 Apr 2024 11:08:13 +0200
> From: Martin Pieuchot 
> 
> On 27/04/24(Sat) 13:44, Visa Hankala wrote:
> > On Tue, Apr 23, 2024 at 02:48:32PM +0200, Martin Pieuchot wrote:
> > > [...]
> > > I agree.  Now I'd be very grateful if someone could dig into WITNESS to
> > > figure out why we see such reports.  Are these false positive or are we
> > > missing data from the code path that we think are incorrect?
> > 
> > WITNESS currently cannot show lock cycles longer than two locks.
> > To fix this, WITNESS needs to do a path search in the lock order graph.
> 
> Lovely!
> 
> > However, there is also something else wrong in WITNESS, possibly
> > related to situations where the kernel lock comes between two rwlocks
> > in the lock order. I still need to study this more.
> 
> I greatly appreciate your dedication in this area.
> 
> > Below is a patch that adds the cycle search and printing. The patch
> > also tweaks a few prints to show more context.
> 
> This is ok mpi@
> 
> > With the patch, the nfsnode-vmmaplk reversal looks like this:
> 
> So the issue here is due to NFS entering the network stack after the
> VFS.  Alexander, Vitaly are we far from a NET_LOCK()-free sosend()?
> Is something we should consider?
> 
> On the other side, would that make sense to have a NET_LOCK()-free
> sysctl path?

I don't think it is necessary to make sysctl calls NET_LOCK()-free,
but it does make a lot of sense to avoid calling copyin() and
copyout() while holding locks as much as possible.

For the network sysctl, that means that instead of something like:

  NET_LOCK();
  ...
  sysctl_int_bounded(oldp, oldlenp, newp, newlen, ...); /* copyin/copyout */
  ...
  NET_UNLOCK();

we really should be doing something like:

  sysctl_int_bounded(NULL, 0, newp, newlen, ...); /* copyin */
  NET_LOCK();
  ...
  NET_UNLOCK();
  sysctl_int_bounded(oldp, oldlenp, NULL, 0, ...); /* copyout */


> > witness: lock order reversal:
> >  1st 0xfd8126deacf8 vmmaplk (>lock)
> >  2nd 0x800039831948 nfsnode (>n_lock)
> > lock order [1] vmmaplk (>lock) -> [2] nfsnode (>n_lock)
> > #0  rw_enter+0x6d
> > #1  rrw_enter+0x5e
> > #2  VOP_LOCK+0x5f
> > #3  vn_lock+0xbc
> > #4  vn_rdwr+0x83
> > #5  vndstrategy+0x2ca
> > #6  physio+0x204
> > #7  spec_write+0x9e
> > #8  VOP_WRITE+0x6e
> > #9  vn_write+0x100
> > #10 dofilewritev+0x143
> > #11 sys_pwrite+0x60
> > #12 syscall+0x588
> > #13 Xsyscall+0x128
> > lock order [2] nfsnode (>n_lock) -> [3] netlock (netlock)
> > #0  rw_enter_read+0x50
> > #1  solock_shared+0x3a
> > #2  sosend+0x10c
> > #3  nfs_send+0x8d
> > #4  nfs_request+0x258
> > #5  nfs_getattr+0xcb
> > #6  VOP_GETATTR+0x55
> > #7  mountnfs+0x37c
> > #8  nfs_mount+0x125
> > #9  sys_mount+0x343
> > #10 syscall+0x561
> > #11 Xsyscall+0x128
> > lock order [3] netlock (netlock) -> [1] vmmaplk (>lock)
> > #0  rw_enter_read+0x50
> > #1  uvmfault_lookup+0x8a
> > #2  uvm_fault_check+0x36
> > #3  uvm_fault+0xfb
> > #4  kpageflttrap+0x158
> > #5  kerntrap+0x94
> > #6  alltraps_kern_meltdown+0x7b
> > #7  _copyin+0x62
> > #8  sysctl_bounded_arr+0x83
> > #9  tcp_sysctl+0x546
> > #10 sys_sysctl+0x17b
> > #11 syscall+0x561
> > #12 Xsyscall+0x128
> > 
> > 
> > Index: kern/subr_witness.c
> > ===
> > RCS file: src/sys/kern/subr_witness.c,v
> > retrieving revision 1.50
> > diff -u -p -r1.50 subr_witness.c
> > --- kern/subr_witness.c 30 May 2023 08:30:01 -  1.50
> > +++ kern/subr_witness.c 27 Apr 2024 13:08:43 -
> > @@ -369,6 +369,13 @@ static struct witness_lock_order_data  *w
> > struct witness *child);
> >  static voidwitness_list_lock(struct lock_instance *instance,
> > int (*prnt)(const char *fmt, ...));
> > +static voidwitness_print_cycle(int(*prnt)(const char *fmt, ...),
> > +   struct witness *parent, struct witness *child);
> > +static voidwitness_print_cycle_edge(int(*prnt)(const char *fmt, 
> > ...),
> > +   struct witness *parent, struct witness *child,
> > +   int step, int last);
> > +static int witness_search(struct witness *w, struct witness *target,
> > +   struct witness **path, int depth, int *remaining);
> >  static voidwitness_setflag(struct lock_object *lock, int flag, int 
> > set);
> >  
> >  /*
> > @@ -652,8 +659,9 @@ witness_ddb_display_descendants(int(*prn
> >  
> > for (i = 0; i < indent; i++)
> > prnt(" ");
> > -   prnt("%s (type: %s, depth: %d)",
> > -w->w_type->lt_name, w->w_class->lc_name, w->w_ddb_level);
> > +   prnt("%s (%s) (type: %s, depth: %d)",
> > +   w->w_subtype, w->w_type->lt_name,
> > +   w->w_class->lc_name, w->w_ddb_level);
> > if (w->w_displayed) {
> > prnt(" -- (already displayed)\n");
> > return;
> > @@ -719,7 +727,8 @@ witness_ddb_display(int(*prnt)(const cha
> > SLIST_FOREACH(w, _all, w_list) {
> > if 

Re: sysupgrade boot.bin apply m1 boot failure

2024-04-29 Thread Mark Kettenis
> Date: Mon, 29 Apr 2024 12:58:25 -0600 (MDT)
> From: bo...@plexuscomp.com
> 
> >Synopsis:sysupgrade to latest snap results in bootloop, had to replace 
> >boot.bin
> >Category:system aarch64
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5-current (GENERIC.MP) #19: Sun Apr 28 13:44:22 
> MDT 2024
>
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.arm64
>   Machine : arm64
> >Description:
>   Upgraded my m1 macbook air to the latest snapshot.
> After the installation, reboot, I see the mac logo, asahi logo, no 
> OpenBSD logo, then it reboots and repeats.
> I copied /m1n1/boot.bin from another asahi efi partition to the 
> OpenBSD m1n1 partition and it boots again. 
> >How-To-Repeat:
>   Install a snapshot on a mac?
> >Fix:
>   Use a boot.bin from asahi
> 
> 
> dmesg:
> OpenBSD 7.5-current (GENERIC.MP) #19: Sun Apr 28 13:44:22 MDT 2024
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 16379801600 (15620MB)
> avail mem = 15738245120 (15009MB)
> random: good seed from bootblocks
> mainbus0 at root: Apple MacBook Air (M1, 2020)
> efi0 at mainbus0: UEFI 2.10
> efi0: Das U-Boot rev 0x20230700
> cpu0 at mainbus0 mpidr 0: Apple Icestorm r1p1
> cpu0: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu0: 4096KB 128b/line 16-way L2 cache
> cpu0: 
> TLBIOS+IRANGE,TS+AXFLAG,FHM,DP,SHA3,RDM,Atomic,CRC32,SHA2+SHA512,SHA1,AES+PMULL,SPECRES,SB,FRINTTS,GPI,LRCPC+LDAPUR,FCMA,JSCVT,API+PAC,DPB,SpecSEI,PAN+ATS1E1,LO,HPDS,VH,CSV3,CSV2,DIT,SSBS+MSR
> cpu1 at mainbus0 mpidr 1: Apple Icestorm r1p1
> cpu1: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu1: 4096KB 128b/line 16-way L2 cache
> cpu2 at mainbus0 mpidr 2: Apple Icestorm r1p1
> cpu2: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu2: 4096KB 128b/line 16-way L2 cache
> cpu3 at mainbus0 mpidr 3: Apple Icestorm r1p1
> cpu3: 128KB 64b/line 8-way L1 VIPT I-cache, 64KB 64b/line 8-way L1 D-cache
> cpu3: 4096KB 128b/line 16-way L2 cache
> cpu4 at mainbus0 mpidr 10100: Apple Firestorm r1p1
> cpu4: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu4: 12288KB 128b/line 12-way L2 cache
> cpu5 at mainbus0 mpidr 10101: Apple Firestorm r1p1
> cpu5: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu5: 12288KB 128b/line 12-way L2 cache
> cpu6 at mainbus0 mpidr 10102: Apple Firestorm r1p1
> cpu6: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu6: 12288KB 128b/line 12-way L2 cache
> cpu7 at mainbus0 mpidr 10103: Apple Firestorm r1p1
> cpu7: 192KB 64b/line 6-way L1 VIPT I-cache, 128KB 64b/line 8-way L1 D-cache
> cpu7: 12288KB 128b/line 12-way L2 cache
> "asc-firmware" at mainbus0 not configured
> "asc-firmware" at mainbus0 not configured
> "framebuffer" at mainbus0 not configured
> "region95" at mainbus0 not configured
> "region94" at mainbus0 not configured
> "region57" at mainbus0 not configured
> "dcp_data" at mainbus0 not configured
> "uat-handoff" at mainbus0 not configured
> "uat-pagetables" at mainbus0 not configured
> "uat-ttbs" at mainbus0 not configured
> "isp-heap" at mainbus0 not configured
> apm0 at mainbus0
> "opp-table-0" at mainbus0 not configured
> "opp-table-1" at mainbus0 not configured
> "opp-table-gpu" at mainbus0 not configured
> agtimer0 at mainbus0: 24000 kHz
> "pmu-e" at mainbus0 not configured
> "pmu-p" at mainbus0 not configured
> "clock-ref" at mainbus0 not configured
> "clock-120m" at mainbus0 not configured
> "clock-200m" at mainbus0 not configured
> "clock-disp0" at mainbus0 not configured
> "clock-dispext0" at mainbus0 not configured
> "clock-ref-nco" at mainbus0 not configured
> simplebus0 at mainbus0: "soc"
> aplpmgr0 at simplebus0
> aplpmgr1 at simplebus0
> aplmbox0 at simplebus0
> apldart0 at simplebus0: 32 bits
> apldart1 at simplebus0: 32 bits, locked
> apldart2 at simplebus0: 32 bits, locked
> aplmbox1 at simplebus0
> apldart3 at simplebus0: 32 bits, bypass
> apldart4 at simplebus0: 32 bits
> apldart5 at simplebus0: 32 bits
> apldart6 at simplebus0: 32 bits, bypass
> aplintc0 at simplebus0 nirq 896 ndie 1
> aplpinctrl0 at simplebus0
> aplpinctrl1 at simplebus0
> apldog0 at simplebus0
> aplmbox2 at simplebus0
> aplpinctrl2 at simplebus0
> aplpinctrl3 at simplebus0
> aplmbox3 at simplebus0
> aplefuse0 at simplebus0
> apldart7 at simplebus0: 32 bits, bypass
> apldart8 at simplebus0: 32 bits, bypass
> apldart9 at simplebus0: 32 bits, bypass
> apldart10 at simplebus0: 32 bits, bypass
> apldart11 at simplebus0: 32 bits
> "gpu" at simplebus0 not configured
> aplcpu0 at simplebus0
> aplcpu1 at simplebus0
> apldcp0 at simplebus0
> apldrm0 at simplebus0
> drm0 at apldrm0
> "isp" at simplebus0 not configured
> apliic0 at simplebus0
> iic0 at apliic0
> tipd0 at iic0 addr 0x38
> tipd1 at iic0 addr 0x3f
> apliic1 at 

Re: lock order reversal in soreceive and NFS

2024-04-22 Thread Mark Kettenis
> Date: Mon, 22 Apr 2024 15:39:55 +0200
> From: Alexander Bluhm 
> 
> Hi,
> 
> I see a witness lock order reversal warning with soreceive.  It
> happens during NFS regress tests.  In /var/log/messages is more
> context from regress.
> 
> Apr 22 03:18:08 ot29 /bsd: uid 0 on 
> /mnt/regress-ffs/fstest_49fd035b8230791792326afb0604868b: out of inodes
> Apr 22 03:18:21 ot29 mountd[6781]: Bad exports list line 
> /mnt/regress-nfs-server
> Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal:
> Apr 22 03:19:08 ot29 /bsd:  1st 0xfd85c8ae12a8 vmmaplk (>lock)
> Apr 22 03:19:08 ot29 /bsd:  2nd 0x80004c488c78 nfsnode (>n_lock)
> Apr 22 03:19:08 ot29 /bsd: lock order data w2 -> w1 missing
> Apr 22 03:19:08 ot29 /bsd: lock order ">lock"(rwlock) -> 
> ">n_lock"(rrwlock) first seen at:
> Apr 22 03:19:08 ot29 /bsd: #0  rw_enter+0x6d
> Apr 22 03:19:08 ot29 /bsd: #1  rrw_enter+0x5e
> Apr 22 03:19:08 ot29 /bsd: #2  VOP_LOCK+0x5f
> Apr 22 03:19:08 ot29 /bsd: #3  vn_lock+0xbc
> Apr 22 03:19:08 ot29 /bsd: #4  vn_rdwr+0x83
> Apr 22 03:19:08 ot29 /bsd: #5  vndstrategy+0x2ca
> Apr 22 03:19:08 ot29 /bsd: #6  physio+0x204
> Apr 22 03:19:08 ot29 /bsd: #7  spec_write+0x9e
> Apr 22 03:19:08 ot29 /bsd: #8  VOP_WRITE+0x45
> Apr 22 03:19:08 ot29 /bsd: #9  vn_write+0x100
> Apr 22 03:19:08 ot29 /bsd: #10 dofilewritev+0x14e
> Apr 22 03:19:08 ot29 /bsd: #11 sys_pwrite+0x60
> Apr 22 03:19:08 ot29 /bsd: #12 syscall+0x588
> Apr 22 03:19:08 ot29 /bsd: #13 Xsyscall+0x128

You're not talking about this one isn't it?

> Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal:
> Apr 22 03:19:08 ot29 /bsd:  1st 0xfd85c8ae12a8 vmmaplk (>lock)
> Apr 22 03:19:08 ot29 /bsd:  2nd 0x80002ec41860 sbufrcv 
> (>so_rcv.sb_lock)
> Apr 22 03:19:08 ot29 /bsd: lock order ">so_rcv.sb_lock"(rwlock) -> 
> ">lock"(rwlock) first seen at:
> Apr 22 03:19:08 ot29 /bsd: #0  rw_enter_read+0x50
> Apr 22 03:19:08 ot29 /bsd: #1  uvmfault_lookup+0x8a
> Apr 22 03:19:08 ot29 /bsd: #2  uvm_fault_check+0x36
> Apr 22 03:19:08 ot29 /bsd: #3  uvm_fault+0xfb
> Apr 22 03:19:08 ot29 /bsd: #4  kpageflttrap+0x158
> Apr 22 03:19:08 ot29 /bsd: #5  kerntrap+0x94
> Apr 22 03:19:08 ot29 /bsd: #6  alltraps_kern_meltdown+0x7b
> Apr 22 03:19:08 ot29 /bsd: #7  copyout+0x57
> Apr 22 03:19:08 ot29 /bsd: #8  soreceive+0x99a
> Apr 22 03:19:08 ot29 /bsd: #9  recvit+0x1fd
> Apr 22 03:19:08 ot29 /bsd: #10 sys_recvfrom+0xa4
> Apr 22 03:19:08 ot29 /bsd: #11 syscall+0x588
> Apr 22 03:19:08 ot29 /bsd: #12 Xsyscall+0x128
> Apr 22 03:19:08 ot29 /bsd: lock order data w1 -> w2 missing

Unfortunately we don't see the backtrace for the reverse lock order.
So it is hard to say something sensible.  Without more information I'd
say that taking ">so_rcv.sb_lock" before ">lock" is the
correct lock order.

> Apr 22 03:22:27 ot29 /bsd: uid 0 on 
> /mnt/regress-nfs-client/fstest_3372ae0ca77c9470440ef577e4f5e16e: file system 
> full
> Apr 22 03:22:30 ot29 /bsd: uid 0 on 
> /mnt/regress-nfs-client/fstest_632a6ba698de06560b4c93617b00808d: out of inodes
> 
> According to timestamp it is regress/sys/ffs.
> make -C /usr/src/regress/sys/ffs/nfs run-chmod
> triggers it.
> 
> I already reported in a thread on tech@, but the issue is independent
> of the diff over there.  Let's start a fresh discussion.
> 
> bluhm
> 
> 



assertion failures in relayd after config reload

2024-04-11 Thread Mark Johnston
>Synopsis:  assertion failures in relayd after config reload
>Category:  system
>Environment:
System  : OpenBSD 7.5
Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
When running relayd on FreeBSD, one user reports that relayd 
occasionally
exits with an internal error, typically "pfe: pfe_dispatch_hce: 
desynchronized"
or "relay: relay_dispatch_pfe: invalid host id".  The error occurs when
relayd's configuration file is reloaded.

I reproduced the problem on OpenBSD 7.5, see below for a simple 
reproducer.
In particular, there appears to be a race condition involving three 
processes:
the PFE, the HCE and the parent process.  The race goes something like 
this:

1. The parent process receives SIGHUP, which causes IMSG_CTL_RESET 
messages
   to be sent to the PFE and HCE.
2. The HCE sends some IMSG_HOST_STATUS messages to the PFE, following a 
health
   check.  These messages include a count of the number of checks since 
the
   last reset.  The PFE keeps track of this count; if it receives a 
message
   with a different count from what it expects, it'll exit.
3. The PFE receives IMSG_CTL_RESET and purges all of its state.  Then it
   reloads the configuration and repopulates it.
4. The PFE receives some of the IMSG_HOST_STATUS messages sent in step 
2.
   The check counts contained in those messages are stale, causing 
errors.

This occurs only rarely in production, but it's easy to reproduce by
spamming relayd with poll and reload commands in parallel, see below.

>How-To-Repeat:
1. Set up a web server, in this example it's at 127.0.0.1:80.

2. Configure relayd using the following configuration:
---
table  {
  127.0.0.1
}

redirect "test" {
  listen on 127.0.0.1 tcp port 8080
  forward to  port 80 mode roundrobin check http "/" host acs-areq code 
200
  session timeout 600
}
---

3. Start relayd.  Let the following two loops run in parallel:

while true; do kill -HUP $(pgrep relayd); sleep 0.01; done
while true; do relayctl poll; sleep 0.01; done

4. Observe relayd exiting with an error.

>Fix:
I don't have a patch for this yet.  I'm happy to work on it, especially
given some suggestions as to what shape the solution should have.


dmesg:
OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 489816064 (467MB)
avail mem = 454209536 (433MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x1fbcf000 (11 entries)
bios0: vendor BHYVE version "14.0" date 10/17/2021
bios0: FreeBSD BHYVE
efi0 at bios0: UEFI 2.7
efi0: BHYVE rev 0x1
acpi0 at bios0: ACPI 5.1
acpi0: sleep states S5
acpi0: tables DSDT FACP APIC HPET MCFG SPCR BGRT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 9 7950X3D 16-Core Processor, 4197.26 MHz, 19-61-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,SKINIT,TCE,TOPEXT,DBKP,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,AVX512F,AVX512DQ,RDSEED,SMAP,AVX512CD,SHA,AVX512BW,AVX512VL,XSAVEOPT
cpu0: 0KB 64b/line 1-way D-cache, 0KB 64b/line 1-way L2 cache, 0KB 64b/line 
1-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 134MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Ryzen 9 7950X3D 16-Core Processor, 4197.36 MHz, 19-61-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,SKINIT,TCE,TOPEXT,DBKP,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,AVX512F,AVX512DQ,RDSEED,SMAP,AVX512CD,SHA,AVX512BW,AVX512VL,XSAVEOPT
cpu1: 0KB 64b/line 1-way D-cache, 0KB 64b/line 1-way L2 cache, 0KB 64b/line 
1-way L3 cache
cpu1: smt 0, core 0, package 1
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 32 pins
acpihpet0 at acpi0: 16777216 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255
acpiprt0 at acpi0: bus 0 (PC00)
acpipci0 at acpi0 PC00
com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte 

Re: t945s hangs on ttyflags -a

2024-03-31 Thread Mark Kettenis
> Date: Sun, 31 Mar 2024 10:47:53 +0200
> From: Landry Breuil 
> 
> Le Sun, Mar 31, 2024 at 09:30:05AM +0200, Landry Breuil a écrit :
> > hi,
> > 
> > istr this has been discussed/fixed at some point and it used to work
> > last year, but the t495s i have here on -current hangs at ttyflags -a in
> > /etc/rc, commenting it again allows boot to succeed.
> > 
> > dmesg attached with -current. i dont boot that machine often enough, so
> > the regression window is .. large.. guess i'll try bisecting.
> > 
> > last known working: #1463: Wed Nov 22 21:13:03 MST 2023.
> 
> after bisecting a bit, i'm puzzled because it seems ttyflags -a hangs
> only happen when a spurious com0 is found in dmesg:
> 
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: probed fifo depth: 0 bytes
> 
> but that device isnt present in the working boots from various kernel
> versions (tried kernels from end of december to 1 feb so far)
> 
> it's enough to test with boot -s and ttyflags -a, i think i triggered it
> once with a kernel from #1587: Sat Dec 30 22:44:51 MST 2023, next boots
> on the same kernel were okay..
> 
> I've tried differentiating cold boots vs reboots, but that didn't help.

Bleah.  Those ISA-style probes are not very sophisticated and if
vendors are stupid enough to put something at the same address as the
legacy COM ports, it may be detected as a phantom port.

Maybe we should stop doing the ISA probes on systems where ACPI tells
us there are no legacy devices.  Although I'm not sure if that would
help your system.

Can you send me the files from /var/db/acpi?



Re: M2 Pro 2023 works, but stuck with our apple-boot firmware

2024-03-31 Thread Mark Kettenis
> Date: Sun, 31 Mar 2024 13:23:41 +
> From: Klemens Nanni 
> 
> Default snapshot install works with the intial UEFI/u-boot from macOS/Asahi.
> 
> After manual fw_update(8) via urndis(4) tethering to install apple-boot-1.2
> and cold reboot, it still boots the initial UEFI/u-boot and works.
> 
> Once I run sysupgrade(8), after the upgrade the boot firmware is switched to
> our apple-boot (visible via tobhe's OpenBSD logo) which gets stuck before
> reaching our bootloader.
> 
> First time using Apple silicon, so I don't have a clue yet what's going on.
> 
> Loose transcription, picture attached.
> 
> Chip-ID: 0x6020
> 
>   OS FW version: 13.5 (iBoot-8422.141.2)
>   System FW version: unknown (iBoot 10151.101.3)
>   [...]
>   Initialization complete.
>   Cechking for payloads...
>   Devicetree compatible value: apple,j416s
>   Found a gzip compressed payload at 0x100041dc200
>   Uncompressing... 272386 bytes uncompressed to 562704 bytes
>   Found a kernel at 0x10006a0
>   Found a variable at 0x1000421ea02: chosen.asahi,efi-system-partition=...
>   No more payloads at 0x1000421ea19
>   ERROR: Kernel found but not devicetree for apple,j416s available.

Looks like I missed hooking up the devicetree for your model to the
build.  Instead I added apple,j414s twice :(.

Looks like the last PLIST updated was botched as well.

Diff below should fix things.  Stuart, what are the chances of
updating the firmware for the release?


Index: sysutils/u-boot-asahi/Makefile
===
RCS file: /cvs/ports/sysutils/u-boot-asahi/Makefile,v
retrieving revision 1.15
diff -u -p -r1.15 Makefile
--- sysutils/u-boot-asahi/Makefile  8 Jan 2024 19:59:11 -   1.15
+++ sysutils/u-boot-asahi/Makefile  31 Mar 2024 16:15:34 -
@@ -6,6 +6,7 @@ VERSION=2024.01
 GH_ACCOUNT=AsahiLinux
 GH_PROJECT=u-boot
 GH_TAGNAME=openbsd-v${VERSION}
+REVISION=  0
 
 PKGNAME=   u-boot-asahi-${VERSION:S/-/./g}
 
Index: sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
===
RCS file: sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
diff -N sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sysutils/u-boot-asahi/patches/patch-arch_arm_dts_Makefile   31 Mar 2024 
16:15:34 -
@@ -0,0 +1,12 @@
+Index: arch/arm/dts/Makefile
+--- arch/arm/dts/Makefile.orig
 arch/arm/dts/Makefile
+@@ -40,7 +40,7 @@ dtb-$(CONFIG_ARCH_APPLE) += \
+   t6001-j375c.dtb \
+   t6002-j375d.dtb \
+   t6020-j414s.dtb \
+-  t6020-j414s.dtb \
++  t6020-j416s.dtb \
+   t6020-j474s.dtb \
+   t6021-j414c.dtb \
+   t6021-j416c.dtb \
Index: sysutils/u-boot-asahi/pkg/PLIST
===
RCS file: /cvs/ports/sysutils/u-boot-asahi/pkg/PLIST,v
retrieving revision 1.4
diff -u -p -r1.4 PLIST
--- sysutils/u-boot-asahi/pkg/PLIST 3 Dec 2023 22:55:16 -   1.4
+++ sysutils/u-boot-asahi/pkg/PLIST 31 Mar 2024 16:15:34 -
@@ -9,10 +9,13 @@ share/u-boot/apple_m1/dts/t6001-j316c.dt
 share/u-boot/apple_m1/dts/t6001-j375c.dtb
 share/u-boot/apple_m1/dts/t6002-j375d.dtb
 share/u-boot/apple_m1/dts/t6020-j414s.dtb
+share/u-boot/apple_m1/dts/t6020-j416s.dtb
 share/u-boot/apple_m1/dts/t6020-j474s.dtb
 share/u-boot/apple_m1/dts/t6021-j414c.dtb
 share/u-boot/apple_m1/dts/t6021-j416c.dtb
+share/u-boot/apple_m1/dts/t6021-j475c.dtb
 share/u-boot/apple_m1/dts/t6022-j180d.dtb
+share/u-boot/apple_m1/dts/t6022-j475d.dtb
 share/u-boot/apple_m1/dts/t8103-j274.dtb
 share/u-boot/apple_m1/dts/t8103-j293.dtb
 share/u-boot/apple_m1/dts/t8103-j313.dtb
Index: sysutils/firmware/apple-boot/Makefile
===
RCS file: /cvs/ports/sysutils/firmware/apple-boot/Makefile,v
retrieving revision 1.16
diff -u -p -r1.16 Makefile
--- sysutils/firmware/apple-boot/Makefile   8 Jan 2024 20:00:31 -   
1.16
+++ sysutils/firmware/apple-boot/Makefile   31 Mar 2024 16:15:34 -
@@ -1,5 +1,5 @@
 FW_DRIVER= apple-boot
-FW_VER=1.2
+FW_VER=1.3
 
 WRKDIST=   ${WRKDIR}
 DISTFILES=
@@ -10,7 +10,7 @@ PERMIT_PACKAGE= firmware
 PERMIT_DISTFILES= Yes
 
 BUILD_DEPENDS= m1n1-=1.4.11:sysutils/m1n1:build \
-   u-boot-asahi-=2024.01:sysutils/u-boot-asahi:build
+   u-boot-asahi-=2024.01p0:sysutils/u-boot-asahi:build
 
 ASAHI_BUILD=   ${WRKSRC}/sysutils/u-boot-asahi/u-boot-*/build
 M1N1_BUILD=${WRKSRC}/sysutils/m1n1/m1n1-*/build



Re: dwqe ifconfig down panic

2024-03-28 Thread Mark Kettenis
> Date: Thu, 28 Mar 2024 23:06:13 +0100
> From: Stefan Sperling 
> 
> On Wed, Mar 27, 2024 at 02:08:27PM +0100, Stefan Sperling wrote:
> > On Tue, Mar 26, 2024 at 11:05:49PM +0100, Patrick Wildt wrote:
> > > On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > > > Hi,
> > > > 
> > > > When doing flood ping transmit from a machine and simultaneously
> > > > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
> >  
> > > * Don't run TX/RX proc in case the interface is down?
> > 
> > The RX path already has a corresponding check. But the Tx path does not.
> > 
> > If the problem is a race involving mbufs freed via dwqe_down() and
> > mbufs freed via dwqe_tx_proc() then this simple tweak might help.
> 
> With this patch bluhm's test machine has survived 30 minutes of
> flood ping + ifconfig down/up in a loop. Without the patch the
> machine crashes within a few seconds.
> 
> I understand that there could be an issue in intr_barrier() which
> gets papered over by this patch. However the patch does avoid the
> crash and it is trivial to revert when testing the effectiveness
> of any potential intr_barrier() fixes.
> 
> ok?

since we already do this in the rx path, I think this is fine.

ok kettenis@

> > diff /usr/src
> > commit - 029d0a842cd8a317375b31145383409491d345e7
> > path + /usr/src
> > blob - 97f874d2edf74a009a811455fbf37ca56f725eef
> > file + sys/dev/ic/dwqe.c
> > --- sys/dev/ic/dwqe.c
> > +++ sys/dev/ic/dwqe.c
> > @@ -593,6 +593,9 @@ dwqe_tx_proc(struct dwqe_softc *sc)
> > struct dwqe_buf *txb;
> > int idx, txfree;
> >  
> > +   if ((ifp->if_flags & IFF_RUNNING) == 0)
> > +   return;
> > +
> > bus_dmamap_sync(sc->sc_dmat, DWQE_DMA_MAP(sc->sc_txring), 0,
> > DWQE_DMA_LEN(sc->sc_txring),
> > BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
> > > 
> > 
> > 
> 
> 



Re: dwqe ifconfig down panic

2024-03-27 Thread Mark Kettenis
> Date: Tue, 26 Mar 2024 23:05:49 +0100
> From: Patrick Wildt 
> 
> On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > Hi,
> > 
> > When doing flood ping transmit from a machine and simultaneously
> > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
> > 
> > dwqe_down() contains an interrupt barrier, but somehow it does not
> > work.  Immediately after Xspllower() a transmit interrupt is
> > processed.
> > 
> > bluhm
> 
> Unfortunately I can't see it in the dmesg, but I wonder: Is it MSIs?
> Maybe the edge-triggered interrupt stays in the controller because it
> isn't cleared.  But things you could try are:
> 
> * Clear the IRQ status in addition to disabling them.  This might not
>   do something in case the MSI is already in the IRQ, there are no
>   takebacks.  But then maybe when the interrupt fires, the code path
>   sees the cleared status and doesn't run the tx/rx proc.
> * Don't run TX/RX proc in case the interface is down?

Another thing...  Is that intr_barrier() called while we're at
IPL_NET?  That might not have the desired effect if intr_barrier()
runs on the same CPU that is handling the interrupts for the device.

And I fear that would be an issue in other drivers too...

> > kernel: protection fault trap, code=0
> > Stopped at  m_tag_delete_chain+0x30:movq0(%rsi),%rax
> > 
> > ddb{0}> trace
> > m_tag_delete_chain(fd806bfa5300) at m_tag_delete_chain+0x30
> > m_free(fd806bfa5300) at m_free+0x9e
> > m_freem(fd806bfa5300) at m_freem+0x38
> > dwqe_tx_proc(80304800) at dwqe_tx_proc+0x194
> > dwqe_intr(80304800) at dwqe_intr+0x9b
> > intr_handler(80003f86e760,805f4f80) at intr_handler+0x72
> > Xintr_ioapic_edge36_untramp() at Xintr_ioapic_edge36_untramp+0x18f
> > Xspllower() at Xspllower+0x1d
> > dwqe_ioctl(80304870,80206910,80003f86e990) at dwqe_ioctl+0x18c
> > ifioctl(fd81ffabe1e8,80206910,80003f86e990,80003f94e550) at 
> > ifioctl+0x726
> > sys_ioctl(80003f94e550,80003f86eb50,80003f86eac0) at 
> > sys_ioctl+0x2af
> > syscall(80003f86eb50) at syscall+0x55b
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x73ef48509270, count: -13
> > 
> > ddb{0}> show register
> > rdi   0xfd806bfa5300
> > rsi   0xdeafbeaddeafbead
> > rbp   0x80003f86e5f0
> > rbx0xf40
> > rdx0
> > rcx0
> > rax   0xab56__ALIGN_SIZE+0x9b56
> > r8  0x90
> > r9 0x24634ac__kernel_rodata_phys+0x3624ac
> > r10   0xe676ed611cc13e4f
> > r11   0xd2619954b795f246
> > r12   0x81110f48
> > r13   0xfd807282
> > r14   0xfd806bfa5300
> > r15   0xfd805f6def00
> > rip   0x81daae80m_tag_delete_chain+0x30
> > cs   0x8
> > rflags   0x10282__ALIGN_SIZE+0xf282
> > rsp   0x80003f86e5d0
> > ss  0x10
> > m_tag_delete_chain+0x30:movq0(%rsi),%rax
> > 
> > ddb{0}> x/s version
> > version:OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 
> > 2024\012
> > r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012
> > 
> > ddb{0}> ps
> >PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > *70039   16536  80360  0  7   0x803ifconfig
> >  41531  214934  36719 51  3   0x8100033  netlock   ping
> > 
> > OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 2024
> > r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 8038207488 (7665MB)
> > avail mem = 7773556736 (7413MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x769c7000 (85 entries)
> > bios0: vendor American Megatrends Inc. version "1.02.10" date 06/27/2022
> > efi0 at bios0: UEFI 2.7
> > efi0: American Megatrends rev 0x50013
> > acpi0 at bios0: ACPI 6.2
> > acpi0: sleep states S0 S5
> > acpi0: tablesfg0: addr 0xc000, bus 0-255
> > acpihpet0 at acpi0: 1920 Hz
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel Atom(R) x6425RE Processor @ 1.90GHz, 1895.90 MHz, 06-96-01, 
> > patch 0017
> > cpu0: 
> > 

Re: pcidevs_data.h relies on pcireg.h despite not including it.

2024-03-26 Thread Mark Kettenis
> Date: Tue, 26 Mar 2024 17:08:24 +
> From: Gibson Pilconis 
> 
> If it was exclusively accessible within the kernel I'd agree that it
> probably isn't neccessary, the trouble is that it is also accessible
> by userland programs. pcidump is an example of a userland program
> that is part of the OpenBSD project and includes the header, and
> that file acts as a reference implantation of sorts for any program
> that needs to discover PCI devices.
> 
> Notwithstanding that, if all it takes to avoid potential confusion
> and wasted time among developers is a one line include statement,
> does the header file having limited uses within the kernel really
> make it not worth doing?
> 
> 
> I'll even fix it myself and submit a patch. I'd just hate to see
> this quirk be neither remedied nor documented in some form.
> 
> 
> Let me know what you guys think though.

It is not a public header and it is just fine as-is for it suse with
the kernel and pcidump.



mg: endless loop using replace-regexp [PATCH]

2024-03-26 Thread Mark Willson
Hi Folks,

The mg command 'regexp-replace "^.*$" ""' enters an endless loop (until
memory exhausted).  This behaviour also occurs in query-replace-regexp
with the "!" option.

This is due to a change I suggested to re_forwsrch, that is, not moving
dot when the line is empty (re_search.c 1.35). The reason for this
change was to ensure the replacement took effect on the starting line of
a replace-regexp, even if the line was empty.

Here's a patch to fix the issue, which is much simpler than my
previous suggestion. It merely reverses the change in re_search:

--- re_search.c.origMon Mar 11 16:05:29 2024
+++ re_search.c Tue Mar 26 14:20:37 2024
@@ -342,10 +342,9 @@
if (tbo == clp->l_used)
/*
 * Don't start matching past end of line -- must move to
-* beginning of next line, unless line is empty or at
-* end of file.
+* beginning of next line, unless at end of file.
 */
-   if (clp != curbp->b_headp && llength(clp) != 0) {
+   if (clp != curbp->b_headp) {
clp = lforw(clp);
tdotline++;
tbo = 0;

As noted, this leave a small defect, but I think that's a better
outcome than an endless loop.

Best Regards,
Mark
--
Mark Willson | Email: mark.will...@hydrus.org.uk




Re: arm64 mbp M2 pro, screen blanks and won't restore after inactivity in X

2024-03-08 Thread Mark Kettenis
> Date: Fri, 8 Mar 2024 20:29:35 +
> From: Stuart Henderson 
> 
> On 2024/03/08 14:34, Kenneth Westerback wrote:
> > I see the same/similar behaviour on my M1 MacMini. i.e. when sceen blanks
> > it won't come back until I reboot.
> > 
> > Monitor is connected via HDMI. Happy to provide more details/info/tests if
> > same deemed useful.
> 
> Just tried xset s off, which I think _may_ be helping.

I haven't done a ton of testing myself.  Mostly just having machines
sit idle in xenodm.  There were some reports that quickly changing the
display brightness causes similar firmware hangs.  So I wonder if
doing something more complicated in X would trigger the issue.

I think tobhe@ has spent a bit more time on his m2 macbook air.  But
maybe he doesn't have the automatic screen blanking enabled.



Re: Gajim msyscall error

2024-03-03 Thread Mark Kettenis
> From: "Theo de Raadt" 
> Date: Sun, 03 Mar 2024 08:20:33 -0700
> 
> It almost feels as if libc.so equivelancy should be closer to
> _dl_find_shlib(),
> 
> (in particular, meaning searchpath[0] in _dl_find_shlib() coming
> from lpath in _dl_load_shlib()
> 
> Is testing for this in loader.c not the right place, and that
> code should be moved to a deeper place, reached by more variations?

Yes, the diff below would make more sense.  Anyway, probably something
to do after the next release?

> The thing that would break is if someone dlopen() of
> "libc.so.not-a-system-library", and that is a real .so but not a real
> full libc; imagine it just contains 1 stub function which isn't a
> system call.  it would now fail to load that stub function.  So maybe
> it is better if we force the applications to request "libc.so".

Index: libexec/ld.so/library_subr.c
===
RCS file: /cvs/src/libexec/ld.so/library_subr.c,v
retrieving revision 1.55
diff -u -p -r1.55 library_subr.c
--- libexec/ld.so/library_subr.c27 Apr 2023 12:27:56 -  1.55
+++ libexec/ld.so/library_subr.c3 Mar 2024 16:44:33 -
@@ -321,6 +321,11 @@ _dl_load_shlib(const char *libname, elf_
try_any_minor = 0;
ignore_hints = 0;
 
+   if (_dl_strncmp(libname, "libc.so.", 8) == 0) {
+   if (_dl_libcname)
+   libname = _dl_libcname;
+   }
+
if (_dl_strchr(libname, '/')) {
char *paths[2];
char *lpath, *lname;
Index: libexec/ld.so/loader.c
===
RCS file: /cvs/src/libexec/ld.so/loader.c,v
retrieving revision 1.223
diff -u -p -r1.223 loader.c
--- libexec/ld.so/loader.c  22 Jan 2024 02:08:31 -  1.223
+++ libexec/ld.so/loader.c  3 Mar 2024 16:44:33 -
@@ -406,10 +406,6 @@ _dl_load_dep_libs(elf_object_t *object, 
liblist[randomlist[loop]].dynp->d_un.d_val;
DL_DEB(("loading: %s required by %s\n", libname,
dynobj->load_name));
-   if (_dl_strncmp(libname, "libc.so.", 8) == 0) {
-   if (_dl_libcname)
-   libname = _dl_libcname;
-   }
depobj = _dl_load_shlib(libname, dynobj,
OBJTYPE_LIB, depflags, nodelete);
if (depobj == 0) {
Index: libexec/ld.so/resolve.h
===
RCS file: /cvs/src/libexec/ld.so/resolve.h,v
retrieving revision 1.107
diff -u -p -r1.107 resolve.h
--- libexec/ld.so/resolve.h 16 Jan 2024 19:07:31 -  1.107
+++ libexec/ld.so/resolve.h 3 Mar 2024 16:44:33 -
@@ -376,6 +376,7 @@ extern char **_dl_libpath;
 extern int _dl_bindnow;
 extern int _dl_traceld;
 extern int _dl_debug;
+extern const char *_dl_libcname;
 
 extern char *_dl_preload;
 extern char *_dl_tracefmt1;



Re: Gajim msyscall error

2024-03-03 Thread Mark Kettenis
> Date: Sun, 3 Mar 2024 14:35:09 +
> From: Stuart Henderson 
> 
> On 2024/03/03 14:29, Stuart Henderson wrote:
> > On 2024/03/03 13:19, Lucas Gabriel Vuotto wrote:
> > > On Sun, Mar 03, 2024 at 11:58:51AM +, Stuart Henderson wrote:
> > > > On 2024/03/02 14:46, Theo de Raadt wrote:
> > > > > Is this a situation where two libc's are being loaded into the address
> > > > > space?  And the 2nd one is refused for pinsyscalls & msyscall, etc 
> > > > > etc.
> > > > 
> > > > It seems the most likely cause. Console output from running with
> > > > LD_DEBUG set in the environment would probably confirm (and would be
> > > > more useful than kdump).
> > > 
> > > See end of this mail.
> > > 
> > > > I can't replicate it here on a system with new libc (I only tried
> > > > starting gajim and poking in the UI, not connecting to any servers).
> > > 
> > > ftr, I don't even get to the UI.
> > 
> > Ah, I can replicate if I ldconfig -R.
> > 
> > > > I'm a bit surprised why a mixture of libs would happen there at all
> > > > (unless something had been rebuilt locally) but don't see another reason
> > > > to hit the msyscall error.
> > > 
> > > Nothing has been locally rebuilt.
> > > 
> > > LD_DEBUG indeed shows that libc.so.98.0 is loaded and libc.so.99.0 is
> > > attempted to load.
> > 
> > 
> > > dlsym: gtk_get_minor_version in /usr/local/lib/libgtk-3.so.2201.0: 
> > > 0x17287b9f300
> > > dlsym: gtk_get_micro_version in /usr/local/lib/libgtk-3.so.2201.0: 
> > > 0x17287b9f330
> > > dlsym: pango_version_string in /usr/local/lib/libpango-1.0.so.3801.4: 
> > > 0x172ed038d60
> > > dlopen: loading: libc.so.99.0
> > > msyscall 1732a806000 a8000 error
> > 
> > Coming from ...
> > 
> > Breakpoint 1.1, dlopen (libname=0x98b61cf06e0 "libc.so.99.0", flags=2) at 
> > /usr/src/libexec/ld.so/dlfcn.c:64
> > 64  if (flags & ~OK_FLAGS) {
> > (gdb) bt
> > #0  dlopen (libname=0x98b61cf06e0 "libc.so.99.0", flags=2) at 
> > /usr/src/libexec/ld.so/dlfcn.c:64
> > #1  0x098b93dc7d01 in py_dl_open () from 
> > /usr/local/lib/python3.10/lib-dynload/_ctypes.cpython-310.so
> > #2  0x098bb0dc1bc1 in cfunction_call () from 
> > /usr/local/lib/libpython3.10.so.0.0
> > #3  0x098bb0d6a132 in _PyObject_MakeTpCall () from 
> > /usr/local/lib/libpython3.10.so.0.0
> > 
> > 
> > so something is doing dlopen("libc.so.99.0", RTLD_NOW) ...
> > 
> > (gdb) py-bt
> > Traceback (most recent call first):
> >   
> >   File "/usr/local/lib/python3.10/ctypes/__init__.py", line 374, in __init__
> > self._handle = _dlopen(self._name, mode)
> >   File "/usr/local/lib/python3.10/site-packages/gajim/main.py", line 147, 
> > in _set_proc_title
> > libc = CDLL(find_library('c'))
> >   File "/usr/local/lib/python3.10/site-packages/gajim/main.py", line 168, 
> > in run
> > _set_proc_title()
> >   File "/usr/local/bin/gajim", line 8, in 
> > sys.exit(run())
> > 
> > aha: gajim is calling setproctitle via ctypes, which dlopen()'s libc.so
> > (without a specific version number). ld.so is picking the latest and
> > loading it, but libc.so.98.0 was already loaded, so we hit msyscall
> > error.
> 
> oh, it's not ld.so which is picking the latest version, it's python's
> ctypes code, which parses the output of "ldconfig -r" to decide.
> 
> I don't think there's anything we can sanely do in ld.so to work
> around this.

We could do something like this.  Still not 100% foolproof though.

Index: libexec/ld.so/dlfcn.c
===
RCS file: /cvs/src/libexec/ld.so/dlfcn.c,v
retrieving revision 1.117
diff -u -p -r1.117 dlfcn.c
--- libexec/ld.so/dlfcn.c   22 Jan 2024 02:08:31 -  1.117
+++ libexec/ld.so/dlfcn.c   3 Mar 2024 15:10:22 -
@@ -68,6 +68,10 @@ dlopen(const char *libname, int flags)
 
if (libname == NULL)
return RTLD_DEFAULT;
+   if (_dl_strncmp(libname, "libc.so.", 8) == 0) {
+   if (_dl_libcname)
+   libname = _dl_libcname;
+   }
 
if ((flags & RTLD_TRACE) == RTLD_TRACE) {
_dl_traceld = 1;
Index: libexec/ld.so/resolve.h
===
RCS file: /cvs/src/libexec/ld.so/resolve.h,v
retrieving revision 1.107
diff -u -p -r1.107 resolve.h
--- libexec/ld.so/resolve.h 16 Jan 2024 19:07:31 -  1.107
+++ libexec/ld.so/resolve.h 3 Mar 2024 15:10:22 -
@@ -376,6 +376,7 @@ extern char **_dl_libpath;
 extern int _dl_bindnow;
 extern int _dl_traceld;
 extern int _dl_debug;
+extern const char *_dl_libcname;
 
 extern char *_dl_preload;
 extern char *_dl_tracefmt1;



Re: Gajim msyscall error

2024-03-03 Thread Mark Kettenis
> Date: Sun, 3 Mar 2024 13:19:36 +
> From: Lucas Gabriel Vuotto 
> 
> On Sun, Mar 03, 2024 at 11:58:51AM +, Stuart Henderson wrote:
> > On 2024/03/02 14:46, Theo de Raadt wrote:
> > > Is this a situation where two libc's are being loaded into the address
> > > space?  And the 2nd one is refused for pinsyscalls & msyscall, etc etc.
> > 
> > It seems the most likely cause. Console output from running with
> > LD_DEBUG set in the environment would probably confirm (and would be
> > more useful than kdump).
> 
> See end of this mail.
> 
> > I can't replicate it here on a system with new libc (I only tried
> > starting gajim and poking in the UI, not connecting to any servers).
> 
> ftr, I don't even get to the UI.
> 
> > > We solved that for most programs.  Something special about python?
> > 
> > Not sure. I assume it's because external Python modules are dlopen()'d
> > and perhaps there could be some edge case in the "only load one libc"
> > code in ld.so.
> > 
> > I'm a bit surprised why a mixture of libs would happen there at all
> > (unless something had been rebuilt locally) but don't see another reason
> > to hit the msyscall error.
> 
> Nothing has been locally rebuilt.
> 
> LD_DEBUG indeed shows that libc.so.98.0 is loaded and libc.so.99.0 is
> attempted to load.

So something is explicitly dlopening libc.so.99.0.  You can't beat stupid...

Do whe have any clue where in the dependency chain this happens?

> ld.so loading: 'python3.10'
> exe load offset:  0x1706abe9000
> objname [/usr/local/bin/python3.10], dynp 0x1706abebc78, objtype 2 lbase 
> 1706abe9000, obase 1706abe9000
>  flags /usr/local/bin/python3.10 = 0x800
> head /usr/local/bin/python3.10
> obj /usr/local/bin/python3.10 has /usr/local/bin/python3.10 as head
> examining: '/usr/local/bin/python3.10'
> loading: libm.so.10.1 required by /usr/local/bin/python3.10
> objname [/usr/lib/libm.so.10.1], dynp 0x172f2b42668, objtype 3 lbase 
> 172f2b13000, obase 172f2b13000
>  flags /usr/lib/libm.so.10.1 = 0x0
> obj /usr/lib/libm.so.10.1 has /usr/local/bin/python3.10 as head
> loading: libpython3.10.so.0.0 required by /usr/local/bin/python3.10
> objname [/usr/local/lib/libpython3.10.so.0.0], dynp 0x1727f4bb248, objtype 3 
> lbase 1727f11d000, obase 1727f11d000
>  flags /usr/local/lib/libpython3.10.so.0.0 = 0x0
> obj /usr/local/lib/libpython3.10.so.0.0 has /usr/local/bin/python3.10 as head
> loading: libintl.so.8.0 required by /usr/local/bin/python3.10
> objname [/usr/local/lib/libintl.so.8.0], dynp 0x1729bbd1478, objtype 3 lbase 
> 1729bbb1000, obase 1729bbb1000
>  flags /usr/local/lib/libintl.so.8.0 = 0x0
> obj /usr/local/lib/libintl.so.8.0 has /usr/local/bin/python3.10 as head
> loading: libpthread.so.27.1 required by /usr/local/bin/python3.10
> objname [/usr/lib/libpthread.so.27.1], dynp 0x1733800eb78, objtype 3 lbase 
> 17338004000, obase 17338004000
>  flags /usr/lib/libpthread.so.27.1 = 0x8
> obj /usr/lib/libpthread.so.27.1 has /usr/local/bin/python3.10 as head
> loading: libutil.so.18.0 required by /usr/local/bin/python3.10
> objname [/usr/lib/libutil.so.18.0], dynp 0x1735d3e7230, objtype 3 lbase 
> 1735d3d1000, obase 1735d3d1000
>  flags /usr/lib/libutil.so.18.0 = 0x0
> obj /usr/lib/libutil.so.18.0 has /usr/local/bin/python3.10 as head
> loading: libc.so.98.0 required by /usr/local/bin/python3.10
> objname [/usr/lib/libc.so.98.0], dynp 0x1733679f5d8, objtype 3 lbase 
> 173366bb000, obase 173366bb000
>  flags /usr/lib/libc.so.98.0 = 0x21
> obj /usr/lib/libc.so.98.0 has /usr/local/bin/python3.10 as head
> linking dep /usr/local/lib/libpython3.10.so.0.0 as child of 
> /usr/local/bin/python3.10
> linking dep /usr/local/lib/libintl.so.8.0 as child of 
> /usr/local/bin/python3.10
> objname /usr/lib/libpthread.so.27.1 is nodelete
> linking dep /usr/lib/libpthread.so.27.1 as child of /usr/local/bin/python3.10
> linking dep /usr/lib/libutil.so.18.0 as child of /usr/local/bin/python3.10
> linking dep /usr/lib/libm.so.10.1 as child of /usr/local/bin/python3.10
> linking dep /usr/lib/libc.so.98.0 as child of /usr/local/bin/python3.10
> examining: '/usr/local/lib/libpython3.10.so.0.0'
> loading: libutil.so.18.0 required by /usr/local/lib/libpython3.10.so.0.0
> loading: libm.so.10.1 required by /usr/local/lib/libpython3.10.so.0.0
> loading: libpthread.so.27.1 required by /usr/local/lib/libpython3.10.so.0.0
> loading: libintl.so.8.0 required by /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/local/lib/libintl.so.8.0 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/lib/libpthread.so.27.1 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/lib/libutil.so.18.0 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> linking dep /usr/lib/libm.so.10.1 as child of 
> /usr/local/lib/libpython3.10.so.0.0
> examining: '/usr/local/lib/libintl.so.8.0'
> loading: libiconv.so.7.1 required by /usr/local/lib/libintl.so.8.0
> objname [/usr/local/lib/libiconv.so.7.1], dynp 0x172d4183598, objtype 3 lbase 
> 172d4073000, 

Re: panic: kernel diagnostic assertion "p->p_wchan == NULL" failed

2024-02-28 Thread Mark Kettenis
> Date: Wed, 28 Feb 2024 16:16:09 +0300
> From: Vitaliy Makkoveev 
> 
> On Wed, Feb 28, 2024 at 12:36:26PM +0100, Claudio Jeker wrote:
> > On Wed, Feb 28, 2024 at 12:26:43PM +0100, Marko Cupać wrote:
> > > Hi,
> > > 
> > > thank you for looking into it, and for the advice.
> > > 
> > > On Wed, 28 Feb 2024 10:13:06 +
> > > Stuart Henderson  wrote:
> > > 
> > > > Please try to re-type at least the most important bits from a
> > > > screenshot so readers can quickly see which subsystems are involved.
> > > 
> > > Below is manual transcript of whole screenshot, hopefully no typos.
> > > 
> > > If you have any advice on what should I do if it happens again in order
> > > to get as much info for debuggers as possible, please let me know.
> > > 
> > > splassert: assertwaitok: want 0 have 4
> > > panic: kernel diagnostic assertion "p->p_wchan == NULL" failed: file 
> > > "/usr/src/sys/kern/kern_sched.c", line 373
> > > Stopped at db_enter+0x14: popq %rbp
> > >TIDPID  UID   PRFLAGS  PFLAGS  CPU  COMMAND
> > > 199248  36172  577  0x10   01  openvpn
> > > 490874  474460   0x14000   0x2002  wg_handshake
> > >  71544   93110   0x14000   0x2003  softnet0
> > > db_enter() at db_enter+0x14
> > > panic(820a4b9f) at panic+0xc3
> > > __assert(82121fcb,8209ae5f,175,82092fbf) at 
> > > assert+0x29
> > > sched_chooseproc() at sched_chooseproc+0x26d
> > > mi_switch() at mi_switch+0x17f
> > > sleep_finish(0,1) at sleep_finish+0x107
> > > rw_enter(88003cf0,2) at rw_enter+0x1ad
> > > noise_remote_ready(88003bf0) at noise_remote_ready+0x33
> > > wg_qstart(fff80a622a8) at wg_qstart+0x18c
> > > ifq_serialize(80a622a8,80a62390) at ifq_serialize+0xfd
> > > hfsc_deferred(80a62000) at hfsc_deferred+0x68
> > > softclock_process_tick_timeout(8115e248,1) at 
> > > softclock_process_tick_timeout+0xfb
> > > softclock(0) at softclock+0xb8
> > > softintr_dispatch(0) at softintr_dispatch+0xeb
> > > end trace frame: 0x800020dbc730, count:0
> > > 
> > 
> > WTF! wg(4) is just broken. How the hell should a sleeping rw_lock work
> > when called from inside a timeout aka softclock? This is interrupt context
> > code is not allowed to sleep there.
> > 
> 
> Not only wg(4). Depends on interface queue usage, ifq_start() schedules
> (*if_qstart)() or calls it, so all the interfaces with use rwlock(9) in
> (*if_qstart)() handler are in risk.
> 
> What about to always schedule (*if_qstart)()?

Why would you want to introduce additional latence?

> Index: sys/net/hfsc.c
> ===
> RCS file: /cvs/src/sys/net/hfsc.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 hfsc.c
> --- sys/net/hfsc.c11 Apr 2023 00:45:09 -  1.49
> +++ sys/net/hfsc.c28 Feb 2024 13:15:22 -
> @@ -953,8 +953,7 @@ hfsc_deferred(void *arg)
>   if (!HFSC_ENABLED(ifq))
>   return;
>  
> - if (!ifq_empty(ifq))
> - ifq_start(ifq);
> + ifq_start_deferred(ifq);
>  
>   hif = ifq_q_enter(>if_snd, ifq_hfsc_ops);
>   if (hif == NULL)
> Index: sys/net/ifq.c
> ===
> RCS file: /cvs/src/sys/net/ifq.c,v
> retrieving revision 1.53
> diff -u -p -r1.53 ifq.c
> --- sys/net/ifq.c 10 Nov 2023 15:51:24 -  1.53
> +++ sys/net/ifq.c 28 Feb 2024 13:15:22 -
> @@ -133,6 +133,12 @@ ifq_start(struct ifqueue *ifq)
>   } else
>   task_add(ifq->ifq_softnet, >ifq_bundle);
>  }
> +void
> +ifq_start_deferred(struct ifqueue *ifq)
> +{
> + if (ifq_len(ifq))
> + task_add(ifq->ifq_softnet, >ifq_bundle);
> +}
>  
>  void
>  ifq_start_task(void *p)
> Index: sys/net/ifq.h
> ===
> RCS file: /cvs/src/sys/net/ifq.h,v
> retrieving revision 1.41
> diff -u -p -r1.41 ifq.h
> --- sys/net/ifq.h 10 Nov 2023 15:51:24 -  1.41
> +++ sys/net/ifq.h 28 Feb 2024 13:15:22 -
> @@ -430,6 +430,7 @@ void   ifq_destroy(struct ifqueue *);
>  void  ifq_add_data(struct ifqueue *, struct if_data *);
>  int   ifq_enqueue(struct ifqueue *, struct mbuf *);
>  void  ifq_start(struct ifqueue *);
> +void  ifq_start_deferred(struct ifqueue *);
>  struct mbuf  *ifq_deq_begin(struct ifqueue *);
>  void  ifq_deq_commit(struct ifqueue *, struct mbuf *);
>  void  ifq_deq_rollback(struct ifqueue *, struct mbuf *);
> 
> 



Re: Different lm attaching?

2024-02-14 Thread Mark Kettenis
> Date: Wed, 14 Feb 2024 10:48:15 +
> From: Laurence Tratt 
> 
> It seems that I have two (at least) lm devices on my motherboard and that
> it's random which attaches. Here are the two I've seen:
> 
>   lm0 at isa0 port 0x290/8: W83627DHG
>   lm0 at isa0 port 0x290/8: NCT6792D
> 
> The W83627DHG gives one fan reading, with an obviously incorrect value:
> 
>   $ sysctl hw|grep fan
>   hw.sensors.lm0.fan0=56250 RPM
> 
> The NCT6792D gave more than one, and seemingly correct, fan readings in
> `sysctl hw`. From memory there were at least fan readings for the CPU and
> rear fan, both were showing in the range 350-600 RPM when idling, and as
> soon as I made the CPU do some work the readings went up, and when the CPU
> stopped doing some work the readings went down.
> 
> The reason I'm being vague about that is that I have only noticed the
> NCT6792D attaching once, so I can't give those fan readings now. AFAICT the
> W83627DHG nearly always attaches: out of 19 dmesgs I've (accidentally)
> stored over many months, only one contains "lm0...NCT6792D".
> 
> I'm attaching a dmesg from a kernel built from -current yesterday in case
> this is useful, though it's from an W83627DHG attach.

It is probably a misdetection.  There are some heuristics involved in
detecting the chip.  And there may even be a 2nd agent here (IPMI,
SMM) that may interfere with the code that tries to detect the chip.

Not much we can do about that.

> OpenBSD 7.4-current (GENERIC.MP) #18: Tue Feb 13 16:07:15 GMT 2024
> ltr...@overdrive.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 68431765504 (65261MB)
> avail mem = 66336387072 (63263MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.5 @ 0x75a58000 (104 entries)
> bios0: vendor American Megatrends Inc. version "1801" date 12/08/2023
> bios0: ASUS ROG STRIX Z790-H GAMING WIFI
> efi0 at bios0: UEFI 2.8
> efi0: American Megatrends rev 0x5001b
> acpi0 at bios0: ACPI 6.4
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT FIDT SSDT SSDT SSDT SSDT HPET APIC MCFG SSDT 
> NHLT LPIT SSDT SSDT DBGP DBG2 SSDT DMAR FPDT SSDT SSDT SSDT UEFI UEFI BGRT 
> WPBT TPM2 PHAT WSMT
> acpi0: wakeup devices PEG1(S4) PEGP(S4) PEGP(S4) PEG0(S4) PEGP(S4) RP09(S4) 
> PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) PXSX(S4) RP13(S4) 
> PXSX(S4) RP14(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 13th Gen Intel(R) Core(TM) i9-13900K, 5902.40 MHz, 06-b7-01, patch 
> 011f
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,RRSBA,OVERCLOCK,GDS_NO,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 36MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.1.0.1.0.1, IBE
> cpu1 at mainbus0: apid 8 (application processor)
> cpu1: 13th Gen Intel(R) Core(TM) i9-13900K, 5902.54 MHz, 06-b7-01, patch 
> 011f
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,TAA_NO,MISC_PKG_CT,ENERGY_FILT,DOITM,SBDR_SSDP_N,FBSDP_NO,PSDP_NO,RRSBA,OVERCLOCK,GDS_NO,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 2MB 64b/line 
> 16-way L2 cache, 36MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 4, package 0
> cpu2 at mainbus0: apid 16 (application processor)
> cpu2: 13th Gen Intel(R) Core(TM) i9-13900K, 5802.40 MHz, 06-b7-01, patch 
> 011f
> cpu2: 
> 

Re: TSO em(4) problem

2024-01-28 Thread Mark Kettenis
> Date: Sun, 28 Jan 2024 10:44:25 +0100
> From: Marcus Glocker 
> 
> On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote:
> 
> > On 27.1.2024. 21:01, Marcus Glocker wrote:
> > > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote:
> > > 
> > >> On 26.1.2024. 21:56, Marcus Glocker wrote:
> > >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
> > >>>
> > >>>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
> > >>>> production.
> > >>>>
> > >>>> Setup is very simple
> > >>>>
> > >>>> em0 - carp <- uplink
> > >>>> em1 - pfsync
> > >>>> ix1 - vlans - carp
> > >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of
> > >>> that box?  You can mask the IPs if it's too sensitive.
> > >>>
> > >>> I still try to reproduce the issue here, and for now I can't.
> > >>> Maybe in your full ifconfig output I can see some specifics about your
> > >>> configuration, which makes it more likely to reproduce the issue here.
> > >>>
> > >> Hi,
> > >>
> > >> here's ifconfig from second setup where watchdog is triggered much 
> > >> faster.
> > >> Originally in this setup uplink is ix0, I've change that to em0 to see
> > >> would the problem be same as in other setup and it is, and that's good
> > >> because this is pfsync setup for students and I can do whatever I want
> > >> with it :)
> > > Thanks.
> > > 
> > > But still, I can do whatever I want on my em(4) I210 box, carp(4),
> > > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which
> > > you are seeing :-(  I'm not sure if this is something related to your
> > > I350.
> > > 
> > > Also, I can't understand why the watchdog still triggers when you disable
> > > TSO by setting net.inet.tcp.tso=0.
> > > 
> > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet,
> > > while EM_TSO_SIZE (65535) is one byte less, can you please apply this
> > > diff to -current and test it?  I doubt it will make a difference, but
> > > I'm running a bit out of ideas here.
> > 
> > 
> > Hi,
> > 
> > with this diff I'm still getting em watchdog
> > 
> > Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185
> > TDT 120
> 
> Thanks for testing again.
> 
> I think we might have a generic problem with TSO with the current em(4)
> code and some chips.  Referring to this recent FreeBSD commit.
> 
> e1000: disable TSO on lem(4) and em(4):
> Disable TSO on lem(4) and em(4) until a ring stall can be debugged.
> https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180
> 
> Can you try this diff to specifically disable TSO for I350 please?
> 
> We will need to discuss internally which way to go.  I see those
> options currently:
> 
> - Entirely pull out the TSO diff.
> - Leave the TSO code in but disable TSO for now (what FreeBSD did).
> - Leave the TSO code in but disable TSO only for chips we see issues
>   with (this diff).

Frankly, I think it is time to just pull the diff.  Between this issue
and the sparc64 unaligned access thing there is just too much breakage
for relatively little gain (since this is only a gigabit Ethernet).

Cheers,

Mark


> Index: if_em.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_em.c,v
> diff -u -p -u -p -r1.370 if_em.c
> --- if_em.c   31 Dec 2023 08:42:33 -  1.370
> +++ if_em.c   28 Jan 2024 09:30:59 -
> @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc)
>   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
>   ifp->if_capabilities |= IFCAP_CSUM_IPv4;
>   ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
> - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
> + /* XXX: Enabling TSO on I350 causes watchdogs */
> + if (sc->hw.mac_type != em_i350)
> + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
>   }
>  
>   /* 
> 
> 



Re: Trouble with console on UART

2024-01-27 Thread Mark Kettenis
> Date: Sat, 27 Jan 2024 18:49:02 +
> From: Mikolaj Kucharski 
> 
> Mark,
> 
> On Sat, Jan 27, 2024 at 03:05:10PM +0100, Mark Kettenis wrote:
> > > Date: Sat, 27 Jan 2024 22:47:05 +0900
> > > From: stephane Tranchemer 
> > > 
> > > Hello Jonathan,
> > > 
> > > made a kernel with the patch and here is what I get on dmesg:
> > > 
> > > puc0 at pci0 dev 26 function 0 "Intel C3000 UART" rev 0x11: ports: 16 com
> > > com4 at puc0 port 0 apic 2 int 16: ns16550a, 16 byte fifo
> > > 
> > > so now it seems to get the com port, however when I type "set tty com4" 
> > > on the boot prompt I get the same result than before, the system freezes 
> > > (or more accurately the input/output goes into the limbo).
> > > 
> > > I am missing something here ?
> > 
> > Try typing "mach comaddr 0xe060" before "set tty com4".
> > 
> 
> How did you know that address to specify?

>From pcidump output the OP sent me.



Re: Trouble with console on UART

2024-01-27 Thread Mark Kettenis
> Date: Sat, 27 Jan 2024 22:47:05 +0900
> From: stephane Tranchemer 
> 
> Hello Jonathan,
> 
> made a kernel with the patch and here is what I get on dmesg:
> 
> puc0 at pci0 dev 26 function 0 "Intel C3000 UART" rev 0x11: ports: 16 com
> com4 at puc0 port 0 apic 2 int 16: ns16550a, 16 byte fifo
> 
> so now it seems to get the com port, however when I type "set tty com4" 
> on the boot prompt I get the same result than before, the system freezes 
> (or more accurately the input/output goes into the limbo).
> 
> I am missing something here ?

Try typing "mach comaddr 0xe060" before "set tty com4".



Re: Trouble with console on UART

2024-01-27 Thread Mark Kettenis
> Date: Sat, 27 Jan 2024 19:54:43 +0900 (JST)
> From: stran...@free.fr
> 
> >Synopsis:Console is lost at boot when com0 is on a UART PCI  
> >Category:system amd64
> >Environment:
>   System  : OpenBSD 7.4
>   Details : OpenBSD 7.4 (GENERIC.MP) #2: Fri Dec  8 15:39:04 MST 2023
>
> r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> Looking for a replacement for Soekris or PCengines machines, I chose a Qotom 
> mini-pc featured in a Servethehome video.
> 
> I chose the 8GB RAM 256GB SSD, Q20321G9 C3558R model
> 
> My intent is to use it as a OpenBSD router, so once I get it I started to 
> play with it.
> 
> Making a USB boot key from install74.img with Etcher (on a windows 
> workstation, sue me) I booted without problem after setting up the boot order 
> in the Bios/UEFI.Interestingly it comes with a preinstalled Windows install 
> without activation number on the SSD, well I just flushed it all.
> 
> The 2.5G and 10 SFP+ interfaces are seen as igc and ix interfaces, great.
> 
> Now there is the problem I stumbled into, it is the console port.
> 
> first, it is not enabled by default, you have to go into the Bios/UEFI to 
> enable it (meaning connecting a USB keyboard and a VGA monitor) and it 
> presents as such in the menus with a toggle to Enable/Disable:
> COM0(Pci Bus0,Dev26,Func0) 
> and also some nice options to change like the type of console or speed.
> 
> Doing so you get your display redirected on the console, fantastic.
> 
> However when you boot your OpenBSD you get this on the console:
> Using drive 0, partition 3.
> Loading..probing: pc0 mem[620K 993M 928M 91M 852K 3M 6144M a20=on]
> disk: hd0+
> >> OpenBSD/amd64 BOOT 3.65
> boot>booting hd0a:/bsd: 17241420+4137992+368672+0+1241088 
> [1340879+128+1321080+101331
> 
> And nothing more, your main display is on the VGA monitor, expected since the 
> redirecting of the tty on the console is not done.
> 
> In all logic I then tried to boot OpenBSD with 
> set tty com0
> But when doing this here is what you get:
> boot> set tty com0
> switching console to com0
> 
> And that's it... no more access to your keyboard and the console is lost.
> 
> Booting the OS completely here's what we can see on dmesg
> "Intel C3000 UART" rev 0x11 at pci0 dev 26 function 0 not configured
> 
> So it seems that from the moment you try telling to use the com0 port you 
> loose all access... this UART thing is not properly recognized.
> 
> For comparison on a PCengine machine:
> com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> The com port there is ISA bus
> 
> Is there something I'm missing to catch the console or enable it in OpenBSD, 
> or is it a non-supported trouble.

Pretty much the latter.

Now it may be possible to turn that into supported trouble with some
minor changes to the code if you're able to test some patches for me.

As a first step can you send me:

* The full dmesg for this machine
* The acpidump output for this machine (tar up everything in /var/db/acpi)
* The output of pcidump -vxxx

Cheers,

Mark



Re: openbsd 7.4 after install kernel panic on m2 macbook

2024-01-15 Thread Mark Kettenis
> From: 소랑개 
> Date: Mon, 15 Jan 2024 12:57:30 +0900
> 
> hi
> 
> reporting problem about openbsd 7.4 on arm64 m2 macbook.
> 
> i used asahi install script and install complete.
> 
> but after booting..
> 
> show kernel panic all time
> 
> check my screenshot
> 
> thank you

The current asahi install script installs a newer version of the
touchpad firmware that the OpenBSD 7.4 release kernel can't handle.
This is fixed in -current, so the easiest thing to do is to just
install a snapshot.  There are some nice goodies in the pipeline for
-current so unless you absolutely need to run a stable relase, this is
a good idea anyway!



Re: Supported iwn device is not configured on ARM64

2024-01-15 Thread Mark Kettenis
> Date: Mon, 15 Jan 2024 00:17:53 -0800
> From: Mike Larkin 
> 
> On Mon, Jan 15, 2024 at 08:58:52AM +0100, Mizsei Zoltán wrote:
> > Thanks, that did the trick, see new dmesg below. Would it possible to 
> > enable iwn* in the upstream sources?
> >
> > Best Regards,
> > --Zoltan
> >
> 
> I think that should be doable. Mark, Patrick, any objections (and if no, do we
> want iwm in there too?)

If we add iwn(4), we probably should add iwm(4) too.

I think I had some worries that these Intel wireless cards were
somehow closely tied to Intel chipsets and therefore adding them made
only sense for amd64.  But iwx(4) works and if iwn(4) works, I think
we cane safely assume that iwm(4) should work as well.

So no objection from me.

> > linkstar$ uname -a
> > OpenBSD linkstar.extrowerk.com 7.4 GENERIC.MP#1 arm64
> > linkstar$ dmesg
> > OpenBSD 7.4 (GENERIC.MP) #1: Mon Jan 15 04:02:12 CET 2024
> > szil...@linkstar.extrowerk.com:/sys/arch/arm64/compile/GENERIC.MP
> > real mem  = 3959590912 (3776MB)
> > avail mem = 3759493120 (3585MB)
> > random: good seed from bootblocks
> > mainbus0 at root: HINLINK OPC-H68K Board
> > psci0 at mainbus0: PSCI 1.1, SMCCC 1.2, SYSTEM_SUSPEND
> > efi0 at mainbus0: UEFI 2.7
> > efi0: EDK2 rev 0x1
> > smbios0 at efi0: SMBIOS 3.3.0
> > smbios0: vendor EDK2 version "miq" date 12/16/2023
> > smbios0: Firefly Firefly ROC-RK3568-PC
> > cpu0 at mainbus0 mpidr 0: ARM Cortex-A55 r2p0
> > cpu0: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu0: 512KB 64b/line 16-way L2 cache
> > cpu0: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > cpu1 at mainbus0 mpidr 100: ARM Cortex-A55 r2p0
> > cpu1: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu1: 512KB 64b/line 16-way L2 cache
> > cpu1: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > cpu2 at mainbus0 mpidr 200: ARM Cortex-A55 r2p0
> > cpu2: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu2: 512KB 64b/line 16-way L2 cache
> > cpu2: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > cpu3 at mainbus0 mpidr 300: ARM Cortex-A55 r2p0
> > cpu3: 32KB 64b/line 4-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu3: 512KB 64b/line 16-way L2 cache
> > cpu3: 
> > DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL,LRCPC,DPB,ASID16,PAN+ATS1E1,LO,HPDS,VH,HAFDBS,SBSS
> > scmi0 at mainbus0: SCMI 2.0
> > apm0 at mainbus0
> > agintc0 at mainbus0 mbi shift 4:4 nirq 352 nredist 4 ipi: 0, 1, 2: 
> > "interrupt-controller"
> > syscon0 at mainbus0: "syscon"
> > rkiovd0 at syscon0
> > syscon1 at mainbus0: "syscon"
> > syscon2 at mainbus0: "syscon"
> > syscon3 at mainbus0: "syscon"
> > syscon4 at mainbus0: "syscon"
> > syscon5 at mainbus0: "syscon"
> > syscon6 at mainbus0: "syscon"
> > rkclock0 at mainbus0: PMUCRU
> > rkclock1 at mainbus0: CRU
> > syscon7 at mainbus0: "power-management"
> > "power-controller" at syscon7 not configured
> > syscon8 at mainbus0: "qos"
> > syscon9 at mainbus0: "qos"
> > syscon10 at mainbus0: "qos"
> > syscon11 at mainbus0: "qos"
> > syscon12 at mainbus0: "qos"
> > syscon13 at mainbus0: "qos"
> > syscon14 at mainbus0: "qos"
> > syscon15 at mainbus0: "qos"
> > syscon16 at mainbus0: "qos"
> > syscon17 at mainbus0: "qos"
> > syscon18 at mainbus0: "qos"
> > syscon19 at mainbus0: "qos"
> > syscon20 at mainbus0: "qos"
> > syscon21 at mainbus0: "qos"
> > syscon22 at mainbus0: "qos"
> > syscon23 at mainbus0: "qos"
> > syscon24 at mainbus0: "qos"
> > syscon25 at mainbus0: "qos"
> > syscon26 at mainbus0: "qos"
> > syscon27 at mainbus0: "qos"
> > syscon28 at mainbus0: "qos"
> > syscon29 at mainbus0: "qos"
> > syscon30 at mainbus0: "qos"
> > syscon31 at mainbus0: "qos"
> > rkcomphy0 at mainbus0
> > rkcomphy1 at mainbus0
> > rkusbphy0 at mainbus0: phy 0
> > rkusbphy1 at mainbus0: phy 1
> > rkpinctrl0 at mainbus0: "pinctrl"
> > rkgpio0 at rkpinctrl0
> > rkgpio1 at rkpinctrl0
> > rkgpio2 at rkpinctrl0
> > rkgpio3 at rkpi

Re: vmm guest crash in vio

2024-01-09 Thread Mark Kettenis
> From: Dave Voutila 
> Date: Tue, 09 Jan 2024 09:19:56 -0500
> 
> Stefan Fritsch  writes:
> 
> > On 08.01.24 22:24, Alexander Bluhm wrote:
> >> Hi,
> >> When running a guest in vmm and doing ifconfig operations on vio
> >> interface, I can crash the guest.
> >> I run these loops in the guest:
> >> while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
> >> while doas ifconfig vio1 -inet; do :; done
> >> while doas ifconfig vio1 down; do :; done
> >> And from host I ping the guest:
> >> ping -f 10.188.234.74
> >
> > I suspect there is a race condition in vmd. The vio(4) kernel driver
> > resets the device and then frees all the mbufs from the tx and rx
> > rings. If vmd continues doing dma for a bit after the reset, this
> > could result in corruption. From this code in vmd's vionet.c
> >
> > case VIODEV_MSG_IO_WRITE:
> > /* Write IO: no reply needed */
> > if (handle_io_write(, dev) == 1)
> > virtio_assert_pic_irq(dev, 0);
> > break;
> >
> > it looks like the main vmd process will just send a pio write message
> > to the vionet process but does not wait for the vionet process to
> > actually execute the device reset. The pio write instruction in the
> > vcpu must complete after the device reset is complete.
> 
> Are you saying we need to wait for the emulation of the OUT instruction
> that the vcpu is executing? I don't believe we should be blocking the
> vcpu here as that's not how port io works with real hardware. It makes
> no sense to block on an OUT until the device finishes emulation.

Well, I/O address space is highly synchronous.  See 16.6 "Ordering
I/O" in the Intel SDM.  There it clearly states that execution of the
next instruction after an OUT instruction is delayed intil the store
completes.  Now that isn't necessarily the same as completing all
device emulation for the device.  But it does mean the store has to
reach the device register before the next instruction gets executed.

Yes, this is slow.  Avoid I/O address space if you can; use
Memory-Mapped I/O instead.

> I *do* think there could be something wrong in the device status
> register emulation, but blocking the vcpu on an OUT isn't the way to
> solve this. In fact, that's what previously happened before I split
> device emulation out into subprocesses...so if there's a bug in the
> emulation logic, it was hiding it.
> 
> >
> > I could not reproduce this issue with kvm/qemu.
> >
> 
> Thanks!
> 
> >
> >> Then I see various kind of mbuf corruption:
> >> kernel: protection fault trap, code=0
> >> Stopped at  pool_do_put+0xc9:   movq0x8(%rcx),%rcx
> >> ddb> trace
> >> pool_do_put(82519e30,fd807db89000) at pool_do_put+0xc9
> >> pool_put(82519e30,fd807db89000) at pool_put+0x53
> >> m_extfree(fd807d330300) at m_extfree+0xa5
> >> m_free(fd807d330300) at m_free+0x97
> >> soreceive(fd806f33ac88,0,80002a3e97f8,0,0,80002a3e9724,76299c799030
> >> 1bf1) at soreceive+0xa3e
> >> soo_read(fd807ed4a168,80002a3e97f8,0) at soo_read+0x4a
> >> dofilereadv(80002a399548,7,80002a3e97f8,0,80002a3e98c0) at 
> >> dofilere
> >> adv+0x143
> >> sys_read(80002a399548,80002a3e9870,80002a3e98c0) at 
> >> sys_read+0x55
> >> syscall(80002a3e9930) at syscall+0x33a
> >> Xsyscall() at Xsyscall+0x128
> >> end of kernel
> >> end trace frame: 0x7469f8836930, count: -10
> >> pool_do_put(8259a500,fd807e7fa800) at pool_do_put+0xc9
> >> pool_put(8259a500,fd807e7fa800) at pool_put+0x53
> >> m_extfree(fd807f838a00) at m_extfree+0xa5
> >> m_free(fd807f838a00) at m_free+0x97
> >> m_freem(fd807f838a00) at m_freem+0x38
> >> vio_txeof(80030118) at vio_txeof+0x11d
> >> vio_tx_intr(80030118) at vio_tx_intr+0x31
> >> virtio_check_vqs(80024800) at virtio_check_vqs+0x102
> >> virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
> >> intr_handler(80002a52dae0,80081000) at intr_handler+0x3c
> >> Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
> >> Xspllower() at Xspllower+0x1d
> >> vio_ioctl(800822a8,80206910,80002a52dd00) at vio_ioctl+0x16a
> >> ifioctl(fd807c0ba7a0,80206910,80002a52dd00,80002a41c810) at 
> >> ifioctl
> >> +0x721
> >> sys_ioctl(80002a41c810,80002a52de00,80002a52de50) at 
> >> sys_ioctl+0x2a
> >> b
> >> syscall(80002a52dec0) at syscall+0x33a
> >> Xsyscall() at Xsyscall+0x128
> >> end of kernel
> >> end trace frame: 0x7b3d36d55eb0, count: -17
> >> panic: pool_do_get: mcl2k free list modified: page
> >> 0xfd80068bd000; item add
> >> r 0xfd80068bf800; offset 0x0=0xa != 0x83dcdb591c6b8bf
> >> Stopped at  db_enter+0x14:  popq%rbp
> >>  TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >> *143851  19121  0 0x3  00  ifconfig
> >> db_enter() at db_enter+0x14
> >> 

Re: panic: aml_die aml_loadtable:3746 when booting 7.4/amd64 under Hyper-V/UEFI

2023-12-21 Thread Mark Kettenis
> Date: Wed, 20 Dec 2023 12:00:47 +0100
> From: Henryk Paluch 
> 
> Hello!
> 
>  > Ah, cool.  That is a bit of a heck though.  I did look into what is
>  > needed to fix this properly.  If I send you a diff, can you test it?
> 
> Feel free to send me patches. I will test them.
> 
> Best regards
>--Henryk Paluch

Can you try the attached diff?


Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.425
diff -u -p -r1.425 acpi.c
--- dev/acpi/acpi.c 8 Jul 2023 08:01:10 -   1.425
+++ dev/acpi/acpi.c 21 Dec 2023 16:37:18 -
@@ -1104,16 +1104,16 @@ acpi_attach_common(struct acpi_softc *sc
printf(" !DSDT");
 
p_dsdt = entry->q_table;
-   acpi_parse_aml(sc, p_dsdt->aml, p_dsdt->hdr_length -
-   sizeof(p_dsdt->hdr));
+   acpi_parse_aml(sc, NULL, p_dsdt->aml,
+   p_dsdt->hdr_length - sizeof(p_dsdt->hdr));
 
/* Load SSDT's */
SIMPLEQ_FOREACH(entry, >sc_tables, q_next) {
if (memcmp(entry->q_table, SSDT_SIG,
sizeof(SSDT_SIG) - 1) == 0) {
p_dsdt = entry->q_table;
-   acpi_parse_aml(sc, p_dsdt->aml, p_dsdt->hdr_length -
-   sizeof(p_dsdt->hdr));
+   acpi_parse_aml(sc, NULL, p_dsdt->aml,
+   p_dsdt->hdr_length - sizeof(p_dsdt->hdr));
}
}
 
Index: dev/acpi/dsdt.c
===
RCS file: /cvs/src/sys/dev/acpi/dsdt.c,v
retrieving revision 1.264
diff -u -p -r1.264 dsdt.c
--- dev/acpi/dsdt.c 9 Dec 2021 20:21:35 -   1.264
+++ dev/acpi/dsdt.c 21 Dec 2023 16:37:18 -
@@ -634,8 +634,9 @@ __aml_search(struct aml_node *root, uint
 
SIMPLEQ_INIT(>son);
SIMPLEQ_INSERT_TAIL(>son, node, sib);
+   return node;
}
-   return node;
+   return NULL;
 }
 
 /* Get absolute pathname of AML node */
@@ -3742,8 +3743,6 @@ aml_loadtable(struct acpi_softc *sc, con
struct acpi_dsdt *p_dsdt;
struct acpi_q *entry;
 
-   if (strlen(rootpath) > 0)
-   aml_die("LoadTable: RootPathString unsupported");
if (strlen(parameterpath) > 0)
aml_die("LoadTable: ParameterPathString unsupported");
 
@@ -3755,8 +3754,8 @@ aml_loadtable(struct acpi_softc *sc, con
strncmp(hdr->oemtableid, oemtableid,
sizeof(hdr->oemtableid)) == 0) {
p_dsdt = entry->q_table;
-   acpi_parse_aml(sc, p_dsdt->aml, p_dsdt->hdr_length -
-   sizeof(p_dsdt->hdr));
+   acpi_parse_aml(sc, rootpath, p_dsdt->aml,
+   p_dsdt->hdr_length - sizeof(p_dsdt->hdr));
return aml_allocvalue(AML_OBJTYPE_DDBHANDLE, 0, 0);
}
}
@@ -4520,10 +4519,18 @@ parse_error:
 }
 
 int
-acpi_parse_aml(struct acpi_softc *sc, uint8_t *start, uint32_t length)
+acpi_parse_aml(struct acpi_softc *sc, const char *rootpath,
+uint8_t *start, uint32_t length)
 {
+   struct aml_node *root = _root;
struct aml_scope *scope;
struct aml_value res;
+
+   if (rootpath) {
+   root = aml_searchname(_root, rootpath);
+   if (root == NULL)
+   aml_die("Invalid RootPathName %s\n", rootpath);
+   }
 
aml_root.start = start;
memset(, 0, sizeof(res));
Index: dev/acpi/dsdt.h
===
RCS file: /cvs/src/sys/dev/acpi/dsdt.h,v
retrieving revision 1.80
diff -u -p -r1.80 dsdt.h
--- dev/acpi/dsdt.h 2 Apr 2023 11:32:48 -   1.80
+++ dev/acpi/dsdt.h 21 Dec 2023 16:37:18 -
@@ -56,8 +56,8 @@ void  aml_walktree(struct aml_node *);
 
 void   aml_find_node(struct aml_node *, const char *,
int (*)(struct aml_node *, void *), void *);
-intacpi_parse_aml(struct acpi_softc *, u_int8_t *,
-   uint32_t);
+intacpi_parse_aml(struct acpi_softc *, const char *,
+   u_int8_t *, uint32_t);
 void   aml_register_notify(struct aml_node *, const char *,
int (*)(struct aml_node *, int, void *), void *,
int);



Re: panic: aml_die aml_loadtable:3746 when booting 7.4/amd64 under Hyper-V/UEFI

2023-12-20 Thread Mark Kettenis
> Date: Wed, 20 Dec 2023 09:30:45 +0100
> From: Henryk Paluch 
> 
> Hello!
> 
> Problem fixed! I resolved ACPI panic when booting OpenBSD7.4 as guest VM 
> under Hyper-V Server 2012R2 in UEFI (Generation 2) mode with this simple 
> patch:
> 
> --- usr/src/sys/dev/acpi/dsdt.c.orig  Tue Dec 19 07:49:12 2023
> +++ usr/src/sys/dev/acpi/dsdt.c   Wed Dec 20 07:43:05 2023
> @@ -3742,7 +3742,7 @@
>   struct acpi_dsdt *p_dsdt;
>   struct acpi_q *entry;
> 
> - if (strlen(rootpath) > 0)
> + if (strlen(rootpath) > 1 || ( strlen(rootpath)==1 && *rootpath != '\\') 
> )
>   aml_die("LoadTable: RootPathString unsupported");
>   if (strlen(parameterpath) > 0)
>   aml_die("LoadTable: ParameterPathString unsupported");


Ah, cool.  That is a bit of a heck though.  I did look into what is
needed to fix this properly.  If I send you a diff, can you test it?

> The 7.4 kernel booted fine and I was able to install OpenBSD over serial 
> console (I was unable to make working efifb0 console). Here are relevant 
> boot messages from ACPI:
> 
> cpihve0 at acpi0
> "ACPI0004" at acpi0 not configured
> "VMBus" at acpi0 not configured
> "Hyper_V_Gen_Counter_V1" at acpi0 not configured
> acpicmos0 at acpi0
> com0 at acpi0 UAR1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at acpi0 UAR2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
> acpicpu at acpi0 not configured
> pvbus0 at mainbus0: Hyper-V 6.3
> hyperv0 at pvbus0: protocol 3.0, features 0xc7f
> hyperv0: heartbeat, kvp, shutdown, timesync
> hvn0 at hyperv0 channel 12: NVS 5.0 NDIS 6.30, address 00:15:5d:00:33:03
> hvs0 at hyperv0 channel 13: scsi, protocol 6.0
> scsibus0 at hvs0: 2 targets
> sd0 at scsibus0 targ 0 lun 0:  
> naa.600224806339816dd00df20d64df290b
> sd0: 20480MB, 512 bytes/sector, 41943040 sectors, thin
> sd1 at scsibus0 targ 0 lun 1:  
> naa.60022480d40507eafb74508ae0298284
> sd1: 664MB, 512 bytes/sector, 1360832 sectors, thin
> pci0 at mainbus0 bus 0
> isa0 at mainbus0
> efifb0 at mainbus0: 1024x768, 32bpp
> wsdisplay at efifb0 not configured
> softraid0 at root
> scsibus1 at softraid0: 256 targets
> root on rd0a swap on rd0b dump on rd0b
> 
> 
> Unpatched kernel panics on this:
> 
>  > acpihve0 at acpi0
>  > LoadTable: RootPathString unsupported
>  > 0034 Called: \_SB_._INI
>  > 0034 Called: \_SB_._INI
>  > panic: aml_die aml_loadtable:3746
> 
> Here is relevant part of DSDT ACPI table that causes panic on stock 
> kernel (dumped and decompiled on similar Linux guest with Intel acpi tools):
> 
> /*
>   * Intel ACPI Component Architecture
>   * AML/ASL+ Disassembler version 20220331 (64-bit version)
>   * Copyright (c) 2000 - 2022 Intel Corporation
>   *
>   * Disassembling to symbolic ASL+ operators
>   *
>   * Disassembly of dsdt.dat, Tue Dec 19 19:55:19 2023
>   *
>   * Original Table Header:
>   * Signature"DSDT"
>   * Length   0x0D8E (3470)
>   * Revision 0x02
>   * Checksum 0x65
>   * OEM ID   "MSFTVM"
>   * OEM Table ID "DSDT01"
>   * OEM Revision 0x0001 (1)
>   * Compiler ID  "MSFT"
>   * Compiler Version 0x0400 (67108864)
>   */
> DefinitionBlock ("", "DSDT", 2, "MSFTVM", "DSDT01", 0x0001)
> {
>  Scope (_SB)
>  {
>  Method (_INI, 0, NotSerialized)  // _INI: Initialize
>  {
>  If ((SCFG > Zero))
>  {
>  LoadTable ("OEM1", "MSFTVM", "UARTS", "\\", "", Zero)
>  }
> 
>  If ((BFLG & 0x02))
>  {
>  LoadTable ("OEMP", "MSFTVM", "SPCI", "\\", "", Zero)
>  }
>  }
>  }
> 
>  OperationRegion (BIOS, SystemMemory, 0x7FDD7000, 0xFF)
>  Field (BIOS, ByteAcc, NoLock, Preserve)
> 
> 
> 
> Personally I'm fine with that solution. Hoping that it may help anybody 
> else using OpenBSD on Hyper-V Gen2 mode.
> 
> Best regards
>--Henryk Paluch
> 
> 
> On 12/19/23 18:36, Henryk Paluch wrote:
> > Hello!
> > 
> > I was able to gather additional information - using small path for stock 
> > OpenBSD 7.4/amd64 kernel to print few debug messages. Additionally I 
> > used  ACPI tools under Linux guest (under same Hyper-V in UEFI mode) to 
> > get details on ACPI tables.
> > 
> > The kernel patch is this primitive:
> > 
> > diff -u /usr/src/sys/dev/acpi/dsdt.c{.orig,}
> > --- /usr/src/sys/dev/acpi/dsdt.c.orig    Tue Dec 19 07:49:12 2023
> > +++ /usr/src/sys/dev/acpi/dsdt.c    Tue Dec 19 17:59:29 2023
> > @@ -3742,11 +3742,23 @@
> >   struct acpi_dsdt *p_dsdt;
> >   struct acpi_q *entry;
> > 
> > -    if (strlen(rootpath) > 0)
> > -    aml_die("LoadTable: RootPathString unsupported");
> > +    printf("HP4\n");
> > +    if (strlen(rootpath) > 0){
> > +    aml_showvalue(parameterdata);
> > +    aml_die("LoadTable: RootPathString unsupported: rootpath='%s', "
> > +    "sign='%s', oemid='%s', oemtableid='%s', "
> > +    "ppath='%s'\n",
> 

Re: arm64 panic: malloc: out of space in kmem_map

2023-11-15 Thread Mark Kettenis
> Date: Thu, 9 Nov 2023 13:21:09 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> During make build my arm64 machine with 32 CPUs crashed.

Next time this happens, please include "show malloc" output.

> ddb{24}> x/s version
> version:OpenBSD 7.4-current (GENERIC.MP) #16: Fri Nov  3 21:38:55 MDT 
> 2023\012
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP\012
> 
> ddb{24}> show panic
>  cpu0: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 698
>  cpu31: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu30: pool_do_get: pted free list modified: page 0xff81baba8000; item 
> addr 0xff81baba8298; offset 0x10=0x19ebd001
>  cpu29: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 895
>  cpu28: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu27: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu26: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu25: pool_do_get: vp: page empty
>  cpu24: pool_do_get: vp: page empty
>  cpu23: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu22: pool_do_get: pted: page empty
> *cpu21: malloc: out of space in kmem_map
>  cpu20: pool_do_get: rwobjpl: page empty
>  cpu19: pool_do_get: anonpl: page empty
>  cpu18: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu17: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu16: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu15: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu14: pool_do_get: vp: page empty
>  cpu13: pmap_pte_insert: have a pted, but missing a vp for 4afaaf2c3 va pmap 
> 0xff81aa0685e8
>  cpu12: pool_do_get: vp: page empty
>  cpu10: attempt to access user address 0x30 from EL1
>  cpu9: pool_do_put: pted: double pool_put: 0xff81afa52f30
>  cpu8: pool_do_get: anonpl: page empty
>  cpu7: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu6: pool_do_get: anonpl: page empty
>  cpu5: pool_do_get: vp: page empty
>  cpu4: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu3: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu2: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu1: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 698
> 
> ddb{24}> show panic
>  cpu0: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 698
>  cpu31: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu30: pool_do_get: pted free list modified: page 0xff81baba8000; item 
> addr 0xff81baba8298; offset 0x10=0x19ebd001
>  cpu29: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
> rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", 
> line 895
>  cpu28: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
> rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
> "/usr/src/sys/uvm/uvm_vnode.c", line 953
>  cpu27: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu26: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu25: pool_do_get: vp: page empty
>  cpu24: pool_do_get: vp: page empty
>  cpu23: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
>  cpu22: pool_do_get: pted: page empty
> *cpu21: malloc: out of space in kmem_map

[PATCH] mg: batch mode leaves terminal in insane mode

2023-10-23 Thread Mark Willson
Hi Folks,

When mg(1) is run in batch mode (-b), at exit the terminal is left in a
non-sane state. Entering ^Jstty sane^J at the shell prompt corrects the
display. This behaviour was observed on OpenBSD 7.4.

The following patch fixes the issue:

--- main.c.orig 2023-10-23 19:07:36.007621269 +0100
+++ main.c  2023-10-23 19:08:15.622234480 +0100
@@ -167,8 +167,10 @@
ffclose(ffp, NULL);
}

-   if (batch)
+   if (batch) {
+   vttidy();
return (0);
+   }

/*
 * Now ensure any default buffer modes from the startup file are

Best Regards,
Mark
--
Mark Willson



Re: FS bit on sstatus csr set on riscv64

2023-09-21 Thread Mark Kettenis
> Date: Thu, 21 Sep 2023 10:23:45 +0200
> From: "Peter J. Philipp" 
> 
> Hi,
> 
> I don't know if it's the same on Sifive based CPU's but on the D1
> (doesn't boot beyond main() yet) the FS bits are set.  These are floating
> point indicators, and I thought these should be off?  In my debugs I have
> found this:
> 
> 10100111 p
> 80026100
> 
> that is the respective binary and hex register that the CSR gave on my D1.
> I have turned this off in locore.S by unsetting the bits in CSR.  it's
> just 2 instructions more.
> 
> Please have a look in page 39 of this RISCV-privileged (2021) document:
> https://mainrechner.de/riscv-privileged-20211203.pdf
> 
> It is the same bit offset in mstatus and sstatus.
> 
> On the D1 after the CPU is reset the FP bits go back to 0, meaning that on
> its depressive boot-life the FS bits have been turned on.
> 
> to check this I would add a debugging printf high in pmap_bootstrap() that
> looks like so:
> 
>status = csr_read(sstatus);
>printf("sstatus: %lX\n", status);
> 
> Principally I can do this too but it would take me some time changing source
> trees and recompiling.
> 
> To turn floating point off, I have set this in locore.S:
> 
> /* turn off any possible FP bits set */
> li  t0, SSTATUS_FS_MASK
> csrcsstatus, t0
> 
> under the pagetable END.
> 
> Best Regards,
> -peter
> 
> PS: If you would like me to keep D1 stuff to myself without relaying findings 
>   back to you let me know.  I know we don't use floating point code
>   in the kernel whatsoever.  Am I wrong?

Right.  This probably fixes itself later, but it is probably best to
clear this early on.  We do clear the FS bits for the secondary CPUs
in cpu_start_secondary().

Need to think what the best place to this would be.  But somewhere in
initriscv() is probably good enough.



Re: RISCV - physmem is an address not pages in locore.S

2023-09-17 Thread Mark Kettenis
> Date: Sun, 17 Sep 2023 12:40:29 +0200
> From: "Peter J. Philipp" 

Sorry Peter,

But this doesn't make any sense to me.  Your C code is just as
unreadable as the assembly code ;)

And your explanation doesn't make sense.  The code works fine on
existing hardware supported by OpenBSD.  Your previous mails were also
high on speculation and low on facts.

Cheers,

Mark

> Hi OpenBSD/riscv64'ers!
> 
> After a week of debugging a different issue I noticed this issue with the 
> L2 cache in locore.S:
> 
> The physical address of the base boot memory is held in register s9,
> and this is shifted by the L2 cache code by 21 to the right.  In order to
> make 2 MiB offsets.  However, I have found in my research that the algorithm
> is flawed a little.  It expects pages not an address on s9.  I wrote this
> program to understand the algorithm better.  And I wrote it in C and it should
> be an exact duplication of the asm code.  Please point out if it isn't.
> 
> Here is the output.  I'm attaching the program after this it's colour coded
> so you can see it better.  As you can see with the first output there is
> bits in the PTE beyond PPN[1] in PPN[2], in the L2 cache.  In the second 
> output which ends at the same address the bits are perfectly aligned in 
> PPN[1].
> 
> pjp@polarstern$ ./l2shit | tail
> sd 1FB80003(0110111011) to 1014FB0
> sd 1FC3(011111) to 1014FB8
> sd 1FC80003(0111001011) to 1014FC0
> sd 1FD3(0111010011) to 1014FC8
> sd 1FD80003(0111011011) to 1014FD0
> sd 1FE3(000011) to 1014FD8
> sd 1FE80003(001011) to 1014FE0
> sd 1FF3(010011) to 1014FE8
> sd 1FF80003(011011) to 1014FF0
> sd 2003(100011) to 1014FF8
> pjp@polarstern$ ./l2shit pages | tail  
> sd 0FB3(0010110011) to 1014FB0
> sd 0FB80003(0010111011) to 1014FB8
> sd 0FC3(001111) to 1014FC0
> sd 0FC80003(0011001011) to 1014FC8
> sd 0FD3(0011010011) to 1014FD0
> sd 0FD80003(0011011011) to 1014FD8
> sd 0FE3(0011100011) to 1014FE0
> sd 0FE80003(0011101011) to 1014FE8
> sd 0FF3(000011) to 1014FF0
> sd 0FF80003(001011) to 1014FF8
> 
> 
> /*
> 
>  94 lla s1, pagetable_l2
>  95 srlit4, s9, L2_SHIFT
>  96 li  t2, 512
>  97 add t3, t4, t2
>  98 li  t0, (PTE_KERN | PTE_X)
>  99 1:
> 100 sllit2, t4, PTE_PPN1_S
> 101 or  t5, t0, t2
> 102 sd  t5, (s1) 
> 103 addis1, s1, PTE_SIZE
> 104
> 105 addit4, t4, 1
> 106 bltut4, t3, 1b
> 107
> 
> */
> 
> #include 
> #include 
> #include 
> 
> #define P_KERN0x1 /* not real */
> #define P_X   0x2 /* not real */
> 
> char *
> binary(ulong t5)
> {
>   static char ret[1280];
>   int i = 0;
> 
>   ret[0] = '\0';
> 
>   for (i = 53; i >= 0; i--) {
>   switch (i) {
>   case (53 - 26):
>   strlcat(ret,"", sizeof(ret));
>   break;
>   case (53 - 26 - 9):
>   strlcat(ret,"", sizeof(ret));
>   break;
>   case (53 - 26 - 9 - 9):
>   strlcat(ret,"", sizeof(ret));
>   break;
>   default:
>   //strlcat(ret,"", sizeof(ret));
>   break;
>   }
> 
>   if (t5 & (1UL << i)) {
>   strlcat(ret, "1", sizeof(ret));
>   } else {
>   strlcat(ret, "0", sizeof(ret));
>   }
>   }
>   
>   return ([0]);
> }
>   
>   

Re: sysupgrade doesn't work headless on Thinkcentre m910q

2023-09-09 Thread Mark Kettenis
a few seconds
>   later. I don't know if the upgrade issue could be to do with a screwy
>   ACPI implementation?)
> 
>   For now I've been working around this using the manual "untar it over
>   the running system" method, as I need this box to be headless.
> 
>   Let me know if there's more info I can supply. Cheers.

Unlikely to get fixed unless you manage to capture the dmesg of the
hang somehow on serial console.  The dmesg suggests that your machine
may support AMT, which means you could try setting up Serial-over-LAN.

Cheers,

Mark

> dmesg:
> OpenBSD 7.3-current (GENERIC.MP) #1352: Wed Aug 23 10:44:51 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17045000192 (16255MB)
> avail mem = 16508690432 (15743MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xdcd7f000 (88 entries)
> bios0: vendor LENOVO version "M1AKT56A" date 07/27/2022
> bios0: LENOVO 10MUS2UG00
> efi0 at bios0: UEFI 2.5
> efi0: American Megatrends rev 0x5000c
> acpi0 at bios0: ACPI 6.1
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG SSDT FIDT SLIC MSDM SSDT SSDT HPET 
> SSDT UEFI SSDT LPIT WSMT SSDT SSDT DBGP DBG2 LUFT ASF!
> acpi0: wakeup devices PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) PEGP(S4) 
> SIO1(S3) RP09(S4) PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) 
> PXSX(S4) RP13(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz, 2394.42 MHz, 06-5e-03, patch 
> 00f0
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 4-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz, 2394.42 MHz, 06-5e-03, patch 
> 00f0
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 4-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz, 2394.42 MHz, 06-5e-03, patch 
> 00f0
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 4-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz, 2394.42 MHz, 06-5e-03, patch 
> 00f0
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,L

Re: RLIMIT_CPU doesn't work reliably on mostly idle systems

2023-08-29 Thread Mark Kettenis
> Date: Tue, 29 Aug 2023 19:15:14 +0200
> From: Claudio Jeker 
> 
> On Tue, Aug 29, 2023 at 01:01:10AM +, Eric Wong wrote:
> > >Synopsis: RLIMIT_CPU doesn't work reliably on mostly idle systems
> > >Category: system
> > >Environment:
> > System  : OpenBSD 7.3
> > Details : OpenBSD 7.3 (GENERIC.MP) #1242: Sat Mar 25 18:04:31 MDT 
> > 2023
> >  
> > dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.octeon
> > Machine : octeon
> > >Description:
> > 
> > RLIMIT_CPU doesn't work reliably when few/no syscalls are made on an
> > otherwise idle system (aside from the test process using up CPU).
> > It can take 20-50s to fire SIGKILL with rlim_max=9 (and the SIGXCPU
> > from rlim_cur=1 won't even fire).
> > 
> > I can reproduce this on a private amd64 VM and also on gcc231
> > on GCC compiler farm .
> > I can't reproduce this on a busy system like gcc220 on cfarm,
> > however.
> 
> Thanks for the report. There is indeed an issue in how the CPU time is
> accounted on an idle system. The below diff is a possible fix.
> 
> In roundrobin() force a resched and therefor mi_switch() when
> SPCF_SHOULDYIELD is set. On an idle CPU mi_switch() will just do all
> accounting bits but skip the expensive cpu_switchto() since the proc
> remains the same.

A bit of a hack, but probably better than trying to account for
spc_runtime at the point where we check the limit.

Also this will call smr_idle() sooner, which may help speed up smr?

ok kettenis@

> Index: kern/sched_bsd.c
> ===
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> retrieving revision 1.84
> diff -u -p -r1.84 sched_bsd.c
> --- kern/sched_bsd.c  29 Aug 2023 16:19:34 -  1.84
> +++ kern/sched_bsd.c  29 Aug 2023 16:20:03 -
> @@ -106,7 +106,7 @@ roundrobin(struct clockintr *cl, void *c
>   }
>   }
>  
> - if (spc->spc_nrun)
> + if (spc->spc_nrun || spc->spc_schedflags & SPCF_SHOULDYIELD)
>   need_resched(ci);
>  }
>  
> 
> 



Re: vmd amd64 snapshot, crash in acpiopen triggerred by apm -b

2023-08-06 Thread Mark Kettenis
> Date: Sun, 6 Aug 2023 14:36:30 +0200
> From: Tobias Heider 
> 
> On Sun, Aug 06, 2023 at 07:55:40AM +0200, Anton Lindqvist wrote:
> > On Sat, Aug 05, 2023 at 10:08:53PM +0200, xavie...@mailoo.org wrote:
> > > Hi,
> > > 
> > > I run a 2G/100G virtual machine at openbsd.amsterdam freshly upgraded
> > > from stable to the latest snapshot and I've figured out the panic
> > > by the two steps detailed there:
> > > 
> > > 1. The system has a root @reboot crontab entry that start a tmux
> > > session in the background (so always detached from a TTY during the
> > > whole procedure) + a /root/.tmux.conf which is some copy of my usual
> > > tmux confi, which appears to call a script that does `apm -b` (we have
> > > our quick workaround by removing it).
> > > 
> > > The tmux session and the programs ran inside started just fine at the
> > > exception of the tmux session itself. By attaching that special
> > > session created @reboot, I noticed that tmux somehow fallback'd on the
> > > builtin's default config. (Green bottom status-bar and defaults
> > > keybinds). Which indeed indicated me that something went wrong.
> > > 
> > > 2. It's only when I started tmux manually that the .tmux.conf calling
> > > `apm -b` triggerred the crash:
> > > 
> > > # tmux ^M
> > > campfire.01:ksh*    <--   my "on-top" status-bar was loaded this time
> > > uvm_fault(0xfd8078416cf0, 0x39c, 0, 2) -> e
> > > kernel: page fault trap, code=2
> > > Stopped at  acpiopen+0x85:  orb $0x1,0x39c(%r13)
> > >     TID    PID    UID PRFLAGS PFLAGS  CPU  COMMAND
> > > *173406  19781  0 0x2  0    0  apm
> > > acpiopen(5300,1,2000,80000b08) at acpiopen+0x85
> > > spec_open(800021648598) at spec_open+0xe0
> > > VOP_OPEN(fd803bb6bcb0,1,fd80691bf550,80000b08) at
> > > VOP_OPEN+0x4e
> > > 
> > > vn_open(8000216487b0,1,0) at vn_open+0x275
> > > doopenat(80000b08,ff9c,f9805daef3b,0,0,800021648980)
> > > at doopena
> > > t+0x1d1
> > > syscall(8000216489f0) at syscall+0x364
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x775e645c8040, count: 8
> > 
> > This looks like a regression introduced in the recent acpi_apm.c
> > extraction in which the ENXIO short circuit got lost in
> > acpi{open,close,ioctl}.
> > 
> > 
> > https://github.com/openbsd/src/commit/c75690924c3df592a3a5078fe57c951f808a8350
> > 
> 
> Urgh yes, thanks for tracking this down.  We are clearly missing at
> least a few checks here. I am working on getting this reproduced
> meanwhile here is a first diff to hopefully fix the crash.

ok kettenis@

> Index: dev/acpi/acpi_apm.c
> ===
> RCS file: /mount/openbsd/cvs/src/sys/dev/acpi/acpi_apm.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 acpi_apm.c
> --- dev/acpi/acpi_apm.c   8 Jul 2023 14:44:43 -   1.2
> +++ dev/acpi/acpi_apm.c   6 Aug 2023 12:29:56 -
> @@ -47,6 +47,9 @@ acpiopen(dev_t dev, int flag, int mode, 
>   struct acpi_softc *sc = acpi_softc;
>   int s;
>  
> + if (sc == NULL)
> + return (ENXIO);
> +
>   s = splbio();
>   switch (APMDEV(dev)) {
>   case APMDEV_CTL:
> @@ -82,6 +85,9 @@ acpiclose(dev_t dev, int flag, int mode,
>   struct acpi_softc *sc = acpi_softc;
>   int s;
>  
> + if (sc == NULL)
> + return (ENXIO);
> +
>   s = splbio();
>   switch (APMDEV(dev)) {
>   case APMDEV_CTL:
> @@ -106,6 +112,9 @@ acpiioctl(dev_t dev, u_long cmd, caddr_t
>   struct apm_power_info *pi = (struct apm_power_info *)data;
>   int s;
>  
> + if (sc == NULL)
> + return (ENXIO);
> +
>   s = splbio();
>   /* fake APM */
>   switch (cmd) {
> @@ -167,6 +176,9 @@ acpikqfilter(dev_t dev, struct knote *kn
>  {
>   struct acpi_softc *sc = acpi_softc;
>   int s;
> +
> + if (sc == NULL)
> + return (ENXIO);
>  
>   switch (kn->kn_filter) {
>   case EVFILT_READ:
> 
> 



Re: taskq_next_work: page fault trap when staring Xfce

2023-08-02 Thread Mark Kettenis
> Date: Wed, 2 Aug 2023 14:11:36 +1000
> From: Jonathan Gray 
> 
> On Mon, Jul 31, 2023 at 10:48:12PM +1000, Jonathan Gray wrote:
> > On Sun, Jul 30, 2023 at 03:21:47PM +0900, YASUOKA Masahiko wrote:
> > > Hello,
> > > 
> > > I got new vaio last week, the machine seems to have the same graphic
> > > 
> > >   inteldrm0 at pci0 dev 2 function 0 "Intel Graphics" rev 0x04
> > >   drm0 at inteldrm0
> > >   inteldrm0: msi, ALDERLAKE_P, gen 12
> > > 
> > > and has the same problem.  I found having Option "PageFlip" "off" in
> > > /etc/X11/xorg.conf can workaround the problem.
> > > 
> > >   Section "Device"
> > >   Identifier  "Card0"
> > >   Driver  "modesetting"
> > >   BusID   "PCI:0:2:0"
> > >   Option  "PageFlip" "off"
> > >   EndSection
> > 
> > running GENERIC I got the following with xfce.
> > 
> > matches the trace in an earlier report from sthen@
> > https://marc.info/?l=openbsd-bugs=168234057913478=2
> > 
> > dpt_insert_entries+0xbc: movl 0x34(%r8),%r10d
> > r8  0x81938fe0
> > r10 0x1000
> > 
> >0x81ab0bc3 <+179>:   mov%r8,%rcx
> >0x81ab0bc6 <+182>:   add$0x20,%rcx
> >0x81ab0bca <+186>:   je 0x81ab0be8 
> > 
> >0x81ab0bcc <+188>:   mov0x34(%r8),%r10d
> >0x81ab0bd0 <+192>:   test   %r10d,%r10d
> >0x81ab0bd3 <+195>:   je 0x81ab0be8 
> > 
> > 
> > (gdb) info line *0x81ab0bcc
> > Line 34 of "/sys/dev/pci/drm/i915/i915_scatterlist.h"
> >starts at address 0x81ab0bc1 
> >and ends at 0x81ab0bd5 .
> > 
> > if (dma && s.sgp && sg_dma_len(s.sgp) == 0) {
> > 
> > dpt_insert_entries+0xbc
> > dpt_bind_vma+0x64
> > i915_vma_bind+0x317
> > i915_vma_pin_ww+0x44b
> > intel_plane_pin_fb+0x25c
> > intel_prepare_plane_pin_fb+0x12c
> > drm_atomic_helper_prepare_planes+0x5b
> > intel_atomic_commit+0xda
> > drm_atomic_helper_page_flip+0x77
> > drm_mode_page_flip_ioctl+0x466
> > drm_do_ioctl+0x285
> > drmioctl+0xdc
> > VOP_IOCTL+0x57
> > vn_ioctl+0x6c
> 
> The fix is to not reset the end of list marker when
> assigning a page.

The Linux version retains the end marker, so this fix appears to be correct.

ok kettenis@

> Index: sys/dev/pci/drm/include/linux/scatterlist.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/include/linux/scatterlist.h,v
> retrieving revision 1.5
> diff -u -p -r1.5 scatterlist.h
> --- sys/dev/pci/drm/include/linux/scatterlist.h   1 Jan 2023 01:34:58 
> -   1.5
> +++ sys/dev/pci/drm/include/linux/scatterlist.h   2 Aug 2023 04:02:02 
> -
> @@ -119,7 +119,6 @@ sg_set_page(struct scatterlist *sgl, str
>   sgl->dma_address = page ? VM_PAGE_TO_PHYS(page) : 0;
>   sgl->offset = offset;
>   sgl->length = length;
> - sgl->end = false;
>  }
>  
>  #define sg_dma_address(sg)   ((sg)->dma_address)
> 
> 



Re: Samsung NVMe M.2 SSD 970 EVO Plus fails to attach on VisionFive 2 (JH7110 SoC) board

2023-07-28 Thread Mark Kettenis
> Date: Fri, 28 Jul 2023 14:32:30 +0200
> From: develo...@robert-palm.de
> 
> Zitat von Miguel Landaeta :
> 
> >> Synopsis:  Samsung NVMe M.2 SSD 970 EVO Plus fails to attach on  
> >> VisionFive 2 (JH7110 SoC) board
> >> Category:  riscv64
> >> Environment:
> > System  : OpenBSD 7.3
> > Details : OpenBSD 7.3-current (GENERIC.MP) #377: Fri Jul 14  
> > 04:39:21 MDT 2023
> >  
> > dera...@riscv64.openbsd.org:/usr/src/sys/arch/riscv64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.riscv64
> > Machine : riscv64
> >> Description:
> > Samsung NVMe M.2 SSD 970 EVO Plus fails to attach on VisionFive 2  
> > (JH7110 SoC) board
> >
> >
> > I just got a Samsung NVMe M.2 SSD 970 EVO Plus to test the recently added
> > support for PCIE devices to JH7110 SoC but it has not been working correctly
> > with this disk.
> >
> > The behavior I'm observing is a little erratic, the NVMe disk only attached
> > correctly like in 1 of 10 or more boot attempts.
> >
> > Only a couple of times worked OK, but most of the times one of the following
> > is observed:
> >
> > - No nvme0 device detected during autoconf phase, nothing related to the
> >   device shows up in dmesg and no sd0 device is attached. When this
> >   happens the board boots OK and SD/MMC devices are detected and attached.
> >
> > - nvme0 device is detected during autoconf, sd0 device attaches but boot
> >   hangs. Looks like kernel never reaches diskconf() or if it reached it
> >   something is preventing the kernel from print the typical message:
> >
> > root on sd0a (062aeb9d33543517.a) swap on sd0b dump on sd0b
> >
> > - nvme0 device appears in dmesg but the device fails to attach with the
> >   following message:
> >
> > nvme0 at pci3 dev 0 function 0 "Samsung SM981/PM981 NVMe" rev 0x00:  
> > unable to map registers
> >
> > - To workaround this I'm just booting the kernel with -c option to disable
> >   nvme driver in UKC and proceed with the boot.
> >
> >
> > I tried to debug more by building a kernel with DEBUG option set to  
> > gather more
> > info but unfortunately if I boot such a kernel my board gets stuck very 
> > early
> > in the boot process just after printing how much real memory is available.
> >
> > I'm more than happy to provide more info if required or to try patches if
> > that helps to troubleshoot the issue.
> >
> > Thanks.
> > Miguel.
> >
> 
> 
> Someone assumes it has to do with a delay:
> 
> At a guess the very large and very fast (very higher power) NVMe  
> devices draw so much current that they are glitching the power and  
> clocks of the VF2, and it needs an extra delay beyond what the  
> specification suggests from them to both stabilise to before the NVMe  
> can be accessed.
> 
> http://forum.rvspace.org/t/unlocking-new-possibilities-starfive-visionfive-2-sbc-now-supports-tianocore-edk-ii-uefi/2779/44?u=rpx
> 
> 
> Is this the place to look for in OpenBSD ?
> 
> https://github.com/openbsd/src/blob/master/sys/dev/ic/nvme.c
> 
> Maybe anybody knows how to change this delay ?

Might be worth trying a kernel with this diff then:

Index: arch/riscv64/dev/stfpcie.c
===
RCS file: /cvs/src/sys/arch/riscv64/dev/stfpcie.c,v
retrieving revision 1.1
diff -u -p -r1.1 stfpcie.c
--- arch/riscv64/dev/stfpcie.c  8 Jul 2023 10:06:13 -   1.1
+++ arch/riscv64/dev/stfpcie.c  28 Jul 2023 13:19:28 -
@@ -430,7 +430,7 @@ stfpcie_attach(struct device *parent, st
 * active at least 100ms after power up.  Since we may have
 * just powered on the device, play it safe and use 100ms.
 */
-   delay(10);
+   delay(30);
 
/* Deassert PERST#. */
gpio_controller_set_pin(reset_gpio, 0);



Re: could there be a breach of license in efiboot?

2023-07-10 Thread Mark Kettenis
> Date: Mon, 10 Jul 2023 08:44:20 +0100
> From: Stuart Henderson 
> 
> On 2023/07/10 05:22, Peter J. Philipp wrote:
> > Redistributions in binary form must reproduce the above copyright
> > notice, this list of conditions and the following disclaimer in
> > the documentation and/or other materials provided with the
> > distribution.
> 
> > This should be included on all the efiboot distributions on install disks.
> 
> IANAL, but I don't get anything from that text suggesting that it has to
> be included _on_ the install image, just "provided with".
> 
> Seems to me that the source tree, which includes that list, is provided
> with the distribution.
> 
> > Here is another license:
> > 
> > https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/stand/efi/include/efi.h?rev=1.1=text/plain
> > 
> > /*++
> > 
> > Copyright (c)  1999 - 2002 Intel Corporation. All rights reserved
> > This software and associated documentation (if any) is furnished
> > under a license and may only be used or copied in accordance
> > with the terms of the license. Except as permitted by such
> > license, no part of this software or documentation may be
> > reproduced, stored in a retrieval system, or transmitted in any
> > form or by any means without the express written consent of
> > Intel Corporation.
> 
> This refers to "a license" but doesn't state it, they're talking about
> the same one mentioned above aren't they?

Yes.  IIRC correctly this is code that originally came from Intel's
"Tiano" EFI implementation that evolved into EDK and EDKII and other
projects under the TianoCore umbrella.  At the point in time that code
was taken by FreeBSD it was licensed under the license in the README
quoted above.  It has been relicensed a few times under a BSD-2-Clause
and BSD-2-Clause-Patent license.

> (I'm not sure efi.h really has anything copyrightable in anyway though?)

Well, most of the other headers carry the same notice.  The headers
themselves are certainly copyrightable.  But all they provide is the
UEFI interface definitions.  So it could be argued that no actual code
under this license ends up in our EFI boot loader.

We should probably replace this code with something newer at some
point.  Not only because of the somewhat obscure license but also
because we'll need newer UEFI features at some point.  For the kernel
we already have .  Extending that one to include the
bits that we use in our EFI bootloaders shouldn't be too much work.

Cheers,

Mark



Re: pardon me

2023-07-07 Thread Mark Kettenis
> Date: Fri, 7 Jul 2023 15:30:37 +0200
> From: "Peter J. Philipp" 
> 
> I'm looking into considering adding pins for the mango pi SBC (riscv64) and
> noticed this little file that has no license:
> 
> --->
> riscv64# head /sys/dev/fdt/sxipio_pins.h
> /* Public Domain */
> 
> 
> const struct sxipio_pin sun4i_a10_pins[] = {
> { SXIPIO_PIN(A, 0), {
> { "gpio_in", 0 },
> { "gpio_out", 1 },
> { "emac", 2 },
> { "spi1", 3 },
> { "uart2", 4 },
> <---
> 
> Where does this file come from?  how is it generated?

https://github.com/kettenis/sxipins

> If anyone also knows the pins for the mango pi D1 in form of
> documentation anywhere (perhaps you're working on it or not) and
> wants to share I'd be grateful.

The docs for the allwinner SoCs tends to be publically available and
contain the information about the pins.



Re: ifconfig sbar hang

2023-06-28 Thread Mark Kettenis
> Date: Mon, 26 Jun 2023 22:28:42 +0200
> From: Alexander Bluhm 
> 
> Hi,
> 
> I have an ifconfig on ix(4) that hangs in "sbar" wait queue during
> "starting network" while booting.
> 
> load: 3.00  cmd: ifconfig 52949 [sbar] 0.01u 0.05s 0% 78k
> 
> ddb{0}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  52949  250855  50082  0  3 0x3  sbar  ifconfig
>  50082  468479  32384  0  30x10008b  sigsusp   sh
>  52583  256132  23859 77  30x100092  kqreaddhcpleased
>  26314 670  23859 77  30x100092  kqreaddhcpleased
>  23859  213684  1  0  30x80  kqreaddhcpleased
>   1084  413649  97426115  30x100092  kqreadslaacd
>  79640  480435  97426115  30x100092  kqreadslaacd
>  97426  244636  1  0  30x100080  kqreadslaacd
>  32384  389946  1  0  30x10008b  sigsusp   sh
>  25127  139046  0  0  3 0x14200  bored smr
>  38562   94707  0  0  3 0x14200  pgzerozerothread
>  27589   65355  0  0  3 0x14200  aiodoned  aiodoned
>  20876  273172  0  0  3 0x14200  syncerupdate
>  35865  394897  0  0  3 0x14200  cleaner   cleaner
>  89296   37410  0  0  3 0x14200  reaperreaper
>   4195   18701  0  0  3 0x14200  pgdaemon  pagedaemon
>  70794   65241  0  0  3 0x14200  usbtskusbtask
>  42580  105576  0  0  3 0x14200  usbatsk   usbatsk
>  969136418  0  0  3  0x40014200  acpi0 acpi0
>  43860  163896  0  0  1 0x14200idle7
>   9928  477713  0  0  7  0x40014200idle6
>  19947  457773  0  0  7  0x40014200idle5
>  71017  110610  0  0  7  0x40014200idle4
>  73733  294276  0  0  7  0x40014200idle3
>  73085  302072  0  0  7  0x40014200idle2
>  89634  211435  0  0  7  0x40014200idle1
>  45877  221411  0  0  2  0x40014200sensors
>  41433  306787  0  0  3 0x14200  bored softnet3
>  85227  338038  0  0  3 0x14200  bored softnet2
>  72032  215983  0  0  3 0x14200  netlock   softnet1
>  32550  351943  0  0  3 0x14200  bored softnet0
>  11993  408132  0  0  2  0x40014200systqmp
>  58738  210334  0  0  3 0x14200  netlock   systq
>  70352  115696  0  0  3  0x40014200  netlock   softclock
> *95768  350377  0  0  7  0x40014200idle0
>  1  298699  0  0  30x82  wait  init
>  0   0 -1  0  3 0x10200  scheduler swapper
> 
> ifconfig holds the netlock, I guess this prevents progress.

What does a WITNESS kernel report?

> ddb{0}> trace /t 0t250855
> sleep_finish(8000248a3928,1) at sleep_finish+0x102
> cond_wait(8000248a39c0,8207c985) at cond_wait+0x64
> sched_barrier(80002253fff0) at sched_barrier+0x77
> ixgbe_stop(80776000) at ixgbe_stop+0x1f7
> ixgbe_init(80776000) at ixgbe_init+0x36
> ixgbe_ioctl(80776048,8020690c,80842500) at ixgbe_ioctl+0x13e
> in_ifinit(80776048,80842500,8000248a3cf0,1) at 
> in_ifinit+0x
> f3
> in_ioctl_change_ifaddr(8040691a,8000248a3ce0,80776048) at 
> in_ioctl_
> change_ifaddr+0x390
> ifioctl(fd8746c878f8,8040691a,8000248a3ce0,80002487ab00) at 
> ifioctl
> +0x988
> sys_ioctl(80002487ab00,8000248a3df0,8000248a3e50) at 
> sys_ioctl+0x2c
> 4
> syscall(8000248a3ec0) at syscall+0x3d4
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x74aea7fb4da0, count: -12
> 
> systqmp is here, it may wait for the scheduler lock.
> 
> ddb{0}> trace /t 0t408132
> sched_barrier_task(8000248a39b8) at sched_barrier_task+0x7e
> taskq_thread(824ac758) at taskq_thread+0x100
> end trace frame: 0x0, count: -2
> 
> sensors thread seems to wait for scheduler lock, too.
> 
> ddb{0}> trace /t 0t221411
> sched_peg_curproc(80002253fff0) at sched_peg_curproc+0x69
> cpu_hz_update_sensor(80002253fff0) at cpu_hz_update_sensor+0x21
> sensor_task_work(80366700) at sensor_task_work+0x48
> taskq_thread(80362100) at taskq_thread+0x100
> 
> ddb{0}> show struct __mp_lock sched_lock
> struct sched_lock at 0x8250fa54 (520 bytes) {mpl_cpus = 10144565, 
> mpl_t
> icket = 0x9acb36, mpl_users = 0x9acb35}
> 
> systq is blocked by netlock
> 
> ddb{0}> trace /t 0t210334
> sleep_finish(8000247ab030,1) at sleep_finish+0x102
> rw_enter(824a4fe8,1) at rw_enter+0x1cf
> pf_purge(824bb760) at pf_purge+0x38
> taskq_thread(824ac708) at taskq_thread+0x100
> end trace frame: 0x0, count: -4
> 
> bluhm
> 
> 



Re: ARM64 installation with new snapshots not possible any longer

2023-06-20 Thread Mark Kettenis
> Date: Tue, 20 Jun 2023 09:31:58 +0200
> From: develo...@robert-palm.de
> 
> Hi,
> 
> I noticed that an ARM64 installation with latest snapshots is not  
> possible any longer in hetzner cloud arm64 instances (ampere altra).
> 
> Last snapshot working is  
> https://ftp.hostserver.de/archive/2023-06-18-0105/snapshots/arm64/miniroot73.img
> 
> Later snapshots get stuck at "scsibus1 at softraid0: 256 targets".
> 
> So it does not get to "root on rd0a swap on rd0b dump on rd0b" any more.
> 
> Think there were 2 or 3 changes related to arm64 between (17-Jun-2023  
> 06:36) and (18-Jun-2023 19:59) that might cause this.
> 
> Please, can you have a look at it?

The most likely candidate is:

  CVSROOT:/cvs
  Module name:src
  Changes by: kette...@cvs.openbsd.org2023/06/18 10:25:21
  
  Modified files:
  sys/arch/arm64/dev: agintc.c 
  
  Log message:
  Remove spurious comment.
  
  ok patrick@

Can you try reverting that change and see of the resulting kernel boots?

Also, I'd like to understand why you're hitting this case.  Can you
show a dmesg from the last working kernel?



Re: wsdisplay_switch2: not switching

2023-05-28 Thread Mark Kettenis
> Date: Sun, 28 May 2023 12:08:35 +
> From: Klemens Nanni 
> 
> Snapshots with 'disable inteldrm' to reduce corruption/hangs on a
> Intel T14 gen 3 always print the following on shutdown/reboot:
> 
>   syncing disks... done
>   wsdisplay_switch2: not switching
>   rebooting...
> 
> Unmodified bsd.mp does not show this.
> 
> It is always a single "wsdisplay_switch2: not switching" line, i.e. never
> "wsdisplay_switch1" or "wsdisplay_switch3" as wsdisplay also provides.
> 
> I do not observe any other misbehaviour wrt. this, reboot/shutdown works.
> 
> Is this a bug or expected behaviour when manually forcing efifb(4) in UKC?
> The wsdisplay code returns EINVAL when logging this, so it reads like an
> error case to me, but I don't know anything about wsdisplay.

Should not happen, but the code in question is a bit a maze that even
I don't understand.

Feel free to debug what is going wrong.

> OpenBSD 7.3-current (GENERIC.MP) #1203: Sat May 27 09:44:55 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 51214807040 (48842MB)
> avail mem = 49642991616 (47343MB)
> User Kernel Config
> UKC> disable inteldrm
> 240 inteldrm* disabled
> UKC> exit
> Continuing...
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
> bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
> bios0: LENOVO 21AHCTO1WW
> efi0 at bios0: UEFI 2.7
> efi0: Lenovo rev 0x1110
> acpi0 at bios0: ACPI 6.3
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT 
> SSDT SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB 
> DMAR SSDT SSDT SSDT ASF! BGRT PHAT UEFI FPDT
> acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
> XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
> RP03(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
> cpu1 at mainbus0: apid 8 (application processor)
> cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 4, package 0
> cpu2 at mainbus0: apid 16 (application processor)
> cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu2: smt 0, core 8, package 0
> cpu3 at mainbus0: apid 24 (application processor)
> cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
> cpu3: 
> 

Re: Hetzner arm64 Cloud

2023-04-18 Thread Mark Kettenis
> Date: Tue, 18 Apr 2023 16:21:54 +1000
> From: David Gwynne 
> 
> On Sun, Apr 16, 2023 at 11:39:33PM +0200, Patrick Wildt wrote:
> > You can also simply dd the image to /dev/sda and reboot, but that still
> > doesn't solve the problem.  The bootup is hard to debug because the
> > console is KVM and uses viogpu.  As soon as we exit the EFI bootservices
> > the framebuffer is shut down for whatever reason.  Means we can only get
> > access to it again through viogpu, which happens pretty late.  I wish we
> > had a serial console, because Qemu/edk2 can do it, they just don't make
> > it available.  This is gonna be "fun" to debug without serial.
> 
> i dont think the problem here is booting openbsd, but if it were the
> diff below might help.
> 
> this diff teaches BOOTAA64.EFI to load files from the EFI System
> Partition that the boot loader was run from. this means you can go
> "boot esp0a:bsd.rd" at the boot> prompt and get into the installer.
> 
> i wrote this cos i wanted another option for getting openbsd installed
> on machines where the boot loader and driver support arent that
> great yet. i can imagine it being useful for upgrading the OS on a
> system where it's difficult to plug install media in, or repartitioning
> or overwriting the disk is risky. especially if you also just want to
> check how well the hardware is supported in openbsd before making
> changes.

Oh, I've wanted this for a while.  This would allow us to integrate
OpenBSD in the Asahi installer by providing a zip file with our
bootloader and bsd.rd.  The installer will unzip that on the ESP for
us, so with a little bit of additional logic in our bootloader that
could boot straight into the OpenBSD installer.

Code looks reasonable to me.  We probably should add this to the armv7
and riscv64 bootloaders as well, but we can worry about that later.

ok kettenis@

> > On Sat, Apr 15, 2023 at 11:33:39AM +0100, Chris Narkiewicz wrote:
> > > 
> > > I asked Hetzner to import install73.img and mounted it as VM CD-ROM,
> > > but it doesn't boot. I'm not sure if this is a bug either.
> > > 
> > > Cheers,
> > > Chris Narkiewic
> > > 
> > > On Thu, 2023-04-13 at 16:16 +, Mikolaj Kucharski wrote:
> > > > Hi,
> > > > 
> > > > I'm not sure does this belong to bugs@
> > > > 
> > > > However what I used in the past was Yaifo and I still use it every
> > > > few
> > > > years, but it takes too much effort to rebase it to -current, so I
> > > > didn't touch it for few years now, but for me it worked really
> > > > nicely.
> > > > 
> > > > https://github.com/jedisct1/yaifo
> > > > 
> > > > 
> > > > On Thu, Apr 13, 2023 at 09:00:23AM +0200, Peter J. Philipp wrote:
> > > > > Hi,
> > > > > 
> > > > > Yesterday hetzner.com came out with arm64 cloud instances, I tried
> > > > > one out.
> > > > > Here is what I found.? The images they give you a choice of does
> > > > > not include
> > > > > OpenBSD, so I had to get a ubuntu OS.? That's fine the EFI
> > > > > partition was
> > > > > already mounted.? Through trialing this I found the best way of
> > > > > getting the
> > > > > OpenBSD loader to boot was the following way:
> > > > > 
> > > > > 1. place miniroot73.img on the EFI partition root (/boot/efi/)
> > > > > 2. reboot
> > > > > 3. press escape to get to the BIOS, there is 3 options one is a
> > > > > configuration
> > > > > ?? option under 1, enter it.? I'm working off memory here I didn't
> > > > > save 
> > > > > ?? anything so take it with a grain of salt on exactness.? In this
> > > > > option is
> > > > > ?? an option to create a RAM drive from a file, go there and enter
> > > > > the
> > > > > ?? miniroot73.img (45MB).? The down arrows didn't work in this BIOS
> > > > > so it was
> > > > > ?? great that it wrapped around going up.
> > > > > 4. next go back into the main bios screen by pressing escape.?
> > > > > There is option
> > > > > ?? 3 for boot options, enter it.? There is a boot from file option
> > > > > enter it.
> > > > > ?? Select the RAM drive and manouver your way to the bootaa64.efi
> > > > > file.? Press
> > > > > ?? enter.
> > > > > 5. OpenBSD loader now loads.? ls displays bsd and bsd.rd, the
> > > > > console is on
> > > > > ?? comcons0 or something like that.? Switching to fb0 works too.?
> > > > > Then when
> > > > > ?? pressing boot a blank screen happens.? Waiting a while no
> > > > > prompts and I
> > > > > ?? didn't try to blind type anything.? Doing this again with fb0
> > > > > doesn't
> > > > > ?? work either.
> > > > > 6. Full stop, I didn't get further.
> > > > > 
> > > > > I then deleted my instance as ubuntu is not good enough for me.? I
> > > > > guess we'll
> > > > > have to wait until the pros get to it.? Thanks!
> > > > > 
> > > > > Best Regards,
> > > > > -peter
> > > > > 
> > > > 
> > > 
> > > -- 
> > > +44 7502 415 180 (Phone, Signal, WhatsApp)
> > > @ezaquarii:etacassiopeiae.net (Matrix)
> 
> Index: conf.c
> ===
> RCS file: 

Re: Dell Wyse 3040 acpitz vs tipmic

2023-03-02 Thread Mark Kettenis
> Date: Mon, 27 Feb 2023 10:00:25 +1000
> From: David Gwynne 
> 
> On Sun, Feb 26, 2023 at 01:28:04PM +0100, Mark Kettenis wrote:
> > > Date: Sun, 26 Feb 2023 18:13:18 +1000
> > > From: David Gwynne 
> 
> yeesh, i should have proofread my email before i sent it. sorry about
> making it harder to read than it should have been.
> 
> > > i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
> > > like them a lot. however, i have to disable acpitz to be able to
> > > use them because the driver gets stuck during attach.
> > > 
> > > during apcitz_attach does a read of all the temperatures. the read
> > > of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
> > > tipmic_thermal_opreg_handler has a loop on line 335 waiting for
> > > sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
> > > acpitz_attach is running while the kernel is code, and it appears that
> > > the interrupt handler never runs, so that value never changes, and
> > > acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
> > > do anything. thanks to patrick for helping me on the acpi side of things
> > > so we could figure this out.
> > 
> > A better approach might be to make sure that while we're cold,
> > tipmic_thermal_opreg_handler() polls for completion.  Something like:
> > 
> > while (sc->sc_stat_adc == 0) {
> > if (cold) {
> > delay(1000);
> > tpmic_intr();
> > } else {
> > if (tsleep(>sc_stat_adc, PRIBIO, "tipmic",
> > SEC_TO_NSEC(1))) {
> > ...
> > }
> > }
> > }   
> > 
> > 
> > > i tried deferring basically all of acpitz_attach to when kthreads are
> > > running, and that works well enough to get to userland.
> > > 
> > > is that reasonable?
> > 
> > The problem is that you can't really know whether AML accesses the
> > opregion while cold.
> 
> good point. the diff below works in this situation and is less
> intrusive.

ok kettenis@

> > > also, shortly after dwiic complains about short reads and the kernel
> > > locks up again. i'll have to plug it in and transcribe the exact
> > > errors. i think that's a separate problem though.
> > 
> > Yes, dwiic(4) has always been somewhat problematic.  Transactions seem
> > to fail randomly on some platforms like the atom system you're looking
> > at but also on my Ampere eMAG system.
> 
> fun. i managed to catch some of the dwiic stuff via dmesg before
> it locked up:
> 
> dwiic0: timed out waiting for tx_empty intr
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x5b
> dwiic0: timed out waiting for tx_empty intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x5a
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x50
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for bus idle
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for bus idle
> 
> Index: tipmic.c
> ===
> RCS file: /cvs/src/sys/dev/acpi/tipmic.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 tipmic.c
> --- tipmic.c  6 Apr 2022 18:59:27 -   1.7
> +++ tipmic.c  26 Feb 2023 23:56:04 -
> @@ -276,6 +276,25 @@ struct tipmic_regmap tipmic_thermal_regm
>   { 0x18, TIPMIC_SYSTEMP_HI, TIPMIC_SYSTEMP_LO }
>  };
>  
> +static int
> +tipmic_wait_adc(struct tipmic_softc *sc)
> +{
> + int i;
> +
> + if (!cold) {
> + return (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic",
> + SEC_TO_NSEC(1)));
> + }
> +
> 

Re: Dell Wyse 3040 acpitz vs tipmic

2023-02-26 Thread Mark Kettenis
> Date: Sun, 26 Feb 2023 18:13:18 +1000
> From: David Gwynne 
> 
> i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
> like them a lot. however, i have to disable acpitz to be able to
> use them because the driver gets stuck during attach.
> 
> during apcitz_attach does a read of all the temperatures. the read
> of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
> tipmic_thermal_opreg_handler has a loop on line 335 waiting for
> sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
> acpitz_attach is running while the kernel is code, and it appears that
> the interrupt handler never runs, so that value never changes, and
> acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
> do anything. thanks to patrick for helping me on the acpi side of things
> so we could figure this out.

A better approach might be to make sure that while we're cold,
tipmic_thermal_opreg_handler() polls for completion.  Something like:

while (sc->sc_stat_adc == 0) {
if (cold) {
delay(1000);
tpmic_intr();
} else {
if (tsleep(>sc_stat_adc, PRIBIO, "tipmic",
SEC_TO_NSEC(1))) {
...
}
}
}   


> i tried deferring basically all of acpitz_attach to when kthreads are
> running, and that works well enough to get to userland.
> 
> is that reasonable?

The problem is that you can't really know whether AML accesses the
opregion while cold.

> also, shortly after dwiic complains about short reads and the kernel
> locks up again. i'll have to plug it in and transcribe the exact
> errors. i think that's a separate problem though.

Yes, dwiic(4) has always been somewhat problematic.  Transactions seem
to fail randomly on some platforms like the atom system you're looking
at but also on my Ampere eMAG system.


> OpenBSD 7.2-current (GENERIC.MP) #1071: Wed Feb 22 17:34:56 MST 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 2018418688 (1924MB)
> avail mem = 1937928192 (1848MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7a9f4000 (50 entries)
> bios0: vendor Dell Inc. version "1.2.5" date 08/20/2018
> bios0: Dell Inc. Wyse 3040 Thin Client
> efi0 at bios0: UEFI 2.4
> efi0: American Megatrends rev 0x5000b
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT SSDT UEFI SSDT HPET 
> SSDT SSDT SSDT LPIT BCFG PRAM CSRT WDAT
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.02 MHz, 06-4c-04
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 79MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.03 MHz, 06-4c-04
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu1: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.04 MHz, 06-4c-04
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu2: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 16-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.07 MHz, 06-4c-04
> cpu3: 
> 

Re: possible underflow (wrap) in tcpdump/print-domain.c

2023-02-26 Thread Mark Jamsek
   goto trunc;
> > > - while (nscount-- && cp < snapend) {
> > > + while (nscount-- > 0 && cp < snapend) {
> > >   putchar(',');
> > >   if ((cp = ns_rprint(cp, bp, is_mdns)) 
> > > == NULL)
> > >   goto trunc;
> > > @@ -723,11 +723,11 @@ ns_print(const u_char *bp, u_int length,
> > >   }
> > >   if (nscount > 0)
> > >   goto trunc;
> > > - if (cp < snapend && arcount--) {
> > > + if (cp < snapend && arcount-- > 0) {
> > >   printf(" ar:");
> > >   if ((cp = ns_rprint(cp, bp, is_mdns)) == NULL)
> > >   goto trunc;
> > > - while (cp < snapend && arcount--) {
> > > + while (cp < snapend && arcount-- > 0) {
> > >   putchar(',');
> > >   if ((cp = ns_rprint(cp, bp, is_mdns)) 
> > > == NULL)
> > >   goto trunc;
> > > 
> > > 
> > 
> > While not pretty there is nothing wrong with the current code.
> > 
> > All 4 variables qdcount, ancount, nscount, arcount are only decremented by
> > one and always checked for 0. So there is no way any of the 4 values
> > become negative. Also the EXTRACT_16BITS() puts a uint16_t into an int
> > which never overflows on OpenBSD. int is always 32bit on OpenBSD.
> > 
> > Still it may make sense to apply the diff just to be explicit about the
> > count values.
> > 
> > -- 
> > :wq Claudio
> 
> Hi Claudio,
> 
> I ask you to look closer.  Perhaps I didn't explain the problem best. 
> Let's take variable qdcount:
> 
>603 while (qdcount--) {
> 
> After this while() qdcount wraps, and that's fine if it isn't reused!  But in
> the same function below it is tested again (at which point it is negative).

This second use of qdcount is in the `else` block; that is, in this
routine, qdcount will only be used in the above `if (DNS_QR(np))` block,
or here--not in both. So I think Claudio is correct.

>686 if (qdcount--) {
> 
> and
>  
>690 while (cp < snapend && qdcount--) {
> 
> 
> Best Regards,
> -peter
> 

-- 
Mark Jamsek 
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68


Re: aplaudio(4) Causing Boot Panic

2023-02-25 Thread Mark Kettenis
> Date: Fri, 24 Feb 2023 18:28:56 -0600
> From: Amada Mackey 

Is this a 7.2-RELEASE kernel?  If so, you're probably better off
trying a snapshot (7.2-CURRENT) kernel.

> Date: 02/24/2023 11:30PM UTC
> Version: OpenBSD 7.2 arm64
> Hardware: Apple Macbook Air M2 2022 (Chip ID 0x8112)
> Message: 'panic: attempt to access user address 0x70 from EL1'
> Affected Item: aplaudio(4), aplmca(4)
> 
> 
> Steps to Reproduce:
> 
> 1. Follow the installation guide to setup UEFI on Apple Silicon
> 2. Boot into install72.img (or miniroot72.img) from USB via U-Boot
> 3. Select (S)hell option
> 3. Partition disk
> 4. Setup full-disk encryption according to installation guide
> 5. Exit installer shell
> 6. Follow (I)nstall option
> 7. Reboot
> 8. Enter encryption password
> 9. Await panic shortly into boot process
> 
> 
> Transcribed Logs/Traces/Info (IMAGES ATTACHED):
> 
>  > show panic
> panic: attempt to access user address 0x70 from EL1
> Stopped at    panic+0x160:    cmp    w21, #0x0
>      TID  PID  UID   PAFLAGS    PFLAGS   CPU COMMAND
> *    0    0    0    0x1    0x200    0K swapper
> 
>  > trace
> db_enter() at panic+0x15c
> panic() at do_el1h_sync+0x1f8
> do_el1h_sync() at handle_el1h_sync+0x6c
> handle_el1h_sync() at aplmca_dai_init+0x70
> aplmca_dai_init() at aplmca_dai_init+0x70
> aplmca_alloc_cluster() at aplaudio_attach+0xd4
> aplaudio_attach() at config_attach+0x214
> config_attach() at mainbus_attach_node+0x2c4
> mainbus_attach_node() at mainbus_attach+0x2d0
> mainbus_attach() at config_attach+0x214
> config_attach() at cpu_configure+0x2c
> cpu_configure() at main+0x310
> main() at virtdone+0x70
> 
>  > ps
>      PID  TID  PPID UID  S    FLAGS     WAIT   COMMAND
> *    0    0    -1    0    7    0x10200  swapper
> 
>  > machine cpuinfo
> 
> *    0: ddb
>   1: stopping
>   2: stopping
>   3: stopping
>   4: stopping
>   5: stopping
>   6: stopping
>   7: stopping
> 
> 
> 
> -
> Amada L. Mackey
> University of Texas at Austin
> Cybersecurity Risk Analyst - Information Security Office



Re: redmi laptop keyboard problem

2023-02-24 Thread Mark Kettenis
> Date: Fri, 24 Feb 2023 21:38:50 +0300
> From: Mikhail 
> 
> On Thu, Feb 23, 2023 at 05:46:04PM +0300, Mikhail wrote:
> > On Thu, Feb 16, 2023 at 02:34:11PM +0300, Mikhail wrote:
> > > We have a redmi laptop where I want to install OpenBSD current, but
> > > the keyboard there is not functional, install image boots fine, but
> > > when I try to press any key, after a delay of 1-2 seconds, I see a
> > > repetitive echo on the screen. For example, I'd like to answer 'i' for
> > > the initial installer question, but instead of 'i' I get 'iii',
> > > pressing backspace removes all seven i's.
> > > 
> > > External USB keyboard works fine, also native keyboard works fine in
> > > boot> prompt.
> > > 
> > > Currently I have only webmail access, so I'd better include the dmesg
> > > as attachment, otherwise gmail will insert line breaks or fix it
> > > another way if I paste it directly.
> > 
> > I tried to use latest ubuntu on this laptop and keyboard didn't work
> > with it also, Kali linux worked fine though.
> > 
> > After some googling I came up with the following patch to linux kernel:
> > https://lore.kernel.org/all/20220712020058.90374-1-gch981...@gmail.com/
> > 
> > I compiled linux 6.1.12 on Kali with and without the patch and I can
> > confirm that without the patch my keyboard becomes non-functional.
> > 
> > The laptop is Redmi Book Pro 14 2022.
> 
> DSDT defines KBC0's (PNP0303, a keyboard) IRQ as
> 
> IRQ (Edge, ActiveLow, Shared, )
> 
> and pckbc_isa_attach defaults to ActiveHigh. As the link in my previous
> email says:
> 
> > There's an active low keyboard IRQ on AMD Ryzen 6000 and it will stay
> > this way on newer platforms.

It's not a PeeCee!

> 
> With the inlined patch I'm able to use native laptop keyboard, but I'm
> sure it will break other keyboards.

Yes, and it doesn't even make sense.  The interrupt is edge triggered
not level triggered.  It's just that the signal has the wrong
polarity.

> Does anyone has an idea how to improve it?

pckbc@acpi may be part of the solution.  At least there we'll be able
to look at what the DSDT says about this interrupt and configure it
accordingly.  But apparently on older hardware the DSDT is full of
lies.

I really wonder how these things happen.  Cause presumably older
versions of Windows didn't look at what the DSDT says about the
polarity, which meant that vendors released DSDTs with the wrong
polarity.  But now Windows does look at what the DSDT says?  Does that
mean that running newer Windows versions on these older laptops
doesn't work anymore?


> diff --git a/sys/dev/isa/pckbc_isa.c b/sys/dev/isa/pckbc_isa.c
> index e94fd7e52..ca7ec6c9f 100644
> --- a/sys/dev/isa/pckbc_isa.c
> +++ b/sys/dev/isa/pckbc_isa.c
> @@ -140,7 +140,7 @@ pckbc_isa_attach(struct device *parent, struct device 
> *self, void *aux)
>  
>   for (slot = 0; slot < PCKBC_NSLOTS; slot++) {
>   rv = isa_intr_establish(ia->ia_ic, ia->ipa_irq[slot].num,
> - IST_EDGE, IPL_TTY, pckbcintr, sc, sc->sc_dv.dv_xname);
> + IST_LEVEL, IPL_TTY, pckbcintr, sc, sc->sc_dv.dv_xname);
>   if (rv == NULL) {
>   printf("%s: unable to establish interrupt for irq %d\n",
>   sc->sc_dv.dv_xname, ia->ipa_irq[slot].num);
> 
> 



Re: bbolt can freeze 7.2 from userspace

2023-02-20 Thread Mark Kettenis
> Date: Mon, 20 Feb 2023 09:43:10 +0100
> From: Martin Pieuchot 
> 
> On 20/02/23(Mon) 03:59, Renato Aguiar wrote:
> > [...] 
> > I can't reproduce it anymore with this patch on 7.2-stable :)
> 
> Thanks a lot for testing!  Here's a better fix from Chuck Silvers.
> That's what I believe we should commit.
> 
> The idea is to prevent sibling from modifying the vm_map by marking
> it as "busy" in msync(2) instead of holding the exclusive lock while
> sleeping.  This let siblings make progress and stop possible writers.
> 
> Could you all guys confirm this also prevent the deadlock?  Thanks!

Been running the bbolt test on my m1 mac mini for hours now and it
didn't hacng.

Diff makes sense to me.

ok kettenis@

> Index: uvm/uvm_map.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> retrieving revision 1.312
> diff -u -p -r1.312 uvm_map.c
> --- uvm/uvm_map.c 13 Feb 2023 14:52:55 -  1.312
> +++ uvm/uvm_map.c 20 Feb 2023 08:10:39 -
> @@ -4569,8 +4569,7 @@ fail:
>   * => never a need to flush amap layer since the anonymous memory has
>   *   no permanent home, but may deactivate pages there
>   * => called from sys_msync() and sys_madvise()
> - * => caller must not write-lock map (read OK).
> - * => we may sleep while cleaning if SYNCIO [with map read-locked]
> + * => caller must not have map locked
>   */
>  
>  int
> @@ -4592,25 +4591,27 @@ uvm_map_clean(struct vm_map *map, vaddr_
>   if (start > end || start < map->min_offset || end > map->max_offset)
>   return EINVAL;
>  
> - vm_map_lock_read(map);
> + vm_map_lock(map);
>   first = uvm_map_entrybyaddr(>addr, start);
>  
>   /* Make a first pass to check for holes. */
>   for (entry = first; entry != NULL && entry->start < end;
>   entry = RBT_NEXT(uvm_map_addr, entry)) {
>   if (UVM_ET_ISSUBMAP(entry)) {
> - vm_map_unlock_read(map);
> + vm_map_unlock(map);
>   return EINVAL;
>   }
>   if (UVM_ET_ISSUBMAP(entry) ||
>   UVM_ET_ISHOLE(entry) ||
>   (entry->end < end &&
>   VMMAP_FREE_END(entry) != entry->end)) {
> - vm_map_unlock_read(map);
> + vm_map_unlock(map);
>   return EFAULT;
>   }
>   }
>  
> + vm_map_busy(map);
> + vm_map_unlock(map);
>   error = 0;
>   for (entry = first; entry != NULL && entry->start < end;
>   entry = RBT_NEXT(uvm_map_addr, entry)) {
> @@ -4722,7 +4723,7 @@ flush_object:
>   }
>   }
>  
> - vm_map_unlock_read(map);
> + vm_map_unbusy(map);
>   return error;
>  }
>  
> 
> 



Re: [acpi] wrong ECDT EC_ID handling

2023-02-18 Thread Mark Kettenis
> Date: Sat, 18 Feb 2023 18:47:10 +0300
> From: Mikhail 

The problem here is that if the firmware provided an ECDT table, its
AML may use the EC right from the start.  But that won't be possible
if you bail out early just because the EC_ID doesn't match.  So hence
the following question:

Do the addresses described by EC_CONTROL and EC_DATA match the ones
described by the _CRS() method for EC in the AML namespace on the
problematic machine?


> On Thu, Feb 09, 2023 at 09:20:10PM +0300, Mikhail wrote:
> > >Synopsis:  wrong ECDT EC_ID handling
> > >Category:  acpi
> > >Environment:
> > System  : OpenBSD 7.2
> > Details : OpenBSD 7.2-current (GENERIC.MP) #1021: Sun Feb  5 
> > 09:52:50 MST 2023
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Currently the kernel doesn't check if EC_ID presented in
> > ECDT is correct, it looks like wrong EC_ID is fairly common
> > mistake in (at least) Lenovo firmware. As consequences -
> > CapsLock LED, brightness keys and apm battery status doesn't
> > work. Similar problem affects at least one more person:
> > https://marc.info/?l=openbsd-tech=166654588612920=2 [he is cc'ed]
> > 
> > I asked for BIOS update from semi-official support forum
> > forums.lenovo.com and from official supp...@lenovo.com - in first case
> > there no meaningful answer for about 4 months and second one sent to
> > partner's service-center for assistance.
> > 
> > I think it's hopeless to wait for ECDT correction, or ECDT
> > removal from the vendor, so I decided to propose a patch for the
> > OS.
> > 
> > >How-To-Repeat:
> > Test on Lenovo IdeaPad 3 14itl05, BIOS GCCN32WW
> > >Fix:
> > I propose to add a check for wrong EC_ID, and if the check fails - do
> > not attach ECDT, we still will attach EC after that, but with another
> > procedure, inlined patch makes my CapsLock LED, brightness buttons and
> > apm battery status work.
> > 
> > diff /usr/src
> > commit - b7cf571f83522f53df8a14fa01dcbeff8df0f02a
> > path + /usr/src
> > blob - 5ef24d5179de52d5321e578b3b73dd9524e7c1de
> > file + sys/dev/acpi/acpiec.c
> > --- sys/dev/acpi/acpiec.c
> > +++ sys/dev/acpi/acpiec.c
> > @@ -429,6 +429,14 @@ acpiec_getcrs(struct acpiec_softc *sc, struct acpi_att
> >  
> > /* Check if this is ECDT initialization */
> > if (ecdt) {
> > +   /* Get devnode from header */
> > +   sc->sc_devnode = aml_searchname(sc->sc_acpi->sc_root,
> > +   ecdt->ec_id);
> > +   if (sc->sc_devnode == NULL) {
> > +   printf("acpiec wrong ECDT EC_ID, broken BIOS\n");
> > +   return (1);
> > +   }
> > +
> > /* Get GPE, Data and Control segments */
> > sc->sc_gpe = ecdt->gpe_bit;
> >  
> > @@ -444,10 +452,6 @@ acpiec_getcrs(struct acpiec_softc *sc, struct acpi_att
> > sc->sc_data_bt = sc->sc_acpi->sc_memt;
> > sc->sc_ec_data = ecdt->ec_data.address;
> >  
> > -   /* Get devnode from header */
> > -   sc->sc_devnode = aml_searchname(sc->sc_acpi->sc_root,
> > -   ecdt->ec_id);
> > -
> > goto ecdtdone;
> > }
> 
> ping
> 
> 



Re: sys_pselect assertion "timo || _kernel_lock_held()" failed

2023-02-13 Thread Mark Kettenis
0x10
> > fs  0x20
> > gs 0
> > edi   0xd0c60caeacx100_txpower_maxim+0xee2e
> > esi0
> > ebp   0xf5b0a8bc
> > ebx   0xf55674f0
> > edx0x3fd
> > ecx   0x25394315
> > eax 0x79
> > eip   0xd061e714db_enter+0x4
> > cs   0x8
> > eflags 0x202
> > esp   0xf5b0a8bc
> > ss  0x10
> > db_enter+0x4:   popl%ebp
> > 
> > ddb{9}> x/s version
> > version:OpenBSD 7.2-current (GENERIC.MP) #0: Mon Feb 13 16:33:03 
> > CET 2023\012
> > r...@ot4.obsd-lab.genua.de:/usr/src/sys/arch/i386/compile/GENERIC.MP\012
> > 
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> > The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2023 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> > 
> > OpenBSD 7.2-current (GENERIC.MP) #0: Mon Feb 13 16:33:03 CET 2023
> > r...@ot4.obsd-lab.genua.de:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > real mem  = 3211833344 (3063MB)
> > avail mem = 3136073728 (2990MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: date 07/19/11, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.6 
> > @ 0x99c00 (88 entries)
> > bios0: vendor American Megatrends Inc. version "2.0c"
> > bios0: Supermicro GS0700
> > acpi0 at bios0: ACPI 3.0
> > acpi0: sleep states S0 S1 S4 S5
> > acpi0: tables DSDT FACP APIC MCFG SLIT SPMI OEMB HPET DMAR SSDT
> > acpi0: wakeup devices NPE1(S4) NPE2(S4) NPE3(S4) NPE4(S4) NPE5(S4) NPE6(S4) 
> > NPE7(S4) NPE8(S4) NPE9(S4) NPEA(S4) P0P1(S4) USB0(S4) USB1(S4) USB2(S4) 
> > USB5(S4) EUSB(S4) [...]
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz ("GenuineIntel" 686-class) 3.47 
> > GHz, 06-2c-02
> > cpu0: 
> > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 133MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> > cpu1 at mainbus0: apid 2 (application processor)
> > cpu1: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz ("GenuineIntel" 686-class) 3.47 
> > GHz, 06-2c-02
> > cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC[-- MARK -- Mon Feb 13 
> > 18:30:00 2023]
> > ,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> > cpu2 at mainbus0: apid 4 (application processor)
> > cpu2: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz ("GenuineIntel" 686-class) 3.47 
> > GHz, 06-2c-02
> > cpu2: 
> > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> > cpu3 at mainbus0: apid 16 (application processor)
> > cpu3: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz ("GenuineIntel" 686-class) 3.47 
> > GHz, 06-2c-02
> > cpu3: 
> > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> > cpu4 at mainbus0: apid 18 (application processor)
> > cpu4: Intel(R) Xeon(R) CPU X5690 @ 3.47GHz ("GenuineIntel" 686-class) 3.47 
> > GHz, 06-2c-02
> > cpu4: 
> > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AES,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,MELTDOWN
> > cpu5 at mainbus0: apid 20 (application pr

Re: bbolt can freeze 7.2 from userspace

2023-01-29 Thread Mark Kettenis
> Date: Sun, 29 Jan 2023 12:31:22 +0100
> From: Martin Pieuchot 
> 
> On 23/01/23(Mon) 22:57, David Hill wrote:
> > On 1/20/23 09:02, Martin Pieuchot wrote:
> > > > [...] 
> > > > Ran it 20 times and all completed and passed.  I was also able to 
> > > > interrupt
> > > > it as well.   no issues.
> > > > 
> > > > Excellent!
> > > 
> > > Here's the best fix I could come up with.  We mark the VM map as "busy"
> > > during the page fault just before the faulting thread releases the shared
> > > lock.  This ensures no other thread will grab an exclusive lock until the
> > > fault is finished.
> > > 
> > > I couldn't trigger the reproducer with this, can you?
> > 
> > Yes, same result as before.  This patch does not seem to help.
> 
> Is it the same as before?  I doubt it is.  On a 4-CPU machine I can't
> trigger the race described in this thread.  On a 8-CPU one I now see all
> threads sleeping on "thrsleep" except one in "kqread" and one in "wait".

I'm also seeing bbolt.test processes sleeping on "vmmaplk", "vmmapbsy"
and "uvn_flsh", just like without the diff :(.  Well, maybe the
"vmmapbsy" one is new...



Re: mail(1) "save" command straying from POSIX for missing filename

2022-12-18 Thread Mark Jamsek
On 22-12-18 09:29PM, Jason McIntyre wrote:
> On Fri, Dec 16, 2022 at 02:21:41AM +, Tim Chase wrote:
> > According to the POSIX definitions for mail(1) & mailx(1), the
> > (s)ave command should save to "mbox" if the filename is not specified
> > 
> > > Save the specified messages in the file named by the pathname
> > > file, or the mbox if the file argument is omitted
> > 
> > (newer spec)
> > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html#tag_20_75_13_33
> > 
> > > s [file]
> > >  Save the message in the named file (mbox is default).
> > 
> > (older spec)
> > https://pubs.opengroup.org/onlinepubs/7908799/xcu/mail.html#tag_001_014_1339
> > 
> > 
> > 
> > However, when exercising this functionality, mail(1) on OpenBSD
> > (also tested on FreeBSD where the same issue manifests[1]) doesn't
> > support this:
> > 
> >   demo$ echo test | mail -s "test" demo # send self a message
> >   demo$ mail
> >   Mail version 8.1 6/6/93.  Type ? for help.
> >   "/var/mail/demo": 1 message 1 new
> >   >N  1 d...@localhost.my.do  Thu Dec 15 19:34  19/775   "test"
> >   & s
> >   No file specified.
> > 
> > While I'm not positive on the solution, I think it involves tweaking
> > the save1() function in src/usr.bin/mail/cmd2.c such that instead
> > of failing if it can't snarf(), it should set `file` to "mbox" or
> > "&" so that expand() points to the mbox as required by POSIX.
> > 
> > -tkc
> > 
> > [1]
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268402
> > 
> 
> hi.
> 
> currently mail(1) has these entries in FILES:
> 
> FILES
>/var/mail/* post office (unless overridden
>by the MAIL environment
>variable)
>~/mbox  user's old mail
> 
> isn;t it the case that openbsd uses mailboxes in /var/mail by default,
> instead of ~/mbox, as displayed?
> 
> it seems that mail(1) is really out of date regarding default mail spool
> entries, but i may well have misunderstood the situation. once it's
> clear, i can see if we need a code fix (out of my hands) or doc fix.
> 
> jmc

I agree with Brian and think behaviour comports with mail(1). Unexamined
mail remains in the post office (i.e., /var/mail/*), and examined mail,
as noted by Brian, is deposited to the user's mbox file (i.e., ~/mbox):

  You can end a mail session with the quit (q) command.  Messages
  which have been examined go to your mbox file unless they have
  been deleted, in which case they are discarded.  Unexamined
  messages go back to the post office (see the -f option above).

If the session is aborted with e(x)it, however, changes are discarded
thus mail remains unexamined and left in the post office.

Regarding the OP's case, (s)ave is also consistent with mail(1) except
that line and char count are not currently echoed (the below diff adds
this to the output):

  save  (s) Takes a message list and a filename and appends each message
in turn to the end of the file.  The filename in quotes,
followed by the line count and character count is echoed on the
user's terminal.

I guess it's a question of whether we want to change this and make it
comply with POSIX so that "s" with no args saves the current message to
the user's mbox file as the diff upthread does. I think it's handy, but
that behaviour might have been omitted on purpose. And, tbh, "s" isn't
really that much more convenient than "s &".

diff d956567b8a83e77dcbaa40d1038b81c18ca02b19 
0628dc730fed3c76f8e1cec17dd1c90c3a58aa75
commit - d956567b8a83e77dcbaa40d1038b81c18ca02b19
commit + 0628dc730fed3c76f8e1cec17dd1c90c3a58aa75
blob - 54b30bc153cd53765a7d15eb0430246c9b46fb17
blob + 7a0650ddcf690e3b2f61550007cf40819b308f94
--- usr.bin/mail/cmd2.c
+++ usr.bin/mail/cmd2.c
@@ -146,6 +146,8 @@ save1(char *str, int mark, char *cmd, struct ignoretab
 {
    struct message *mp;
char *file, *disp;
+   off_t sz = 0;
+   int nlines = 0;
int f, *msgvec, *ip;
FILE *obuf;
 
@@ -182,6 +184,8 @@ save1(char *str, int mark, char *cmd, struct ignoretab
(void)Fclose(obuf);
return(1);
}
+   nlines += mp->m_lines;
+   sz += mp->m_size;
if (mark)
mp->m_flag |= MSAVED;
}
@@ -189,7 +193,7 @@ save1(char *str, int mark, char *cmd, struct ignoretab
if (ferror(obuf))
warn("%s", file);
(void)Fclose(obuf);
-   printf("%s\n", disp);
+   printf("%s %d/%lld\n", disp, nlines, (long long)sz);
return(0);
 }
 

-- 
Mark Jamsek 
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68


signature.asc
Description: PGP signature


Re: mail(1) "save" command straying from POSIX for missing filename

2022-12-16 Thread Mark Jamsek
On 22-12-16 02:21AM, Tim Chase wrote:
> According to the POSIX definitions for mail(1) & mailx(1), the
> (s)ave command should save to "mbox" if the filename is not specified
>
> ...
>
> However, when exercising this functionality, mail(1) on OpenBSD
> (also tested on FreeBSD where the same issue manifests[1]) doesn't
> support this:
> 
>   demo$ echo test | mail -s "test" demo # send self a message
>   demo$ mail
>   Mail version 8.1 6/6/93.  Type ? for help.
>   "/var/mail/demo": 1 message 1 new
>   >N  1 d...@localhost.my.do  Thu Dec 15 19:34  19/775   "test"
>   & s
>   No file specified.

Current behaviour comports with the mail(1) manual page, so support for
this may be intentionally elided; I'm not sure. In either case, here's
a minimal diff making the change.

Index: cmd2.c
===
RCS file: /cvs/src/usr.bin/mail/cmd2.c,v
retrieving revision 1.22
diff -u -p -r1.22 cmd2.c
--- cmd2.c  16 Oct 2015 17:56:07 -  1.22
+++ cmd2.c  16 Dec 2022 12:59:21 -
@@ -139,6 +139,7 @@ copycmd(void *v)
 
 /*
  * Save/copy the indicated messages at the end of the passed file name.
+ * If no file name is specified, default to user mbox.
  * If mark is true, mark the message "saved."
  */
 int
@@ -208,10 +209,11 @@ swrite(void *v)
 /*
  * Snarf the file from the end of the command line and
  * return a pointer to it.  If there is no file attached,
- * just return NULL.  Put a null in front of the file
+ * return the mbox file.  Put a null in front of the file
  * name so that the message list processing won't see it,
- * unless the file name is the only thing on the line, in
- * which case, return 0 in the reference flag variable.
+ * unless the file name is the only thing on the line, or
+ * no file was attached, in which case, return 0 in the
+ * reference flag variable.
  */
 char *
 snarf(char *linebuf, int *flag)
@@ -234,8 +236,8 @@ snarf(char *linebuf, int *flag)
while (cp > linebuf && !isspace((unsigned char)*cp))
cp--;
if (*cp == '\0') {
-   puts("No file specified.");
-   return(NULL);
+   *flag = 0;
+   return(expand("&"));
}
if (isspace((unsigned char)*cp))
*cp++ = 0;
Index: mail.1
===
RCS file: /cvs/src/usr.bin/mail/mail.1,v
retrieving revision 1.83
diff -u -p -r1.83 mail.1
--- mail.1  31 Mar 2022 17:27:25 -  1.83
+++ mail.1  16 Dec 2022 12:59:22 -
@@ -633,6 +633,9 @@ retained fields.
 .Pq Ic s
 Takes a message list and a filename and appends each message in
 turn to the end of the file.
+If filename is omitted, the
+.Ar mbox
+file is used.
 The filename in quotes, followed by the line
 count and character count is echoed on the user's terminal.
 .It Ic saveignore

-- 
Mark Jamsek 
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68


signature.asc
Description: PGP signature


Re: ACPI "Undefined scope"

2022-11-29 Thread Mark Kettenis
> Date: Tue, 29 Nov 2022 08:16:57 +
> From: Laurence Tratt 
> 
> I have been trying out a newish AMD machine (7900x with integrated graphics
> on an MSI board). At a basic level it works, though there an awful lot of
> "not configured"s! That might partly be because the ACPI parser/evaluator
> seems to choke:
> 
>   acpi0 at bios0: ACPI 6.4Undefined scope: 
> \\_SB_.PCI0.GPP7.UP00.DP40.UP00.DP68

Sloppily written AML, but nothing to worry about really.

> "AMDI0052" at acpi0 not configured

Doesn't seem to do do anything.  Probably just there to make some
windows driver attach.

> "MSFT8000" at acpi0 not configured

That seems to be a thing to give user-mode access to an i2c bus in
Windows:

  
https://learn.microsoft.com/en-us/windows/uwp/devices-sensors/enable-usermode-access

> "AMDIF031" at acpi0 not configured

That is some sort of new GPIO controller that we don't support yet.
Not used on your machine though.

You might want to report that uaudio thing separately.



Re: deadlock in ifconfig

2022-11-21 Thread Mark Kettenis
> Date: Mon, 21 Nov 2022 20:28:35 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> Some of my test machines hang while booting userland.
> 
> starting network
> -> here it hangs
> load: 0.02  cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k
> 
> ddb shows these two processes.
> 
>  81303  375320  89140  0  3 0x3  sbar  ifconfig
>  48135  157353  0  0  3 0x14200  netlock   systqmp
> 
> ddb{0}> trace /t 0t375320
> sleep_finish(800022d31318,1) at sleep_finish+0xfe
> cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54
> sched_barrier(800022512ff0) at sched_barrier+0x73
> ixgbe_stop(80118000) at ixgbe_stop+0x1f7
> ixgbe_init(80118000) at ixgbe_init+0x32
> ixgbe_ioctl(80118048,8020690c,8022ec00) at ixgbe_ioctl+0x13a
> in_ifinit(80118048,8022ec00,800022d31740,1) at 
> in_ifinit+0x
> ef
> in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at 
> in_ioct
> l_change_ifaddr+0x3a4
> in_control(fd81901dc740,8040691a,800022d31730,80118048) at 
> in_c
> ontrol+0x75
> ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at 
> ifioctl
> +0x982
> sys_ioctl(800022d6,800022d31840,800022d318a0) at 
> sys_ioctl+0x2c
> 4
> syscall(800022d31910) at syscall+0x384
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7d94a0, count: -13
> 
> ddb{0}> trace /t 0t157353
> sleep_finish(800022ca8b70,1) at sleep_finish+0xfe
> rw_enter(822b4f80,1) at rw_enter+0x1cb
> pf_purge(0) at pf_purge+0x1d
> taskq_thread(822ac568) at taskq_thread+0x100
> end trace frame: 0x0, count: -4
> 
> ifconfig waits for the sched_barrier_task() on the systqmp task
> queue while holding the netlock.  pf_purge() runs on the systqmp
> task queue and is waiting for the netlock.  The netlock has been
> taken by ifconfig in in_ioctl_change_ifaddr().
> 
> The problem has been introduced when pf_purge() was moved from systq
> to systqmp.
> https://marc.info/?l=openbsd-cvs=166818274216800=2

I'd say pfpurge should be moved to itw own taskq.

ixgb(4) holding netlock while calling sched_barrier() is probably
wrong too.



Re: 7.2 sysupgrade of VM to snapshot panic

2022-11-18 Thread Mark Kettenis
> Date: Fri, 18 Nov 2022 09:18:52 -0800
> From: Mike Larkin 
> 
> On Fri, Nov 18, 2022 at 12:37:48AM +0100, Mike Fischer wrote:
> > On a host running OpenBSD 7.2 stable, amd64, all updates & patches using 
> > vmd I have a VM, configured with 1 GB RAM, 40 GB virtual disk, network 
> > access direct through host bridge0 (FAQ option #4). The VM has also been 
> > installed with OpenBSD 7.2 stable + patches.
> >
> > For the first time in my life I wanted to try upgrading to -current. This 
> > is what happened:
> >
> > 20221118T003040 root@vm2:~# sysupgrade -s
> > Fetching from https://cdn.openbsd.org/pub/OpenBSD/snapshots/amd64/
> > SHA256.sig   100% |*|  2144   00:00
> > Signature Verified
> > INSTALL.amd64 100% || 43554   00:00
> > base72.tgz   100% |*|   332 MB00:50
> > bsd  100% |*| 22479 KB00:04
> > bsd.mp   100% |*| 22584 KB00:04
> > bsd.rd   100% |*|  4547 KB00:01
> > comp72.tgz   100% |*| 75037 KB00:12
> > game72.tgz   100% |*|  2745 KB00:01
> > man72.tgz100% |*|  7609 KB00:02
> > xbase72.tgz  100% |*| 52858 KB00:09
> > xfont72.tgz  100% |*| 22967 KB00:04
> > xserv72.tgz  100% |*| 14815 KB00:03
> > xshare72.tgz 100% |*|  4573 KB00:01
> > Verifying sets.
> > Fetching updated firmware.
> > fw_update: added none; updated none; kept none
> > Upgrading.
> > syncing disks... done
> > vmmci0: powerdown
> > rebooting...
> > Using drive 0, partition 3.
> > Loading..
> > probing: pc0 com0 mem[638K 1022M a20=on]
> > disk: hd0+
> > >> OpenBSD/amd64 BOOT 3.55
> > upgrade detected: switching to /bsd.upgrade
> > |
> > com0: 115200 baud
> > switching console to com0
> > >> OpenBSD/amd64 BOOT 3.55
> > boot>
> > booting hd0a:/bsd.upgrade: 3916484+1643520+3882152+0+704512 
> > [109+439944+293419]=0xa624a8
> > entry point at 0x81001000
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> > The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> >
> > OpenBSD 7.2-current (RAMDISK_CD) #797: Thu Nov 17 08:26:28 MST 2022
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
> > real mem = 1056952320 (1007MB)
> > avail mem = 1020960768 (973MB)
> > random: good seed from bootblocks
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36e0 (10 entries)
> > bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
> > bios0: OpenBSD VMM
> > acpi at bios0 not configured
> > cpu0 at mainbus0: (uniprocessor)
> > fatal protection fault in supervisor mode
> > trap type 4 code  rip 811d7322 cs 8 rflags 10202 cr2 0 cpl 
> > e rsp 81a06d10
> > gsbase 0x818f7ff0  kgsbase 0x0
> > panic: trap type 4, code=, pc=811d7322
> >
> > The operating system has halted.
> > Please press any key to reboot.
> >
> >
> > Note: I tried this a few times with identical results.
> >
> > Is the snapshot broken?
> > Or are snapshots not supported on vmd VMs?
> > Or am I doing something wrong?
> >
> 
> Not sure if this was a one-off problem with that snapshot or not, but I just
> tested snapshot 800 (18 nov) and it works fine here on similar hardware in 
> vmd.
> 
> You might try the sysupgrade again.

A sysupgrade of the guest won't help.  A -current guest will not run
on a -release host running vmd(8) on most AMD hardware because
-current uses an MSR that isn't passed through by the -release vmd(8).

So a sysupgrade of the host (to a snapshot) will fix this.



Re: ACPI 6.4 Could not convert 1 to 4 panic

2022-11-13 Thread Mark Kettenis
> Date: Sun, 13 Nov 2022 16:05:36 +0300
> From: Mikhail 
> 
> On Sun, Nov 13, 2022 at 04:25:00PM +1100, ja...@tubnor.net wrote:
> > 
> > 
> > > -Original Message-
> > > From: Mark Kettenis 
> > > Sent: Saturday, 12 November 2022 11:00 PM
> > > To: ja...@tubnor.net
> > > Cc: bugs@openbsd.org
> > > Subject: Re: ACPI 6.4 Could not convert 1 to 4 panic
> > > 
> > > 
> > > Could you boot a normal kernel (i.e. not a ramdisk kernel) without
> > > Mikhail's diff and show me the output? A screen image is ok.
> > 
> > No problems. This was a sysupgrade to the latest -current from the patched
> > to unpatched GENERIC.MP kernel.
>  
> Offending code is in SSDT.8:
> 
> Scope (\_SB.PC00.PEG0) {
> 
>   [...]
> Method (_STA, 0, NotSerialized)  // _STA: Status
> {
> If ((PG0E == One))
> {
> Return (0x0F)
> }
> 
> Return (Zero)
> }
> [...]
> }
> 
> PG0E is defined as External/UnknownObj, the problem is that in
> DSDT we have two definition of PG0E - one is the field unit under
> "DefinitionBlock" root, another one as a package under "Scope (_SB)". My
> suspicion is that they meant "\PG0E" and simply forgot to define scope
> explicitly, because comparing package to One doesn't make sense.

Thanks.  Yes I agree with that analysis.

> So the patch I sent works, but whole situation looks like not as a
> bug or not implemented functionality in OpenBSD, but as a bug in the
> vendor's ASL, and I am not sure what is the policy for including such
> workarounds into the tree.

Not sure we have a policy.  But it does seem to indicate that adding
an implicit conversion from Package to Integer isn't the right
approach.  Need to think a bit more about this, but maybe the right
thing to do is having _STA() return 0 if an unexpected AML failure
occurs.



Re: ACPI 6.4 Could not convert 1 to 4 panic

2022-11-12 Thread Mark Kettenis
> From: 
> Date: Sun, 6 Nov 2022 11:29:47 +1100
> 
> > -Original Message-
> > From: Mark Kettenis 
> > Sent: Saturday, 5 November 2022 8:44 PM
> > To: ja...@tubnor.net
> > Cc: bugs@openbsd.org
> > Subject: Re: ACPI 6.4 Could not convert 1 to 4 panic
> > 
> > > From: 
> > > Date: Sat, 5 Nov 2022 18:47:23 +1100
> > 
> > Hi Jason,
> > 
> > Can you send us the acpidump output (all the files in
> /var/db/acpi) for this > machine?
> 
> No problems. Let me know if there is anything else I can help with. Cheers!

Still trying to get some context here as the ACPI standard explicitly
describes what conversions are allowed.

Could you boot a normal kernel (i.e. not a ramdisk kernel) without
Mikhail's diff and show me the output?  A screen image is ok.

Thanks,

Mark



Re: bse(4) media/link bug

2022-11-07 Thread Mark Kettenis
> Date: Mon, 7 Nov 2022 13:24:24 +
> From: Martin Pieuchot 
> 
> On 07/11/22(Mon) 13:20, Martin Pieuchot wrote:
> > On a raspberry pi4, with the following configuration :
> > 
> > $ cat /etc/hostname.bse0 
> > dhcp
> > 
> > ...and with the cable directly connected to my laptop (amd64 w/ em(4)) I
> > have to force the media type, with the command below, to make it work.
> > 
> > # ifconfig bse0 media 1000baseT mediaopt full-duplex
> 
> Actually it is worst than that.  It's completely broken and I can't use
> it.

People have complained about this before.  I can't reproduce it and
therefore I can't fix it.



Re: ACPI 6.4 Could not convert 1 to 4 panic

2022-11-05 Thread Mark Kettenis
> From: 
> Date: Sat, 5 Nov 2022 18:47:23 +1100

Hi Jason,

Can you send us the acpidump output (all the files in /var/db/acpi)
for this machine?

> > -Original Message-
> > From: Mikhail 
> > Sent: Wednesday, 2 November 2022 6:43 PM
> > To: bugs@openbsd.org
> > Cc: ja...@tubnor.net
> > Subject: Re: ACPI 6.4 Could not convert 1 to 4 panic
> > 
> > 
> > Wasn't able to test it, since I don't own the hardware and of course there
> > could be more issues even if that one is fixed with the patch.
> > 
> > I think it'd be good to have the patch for archives, in case anyone google
> the
> > error message.
> > 
> > diff /usr/src
> > commit - ba77ede935ace61278da5c3474c6951e0a606318
> > path + /usr/src
> > blob - 1a5694c9e4b77cd1223f26d81d8e3c11fd341adb
> > file + sys/dev/acpi/dsdt.c
> > --- sys/dev/acpi/dsdt.c
> > +++ sys/dev/acpi/dsdt.c
> > @@ -2035,6 +2035,16 @@ aml_convert(struct aml_value *a, int ctype, int
> > clen)
> > return a;
> > }
> > switch (ctype) {
> > +   case AML_OBJTYPE_PACKAGE:
> > +   dnprintf(10,"convert to package\n");
> > +   switch (a->type) {
> > +   case AML_OBJTYPE_INTEGER:
> > +   c = aml_allocvalue(AML_OBJTYPE_PACKAGE, 1,
> > NULL);
> > +   _aml_setvalue(c->v_package[0],
> > AML_OBJTYPE_INTEGER,
> > +   a->v_integer, NULL);
> > +   break;
> > +   }
> > +   break;
> > case AML_OBJTYPE_BUFFER:
> > dnprintf(10,"convert to buffer\n");
> > switch (a->type) {
> 
> Thanks for the patch Mikhail. This fixed the ACPI issue and the system fully
> boots now. Complete installation from a release(8) build and the system runs
> as expected.
> 
> I have attached the complete dmesg below if there is any other hardware
> features that need to be considered. Hopefully this can be committed to
> -current. Thanks again!
> 
> OpenBSD 7.2-current (GENERIC.MP) #2: Fri Nov  4 20:46:44 AEDT 2022
>  
> mrbuil...@o-snap.in.tubnor.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16862752768 (16081MB)
> avail mem = 16334270464 (15577MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x43d17000 (140 entries)
> bios0: vendor LENOVO version "M41KT32A" date 09/12/2022
> bios0: LENOVO 11T8S03M00
> efi0 at bios0: UEFI 2.8
> efi0: American Megatrends rev 0x50018
> acpi0 at bios0: ACPI 6.4
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT FIDT SSDT SSDT SSDT SSDT SSDT HPET APIC MCFG
> SSDT UEFI NHLT LPIT SSDT SSDT DBGP DBG2 SSDT DMAR SSDT SSDT SSDT SSDT LUFT
> TPM2 PHAT FPDT BGRT WSMT
> acpi0: wakeup devices PEG1(S4) PEGP(S4) PEGP(S4) PEGP(S4) PEGP(S4) SIO1(S3)
> RP09(S4) PXSX(S4) RP10(S4) PXSX(S4) RP11(S4) PXSX(S4) RP12(S4) PXSX(S4)
> RP13(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 12th Gen Intel(R) Core(TM) i5-12400, 4390.47 MHz, 06-97-02
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU
> SH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,V
> MX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,PO
> PCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DN
> OWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,AD
> X,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SE
> NSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 10-way L2 cache, 18MB 64b/line 9-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.1.0.1.0.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: 12th Gen Intel(R) Core(TM) i5-12400, 4390.47 MHz, 06-97-02
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU
> SH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,V
> MX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,PO
> PCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DN
> OWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,AD
> X,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SE
> NSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 10-way L2 cache, 18MB 64b/line 9-way L3 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: 12th Gen Intel(R) Core(TM) i5-12400, 4388.68 MHz, 06-97-02
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLU
> 

Re: Panic on Dell Precision T1600, BIOS A21 (stopped at efi_attach+0x171)

2022-10-19 Thread Mark Kettenis
> From: Claudio Miranda 
> Date: Wed, 19 Oct 2022 13:28:25 -0400
> 
> >
> > Wow:
> >
> >   efi0 at bios0: UEFI 2.0
> >
> > that is ancient.  I also found
> >
> >   https://docs.oracle.com/cd/E26502_01/html/E28978/hardw.html
> >
> > so clearly the UEFI BIOS has bugs.  Using UEFI instead of the legacy
> > BIOS on a machine that old may not be the wisest choice.  But I think
> > we can just avoid using UEFI in the kernel in this case.
> >
> > Diff below should fix it.
> >
> > ok?
> >
> >
> > Index: arch/amd64/amd64/efi_machdep.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/efi_machdep.c,v
> > retrieving revision 1.1
> > diff -u -p -r1.1 efi_machdep.c
> > --- arch/amd64/amd64/efi_machdep.c  16 Oct 2022 15:03:39 -  1.1
> > +++ arch/amd64/amd64/efi_machdep.c  19 Oct 2022 17:18:19 -
> > @@ -112,6 +112,10 @@ efi_attach(struct device *parent, struct
> > printf(".%d", minor % 10);
> > printf("\n");
> >
> > +   /* Early implementations can be buggy. */
> > +   if (major < 2 || (major == 2 && minor < 10))
> > +   return;
> > +
> > if ((bios_efiinfo->flags & BEI_64BIT) == 0)
> > return;
> >
> > Index: arch/arm64/dev/efi_machdep.c
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/dev/efi_machdep.c,v
> > retrieving revision 1.2
> > diff -u -p -r1.2 efi_machdep.c
> > --- arch/arm64/dev/efi_machdep.c12 Oct 2022 13:39:50 -  1.2
> > +++ arch/arm64/dev/efi_machdep.c19 Oct 2022 17:18:19 -
> > @@ -118,6 +118,10 @@ efi_attach(struct device *parent, struct
> > printf(".%d", minor % 10);
> > printf("\n");
> >
> > +   /* Early implementations can be buggy. */
> > +   if (major < 2 || (major == 2 && minor < 10))
> > +   return;
> > +
> > efi_map_runtime(sc);
> >
> > /*
> >
> 
> Heh, yeah. :-) I figured I'd give it a go with UEFI on this old beast
> and it worked without that kind of issue until Monday of this week.
> I'll see how I can apply this diff if possible since I can't boot at
> all, unless it will be included in an upcoming snapshot.

I expect it to be committed fairly quickly.



Re: Panic on Dell Precision T1600, BIOS A21 (stopped at efi_attach+0x171)

2022-10-19 Thread Mark Kettenis
> From: Claudio Miranda 
> Date: Wed, 19 Oct 2022 12:07:50 -0400
> 
> Greetings,
> 
> I'm getting a kernel panic on a Dell Precision T1600 with BIOS A21
> which is the latest revision from Dell for this system. This all
> started as of the #793 snapshot of -current on Monday, October 17 at
> 10:16:43 MDT. I've attached pictures of the kernel panic on boot as
> well as the panic info, trace info, and dmesg info. Prior to this
> snapshot, the system was booting OpenBSD without issue. Unfortunately,
> I'm only able to provide pictures of the information needed. Any help
> is greatly appreciated.
> 
> Thanks,
> 
> Claudio

Wow:

  efi0 at bios0: UEFI 2.0

that is ancient.  I also found

  https://docs.oracle.com/cd/E26502_01/html/E28978/hardw.html

so clearly the UEFI BIOS has bugs.  Using UEFI instead of the legacy
BIOS on a machine that old may not be the wisest choice.  But I think
we can just avoid using UEFI in the kernel in this case.

Diff below should fix it.

ok?


Index: arch/amd64/amd64/efi_machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/efi_machdep.c,v
retrieving revision 1.1
diff -u -p -r1.1 efi_machdep.c
--- arch/amd64/amd64/efi_machdep.c  16 Oct 2022 15:03:39 -  1.1
+++ arch/amd64/amd64/efi_machdep.c  19 Oct 2022 17:18:19 -
@@ -112,6 +112,10 @@ efi_attach(struct device *parent, struct
printf(".%d", minor % 10);
printf("\n");
 
+   /* Early implementations can be buggy. */
+   if (major < 2 || (major == 2 && minor < 10))
+   return;
+
if ((bios_efiinfo->flags & BEI_64BIT) == 0)
return;
 
Index: arch/arm64/dev/efi_machdep.c
===
RCS file: /cvs/src/sys/arch/arm64/dev/efi_machdep.c,v
retrieving revision 1.2
diff -u -p -r1.2 efi_machdep.c
--- arch/arm64/dev/efi_machdep.c12 Oct 2022 13:39:50 -  1.2
+++ arch/arm64/dev/efi_machdep.c19 Oct 2022 17:18:19 -
@@ -118,6 +118,10 @@ efi_attach(struct device *parent, struct
printf(".%d", minor % 10);
printf("\n");
 
+   /* Early implementations can be buggy. */
+   if (major < 2 || (major == 2 && minor < 10))
+   return;
+
efi_map_runtime(sc);
 
/*



Re: time keeping on armv7

2022-09-25 Thread Mark Kettenis
> Date: Tue, 20 Sep 2022 14:04:14 +
> From: Miod Vallat 
> 
> I recently installed OpenBSD to a PandaBoard (the original, not
> PandaBoard ES) and noticed that the clock was very quickly getting
> behind, with ntpd unable to cope.
> 
> The following extremely crude diff fixes it, but probably at the expense
> of breaking other omap systems. Is there a better way to figure out what
> is the real system clock frequency?

Hmm, in the device tree world it appears that there should be a node
with a "arm,cortex-a9-global-timer" compatible string that references
a clock which will provide the actual frequency the clock is running
at.

However, I don't think the omap4 device trees have such a node.  I
think this means that Linux doesn't actually use the global timer and
uses the private timer instead.  That timer is represented in the
device tree by a node with the "arm,cortex-a9-twd-timer" compatible.

I think jsg@ is right.  The clock rate will be the output rate of
mpu_periphclk.  It looks like that clock has a clock-output-names
property that could be used to look up the frequency, although we
currently don't have any infrastructure to look up clocks by name.

> Index: sys/arch/armv7/omap/omapid.c
> ===
> RCS file: /OpenBSD/src/sys/arch/armv7/omap/omapid.c,v
> retrieving revision 1.5
> diff -u -p -u -p -r1.5 omapid.c
> --- sys/arch/armv7/omap/omapid.c  24 Oct 2021 17:52:27 -  1.5
> +++ sys/arch/armv7/omap/omapid.c  20 Sep 2022 13:54:01 -
> @@ -83,9 +83,12 @@ omapid_attach(struct device *parent, str
>   rev = bus_space_read_4(sc->sc_iot, sc->sc_ioh, O4_ID_CODE);
>   switch ((rev >> 12) & 0x) {
>   case 0xB852:
> - case 0xB95C:
>   board = "omap4430";
>   newclockrate = 400 * 1000 * 1000;
> + break;
> + case 0xB95C:
> + board = "omap4430";
> + newclockrate = 300 * 1000 * 1000;
>   break;
>   case 0xB94E:
>   board = "omap4460";
> 
> 
> 
> OpenBSD 7.2 (GENERIC) #11: Tue Sep 20 13:18:51 GMT 2022
> m...@enfer.gentiane.org:/usr/src/sys/arch/armv7/compile/GENERIC
> real mem  = 1021243392 (973MB)
> avail mem = 992374784 (946MB)
> random: boothowto does not indicate good seed
> mainbus0 at root: TI OMAP4 PandaBoard
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A9 r1p2
> cpu0: 32KB 32b/line 4-way L1 VIPT I-cache, 32KB 32b/line 4-way L1 D-cache
> cortex0 at mainbus0
> amptimer0 at cortex0: 396000 kHz
> armliicc0 at cortex0: rtl 4 waymask: 0x000f
> omap0 at mainbus0
> omapid0 at omap0: omap4430
> amptimer0: adjusting clock: new rate 30 kHz
> prcm0 at omap0 rev 0.0
> ampintc0 at mainbus0 nirq 160, ncpu 2: "interrupt-controller"
> omwugen0 at mainbus0
> simplebus0 at mainbus0: "ocp"
> omsysc0 at simplebus0: "target-module"
> omsysc1 at simplebus0: "target-module"
> omsysc2 at simplebus0: "target-module"
> omsysc3 at simplebus0: "target-module"
> omsysc4 at simplebus0: "target-module"
> omsysc5 at simplebus0: "target-module"
> omsysc6 at simplebus0: "target-module"
> simplebus1 at simplebus0: "l4"
> simplebus2 at simplebus1: "cm1"
> omcm0 at simplebus2: "mpuss_cm"
> omclock0 at omcm0: "clk"
> omcm1 at simplebus2: "tesla_cm"
> omclock1 at omcm1: "clk"
> omcm2 at simplebus2: "abe_cm"
> omclock2 at omcm2: "clk"
> simplebus3 at simplebus1: "cm2"
> omcm3 at simplebus3: "l4_ao_cm"
> omclock3 at omcm3: "clk"
> omcm4 at simplebus3: "l3_1_cm"
> omclock4 at omcm4: "clk"
> omcm5 at simplebus3: "l3_2_cm"
> omclock5 at omcm5: "clk"
> omcm6 at simplebus3: "ducati_cm"
> omclock6 at omcm6: "clk"
> omcm7 at simplebus3: "l3_dma_cm"
> omclock7 at omcm7: "clk"
> omcm8 at simplebus3: "l3_emif_cm"
> omclock8 at omcm8: "clk"
> omcm9 at simplebus3: "d2d_cm"
> omclock9 at omcm9: "clk"
> omcm10 at simplebus3: "l4_cfg_cm"
> omclock10 at omcm10: "clk"
> omcm11 at simplebus3: "l3_instr_cm"
> omclock11 at omcm11: "clk"
> omcm12 at simplebus3: "ivahd_cm"
> omclock12 at omcm12: "clk"
> omcm13 at simplebus3: "iss_cm"
> omclock13 at omcm13: "clk"
> omcm14 at simplebus3: "l3_dss_cm"
> omclock14 at omcm14: "clk"
> omcm15 at simplebus3: "l3_gfx_cm"
> omclock15 at omcm15: "clk"
> omcm16 at simplebus3: "l3_init_cm"
> omclock16 at omcm16: "clk"
> omcm17 at simplebus3: "l4_per_cm"
> omclock17 at omcm17: "clk"
> simplebus4 at simplebus1: "scm"
> syscon0 at simplebus4: "scm_conf"
> simplebus5 at simplebus1: "scm"
> syscon1 at simplebus5: "omap4_padconf_global"
> pinctrl0 at simplebus5
> simplebus6 at simplebus1: "l4"
> "counter" at simplebus6 not configured
> "prm" at simplebus6 not configured
> "scrm" at simplebus6 not configured
> "scm" at simplebus6 not configured
> simplebus7 at simplebus6: "padconf"
> pinctrl1 at simplebus7
> "ocmcram" at simplebus0 not configured
> "dma-controller" at simplebus0 not configured
> omgpio0 at simplebus0: rev 0.1
> gpio0 at 

Re: Missing and strfmon()/strfmon_l()

2022-08-23 Thread Mark Kettenis
> Date: Wed, 17 Aug 2022 11:30:04 +0200
> From: Ingo Schwarze 
> 
> QUESTION TO PORTERS:
> Would providing , strfmon(3), and strfmon_l(3)
> in our libc make porters' lives easier, or are these interfaces
> used so rarely in real-world programs that it does not matter?

Note that these interfaces have been made part of POSIX proper some time ago:

  https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/monetary.h.html

Given that we aim to be POSIX-compatible, we probably should add a
(minimal) implementation of these functions.

> Hi John,
> 
> John Zaitseff wrote on Wed, Aug 17, 2022 at 12:29:20PM +1000:
> 
> > Apologies in advance if I am sending this to the wrong list...
> 
> Since strfmon(3) is a useless API and this discussion is exclusively
> about compatibility, asking on ports@ would have been better,
> more accurately targetting the intended audience, but bugs@ is not
> outright wrong either because OpenBSD aims to support POSIX unless
> there are specific reasons to not support something, and ,
> strfmon(3), and strfmon_l(3) are specified by POSIX.
> 
> However, strfmon(3) is even more ill-designed than other POSIX
> locale-related functions:
> 
>  - strfmon(3) is designed to interpret the same floating-point
>number differently depending on the user's locale(1).
>But whether a user owns 42.00 Turkish Lira or 42.00 Pound Sterling
>is *not* a matter of personally preferred output conventions.
>Consequently, i can hardly imagine any situation where using
>strfmon*(3) might make any sense.  Using this functions will
>usually misrepresent the currency owned by the user, causing
>wrong output.
> 
>  - Arguably, you can use the "!" flag to suppress the misfeature
>of having the currency symbol depend on the user's locale,
>but then, strfmon(3) mostly duplicates functionality already
>provided by printf(3) with a very small number of gratuitious
>variations, so i see no conceivable motivation for using the
>interface with "!" either.
> 
> For those reasons, i think using  is a terrible idea in the
> first place and we should not add it to our libc if we can avoid it.
> 
> That said, even if an API is abominable (like this one),
> support can sometimes be considered *if* it is used by enough
> ports(7) that its absense causes pain for OpenBSD porters.
> The only condition in such cases is that a dummy version can be
> provided that poses no security risks.  I do not doubt that
> would be possible for strfmon(3) and strfmon_l(3).
> 
> > A few years ago, Frederic Cambus packaged Star Traders, my simple
> > game of interstellar trading, for OpenBSD ("trader").  In doing so,
> > he bundled FreeBSD's version of strfmon() as that function is
> > required by my program.
> 
> My personal recommendation would be to stop using the bad function
> in your program.
> 
> > Longer term, however, could OpenBSD include , strfmon()
> > and strfmon_l(), possibly by copying these from the latest version
> > of FreeBSD.
> 
> Well, as usual, the FreeBSD version of locale functions is seriously
> bloated, so if porters tell me that lack of the function causes
> pain for them, i would radically strip it down before commit.
> 
> Yours,
>   Ingo
> 
> 



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-14 Thread Mark Kettenis
> From: kiltz 
> Date: Tue, 14 Jun 2022 13:27:03 +0200
> 
> Dear Mark,
> first of all, thanks again for your efforts!
> After testing the new snapshot, in total 64 cores are recognized (see  
> attached screenshot) before the kernel panics again.
> Since the CPU Ampere Altra Q80-33 processor has 80 cores, I suspect  
> there are only two possibilities - either after initializing 64 cores  
> the kernel we hit a brick wall of sorts or somehow the limit was set  
> to 64 cores?
> Best wishes,

Hi Stefan,

I don't understand what's happening here.  To help us out can you:

1. Boot the machine with a single-processor kernel by typing "bsd.sp"
   at the boot> prompt?

2. Send me the output of the "eeprom -p" command.

3. Send me the files in the /var/db/acpi directory.

4. Send the dmesg output.

Feel free to send me (kette...@openbsd.org) and Patrick
(patr...@openbsd.org) that output in private if you have concerns
sending it to a public mailing list.

Thanks,

Mark



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis
> From: kiltz 
> Date: Mon, 13 Jun 2022 18:12:27 +0200
> 
> Dear Mark,
> first of all, thank you very much for your explainations, the diff  
> and, indeed, the ultra swift reply!
> That helps us a lot already.
> A snapshot with a higher value of max CPUs out of the box, of course,  
> would be the proverbial icing on the cake.
> Probably a strange question but I hazard it anyways - should we  
> monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> there a quicker way to find out what your fellow developers think?
> Again, many thanks for your help and best wishes,

Hi Stefan,

Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
will have it.  You can easily tell, since all 80 CPUs should attach
with that diff.

Cheers,

Mark

> - 
> Dr.-Ing. Stefan Kiltz
> 
> Otto-von-Guericke University of Magdeburg
> ITI Research Group on
> Multimedia and Security
> Universitaetsplatz 2
> 39106 Magdeburg
> Germany
> 
> Tel: +49-391-67-52838
> Fax: +49-391-67-18110
> 
> eMail: ki...@iti.cs.uni-magdeburg.de
> 
> 
> 
> 
> 
> On 13 Jun 2022, at 17:20, Mark Kettenis wrote:
> 
> >> From: kiltz 
> >> Date: Mon, 13 Jun 2022 14:46:39 +0200
> >
> > Hi Stefan,
> >
> >> Dear kind people at OpenBSD.org,
> >> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30
> >> with the following specifications:
> >>
> >>Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
> >>512 GB RAM (3200 MHz ECC-reg.)
> >>2 x 480 GB SSD SATA 6 Gb/s 2,5''
> >>Dual-Port 1 GbE (RJ-45)
> >>IPMI 2.0 Baseboard Management Controller (BMC)
> >> 1 x PCIe4.0 x16 (FHHL)
> >>1 x PCIe3.0 x16 OCP2.0 (belegt)
> >>1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> >>
> >> We tried both:
> >> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> >> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> >>
> >> The repeatable result is a working install in single CPU/Core
> >> installation mode, cpu panic after first reboot with mp kernel. We  
> >> use
> >> the serial to LAN console provided by the IMPI/BMC card.
> >> Attached you will find screenshots from:
> >>
> >> - the last 49 columns of the reboot into mp kernel
> >> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13
> >> 13-51-00.png),
> >> - the ddb trace output (Screenshot ddb_trace_2022-06-13  
> >> 14-02-11.png),
> >> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> >> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13
> >> 14-04-28.png)
> >> - the ddb show registers output (Screenshot ddb_show_registers_at
> >> 2022-06-13 14-06-34.png)
> >>
> >> Due to the nature of the early boot panic, the kernel output is not
> >> accessible to us.
> >>
> >> Interestingly, FreeBSD only supports them in their current release,
> >> the stable fails with a similar panic. They seem to have found a fix
> >> of sorts. But we very much prefer OpenBSD for the firewalling role of
> >> aforementioned system.
> >>
> >> Of course we support your effort so if you need more info from us
> >> regarding the circumstances, we will happily try and supply the
> >> required information.
> >
> > The immediate problem is that OpenBSD currently supports a maximum of
> > 32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
> > 128.  You could try building a GENERIC.MP kernel with this diff after
> > booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
> > my fellow developers think abut bumping MAXCPUS.  Depending on the
> > outcome of that a snapshot with this change may be available in a few
> > days.
> >
> > I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
> > very well but I guess there is only one way to find out...
> >
> > Cheers,
> >
> > Mark
> >
> >
> > Index: arch/arm64/include/cpu.h
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 cpu.h
> > --- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
> > +++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
> > @@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
> > #define CPU_INFO_FOREACH(cii, ci)   for (cii = 0, ci = cpu_info_list; \
> > ci != NULL; ci = ci->ci_next)
> > #define CPU_INFO_UNIT(ci)   ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
> > -#define MAXCPUS32
> > +#define MAXCPUS128
> >
> > extern struct cpu_info *cpu_info[MAXCPUS];
> >
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> 
> iEYEARECAAYFAmKnYesACgkQuLKZPfaiT0iDDgCfXC6QIWGHzkMyWxPKHCaTkYwR
> AXUAnjLiJX1RyuqrMejk4AT2s5X99fmi
> =pRhT
> -END PGP SIGNATURE-
> 



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis
> From: kiltz 
> Date: Mon, 13 Jun 2022 14:46:39 +0200

Hi Stefan,

> Dear kind people at OpenBSD.org,
> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30  
> with the following specifications:
> 
>   Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
>   512 GB RAM (3200 MHz ECC-reg.)
>   2 x 480 GB SSD SATA 6 Gb/s 2,5''
>   Dual-Port 1 GbE (RJ-45)
>   IPMI 2.0 Baseboard Management Controller (BMC)
>  1 x PCIe4.0 x16 (FHHL)
>   1 x PCIe3.0 x16 OCP2.0 (belegt)
>   1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> 
> We tried both:
> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> 
> The repeatable result is a working install in single CPU/Core  
> installation mode, cpu panic after first reboot with mp kernel. We use  
> the serial to LAN console provided by the IMPI/BMC card.
> Attached you will find screenshots from:
> 
> - the last 49 columns of the reboot into mp kernel  
> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13  
> 13-51-00.png),
> - the ddb trace output (Screenshot ddb_trace_2022-06-13 14-02-11.png),
> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13  
> 14-04-28.png)
> - the ddb show registers output (Screenshot ddb_show_registers_at  
> 2022-06-13 14-06-34.png)
> 
> Due to the nature of the early boot panic, the kernel output is not  
> accessible to us.
> 
> Interestingly, FreeBSD only supports them in their current release,  
> the stable fails with a similar panic. They seem to have found a fix  
> of sorts. But we very much prefer OpenBSD for the firewalling role of  
> aforementioned system.
> 
> Of course we support your effort so if you need more info from us  
> regarding the circumstances, we will happily try and supply the  
> required information.

The immediate problem is that OpenBSD currently supports a maximum of
32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
128.  You could try building a GENERIC.MP kernel with this diff after
booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
my fellow developers think abut bumping MAXCPUS.  Depending on the
outcome of that a snapshot with this change may be available in a few
days.

I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
very well but I guess there is only one way to find out...

Cheers,

Mark


Index: arch/arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.25
diff -u -p -r1.25 cpu.h
--- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
+++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
@@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
 #define CPU_INFO_FOREACH(cii, ci)  for (cii = 0, ci = cpu_info_list; \
ci != NULL; ci = ci->ci_next)
 #define CPU_INFO_UNIT(ci)  ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
-#define MAXCPUS32
+#define MAXCPUS128
 
 extern struct cpu_info *cpu_info[MAXCPUS];
 



Re: sparc64: Open Firmware stack corruption / "no space for symbol table"

2022-05-25 Thread Mark Kettenis
> Date: Mon, 16 May 2022 00:13:12 +0200
> From: Harold Gutch 
> 
> Hi,
> 
> over the last months there have been multiple reports of sparc64 not
> booting with OF_map_phys() calls failing, see, e.g., the thread
> https://marc.info/?t=16437199371=1=2 .
> 
> In early December 2021, writing to disk from Open Firmware was disabled,
> but at least in Qemu I still get this error when booting the most
> recent miniroot snapshot,
> https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/miniroot71.img .
> 
> I believe the actual reason is a bug in the OF_map_phys() function of
> ofwboot (in Locore.c) which does not correspond to the Open Firmware
> documentation.  As a result, the Open Firmware stack is garbled, and
> some images just happen to have the right values where OF_map_phys()
> reads something it believes to be a return value and end up
> successfully booting nonetheless.
> 
> The attached patch fixes that (and makes the according change to the
> kernel call in ofw_machdep.c).  After rebuilding ofwboot with it and
> injecting that in miniroot71.img, it successfully boots in Qemu.
> 
> I don't have sparc64 hardware readily available and was thus unable to
> verify this on hardware.  The bug was inherited from NetBSD where this
> started showing up with a compiler change roughly 1 year ago, and
> there the patch helped for both Qemu and hardware, see also
> https://gnats.netbsd.org/56829 .
> 
> 
> cheers,
>   Harold

Hi Harald,

Thanks for the diff.  I've made some further adjustments, in
particular fixing the return type of the OF_map_phys() function.  The
diff is currently in snapshots and will probably be committed soon.

Thanks again,

Mark


Index: arch/sparc64/sparc64/ofw_machdep.c
===
RCS file: /cvs/src/sys/arch/sparc64/sparc64/ofw_machdep.c,v
retrieving revision 1.34
diff -u -p -r1.34 ofw_machdep.c
--- arch/sparc64/sparc64/ofw_machdep.c  28 Aug 2018 00:00:42 -  1.34
+++ arch/sparc64/sparc64/ofw_machdep.c  24 May 2022 20:42:41 -
@@ -350,8 +350,6 @@ prom_map_phys(paddr, size, vaddr, mode)
cell_t vaddr;
cell_t phys_hi;
cell_t phys_lo;
-   cell_t status;
-   cell_t retaddr;
} args;
 
if (mmuh == -1 && ((mmuh = get_mmu_handle()) == -1)) {
@@ -360,7 +358,7 @@ prom_map_phys(paddr, size, vaddr, mode)
}
args.name = ADR2CELL("call-method");
args.nargs = 7;
-   args.nreturns = 1;
+   args.nreturns = 0;
args.method = ADR2CELL("map");
args.ihandle = HDL2CELL(mmuh);
args.mode = mode;
@@ -368,12 +366,7 @@ prom_map_phys(paddr, size, vaddr, mode)
args.vaddr = ADR2CELL(vaddr);
args.phys_hi = HDQ2CELL_HI(paddr);
args.phys_lo = HDQ2CELL_LO(paddr);
-
-   if (openfirmware() == -1)
-   return -1;
-   if (args.status)
-   return -1;
-   return (int)args.retaddr;
+   return openfirmware();
 }
 
 
Index: arch/sparc64/stand/ofwboot/Locore.c
===
RCS file: /cvs/src/sys/arch/sparc64/stand/ofwboot/Locore.c,v
retrieving revision 1.16
diff -u -p -r1.16 Locore.c
--- arch/sparc64/stand/ofwboot/Locore.c 31 Dec 2018 11:44:57 -  1.16
+++ arch/sparc64/stand/ofwboot/Locore.c 24 May 2022 20:42:41 -
@@ -46,7 +46,7 @@
 static vaddr_t OF_claim_virt(vaddr_t vaddr, int len);
 static vaddr_t OF_alloc_virt(int len, int align);
 static int OF_free_virt(vaddr_t vaddr, int len);
-static vaddr_t OF_map_phys(paddr_t paddr, off_t size, vaddr_t vaddr, int mode);
+static int OF_map_phys(paddr_t paddr, off_t size, vaddr_t vaddr, int mode);
 static paddr_t OF_alloc_phys(int len, int align);
 static int OF_free_phys(paddr_t paddr, int len);
 
@@ -438,7 +438,7 @@ OF_free_virt(vaddr_t vaddr, int len)
  *
  * Only works while the prom is actively mapping us.
  */
-static vaddr_t
+static int
 OF_map_phys(paddr_t paddr, off_t size, vaddr_t vaddr, int mode)
 {
struct {
@@ -452,13 +452,11 @@ OF_map_phys(paddr_t paddr, off_t size, v
cell_t vaddr;
cell_t paddr_hi;
cell_t paddr_lo;
-   cell_t status;
-   cell_t retaddr;
} args;
 
args.name = ADR2CELL("call-method");
args.nargs = 7;
-   args.nreturns = 1;
+   args.nreturns = 0;
args.method = ADR2CELL("map");
args.ihandle = HDL2CELL(mmuh);
args.mode = mode;
@@ -466,12 +464,7 @@ OF_map_phys(paddr_t paddr, off_t size, v
args.vaddr = ADR2CELL(vaddr);
args.paddr_hi = HDQ2CELL_HI(paddr);
args.paddr_lo = HDQ2CELL_LO(paddr);
-
-   if (openfirmware() == -1)
-   return -1;
-   if (args.status)
-   return -1;
-   return (vaddr_t)args.retaddr;
+   return openfirmware();
 }
 
 



Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-05-21 Thread Mark Kettenis
> Date: Sat, 21 May 2022 13:13:19 -0400
> From: Johan Huldtgren 
> 
> On 2022/05/21 12:43, Mark Kettenis wrote:
> >> Date: Sat, 21 May 2022 12:36:03 -0400
> >> From: Johan Huldtgren 
> >>
> >> hello,
> >>
> >> On 2022/05/21 12:08, Mark Kettenis wrote:
> >>>> Date: Sat, 21 May 2022 10:31:37 -0400
> >>>> From: Johan Huldtgren 
> >>>>
> >>>> hello,
> >>>>
> >>>> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> >>>> this host boot. I wrote much of this e-mail while going through it,
> >>>> so while we know now what the issue is I'm leaving my responses in
> >>>> case it sheds light on anything.
> >>>
> >>> So it seems your machine incorrectly advertises a serial port that
> >>> doesn't actually exist:
> >>>
> >>>> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> >>>> com1: probed fifo depth: 0 bytes
> >>
> >> I think you're right, Crystal asked about it in a previous
> >> mail which I didn't get a chance to respond to, but I do not
> >> see com1 being reported in the 7.0 dmesg from last night nor
> >> in any older dmesgs I've been able to dig up and I don't
> >> believe anything with this hardware has changed as long as I've
> >> had it.
> >>
> >>> This may be a bug in our APCI code.  Can you send the contents of
> >>> /var/db/acpi on your machine?
> >>
> >> root@www ~]# ls -al /var/db/acpi/
> >> total 164
> >> drwxr-xr-x   2 root  wheel512 May 20 21:26 ./
> >> drwxr-xr-x  15 root  wheel   1024 May 21 06:10 ../
> >> -rw-r--r--   1 root  wheel146 May 21 06:55 APIC.3
> >> -rw-r--r--   1 root  wheel120 May 21 06:55 DMAR.12
> >> -rw-r--r--   1 root  wheel  44470 May 21 06:55 DSDT.2
> >> -rw-r--r--   1 root  wheel244 May 21 06:55 FACP.1
> >> -rw-r--r--   1 root  wheel 68 May 21 06:55 FPDT.4
> >> -rw-r--r--   1 root  wheel 56 May 21 06:55 HPET.7
> >> -rw-r--r--   1 root  wheel 60 May 21 06:55 MCFG.5
> >> -rw-r--r--   1 root  wheel190 May 21 06:55 PRAD.6
> >> -rw-r--r--   1 root  wheel 80 Sep 17  2019 RSDT.0
> >> -rw-r--r--   1 root  wheel 64 May 21 06:55 SPMI.9
> >> -rw-r--r--   1 root  wheel   2468 May 21 06:55 SSDT.10
> >> -rw-r--r--   1 root  wheel   2696 May 21 06:55 SSDT.11
> >> -rw-r--r--   1 root  wheel877 May 21 06:55 SSDT.8
> >> -rw-r--r--   1 root  wheel124 May 21 06:55 XSDT.0
> >> -rw-r--r--   1 root  wheel   2520 May 21 06:55 headers
> >>
> >> Do you need the files? I can tar that directory up and
> >> make it available.
> > 
> > Right we need all of those.
> 
> http://www.huldtgren.com/panics/20220520/acpi.tgz

It looks as if the ACPI AML is properly checking that the UART is
enabled in the NCT6776F SuperIO chip.  Can you build a kernel with the
diff below and mail the dmesg from that kernel?


Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.413
diff -u -p -r1.413 acpi.c
--- dev/acpi/acpi.c 17 Feb 2022 00:21:40 -  1.413
+++ dev/acpi/acpi.c 21 May 2022 18:20:20 -
@@ -3095,6 +3095,7 @@ acpi_foundhid(struct aml_node *node, voi
return (0);
 
sta = acpi_getsta(sc, node->parent);
+   printf("_STA: 0x%02llx\n", sta);
if ((sta & (STA_PRESENT | STA_ENABLED)) != (STA_PRESENT | STA_ENABLED))
return (0);
 



Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-05-21 Thread Mark Kettenis
> Date: Sat, 21 May 2022 12:36:03 -0400
> From: Johan Huldtgren 
> 
> hello,
> 
> On 2022/05/21 12:08, Mark Kettenis wrote:
> >> Date: Sat, 21 May 2022 10:31:37 -0400
> >> From: Johan Huldtgren 
> >>
> >> hello,
> >>
> >> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> >> this host boot. I wrote much of this e-mail while going through it,
> >> so while we know now what the issue is I'm leaving my responses in
> >> case it sheds light on anything.
> > 
> > So it seems your machine incorrectly advertises a serial port that
> > doesn't actually exist:
> > 
> >> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> >> com1: probed fifo depth: 0 bytes
> 
> I think you're right, Crystal asked about it in a previous
> mail which I didn't get a chance to respond to, but I do not
> see com1 being reported in the 7.0 dmesg from last night nor
> in any older dmesgs I've been able to dig up and I don't
> believe anything with this hardware has changed as long as I've
> had it.
> 
> > This may be a bug in our APCI code.  Can you send the contents of
> > /var/db/acpi on your machine?
> 
> root@www ~]# ls -al /var/db/acpi/
> total 164
> drwxr-xr-x   2 root  wheel512 May 20 21:26 ./
> drwxr-xr-x  15 root  wheel   1024 May 21 06:10 ../
> -rw-r--r--   1 root  wheel146 May 21 06:55 APIC.3
> -rw-r--r--   1 root  wheel120 May 21 06:55 DMAR.12
> -rw-r--r--   1 root  wheel  44470 May 21 06:55 DSDT.2
> -rw-r--r--   1 root  wheel244 May 21 06:55 FACP.1
> -rw-r--r--   1 root  wheel 68 May 21 06:55 FPDT.4
> -rw-r--r--   1 root  wheel 56 May 21 06:55 HPET.7
> -rw-r--r--   1 root  wheel 60 May 21 06:55 MCFG.5
> -rw-r--r--   1 root  wheel190 May 21 06:55 PRAD.6
> -rw-r--r--   1 root  wheel 80 Sep 17  2019 RSDT.0
> -rw-r--r--   1 root  wheel 64 May 21 06:55 SPMI.9
> -rw-r--r--   1 root  wheel   2468 May 21 06:55 SSDT.10
> -rw-r--r--   1 root  wheel   2696 May 21 06:55 SSDT.11
> -rw-r--r--   1 root  wheel877 May 21 06:55 SSDT.8
> -rw-r--r--   1 root  wheel124 May 21 06:55 XSDT.0
> -rw-r--r--   1 root  wheel   2520 May 21 06:55 headers
> 
> Do you need the files? I can tar that directory up and
> make it available.

Right we need all of those.



Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-05-21 Thread Mark Kettenis
> Date: Sat, 21 May 2022 10:31:37 -0400
> From: Johan Huldtgren 
> 
> hello,
> 
> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> this host boot. I wrote much of this e-mail while going through it,
> so while we know now what the issue is I'm leaving my responses in
> case it sheds light on anything.

So it seems your machine incorrectly advertises a serial port that
doesn't actually exist:

> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> com1: probed fifo depth: 0 bytes

This may be a bug in our APCI code.  Can you send the contents of
/var/db/acpi on your machine?



Re: uhid spam: uhidev_intr: bad repid 33

2022-05-09 Thread Mark Kettenis
> Date: Mon, 9 May 2022 17:44:29 +0100
> From: Stuart Henderson 
> 
> I have a USB combi keyboard/trackpad thing which is triggering "bad
> repid 33" frequently while attached (between a couple of times a minute,
> and once every few minutes). It does work but it's annoying.
> 
> Presumably this is because it has non-contiguous report IDs?

That shouldn't be a problem.

> Anyone have an idea how to handle it?

No.  But showing dmesg output might help.

> Bus 000 Device 002: ID 045e:0800 Microsoft Corp. 
> Device Descriptor:
>   bLength18
>   bDescriptorType 1
>   bcdUSB   2.00
>   bDeviceClass0 (Defined at Interface level)
>   bDeviceSubClass 0 
>   bDeviceProtocol 0 
>   bMaxPacketSize064
>   idVendor   0x045e Microsoft Corp.
>   idProduct  0x0800 
>   bcdDevice9.44
>   iManufacturer   1 Microsoft
>   iProduct2 Microsoft? Nano Transceiver v2.0
>   iSerial 0 
>   bNumConfigurations  1
>   Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength   84
> bNumInterfaces  3
> bConfigurationValue 1
> iConfiguration  0 
> bmAttributes 0xa0
>   (Bus Powered)
>   Remote Wakeup
> MaxPower  100mA
> Interface Descriptor:
>   bLength 9
>   bDescriptorType 4
>   bInterfaceNumber0
>   bAlternateSetting   0
>   bNumEndpoints   1
>   bInterfaceClass 3 Human Interface Device
>   bInterfaceSubClass  1 Boot Interface Subclass
>   bInterfaceProtocol  1 Keyboard
>   iInterface  0 
> HID Device Descriptor:
>   bLength 9
>   bDescriptorType33
>   bcdHID   1.11
>   bCountryCode0 Not supported
>   bNumDescriptors 1
>   bDescriptorType34 Report
>   wDescriptorLength  57
>   Report Descriptor: (length is 57)
> Item(Global): Usage Page, data= [ 0x01 ] 1
> Generic Desktop Controls
> Item(Local ): Usage, data= [ 0x06 ] 6
> Keyboard
> Item(Main  ): Collection, data= [ 0x01 ] 1
> Application
> Item(Global): Usage Page, data= [ 0x08 ] 8
> LEDs
> Item(Local ): Usage Minimum, data= [ 0x01 ] 1
> NumLock
> Item(Local ): Usage Maximum, data= [ 0x03 ] 3
> Scroll Lock
> Item(Global): Logical Minimum, data= [ 0x00 ] 0
> Item(Global): Logical Maximum, data= [ 0x01 ] 1
> Item(Global): Report Size, data= [ 0x01 ] 1
> Item(Global): Report Count, data= [ 0x03 ] 3
> Item(Main  ): Output, data= [ 0x02 ] 2
> Data Variable Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Report Count, data= [ 0x05 ] 5
> Item(Main  ): Output, data= [ 0x01 ] 1
> Constant Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Usage Page, data= [ 0x07 ] 7
> Keyboard
> Item(Local ): Usage Minimum, data= [ 0xe0 0x00 ] 224
> Control Left
> Item(Local ): Usage Maximum, data= [ 0xe7 0x00 ] 231
> GUI Right
> Item(Global): Report Count, data= [ 0x08 ] 8
> Item(Main  ): Input, data= [ 0x02 ] 2
> Data Variable Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Global): Report Size, data= [ 0x08 ] 8
> Item(Global): Report Count, data= [ 0x01 ] 1
> Item(Main  ): Input, data= [ 0x01 ] 1
> Constant Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Local ): Usage Minimum, data= [ 0x00 ] 0
> No Event
> Item(Local ): Usage Maximum, data= [ 0x91 0x00 ] 145
> LANG 2 (Hanja Conversion, Korea)
> Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
> Item(Global): Report Count, data= [ 0x06 ] 6
> Item(Main  ): Input, data= [ 0x00 ] 0
> Data Array Absolute No_Wrap Linear
> Preferred_State No_Null_Position Non_Volatile 
> Bitfield
> Item(Main  ): End Collection, data=none
>   Endpoint Descriptor:
>  

Re: macppc panic: vref used where vget required

2022-05-04 Thread Mark Kettenis
> Date: Wed, 4 May 2022 17:58:14 +0200
> From: Martin Pieuchot 
> 
> On 04/05/22(Wed) 09:16, Sebastien Marie wrote:
> > [...] 
> > we don't have any vclean label ("vclean (inactive)" or "vclean (active)"), 
> > so 
> > vclean() was not called in this timeframe.
> 
> So we are narrowing down the issue:
> 
> 1. A file is opened
> 2. Then mmaped
> 3. Some of its pages are swapped to disk

Hmm, why does this happen?  Is this because the mmap(2) was done using
MAP_PRIVATE?  But then what's the point of setting
UVM_VNODE_CANPERSIST?

> 4. The process die, closing the file
> 5. The reaper calls uvn_detach() on the vnode which has UVM_VNODE_CANPERSIST
>   . This release the last reference of the vnode without sync' the pages
>   -> the vnode ends up on the free list
> 6. The page daemon tries to sync the pages, grab a reference on the vnode
>   which has already been recycled.
> 
> I don't understand the mechanism around UVM_VNODE_CANPERSIST.  I looked
> for missing uvm_vnp_uncache() and found the following two.  I doubt
> those are the one triggering the bug because they are in NFS & softdep.
> 
> So my question is should UVM_VNODE_CANPERSIST be cleared at some point
> in this scenario?  If so, when?
> 
> What is the interaction between this flag and mmap pages which are on
> swap?  In other words, is it safe to call vrele(9) in uvn_detach() if
> uvn_flush() hasn't been called with PGO_FREE|PGO_ALLPAGES?  If yes, why?
> 
> What it this flag suppose to say?  Why is it always cleared before
> VOP_REMOVE() & VOP_RENAME()?
> 
> Index: nfs/nfs_serv.c
> ===
> RCS file: /cvs/src/sys/nfs/nfs_serv.c,v
> retrieving revision 1.120
> diff -u -p -r1.120 nfs_serv.c
> --- nfs/nfs_serv.c11 Mar 2021 13:31:35 -  1.120
> +++ nfs/nfs_serv.c4 May 2022 15:29:06 -
> @@ -1488,6 +1488,9 @@ nfsrv_rename(struct nfsrv_descript *nfsd
>   error = -1;
>  out:
>   if (!error) {
> + if (tvp) {
> + (void)uvm_vnp_uncache(tvp);
> + }
>   error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, _cnd,
>  tond.ni_dvp, tond.ni_vp, _cnd);
>   } else {
> Index: ufs/ffs/ffs_inode.c
> ===
> RCS file: /cvs/src/sys/ufs/ffs/ffs_inode.c,v
> retrieving revision 1.81
> diff -u -p -r1.81 ffs_inode.c
> --- ufs/ffs/ffs_inode.c   12 Dec 2021 09:14:59 -  1.81
> +++ ufs/ffs/ffs_inode.c   4 May 2022 15:32:15 -
> @@ -172,11 +172,12 @@ ffs_truncate(struct inode *oip, off_t le
>   if (length > fs->fs_maxfilesize)
>   return (EFBIG);
>  
> - uvm_vnp_setsize(ovp, length);
>   oip->i_ci.ci_lasta = oip->i_ci.ci_clen 
>   = oip->i_ci.ci_cstart = oip->i_ci.ci_lastw = 0;
>  
>   if (DOINGSOFTDEP(ovp)) {
> + uvm_vnp_setsize(ovp, length);
> + (void) uvm_vnp_uncache(ovp);
>   if (length > 0 || softdep_slowdown(ovp)) {
>   /*
>* If a file is only partially truncated, then
> 
> 



Re: bse: null dereference in genet_rxintr()

2022-05-02 Thread Mark Kettenis
> Date: Mon, 2 May 2022 07:15:51 +0200
> From: Anton Lindqvist 
> 
> On Mon, May 02, 2022 at 12:32:24AM +0200, Mark Kettenis wrote:
> > > Date: Sun, 1 May 2022 20:13:57 +0200
> > > From: Anton Lindqvist 
> > > 
> > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote:
> > > > > Date: Tue, 19 Apr 2022 07:32:36 +0200
> > > > > From: Anton Lindqvist 
> > > > > 
> > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > > > >Category:  arm64
> > > > > > >Environment:
> > > > > > System  : OpenBSD 7.1
> > > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > > > 06:55:12 MDT 2022
> > > > > > 
> > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > > > 
> > > > > > Architecture: OpenBSD.arm64
> > > > > > Machine : arm64
> > > > > > >Description:
> > > > > > 
> > > > > > Booting my rpi4 often but not always causes a panic while rc(8) 
> > > > > > tries to start
> > > > > > the bse network interface:
> > > > > > 
> > > > > > panic: attempt to access user address 0x38 from EL1
> > > > > > Stopped at  panic+0x160:cmp w21, #0x0
> > > > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > > > > * 0  0  0 0x1  0x2000K swapper
> > > > > > db_enter() at panic+0x15c
> > > > > > panic() at do_el1h_sync+0x1f8
> > > > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > > > genet_rxintr() at genet_intr+0x74
> > > > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > > > ampintc_splx() at genet_ioctl+0x158
> > > > > > genet_ioctl() at ifioctl+0x308
> > > > > > ifioctl() at nfs_boot_init+0xc0
> > > > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > > > nfs_mountroot() at main+0x464
> > > > > > main() at virtdone+0x70
> > > > > > 
> > > > > > >Fix:
> > > > > > 
> > > > > > The mbuf associated with the current index is NULL. I noticed that 
> > > > > > the NetBSD
> > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). 
> > > > > > But even with
> > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG 
> > > > > > shows that the
> > > > > > total is quite high:
> > > > > > 
> > > > > > RX pidx=ca07 total=51463
> > > > > >
> > > > > > 
> > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > > > dereference will
> > > > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > > > since we
> > > > > > will start accessing mbufs cleared by the previous iteration.
> > > > > > 
> > > > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > > > capturing the
> > > > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > > > 
> > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > > > ignored by the hardware. That's why I ended up with a large amount of
> > > > > mbufs available in genet_rxintr() since the software and hardware 
> > > > > state
> > > > > was out of sync. Honoring any existing value makes the problem go away
> > > > > and matches what u-boot[1] does as well.
> > > > 
> > > > Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
> > > > that writing 0 doesn't work.  But even that works for me.  So I'

Re: bse: null dereference in genet_rxintr()

2022-05-01 Thread Mark Kettenis
> Date: Sun, 1 May 2022 20:13:57 +0200
> From: Anton Lindqvist 
> 
> On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 19 Apr 2022 07:32:36 +0200
> > > From: Anton Lindqvist 
> > > 
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > >Category:  arm64
> > > > >Environment:
> > > > System  : OpenBSD 7.1
> > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > 06:55:12 MDT 2022
> > > > 
> > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.arm64
> > > > Machine : arm64
> > > > >Description:
> > > > 
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries 
> > > > to start
> > > > the bse network interface:
> > > > 
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at  panic+0x160:cmp w21, #0x0
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > > * 0  0  0 0x1  0x2000K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > > 
> > > > >Fix:
> > > > 
> > > > The mbuf associated with the current index is NULL. I noticed that the 
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But 
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows 
> > > > that the
> > > > total is quite high:
> > > > 
> > > > RX pidx=ca07 total=51463
> > > >
> > > > 
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > > 
> > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > 
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> > 
> > Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
> > that writing 0 doesn't work.  But even that works for me.  So I'm
> > puzzled.
> > 
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> > 
> > Well, what the code does is setting the "prod" index ahead of the
> > "cons" index to simulate a full ring.  And then when we (partially)
> > fill the ring we increase "cons" to make descriptors available to the
> > hardware.  This seems to work on my hardware and I've never seen the
>

Re: bse: null dereference in genet_rxintr()

2022-04-30 Thread Mark Kettenis
> Date: Tue, 19 Apr 2022 07:32:36 +0200
> From: Anton Lindqvist 
> 
> On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > >Synopsis:  bse: null dereference in genet_rxintr()
> > >Category:  arm64
> > >Environment:
> > System  : OpenBSD 7.1
> > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 
> > MDT 2022
> > 
> > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.arm64
> > Machine : arm64
> > >Description:
> > 
> > Booting my rpi4 often but not always causes a panic while rc(8) tries to 
> > start
> > the bse network interface:
> > 
> > panic: attempt to access user address 0x38 from EL1
> > Stopped at  panic+0x160:cmp w21, #0x0
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > * 0  0  0 0x1  0x2000K swapper
> > db_enter() at panic+0x15c
> > panic() at do_el1h_sync+0x1f8
> > do_el1h_sync() at handle_el1h_sync+0x6c
> > handle_el1h_sync() at genet_rxintr+0x120
> > genet_rxintr() at genet_intr+0x74
> > genet_intr() at ampintc_irq_handler+0x14c
> > ampintc_irq_handler() at arm_cpu_irq+0x30
> > arm_cpu_irq() at handle_el1h_irq+0x6c
> > handle_el1h_irq() at ampintc_splx+0x80
> > ampintc_splx() at genet_ioctl+0x158
> > genet_ioctl() at ifioctl+0x308
> > ifioctl() at nfs_boot_init+0xc0
> > nfs_boot_init() at nfs_mountroot+0x3c
> > nfs_mountroot() at main+0x464
> > main() at virtdone+0x70
> > 
> > >Fix:
> > 
> > The mbuf associated with the current index is NULL. I noticed that the 
> > NetBSD
> > driver allocates mbufs for each ring entry in genet_setup_dma(). But even 
> > with
> > that in place the same panic still occurs. Enabling GENET_DEBUG shows that 
> > the
> > total is quite high:
> > 
> > RX pidx=ca07 total=51463
> >
> > 
> > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference 
> > will
> > still happen after doing more than 256 iterations in genet_rxintr() since we
> > will start accessing mbufs cleared by the previous iteration.
> > 
> > Here's a diff with what I've tried so far. The KASSERT() is just capturing 
> > the
> > problem at an earlier stage. Any pointers would be much appreciated.
> 
> Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> ignored by the hardware. That's why I ended up with a large amount of
> mbufs available in genet_rxintr() since the software and hardware state
> was out of sync. Honoring any existing value makes the problem go away
> and matches what u-boot[1] does as well.

Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
that writing 0 doesn't work.  But even that works for me.  So I'm
puzzled.

> The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> carefully selected as they ensure that the rx ring is filled with at
> least the configured low watermark number of mbufs. However, instead of
> being forced to ensure a pidx - cidx delta above 0 on the first
> invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> passed as the max argument to if_rxr_get() which will clamp the value
> anyway.

Well, what the code does is setting the "prod" index ahead of the
"cons" index to simulate a full ring.  And then when we (partially)
fill the ring we increase "cons" to make descriptors available to the
hardware.  This seems to work on my hardware and I've never seen the
crash you're seeing.



Re: Witness lock-order reversal in radeondrm

2022-04-27 Thread Mark Kettenis
> Date: Wed, 27 Apr 2022 13:52:28 -0400 (EDT)
> From: d...@sisu.io
> 
> >Synopsis:Witnesss lock order reversal in radeondrm
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.1
>   Details : OpenBSD 7.1-current (CUSTOM.MP) #14: Wed Apr 27 13:22:39 
> EDT 2022
>
> d...@minmin.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> 
> Noticed this on a fresh kernel I built while hacking on some vmm/vmd
> stuff. Probably related to the MP_LOCKDEBUG spinout I reported
> previously and since bumped the spinout counter to INT_MAX.

I doubt it.

The missing lock order data means we can't know for sure, but since
pretty much all of the drm code still runs under the kernel lock, lock
order reversals aren't necessarily problematic.

> The following was in my kernel buffer after rebooting and logging
> into my box:
> 
> witness: lock order reversal:
>  1st 0xfd91bffda1d0 uobjlk (>vmobjlock)
>  2nd 0x80430b78 mclk (>pm.mclk_lock)
> lock order data w2 -> w1 missing
> lock order ">vmobjlock"(rwlock) -> ">pm.mclk_lock"(rwlock) first 
> seen at:
> #0  rw_enter_read+0x38
> #1  radeon_gem_fault+0x4e
> #2  uvm_fault+0x179
> #3  upageflttrap+0x62
> #4  usertrap+0x129
> #5  recall_trap+0x8
> 
> >How-To-Repeat:
> My kernel config:
> 
> include "arch/amd64/conf/GENERIC"
> 
> #option   VMM_DEBUG
> optionMULTIPROCESSOR
> optionMP_LOCKDEBUG
> optionWITNESS
> 
> cpu*  at mainbus?
> 
> >Fix:
>  TBD
> 
> -dv
> 
> dmesg:
> OpenBSD 7.1-current (CUSTOM.MP) #14: Wed Apr 27 13:22:39 EDT 2022
> d...@minmin.sisu.home:/usr/src/sys/arch/amd64/compile/CUSTOM.MP
> real mem = 85769121792 (81795MB)
> avail mem = 82520170496 (78697MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xec0f0 (105 entries)
> bios0: vendor Dell Inc. version "A32" date 09/25/2019
> bios0: Dell Inc. Precision Tower 7810
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG UEFI HPET MSCT SLIT SRAT SRAT 
> WDDT SSDT NITR SLIC MSDM DMAR ASF!
> acpi0: wakeup devices IP2P(S3) RP01(S4) RP02(S4) RP03(S4) RP04(S4) RP06(S4) 
> RP07(S4) RP08(S4) BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) BR2D(S4) 
> BR3A(S4) BR3B(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.63 MHz, 06-3f-02
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.16 MHz, 06-3f-02
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: disabling user TSC (skew=9173028)
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.16 MHz, 06-3f-02
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: disabling user TSC (skew=9880086)
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz, 3392.16 MHz, 06-3f-02
> cpu3: 
> 

Re: X11 GLAMOUR acceleration is broken on the Thinkpad X41T.

2022-04-27 Thread Mark Kettenis
> Date: Wed, 27 Apr 2022 18:30:52 +1000
> From: Jonathan Gray 
> 
> On Wed, Apr 27, 2022 at 03:14:50PM +1000, Jonathan Gray wrote:
> > On Wed, Apr 27, 2022 at 12:53:37PM +1000, Jonathan Gray wrote:
> > > On Tue, Apr 26, 2022 at 12:51:10PM +0100, james palmer wrote:
> > > > That fixes things, thanks :)
> > > > 
> > > > Maybe the default should be to not use glamour if hardware cannot be 
> > > > scanned. Then again, not many people will be using hardware this old so 
> > > > it might not be worth it.
> > > > 
> > > > - James
> > > 
> > > When pci can not be scanned the wscons display type is used to
> > > decide if modesetting is used.
> > > 
> > > Using startx on x40 (i855 with gen 2 graphics) modesetting does not use
> > > glamor due to the advertised opengl version.
> > > 
> > > [   340.854] (II) modeset(0): glamor: Ignoring GL < 2.1, falling back to 
> > > GLES.
> > > [   340.855] (EE) modeset(0): glamor: Failed to create GL or GLES2 
> > > contexts
> > > [   340.985] (II) modeset(0): glamor initialization failed
> > > 
> > > This check in xenocara/xserver/glamor/glamor_egl.c glamor_egl_init()
> > > could be changed to include intel gen 3 hardware.
> > > 
> > > intel should be the preferred driver for this hardware.  I'll see if I
> > > can come up with a patch to get the pci vid/pid out of a drm device.
> > 
> > The diff below does that but startx will still result in the modesetting
> > driver being used.  I suspect that is due to libpciaccess use in
> > xf86-video-intel.
> 
> The problem on intel gen 3 is that it falls back to GLES.
> The max OpenGL compat profile for gen 3 is 1.4
> 
> With this diff startx works with modesetting and the llvmpipe
> Mesa driver is used on
> inteldrm0: apic 1 int 16, I945GM, gen 3

Would be interesting to see what upstream thinks about this.

Any clue why falling back to GLES causes issues?

> Index: xserver/glamor/glamor_egl.c
> ===
> RCS file: /cvs/xenocara/xserver/glamor/glamor_egl.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 glamor_egl.c
> --- xserver/glamor/glamor_egl.c   11 Nov 2021 09:03:03 -  1.11
> +++ xserver/glamor/glamor_egl.c   27 Apr 2022 08:17:15 -
> @@ -1016,9 +1016,10 @@ glamor_egl_init(ScrnInfoPtr scrn, int fd
>  
>  if (epoxy_gl_version() < 21) {
>  xf86DrvMsg(scrn->scrnIndex, X_INFO,
> -   "glamor: Ignoring GL < 2.1, falling back to GLES.\n");
> +   "glamor: Ignoring GL < 2.1\n");
>  eglDestroyContext(glamor_egl->display, glamor_egl->context);
>  glamor_egl->context = EGL_NO_CONTEXT;
> +goto error;
>  }
>  }
>  
> 
> 



Re: VPS hang running ttyflags -a after 7.1 upgrade

2022-04-26 Thread Mark Kettenis
> Date: Tue, 26 Apr 2022 07:24:22 +0200
> From: Anton Lindqvist 
> 
> On Tue, Apr 26, 2022 at 02:32:22AM +, Lucas wrote:
> > >Synopsis:  `ttyflags -a` hangs the system
> > >Category:  tty?
> > >Environment:
> > System  : OpenBSD 7.1
> > Details : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 
> > 2022
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > After an upgrade to 7.1, /etc/rc hangs when running `ttyflags
> > -a`. I don't have a 7.0 dmesg of this machine, but I do have
> > other machines in that provider running 7.0 without the same
> > specs (4 vCPUs here vs 1 vCPU) and they boot fine. The dmesg in
> > those machines doesn't show any ^com line. I can share the dmesg
> > of one of those, and I can attempt an upgrade if it can help to
> > better diagnostic the problem. I can try some kernel patches
> > too.
> > >How-To-Repeat:
> > Upgrade to 7.1 in this provider
> > >Fix:
> > Comment out `ttyflags -a` call in line 393 of /etc/rc.
> > 
> > dmesg:
> > OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 4278169600 (4079MB)
> > avail mem = 4131217408 (3939MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5810 (13 entries)
> > bios0: vendor SeaBIOS version "1.12.0-1" date 04/01/2014
> > bios0: QEMU Standard PC (Q35 + ICH9, 2009)
> > acpi0 at bios0: ACPI 1.0
> > acpi0: sleep states S3 S4 S5
> > acpi0: tables DSDT FACP SSDT APIC HPET MCFG
> > acpi0: wakeup devices
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 577.66 MHz, 06-3c-03
> > cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 999MHz
> > cpu1 at mainbus0: apid 1 (application processor)
> > cpu1: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 626.20 MHz, 06-3c-03
> > cpu1: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu1: smt 0, core 0, package 1
> > cpu2 at mainbus0: apid 2 (application processor)
> > cpu2: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 575.47 MHz, 06-3c-03
> > cpu2: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu2: smt 0, core 0, package 2
> > cpu3 at mainbus0: apid 3 (application processor)
> > cpu3: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz, 523.71 MHz, 06-3c-03
> > cpu3: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,UMIP,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
> > cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
> > 64b/line 16-way L2 cache
> > cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> > cpu3: DTLB 255 4KB entries 

Re: Boot(8) timeouts take excessively long on OnLogic Helix 500.

2022-04-25 Thread Mark Kettenis
> From: Dan Cross 
> Date: Sun, 24 Apr 2022 21:12:29 -0400

On a machine of this vintage you probably shouldn't boot using the
legacy BIOS.  Try UEFI mode instead.

> >Synopsis: Boot(8) timeouts take excessively long on OnLogic Helix 500.
> >Category:  boot, amd64
> >Environment:
> System  : OpenBSD 7.1
> Details : OpenBSD 7.1-current (GENERIC.MP) #9: Thu Apr  7
> 15:59:04 UTC 2022
>  cr...@samudra.gajendra.net:
> /usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> On the OnLogic Helix 500, and possibly other models in
> the series of, industrial machines (amd64), the timeout
> from at the 'boot>' prompt takes excessively long: on
> the order of 30 *minutes*.
> 
> What is happening is that the code in sys/stand/boot/cmd.c
> has logic to only sample the time source every 1000
> iterations of the keystroke probe loop.  However, on
> these machines, the keystroke probe function (`cnischar`
> defined in /sys/lib/libsa/cons.c) takes a very long
> time: one or two seconds.
> 
> It is not entirely clear why the `cnischar` is so slow;
> this function results in a call to `pc_getc` such that
> it makes the BIOS "int 16h" call with `%ah` set to 1,
> which "gets the state of the keyboard buffer".  That
> BIOS call clears the zero flag if a key was pressed and
> `pc_getc` sets %ax if Z is not set (via a `setnz`
> instruction in inline assembler).  The function returns
> this result (actually the low byte of that result,
> but the result is the same).  One must assume that the
> BIOS call is slow on this machine.
> 
> >How-To-Repeat:
> Install OpenBSD/amd64 on an OnLogic Helix 500.  Reboot.
> Observe that the timeout at the 'boot>' prompt takes
> many minutes.  A keystroke will be recognized reasonably
> quickly, however.
> 
> Note: I have not tried all configurations of local PC
> console and serial console to see if there's some
> configuration that is faster.
> 
> >Fix:
> The logic in cmd.c limiting probing the BIOS clock to
> every thousand iterations of the loop was added in 1999
> (CVS commit #1.44 of that file:
> 
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/stand/boot/cmd.c.diff?r1=1.43=1.44=h
> ).
> 
> That commit added a comment saying, "check for timeout
> expiration less often (for some very constrained
> archs)".  Sadly, I had no luck trying to track down the
> context around this change.
> 
> However, One wonders how relevant that remains almost a
> quarter century later.  Moreover, this is in
> single-threaded, early boot code.  What else does the
> machine have to do at this point?  It was not clear what
> was wrong with calling the BIOS clock routine so often,
> so my solution was to effectively undo revision 1.44, and
> simply call check the timeout on each iteration of the
> loop.  Please see the following patch:
> 
> -->BEGIN PATCH<--
> Index: cmd.c
> ===
> RCS file: /cvs/src/sys/stand/boot/cmd.c,v
> retrieving revision 1.68
> diff -u -p -r1.68 cmd.c
> --- cmd.c   24 Oct 2021 17:49:19 -  1.68
> +++ cmd.c   25 Apr 2022 00:57:24 -
> @@ -248,7 +248,6 @@ readline(char *buf, size_t n, int to)
> 
> /* Only do timeout if greater than 0 */
> if (to > 0) {
> -   u_long i = 0;
> time_t tt = getsecs() + to;
>  #ifdef DEBUG
> if (debug > 2)
> @@ -256,9 +255,8 @@ readline(char *buf, size_t n, int to)
>  #endif
> /* check for timeout expiration less often
>(for some very constrained archs) */
> -   while (!cnischar())
> -   if (!(i++ % 1000) && (getsecs() >= tt))
> -   break;
> +   while (getsecs() < tt && !cnischar())
> +   ;
> 
> if (!cnischar()) {
> strlcpy(buf, "boot", 5);
> -->END PATCH<--
> 
> Of course, there could be other approaches, such as
> tracking down why the BIOS call is slow in the first
> place, but for such a special case it hardly seemed
> worth it, and with this in place, boot time is
> acceptably fast again.  Given that the use case might
> be rather long in the tooth at this point anyhow, it
> seemed useful to send it upstream instead of floating
>

Re: bse: null dereference in genet_rxintr()

2022-04-21 Thread Mark Kettenis
> Date: Wed, 20 Apr 2022 18:14:57 +0200
> From: Anton Lindqvist 
> 
> On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote:
> > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote:
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis:  bse: null dereference in genet_rxintr()
> > > > >Category:  arm64
> > > > >Environment:
> > > > System  : OpenBSD 7.1
> > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > 06:55:12 MDT 2022
> > > > 
> > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.arm64
> > > > Machine : arm64
> > > > >Description:
> > > > 
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries 
> > > > to start
> > > > the bse network interface:
> > > > 
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at  panic+0x160:cmp w21, #0x0
> > > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > > * 0  0  0 0x1  0x2000K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > > 
> > > > >Fix:
> > > > 
> > > > The mbuf associated with the current index is NULL. I noticed that the 
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But 
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows 
> > > > that the
> > > > total is quite high:
> > > > 
> > > > RX pidx=ca07 total=51463
> > > >
> > > > 
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > > 
> > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > 
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> > > 
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> > > 
> > > Also, I've seen up to 8 mbufs being available per rx interrupt which is
> > > odd as only a less amount of rx ring entries are actually populated. Not
> > > sure if the driver is missing some interrupt threshold configuration.
> > > Increasing the rx ring low watermark to 8 "solved" it for now.
> > > Otherwise, the same null dereference occurs while trying to access empty
> > > mbuf ring entries.
> > > 
> > > Worth mentioning is that the NetBSD driver does not suffer from the same
> > > problem as they keep all rx ring entries populated all the time.
> > > 
> > > Looking for feedback and OKs at this point.
> > > 
> > > [1] 
> > > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404
> > 
> > While putting more pressure on the network I'm seeing up to 100 mbufs
> > being available per rx interrupt. Could it simply be explained by the
> > hardware operating under the assumption that all ring entries are
> > available? Even if instructing the hardware about the actual amount of
> > available ring entries would require the driver to keep it in sync
> > whenever the if_rxr_*() implementation decides to adjust the ring.
> > 
> > Moving to if_rxr_init(RX_DESC_COUNT, RX_DESC_COUNT) essentially making
> > all 256 ring entries always available makes the driver stable.
> 
> Here's the diff I've been running lately which is deemed to be stable.
> Changes since last 

Re: cpu clock stuck at maximum speed when running on battery on Lenovo X1 Carbon 8th gen.

2022-03-20 Thread Mark Kettenis
> Date: Fri, 18 Mar 2022 17:22:35 +
> From: "Nicola Dell'Uomo" 
> 
> Hi Mark,
> 
> apparently both commands succeed: hw.perfpolicy is set to manual and
> hw.setperf is set to the chosen value; but cpu clock is still stuck
> @2100.
> 
> I noticed an increase in cpu temp and a drop in battery life;
> however I read that some people are experiencing crashes with intel
> graphic driver, so I'm not totally sure these perfomance troubles
> are exclusively due to my cpu clock speed.  On average battery life
> passed from 6-8 hours to 2-4.

I think the diff below will fix your issue.


Index: dev/acpi/acpiac.c
===
RCS file: /cvs/src/sys/dev/acpi/acpiac.c,v
retrieving revision 1.34
diff -u -p -r1.34 acpiac.c
--- dev/acpi/acpiac.c   30 Oct 2021 23:24:47 -  1.34
+++ dev/acpi/acpiac.c   20 Mar 2022 21:31:54 -
@@ -118,9 +118,11 @@ void
 acpiac_refresh(void *arg)
 {
struct acpiac_softc *sc = arg;
+   extern int hw_power;
 
acpiac_getpsr(sc);
sc->sc_sens[0].value = sc->sc_ac_stat;
+   hw_power = (sc->sc_ac_stat == PSR_ONLINE);
 }
 
 int
@@ -142,7 +144,6 @@ int
 acpiac_notify(struct aml_node *node, int notify_type, void *arg)
 {
struct acpiac_softc *sc = arg;
-   extern int hw_power;
 
dnprintf(10, "acpiac_notify: %.2x %s\n", notify_type,
DEVNAME(sc));
@@ -162,6 +163,5 @@ acpiac_notify(struct aml_node *node, int
dnprintf(10, "A/C status: %d\n", sc->sc_ac_stat);
break;
}
-   hw_power = (sc->sc_ac_stat == PSR_ONLINE);
return (0);
 }



Re: cpu clock stuck at maximum speed when running on battery on Lenovo X1 Carbon 8th gen.

2022-03-18 Thread Mark Kettenis
> Date: Fri, 18 Mar 2022 16:05:06 +
> From: "Nicola Dell'Uomo" 

So what does apm(8) say?  And sysctl hw.power?

On modern Intel and AMD CPUs the CPU "speed" isn't really all that
relevant for how much power your machine consumes.  But OpenBSD is
still supposed to switch to the lower speed when idle and running on
battery power.  But make sure your machine is really idle by killing
any applications that show up with a significant CPU percantage in
top.

> Synopsis: cpu clock is stuck when cpu idles; root can't change cpu clock 
> speed by apm(8) or sysctl(8).
> 
> Category: system
> Environment:
> System : OpenBSD 7.1
> Details : OpenBSD 7.1-beta (GENERIC.MP) #422: Tue Mar 15 11:28:22 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64Description:
> Since GENERIC.MP#416 cpu clock is stuck at maximum speed when cpu idles and 
> laptop runs on battery; moreover root can't manually lower speed via apm(8) 
> or sysctl(8). This problem is still present in GENERIC.MP#422.
> How-To-Repeat:
> Run GENERIC.MP#416 and higher and check cpu speed by apm(8) or sysctl(8) when 
> cpu idles; run as root 'apm -L' or 'sysctl hw.perfpolicy=manual && sysctl 
> hw.setperf=20'.
> Fix:
> No known workarounds.
> 
> SENDBUG: dmesg, pcidump, acpidump and usbdevs are attached.
> SENDBUG: Feel free to delete or use the -D flag if they contain sensitive 
> information.
> 
> dmesg:
> OpenBSD 7.1-beta (GENERIC.MP) #422: Tue Mar 15 11:28:22 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 16937349120 (16152MB)
> avail mem = 16406769664 (15646MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.2 @ 0xc66ac000 (69 entries)
> bios0: vendor LENOVO version "N2WET34W (1.24 )" date 12/23/2021
> bios0: LENOVO 20U9CTO1WW
> acpi0 at bios0: ACPI 6.1
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 SSDT HPET APIC MCFG 
> ECDT SSDT SSDT SSDT NHLT BOOT SSDT LPIT WSMT SSDT DBGP DBG2 MSDM BATB DMAR 
> BGRT UEFI FPDT
> acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) 
> RP02(S4) PXSX(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
> PXSX(S4) RP07(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 2399 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1784.74 MHz, 06-8e-0c
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1558.97 MHz, 06-8e-0c
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1336.40 MHz, 06-8e-0c
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 1204.18 MHz, 06-8e-0c
> cpu3: 
> 

Re: powerpc64 crash in uvm_mapent_alloc pool_get

2022-03-09 Thread Mark Kettenis
> Date: Wed, 9 Mar 2022 11:01:04 +0100
> From: Alexander Bluhm 

Not sure what happened here.  It is a kernel read access that failed
because the page isn't in the page tables.  Hard to tell why, but the
address looks legit.

> Hi,
> 
> While building clang, my powerpc64 crashed.  I did not panic,
> don't know why it went to ddb.  Console output:
> 
> [-- MARK -- Wed Mar  9 08:05:00 2022]
> dar 0xfd7f0020 dsisr 0x4000
> trap type 300 srr1 90009032 at 1411eb8 lr 1411e94
> Stopped at  pool_do_get+0xa8:   ld r4,32(r27)
> 
> ddb{1}> show panic
> the kernel did not panic
> 
> ddb{1}> x/s version
> version:OpenBSD 7.1-beta (GENERIC.MP) #0: Tue Mar  8 14:28:42 CET 
> 2022\012
> r...@ot27.obsd-lab.genua.de:/usr/src/sys/arch/powerpc64/compile/GENERIC.MP\012
> 
> ddb{1}> trace
> pool_do_get+0xa8
> pool_get+0xd4
> uvm_mapent_alloc+0x22c
> uvm_map_clip_start+0xa0
> uvm_map_protect+0x3b4
> sys_mprotect+0x1a0
> syscall+0x384
> trap+0x5dc
> trapagain+0x4
> --- syscall (number 74) ---
> End of kernel: 0xbffc9520 lr 0x46b8295c0
> 
> ddb{1}> show register
> r0 0x1411e94pool_do_get+0x84
> r10xc0007d4157f0
> r2 0x1aa.TOC.
> r30xfd7f
> r40xfd7f
> r5   0x7
> r6 0x1aacdb8cpu_info+0xd08
> r7 0x1aacdb8cpu_info+0xd08
> r8 0x1b837e8db_active
> r90x90001032
> r10   0x10329000
> r110
> r120
> r13  0x4366a6ab8
> r14 0x19
> r15 0x18
> r16 0x14
> r170x3ff
> r180x7ff
> r190
> r20  0x7
> r21   0xfffd
> r22  0xc
> r230
> r240
> r25   0xc0007cdd6600
> r26   0xc0007cdd6640
> r27   0xfd7f
> r28  0x1
> r29   0xc0007d415934
> r300x1b58aa0uvm_map_entry_pool
> r31   0x9200f932
> lr 0x1411e94pool_do_get+0x84
> cr0x442c8208
> xer   0x2004
> ctr0x1415850pool_lock_mtx_assert_locked
> iar0x1411eb8pool_do_get+0xa8
> msr   0x90009032
> dar   0xfd7f0020
> dsisr 0x4000
> pool_do_get+0xa8:   ld r4,32(r27)
> 
> ddb{1}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  69905   86556  38626 21  30x12  biowait   rm
> *19486   99871  89486 21  7 0x2c++
>  89486  373818  38626 21  30x10008a  sigsusp   sh
>  70042  457393  98642 21  7 0x2c++
>  98642  215955  38626 21  30x10008a  sigsusp   sh
>  38626  487218  25201 21  30x10008a  sigsusp   make
>  25201  233884  11747 21  30x10008a  sigsusp   sh
>  11747  289720  54644 21  30x10008a  sigsusp   make
>  54644  320707  86612 21  30x10008a  sigsusp   sh
>  86612  470459  70954 21  30x10008a  sigsusp   make
>  70954  360477  98614 21  30x10008a  sigsusp   sh
>  98614  102871  46607 21  30x10008a  sigsusp   make
>  46607   82897   4326 21  30x10008a  sigsusp   sh
>   4326  521564  39944 21  30x10008a  sigsusp   make
>  39944   29261  23064  0  30x10008a  sigsusp   sh
>  23064  114995  42774  0  30x10008a  sigsusp   make
>  42774  513573  49356  0  30x10008a  sigsusp   make
>  49356   29591  5  0  30x10008a  sigsusp   ksh
>  5  444122  84159  0  30x9a  kqreadsshd
>  16291  124821  1  0  30x80  mfsidlmount_mfs
>  68319  151430  1  0  30x100083  ttyin getty
>  57451  206235  1  0  30x100098  kqreadcron
>  65588  388973  1 99  3   0x1100090  kqreadsndiod
>  12015   62286  1110  30x100090  kqreadsndiod
>  26835  364696  10259 95  3   0x1100092  kqreadsmtpd
>  65551  247250  10259103  3   0x1100092  kqread

Re: witness: acquiring duplicate lock of same type: ">vmobjlock"

2022-02-16 Thread Mark Kettenis
> Date: Wed, 16 Feb 2022 21:13:03 +
> From: Klemens Nanni 
> 
> Unmodified -current with WITNESS enabled booting into X on my X230:
> 
> wsdisplay0: screen 1-5 added (std, vt100 emulation)
> witness: acquiring duplicate lock of same type: ">vmobjlock"
>  1st uobjlk
>  2nd uobjlk
> Starting stack trace...
> witness_checkorder(fd83b625f9b0,9,0) at witness_checkorder+0x8ac
> rw_enter(fd83b625f9a0,1) at rw_enter+0x68
> uvm_obj_wire(fd843c39e948,0,4,800033b70428) at uvm_obj_wire+0x46
> shmem_get_pages(88008500) at shmem_get_pages+0xb8
> __i915_gem_object_get_pages(88008500) at 
> __i915_gem_object_get_pages+0x6d
> i915_gem_fault(88008500,800033b707c0,10009b000,a43d6b1c000,800033b70740,1,35ba896911df1241,800aa078,800aa178)
>  at i915_gem_fault+0x203
> drm_fault(800033b707c0,a43d6b1c000,800033b70740,1,0,0,7eca45006f70ee0,800033b707c0)
>  at drm_fault+0x156
> uvm_fault(fd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179
> upageflttrap(800033b70920,a43d6b1c000) at upageflttrap+0x62
> usertrap(800033b70920) at usertrap+0x129
> recall_trap() at recall_trap+0x8
> end of kernel
> end trace frame: 0x7f7dc7c0, count: 246
> End of stack trace.
> 
> The system works fine (unless booted with kern.witness.watch=3), so I'm
> posting it here for reference -- haven't had time to look into this.

Yes, this is expected.  The graphics buffers are implented as a uvm
object and this object is backed by an anonymous memory uvm_object
(aobj).  So I think the vmobjlock needs a RW_DUPOK flag.

> Looking at bugs@ I see Jan Stary's report from 08.02.22 unrelatedly
> containing it in "C2 state not recognized on Thinkpad T420s when on AC".
> 
> X230 dmesg follows.
> 
> OpenBSD 7.0-current (GENERIC.MP) #0: Wed Feb 16 21:14:45 CET 2022
> kn@eru:/home/kn/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17118130176 (16325MB)
> avail mem = 16450445312 (15688MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbff31020 (17 entries)
> bios0: vendor coreboot version "CBET4000 x230-seabios" date 01/07/2020
> bios0: LENOVO 2325A95
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
> acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3) LID_(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.47 MHz, 06-3a-09
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.12 MHz, 06-3a-09
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 2594.11 MHz, 06-3a-09
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu3: 256KB 64b/line 8-way L2 cache
> 

Re: C2 state not recognized on Thinkpad T420s when on AC

2022-02-11 Thread Mark Kettenis
> Date: Thu, 10 Feb 2022 23:46:43 -0800
> From: guent...@openbsd.org
> 
> On Thu, 10 Feb 2022, Jan Stary wrote:
> > > > When you build a kernel with this, do please add ACPI_DEBUG to your 
> > > > kernel 
> > > > config, so we can see more details about what the firmware is telling 
> > > > us.
> > > 
> > > Full dmesg below, without ACPI_DEBUG.
> > > 
> > > Also below, full /var/log/messages with ACPI_DEBUG,
> > > as it spams dmesg so much that /var/run/dmesg.boot
> > > does not really contain the booting kernel device messages,
> > > being rolled off by the storm of ACPI_DEBUG messages.
> > > (Is there a way to increase that buffer,
> > > so that dmesg.boot would hold everything?)
> > > Of course, this is only after syslogd has started;
> > > hopefully the acpicpu events are there.
> > > 
> > > Both contain a log of the same scenario: cold start the machine on AC,
> > > plug AC out, in, out, in; shutdown with the power button.
> > 
> > With MSGBUFSIZE cranked up,
> > here is a dmesg containing all,
> > up to before the shutdown.
> 
> Uh, wow, I had forgotten how horrifically verbose ACPI_DEBUG was.  I'm 
> half inclined to delete all the uses of ACPI_DEBUG from acpicpu.c and use 
> a different #define for them.

Go for it.

> That said, the data shows the expected 0x81 notifications (and no 0x80 
> notifications) on the CPU objects, and the values appear to be accurately 
> parsed the acpicpu.c.  Whew.
> 
> 
> So here's a revised diff that tries to make it safe for ACPI to notify us 
> that a CPU's _CST has changed while that cpu is entering idle.  Revert the 
> previous diff before trying to apply this one.  Please give it a shot; no 
> need for ACPI_DEBUG now!
> 
> 
> Philip
> 
> 
> Index: sys/dev/acpi/acpicpu.c
> ===
> RCS file: /data/src/openbsd/src/sys/dev/acpi/acpicpu.c,v
> retrieving revision 1.91
> diff -u -p -r1.91 acpicpu.c
> --- sys/dev/acpi/acpicpu.c9 Jan 2022 05:42:37 -   1.91
> +++ sys/dev/acpi/acpicpu.c11 Feb 2022 07:19:11 -
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -80,6 +81,7 @@ voidacpicpu_setperf_ppc_change(struct a
>  #define CST_FLAG_FALLBACK0x4000  /* fallback for broken _CST */
>  #define CST_FLAG_SKIP0x8000  /* state is worse 
> choice */
>  
> +#define FLAGS_NOCST  0x01
>  #define FLAGS_MWAIT_ONLY 0x02
>  #define FLAGS_BMCHECK0x04
>  #define FLAGS_NOTHROTTLE 0x08
> @@ -130,6 +132,11 @@ struct acpicpu_softc {
>   struct cpu_info *sc_ci;
>   SLIST_HEAD(,acpi_cstate) sc_cstates;
>  
> + /* sc_mtx protects sc_cstates_active and sc_mwait_only */
> + struct mutexsc_mtx;
> + struct acpi_cstate  *sc_cstates_active;
> + int sc_mwait_only;
> +
>   bus_space_tag_t sc_iot;
>   bus_space_handle_t  sc_ioh;
>  
> @@ -161,10 +168,12 @@ struct acpicpu_softc {
>  
>  void acpicpu_add_cstatepkg(struct aml_value *, void *);
>  void acpicpu_add_cdeppkg(struct aml_value *, void *);
> +void acpicpu_cst_activate(struct acpicpu_softc *);
>  int  acpicpu_getppc(struct acpicpu_softc *);
>  int  acpicpu_getpct(struct acpicpu_softc *);
>  int  acpicpu_getpss(struct acpicpu_softc *);
>  int  acpicpu_getcst(struct acpicpu_softc *);
> +void acpicpu_free_states(struct acpi_cstate *);
>  void acpicpu_getcst_from_fadt(struct acpicpu_softc *);
>  void acpicpu_print_one_cst(struct acpi_cstate *_cx);
>  void acpicpu_print_cst(struct acpicpu_softc *_sc);
> @@ -511,10 +520,10 @@ acpicpu_getcst(struct acpicpu_softc *sc)
>   int use_nonmwait;
>  
>   /* delete the existing list */
> - while ((cx = SLIST_FIRST(>sc_cstates)) != NULL) {
> - SLIST_REMOVE_HEAD(>sc_cstates, link);
> - free(cx, M_DEVBUF, sizeof(*cx));
> - }
> + cx = SLIST_FIRST(>sc_cstates);
> + SLIST_INIT(>sc_cstates);
> + if (cx != sc->sc_cstates_active)
> + acpicpu_free_states(cx);
>  
>   /* provide a fallback C1-via-halt in case _CST's C1 is bogus */
>   acpicpu_add_cstate(sc, ACPI_STATE_C1, CST_METH_HALT,
> @@ -526,17 +535,18 @@ acpicpu_getcst(struct acpicpu_softc *sc)
>   aml_foreachpkg(, 1, acpicpu_add_cstatepkg, sc);
>   aml_freevalue();
>  
> + use_nonmwait = 0;
> +
>   /* only have fallback state?  then no _CST objects were understood */
>   cx = SLIST_FIRST(>sc_cstates);
>   if (cx->flags & CST_FLAG_FALLBACK)
> - return (1);
> + goto done;
>  
>   /*
>* Skip states >= C2 if the CPU's LAPIC timer stops in deep
>* states (i.e., it doesn't have the 'ARAT' bit set).
>* Also keep track if all the states we'll use use mwait.
>*/
> - use_nonmwait = 0;
>   while ((next_cx = SLIST_NEXT(cx, link)) != NULL) {
>   if (cx->state > 1 &&
>   

Re: pcidump -v panic in OpenBSD 7.0 on Samsung NC215S

2022-02-04 Thread Mark Kettenis
> Date: Fri, 4 Feb 2022 19:33:08 +
> From: Miod Vallat 
> 
> > After printing information via "doas pcidump -v" on device PCI
> > 0:0:27:0 "Intel 82801GB HD Audio", kernel panics. Sorry, I used OCR
> > software to recognize the text from the photo of the screen, maybe
> > there are some errors in hex numbers. Photo is attached.
> 
> The following diff, while not fixing the cause of the problem, ought to
> prevent the kernel from panicing.

I don't think making that call silently fail is a good idea though.

> Does audio (azalia0) work correctly on your system?
> 
> Miod
> 
> Index: amd64/pci/pci_machdep.c
> ===
> RCS file: /OpenBSD/src/sys/arch/amd64/pci/pci_machdep.c,v
> retrieving revision 1.77
> diff -u -p -r1.77 pci_machdep.c
> --- amd64/pci/pci_machdep.c   11 Mar 2021 11:16:55 -  1.77
> +++ amd64/pci/pci_machdep.c   4 Feb 2022 19:31:36 -
> @@ -213,15 +213,14 @@ pci_conf_size(pci_chipset_tag_t pc, pcit
>   return PCI_CONFIG_SPACE_SIZE;
>  }
>  
> -void
> +int
>  pci_mcfg_map_bus(int bus)
>  {
>   if (pci_mcfgh[bus])
> - return;
> + return 0;
>  
> - if (bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> - 0, _mcfgh[bus]))
> - panic("pci_conf_read: cannot map mcfg space");
> + return bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> + 0, _mcfgh[bus]);
>  }
>  
>  pcireg_t
> @@ -235,7 +234,8 @@ pci_conf_read(pci_chipset_tag_t pc, pcit
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return 0x;
>   data = bus_space_read_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag & 0x000ff00) << 4 | reg);
>   return data;
> @@ -261,7 +261,8 @@ pci_conf_write(pci_chipset_tag_t pc, pci
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return;
>   bus_space_write_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag & 0x000ff00) << 4 | reg, data);
>   return;
> Index: i386/pci/pci_machdep.c
> ===
> RCS file: /OpenBSD/src/sys/arch/i386/pci/pci_machdep.c,v
> retrieving revision 1.87
> diff -u -p -r1.87 pci_machdep.c
> --- i386/pci/pci_machdep.c11 Mar 2021 11:16:57 -  1.87
> +++ i386/pci/pci_machdep.c4 Feb 2022 19:31:36 -
> @@ -127,7 +127,7 @@ bus_addr_t pci_mcfg_addr;
>  int pci_mcfg_min_bus, pci_mcfg_max_bus;
>  bus_space_tag_t pci_mcfgt = I386_BUS_SPACE_MEM;
>  bus_space_handle_t pci_mcfgh[256];
> -void pci_mcfg_map_bus(int);
> +int pci_mcfg_map_bus(int);
>  
>  struct mutex pci_conf_lock = MUTEX_INITIALIZER(IPL_HIGH);
>  
> @@ -420,15 +420,14 @@ pci_conf_size(pci_chipset_tag_t pc, pcit
>   return PCI_CONFIG_SPACE_SIZE;
>  }
>  
> -void
> +int
>  pci_mcfg_map_bus(int bus)
>  {
>   if (pci_mcfgh[bus])
> - return;
> + return 0;
>  
> - if (bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> - 0, _mcfgh[bus]))
> - panic("pci_conf_read: cannot map mcfg space");
> + return bus_space_map(pci_mcfgt, pci_mcfg_addr + (bus << 20), 1 << 20,
> + 0, _mcfgh[bus]);
>  }
>  
>  pcireg_t
> @@ -442,7 +441,8 @@ pci_conf_read(pci_chipset_tag_t pc, pcit
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return 0x;
>   data = bus_space_read_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag.mode1 & 0x000ff00) << 4 | reg);
>   return data;
> @@ -480,7 +480,8 @@ pci_conf_write(pci_chipset_tag_t pc, pci
>   if (pci_mcfg_addr && reg >= PCI_CONFIG_SPACE_SIZE) {
>   pci_decompose_tag(pc, tag, , NULL, NULL);
>   if (bus >= pci_mcfg_min_bus && bus <= pci_mcfg_max_bus) {
> - pci_mcfg_map_bus(bus);
> + if (pci_mcfg_map_bus(bus) != 0)
> + return;
>   bus_space_write_4(pci_mcfgt, pci_mcfgh[bus],
>   (tag.mode1 & 0x000ff00) << 4 | reg, data);
>   return;
> 
> 



Re: sparc64 dlopen data access fault

2022-02-03 Thread Mark Kettenis
> Date: Wed, 2 Feb 2022 16:19:10 -0800
> From: guent...@openbsd.org
> 
> On Wed, 2 Feb 2022, Alexander Bluhm wrote:
> > On Wed, Feb 02, 2022 at 07:53:59PM +, Miod Vallat wrote: > > Hi, > > > 
> > > On my sparc64 machine
> > regress/lib/libpthread triggers a panic. It > > happend with Feb 1 and Jan 
> > 31 snapshot. Jan 29 snapshot paniced >
> > 
> > 
> > On Wed, Feb 02, 2022 at 07:53:59PM +, Miod Vallat wrote:
> > > > Hi,
> > > > 
> > > > On my sparc64 machine regress/lib/libpthread triggers a panic.  It
> > > > happend with Feb 1 and Jan 31 snapshot.  Jan 29 snapshot paniced
> > > > somewhere else.  Test and console output below.
> > > > 
> > > > *cpu1: pmap_enter: access_type exceeds prot
> > > > 
> > > > bluhm
> > > 
> > > Does the following diff help?
> > 
> > Unfortunately not.  Same panic.
> 
> That suggests this is probably from the __HAVE_PMAP_MPSAFE_ENTER_COW 
> change.  Can you try this diff, mirroring miod's?
> 
> (Perhaps sparc64 has correct break-before-make semantics, I'm not wise 
> enough in sparc64 pmap to know)

I don't think it has.  Anyway,

ok kettenis@

for the diff.

> Index: uvm/uvm_fault.c
> ===
> RCS file: /data/src/openbsd/src/sys/uvm/uvm_fault.c,v
> retrieving revision 1.125
> diff -u -p -r1.125 uvm_fault.c
> --- uvm/uvm_fault.c   1 Feb 2022 08:38:53 -   1.125
> +++ uvm/uvm_fault.c   3 Feb 2022 00:16:26 -
> @@ -1022,8 +1022,10 @@ uvm_fault_upper(struct uvm_faultinfo *uf
>* uvm does it by inserting the new mapping RO and
>* letting it fault again.
>*/
> - if (P_HASSIBLING(curproc))
> + if (P_HASSIBLING(curproc)) {
>   flt->enter_prot &= ~PROT_WRITE;
> + flt->access_type &= ~PROT_WRITE;
> + }
>  #endif
>  
>   /*
> 
> 



  1   2   3   4   5   6   7   >