Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Patrick Wildt
Am Mon, Jun 13, 2022 at 07:44:24PM +0200 schrieb Mark Kettenis:
> > From: kiltz 
> > Date: Mon, 13 Jun 2022 18:12:27 +0200
> > 
> > Dear Mark,
> > first of all, thank you very much for your explainations, the diff  
> > and, indeed, the ultra swift reply!
> > That helps us a lot already.
> > A snapshot with a higher value of max CPUs out of the box, of course,  
> > would be the proverbial icing on the cake.
> > Probably a strange question but I hazard it anyways - should we  
> > monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> > there a quicker way to find out what your fellow developers think?
> > Again, many thanks for your help and best wishes,
> 
> Hi Stefan,
> 
> Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
> will have it.  You can easily tell, since all 80 CPUs should attach
> with that diff.
> 
> Cheers,
> 
> Mark

And it's nice to hear that SP install already worked.  I remember
booting it up on an Oracle machine with an Ampere Altra which led
to messages like

agintcmsi0 at agintc0: unsupported type 0x001700026f31

See http://ix.io/3GEX

While I had a diff somewhere to 'fix that', I never got the timer
interrupt to fire.

That you already had an SP install means all that should be fine.
If this change/new snap works, I'd be interested to read a full
dmesg!

Cheers,
Patrick



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis
> From: kiltz 
> Date: Mon, 13 Jun 2022 18:12:27 +0200
> 
> Dear Mark,
> first of all, thank you very much for your explainations, the diff  
> and, indeed, the ultra swift reply!
> That helps us a lot already.
> A snapshot with a higher value of max CPUs out of the box, of course,  
> would be the proverbial icing on the cake.
> Probably a strange question but I hazard it anyways - should we  
> monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> there a quicker way to find out what your fellow developers think?
> Again, many thanks for your help and best wishes,

Hi Stefan,

Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
will have it.  You can easily tell, since all 80 CPUs should attach
with that diff.

Cheers,

Mark

> - 
> Dr.-Ing. Stefan Kiltz
> 
> Otto-von-Guericke University of Magdeburg
> ITI Research Group on
> Multimedia and Security
> Universitaetsplatz 2
> 39106 Magdeburg
> Germany
> 
> Tel: +49-391-67-52838
> Fax: +49-391-67-18110
> 
> eMail: ki...@iti.cs.uni-magdeburg.de
> 
> 
> 
> 
> 
> On 13 Jun 2022, at 17:20, Mark Kettenis wrote:
> 
> >> From: kiltz 
> >> Date: Mon, 13 Jun 2022 14:46:39 +0200
> >
> > Hi Stefan,
> >
> >> Dear kind people at OpenBSD.org,
> >> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30
> >> with the following specifications:
> >>
> >>Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
> >>512 GB RAM (3200 MHz ECC-reg.)
> >>2 x 480 GB SSD SATA 6 Gb/s 2,5''
> >>Dual-Port 1 GbE (RJ-45)
> >>IPMI 2.0 Baseboard Management Controller (BMC)
> >> 1 x PCIe4.0 x16 (FHHL)
> >>1 x PCIe3.0 x16 OCP2.0 (belegt)
> >>1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> >>
> >> We tried both:
> >> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> >> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> >>
> >> The repeatable result is a working install in single CPU/Core
> >> installation mode, cpu panic after first reboot with mp kernel. We  
> >> use
> >> the serial to LAN console provided by the IMPI/BMC card.
> >> Attached you will find screenshots from:
> >>
> >> - the last 49 columns of the reboot into mp kernel
> >> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13
> >> 13-51-00.png),
> >> - the ddb trace output (Screenshot ddb_trace_2022-06-13  
> >> 14-02-11.png),
> >> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> >> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13
> >> 14-04-28.png)
> >> - the ddb show registers output (Screenshot ddb_show_registers_at
> >> 2022-06-13 14-06-34.png)
> >>
> >> Due to the nature of the early boot panic, the kernel output is not
> >> accessible to us.
> >>
> >> Interestingly, FreeBSD only supports them in their current release,
> >> the stable fails with a similar panic. They seem to have found a fix
> >> of sorts. But we very much prefer OpenBSD for the firewalling role of
> >> aforementioned system.
> >>
> >> Of course we support your effort so if you need more info from us
> >> regarding the circumstances, we will happily try and supply the
> >> required information.
> >
> > The immediate problem is that OpenBSD currently supports a maximum of
> > 32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
> > 128.  You could try building a GENERIC.MP kernel with this diff after
> > booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
> > my fellow developers think abut bumping MAXCPUS.  Depending on the
> > outcome of that a snapshot with this change may be available in a few
> > days.
> >
> > I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
> > very well but I guess there is only one way to find out...
> >
> > Cheers,
> >
> > Mark
> >
> >
> > Index: arch/arm64/include/cpu.h
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 cpu.h
> > --- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
> > +++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
> > @@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
> > #define CPU_INFO_FOREACH(cii, ci)   for (cii = 0, ci = cpu_info_list; \
> > ci != NULL; ci = ci->ci_next)
> > #define CPU_INFO_UNIT(ci)   ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
> > -#define MAXCPUS32
> > +#define MAXCPUS128
> >
> > extern struct cpu_info *cpu_info[MAXCPUS];
> >
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> 
> iEYEARECAAYFAmKnYesACgkQuLKZPfaiT0iDDgCfXC6QIWGHzkMyWxPKHCaTkYwR
> AXUAnjLiJX1RyuqrMejk4AT2s5X99fmi
> =pRhT
> -END PGP SIGNATURE-
> 



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread kiltz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Mark,
first of all, thank you very much for your explainations, the diff  
and, indeed, the ultra swift reply!

That helps us a lot already.
A snapshot with a higher value of max CPUs out of the box, of course,  
would be the proverbial icing on the cake.
Probably a strange question but I hazard it anyways - should we  
monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
there a quicker way to find out what your fellow developers think?

Again, many thanks for your help and best wishes,

 Stefan

- 
Dr.-Ing. Stefan Kiltz

Otto-von-Guericke University of Magdeburg
ITI Research Group on
Multimedia and Security
Universitaetsplatz 2
39106 Magdeburg
Germany

Tel: +49-391-67-52838
Fax: +49-391-67-18110

eMail: ki...@iti.cs.uni-magdeburg.de





On 13 Jun 2022, at 17:20, Mark Kettenis wrote:


From: kiltz 
Date: Mon, 13 Jun 2022 14:46:39 +0200


Hi Stefan,


Dear kind people at OpenBSD.org,
we want to run OpenBSD as a firewall system on a Gigabyte R152_P30
with the following specifications:

Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
512 GB RAM (3200 MHz ECC-reg.)
2 x 480 GB SSD SATA 6 Gb/s 2,5''
Dual-Port 1 GbE (RJ-45)
IPMI 2.0 Baseboard Management Controller (BMC)
1 x PCIe4.0 x16 (FHHL)
1 x PCIe3.0 x16 OCP2.0 (belegt)
1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)

We tried both:
- official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
- snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)

The repeatable result is a working install in single CPU/Core
installation mode, cpu panic after first reboot with mp kernel. We  
use

the serial to LAN console provided by the IMPI/BMC card.
Attached you will find screenshots from:

- the last 49 columns of the reboot into mp kernel
(Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13
13-51-00.png),
- the ddb trace output (Screenshot ddb_trace_2022-06-13  
14-02-11.png),

- the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
- the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13
14-04-28.png)
- the ddb show registers output (Screenshot ddb_show_registers_at
2022-06-13 14-06-34.png)

Due to the nature of the early boot panic, the kernel output is not
accessible to us.

Interestingly, FreeBSD only supports them in their current release,
the stable fails with a similar panic. They seem to have found a fix
of sorts. But we very much prefer OpenBSD for the firewalling role of
aforementioned system.

Of course we support your effort so if you need more info from us
regarding the circumstances, we will happily try and supply the
required information.


The immediate problem is that OpenBSD currently supports a maximum of
32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
128.  You could try building a GENERIC.MP kernel with this diff after
booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
my fellow developers think abut bumping MAXCPUS.  Depending on the
outcome of that a snapshot with this change may be available in a few
days.

I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
very well but I guess there is only one way to find out...

Cheers,

Mark


Index: arch/arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.25
diff -u -p -r1.25 cpu.h
--- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
+++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
@@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
#define CPU_INFO_FOREACH(cii, ci)   for (cii = 0, ci = cpu_info_list; \
ci != NULL; ci = ci->ci_next)
#define CPU_INFO_UNIT(ci)   ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
-#define MAXCPUS32
+#define MAXCPUS128

extern struct cpu_info *cpu_info[MAXCPUS];



-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

iEYEARECAAYFAmKnYesACgkQuLKZPfaiT0iDDgCfXC6QIWGHzkMyWxPKHCaTkYwR
AXUAnjLiJX1RyuqrMejk4AT2s5X99fmi
=pRhT
-END PGP SIGNATURE-



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis
> From: kiltz 
> Date: Mon, 13 Jun 2022 14:46:39 +0200

Hi Stefan,

> Dear kind people at OpenBSD.org,
> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30  
> with the following specifications:
> 
>   Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
>   512 GB RAM (3200 MHz ECC-reg.)
>   2 x 480 GB SSD SATA 6 Gb/s 2,5''
>   Dual-Port 1 GbE (RJ-45)
>   IPMI 2.0 Baseboard Management Controller (BMC)
>  1 x PCIe4.0 x16 (FHHL)
>   1 x PCIe3.0 x16 OCP2.0 (belegt)
>   1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> 
> We tried both:
> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> 
> The repeatable result is a working install in single CPU/Core  
> installation mode, cpu panic after first reboot with mp kernel. We use  
> the serial to LAN console provided by the IMPI/BMC card.
> Attached you will find screenshots from:
> 
> - the last 49 columns of the reboot into mp kernel  
> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13  
> 13-51-00.png),
> - the ddb trace output (Screenshot ddb_trace_2022-06-13 14-02-11.png),
> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13  
> 14-04-28.png)
> - the ddb show registers output (Screenshot ddb_show_registers_at  
> 2022-06-13 14-06-34.png)
> 
> Due to the nature of the early boot panic, the kernel output is not  
> accessible to us.
> 
> Interestingly, FreeBSD only supports them in their current release,  
> the stable fails with a similar panic. They seem to have found a fix  
> of sorts. But we very much prefer OpenBSD for the firewalling role of  
> aforementioned system.
> 
> Of course we support your effort so if you need more info from us  
> regarding the circumstances, we will happily try and supply the  
> required information.

The immediate problem is that OpenBSD currently supports a maximum of
32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
128.  You could try building a GENERIC.MP kernel with this diff after
booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
my fellow developers think abut bumping MAXCPUS.  Depending on the
outcome of that a snapshot with this change may be available in a few
days.

I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
very well but I guess there is only one way to find out...

Cheers,

Mark


Index: arch/arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.25
diff -u -p -r1.25 cpu.h
--- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
+++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
@@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
 #define CPU_INFO_FOREACH(cii, ci)  for (cii = 0, ci = cpu_info_list; \
ci != NULL; ci = ci->ci_next)
 #define CPU_INFO_UNIT(ci)  ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
-#define MAXCPUS32
+#define MAXCPUS128
 
 extern struct cpu_info *cpu_info[MAXCPUS];
 



Re: System upgraded from 7.0 to 7.1 hangs after fs mounts

2022-06-13 Thread Johan Huldtgren
On 2022-05-21 15:33, Johan Huldtgren wrote:
> On 2022/05/21 14:23, Mark Kettenis wrote:
> >> Date: Sat, 21 May 2022 13:13:19 -0400
> >> From: Johan Huldtgren 
> >>
> >> On 2022/05/21 12:43, Mark Kettenis wrote:
>  Date: Sat, 21 May 2022 12:36:03 -0400
>  From: Johan Huldtgren 
> 
>  hello,
> 
>  On 2022/05/21 12:08, Mark Kettenis wrote:
> >> Date: Sat, 21 May 2022 10:31:37 -0400
> >> From: Johan Huldtgren 
> >>
> >> hello,
> >>
> >> Details below, but commenting out 'ttyflags -a' from /etc/rc lets
> >> this host boot. I wrote much of this e-mail while going through it,
> >> so while we know now what the issue is I'm leaving my responses in
> >> case it sheds light on anything.
> >
> > So it seems your machine incorrectly advertises a serial port that
> > doesn't actually exist:
> >
> >> com1 at acpi0 UAR1 addr 0x2f8/0x8 irq 3: ti16750, 64 byte fifo
> >> com1: probed fifo depth: 0 bytes
> 
>  I think you're right, Crystal asked about it in a previous
>  mail which I didn't get a chance to respond to, but I do not
>  see com1 being reported in the 7.0 dmesg from last night nor
>  in any older dmesgs I've been able to dig up and I don't
>  believe anything with this hardware has changed as long as I've
>  had it.
> 
> > This may be a bug in our APCI code.  Can you send the contents of
> > /var/db/acpi on your machine?
> 
>  root@www ~]# ls -al /var/db/acpi/
>  total 164
>  drwxr-xr-x   2 root  wheel512 May 20 21:26 ./
>  drwxr-xr-x  15 root  wheel   1024 May 21 06:10 ../
>  -rw-r--r--   1 root  wheel146 May 21 06:55 APIC.3
>  -rw-r--r--   1 root  wheel120 May 21 06:55 DMAR.12
>  -rw-r--r--   1 root  wheel  44470 May 21 06:55 DSDT.2
>  -rw-r--r--   1 root  wheel244 May 21 06:55 FACP.1
>  -rw-r--r--   1 root  wheel 68 May 21 06:55 FPDT.4
>  -rw-r--r--   1 root  wheel 56 May 21 06:55 HPET.7
>  -rw-r--r--   1 root  wheel 60 May 21 06:55 MCFG.5
>  -rw-r--r--   1 root  wheel190 May 21 06:55 PRAD.6
>  -rw-r--r--   1 root  wheel 80 Sep 17  2019 RSDT.0
>  -rw-r--r--   1 root  wheel 64 May 21 06:55 SPMI.9
>  -rw-r--r--   1 root  wheel   2468 May 21 06:55 SSDT.10
>  -rw-r--r--   1 root  wheel   2696 May 21 06:55 SSDT.11
>  -rw-r--r--   1 root  wheel877 May 21 06:55 SSDT.8
>  -rw-r--r--   1 root  wheel124 May 21 06:55 XSDT.0
>  -rw-r--r--   1 root  wheel   2520 May 21 06:55 headers
> 
>  Do you need the files? I can tar that directory up and
>  make it available.
> >>>
> >>> Right we need all of those.
> >>
> >> http://www.huldtgren.com/panics/20220520/acpi.tgz
> > 
> > It looks as if the ACPI AML is properly checking that the UART is
> > enabled in the NCT6776F SuperIO chip.  Can you build a kernel with the
> > diff below and mail the dmesg from that kernel?
> > 
> > 
> > Index: dev/acpi/acpi.c
> > ===
> > RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
> > retrieving revision 1.413
> > diff -u -p -r1.413 acpi.c
> > --- dev/acpi/acpi.c 17 Feb 2022 00:21:40 -  1.413
> > +++ dev/acpi/acpi.c 21 May 2022 18:20:20 -
> > @@ -3095,6 +3095,7 @@ acpi_foundhid(struct aml_node *node, voi
> > return (0);
> >  
> > sta = acpi_getsta(sc, node->parent);
> > +   printf("_STA: 0x%02llx\n", sta);
> > if ((sta & (STA_PRESENT | STA_ENABLED)) != (STA_PRESENT | STA_ENABLED))
> > return (0);
> >  

Did this provide any clues as to what is going on? If not and this
hardware is just odd and the work around is just to comment out the
ttyflags line from /etc/rc I'll add that to my upgrade notes for
this machine.

thanks,

.jh
 
> OpenBSD 7.1-current (GENERIC.MP) #1: Sat May 21 15:19:34 EDT 2022
> jo...@xasthur.home.huldtgren.net:/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17127677952 (16334MB)
> avail mem = 16591237120 (15822MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb4c0 (54 entries)
> bios0: vendor American Megatrends Inc. version "2.00" date 05/08/2012
> bios0: Supermicro X9SCD
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG PRAD HPET SSDT SPMI SSDT SSDT DMAR
> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) P0P1(S4) USB1(S4) USB2(S4) 
> USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) RP01(S4) PXSX(S4) RP02(S4) 
> PXSX(S4) RP03(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz, 3392.86 MHz, 06-3a-09
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSS