Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-12-14 Thread Laurent CARON

Hi Wouter,

Please keep in mind the storage controler (Perc H755) is _not_ yet 
supported by OpenBSD.


Cheers,

Laurent

Le 14/12/2023 à 15:43, Wouter Prins a écrit :

Thank you Laurent and Claudio,

We have an identical setup (hardware specs), i am sure we need this in 
the near future. :)


/Wouter

On Thu, Dec 14, 2023 at 3:08 PM Claudio Jeker 
 wrote:


On Tue, Nov 28, 2023 at 05:55:03PM +0100, Laurent CARON wrote:
>
> Le 28/11/2023 à 17:46, Claudio Jeker a écrit :
> > The problem is that the symbol nkmempages moved into .bss and
is therefor
> > no longer modifiable by config(8). I think you can still use
ukc via
> > boot -c to alter it (but that is not sticky).
> >
> > The alternative is to set "option NKMEMPAGES=131072" in your
GENERIC
> > config file (or option NKMEMPAGES_MAX=131072). See also
options(4).
> >
> > Long term is the fix this proper. All of this was built when
computers had
> > 100MB of memory not 100GB.
> >
>
> Got it. Thanks.
>
> It means I'll stick with this kernel for now and see if it helps
(it seems
> promising for now).
>
> Is there a way you can submit this patch (option
NKMEMPAGES=131072) to the
> current branch ?

A better calculation logic for nkmempages was added to -current.
On most 64bit archs nkmempages now scales to much larger values.

See https://marc.info/?l=openbsd-cvs=170255507530513=2
 for more
details.

-- 
:wq Claudio




--
Wouter Prins
w...@null0.nl



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-12-14 Thread Wouter Prins
Thank you Laurent and Claudio,

We have an identical setup (hardware specs), i am sure we need this in the
near future. :)

/Wouter

On Thu, Dec 14, 2023 at 3:08 PM Claudio Jeker 
wrote:

> On Tue, Nov 28, 2023 at 05:55:03PM +0100, Laurent CARON wrote:
> >
> > Le 28/11/2023 à 17:46, Claudio Jeker a écrit :
> > > The problem is that the symbol nkmempages moved into .bss and is
> therefor
> > > no longer modifiable by config(8). I think you can still use ukc via
> > > boot -c to alter it (but that is not sticky).
> > >
> > > The alternative is to set "option NKMEMPAGES=131072" in your GENERIC
> > > config file (or option NKMEMPAGES_MAX=131072). See also options(4).
> > >
> > > Long term is the fix this proper. All of this was built when computers
> had
> > > 100MB of memory not 100GB.
> > >
> >
> > Got it. Thanks.
> >
> > It means I'll stick with this kernel for now and see if it helps (it
> seems
> > promising for now).
> >
> > Is there a way you can submit this patch (option NKMEMPAGES=131072) to
> the
> > current branch ?
>
> A better calculation logic for nkmempages was added to -current.
> On most 64bit archs nkmempages now scales to much larger values.
>
> See https://marc.info/?l=openbsd-cvs=170255507530513=2 for more
> details.
>
> --
> :wq Claudio
>
>

-- 
Wouter Prins
w...@null0.nl


Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-12-14 Thread Claudio Jeker
On Tue, Nov 28, 2023 at 05:55:03PM +0100, Laurent CARON wrote:
> 
> Le 28/11/2023 à 17:46, Claudio Jeker a écrit :
> > The problem is that the symbol nkmempages moved into .bss and is therefor
> > no longer modifiable by config(8). I think you can still use ukc via
> > boot -c to alter it (but that is not sticky).
> > 
> > The alternative is to set "option NKMEMPAGES=131072" in your GENERIC
> > config file (or option NKMEMPAGES_MAX=131072). See also options(4).
> > 
> > Long term is the fix this proper. All of this was built when computers had
> > 100MB of memory not 100GB.
> > 
> 
> Got it. Thanks.
> 
> It means I'll stick with this kernel for now and see if it helps (it seems
> promising for now).
> 
> Is there a way you can submit this patch (option NKMEMPAGES=131072) to the
> current branch ?

A better calculation logic for nkmempages was added to -current.
On most 64bit archs nkmempages now scales to much larger values.

See https://marc.info/?l=openbsd-cvs=170255507530513=2 for more
details.

-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Laurent CARON



Le 28/11/2023 à 17:46, Claudio Jeker a écrit :

The problem is that the symbol nkmempages moved into .bss and is therefor
no longer modifiable by config(8). I think you can still use ukc via
boot -c to alter it (but that is not sticky).

The alternative is to set "option NKMEMPAGES=131072" in your GENERIC
config file (or option NKMEMPAGES_MAX=131072). See also options(4).

Long term is the fix this proper. All of this was built when computers had
100MB of memory not 100GB.



Got it. Thanks.

It means I'll stick with this kernel for now and see if it helps (it 
seems promising for now).


Is there a way you can submit this patch (option NKMEMPAGES=131072) to 
the current branch ?


Thanks


Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Claudio Jeker
On Tue, Nov 28, 2023 at 04:50:05PM +0100, Laurent CARON wrote:
> Le 28/11/2023 à 12:12, Claudio Jeker a écrit :
> > So the problem is that the malloc space is filled by
> > a) 26540K of devbuf -- because of the multiqueue support in ixl
> > b) 63493K of ACPI -- what the heck ACPI?!?
> > and then there is not enough space for rtable. A full table requires
> > in your example 50816K of rtable malloc space.
> > 
> > Now on amd64 all of this needs to fit into 128MB which is impossible.
> > 
> > You can use config(8) and bsd.re-config(5) to adjust the nkmempg variable
> > to something like 131072 (which is 4 times the default size).
> > This can be verified with `sysctl vm.nkmempages`
> > 
> > Now ixl(4) and ACPI should not be such pigs but in the end 128MB of kernel
> > malloc space is just stupidly small on a system with 128GB of memory.
> 
> 
> Hi Claudio,
> 
> Thanks.
> 
> I bumped nkmempg to 131072
> 
> 
> 
> # config -e -o bsd.new /bsd
> 
> ukc> nkmempg 131072
> 
> quit
> 
> 
> 
> Then rebooted with the very same issue.
> 
> It seems the nkmempg variable is not properly takes into account since
> 'sysctl vm.nkmempages' still shows 32768 after reboot
> 
> 
> 
> # sysctl vm.nkmempages vm.nkmempages=32768
> 
> 
> 
> # config -e -o bsd.new /bsd OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 12:13:42
> MDT 2023
> r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> Enter 'help' for information ukc> nkmempg nkmempages = 262144 
> 
> Modifying /etc/bsd.re-config and rebooting (twice) didn't help either.
> 
> I 'had'to recompile kernel (after modifying: /usr/src/sys/kern/kern_malloc.c
> with '#define NKMEMPAGES 262144 '), the issue is not occuring again.
> 
> Do you recomend using this approach to mitigate the issue, or is there a
> more 'long term' fix ?

The problem is that the symbol nkmempages moved into .bss and is therefor
no longer modifiable by config(8). I think you can still use ukc via
boot -c to alter it (but that is not sticky).

The alternative is to set "option NKMEMPAGES=131072" in your GENERIC
config file (or option NKMEMPAGES_MAX=131072). See also options(4).

Long term is the fix this proper. All of this was built when computers had
100MB of memory not 100GB.

-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Laurent CARON

Le 28/11/2023 à 12:12, Claudio Jeker a écrit :

So the problem is that the malloc space is filled by
a) 26540K of devbuf -- because of the multiqueue support in ixl
b) 63493K of ACPI -- what the heck ACPI?!?
and then there is not enough space for rtable. A full table requires
in your example 50816K of rtable malloc space.

Now on amd64 all of this needs to fit into 128MB which is impossible.

You can use config(8) and bsd.re-config(5) to adjust the nkmempg variable
to something like 131072 (which is 4 times the default size).
This can be verified with `sysctl vm.nkmempages`

Now ixl(4) and ACPI should not be such pigs but in the end 128MB of kernel
malloc space is just stupidly small on a system with 128GB of memory.



Hi Claudio,

Thanks.

I bumped nkmempg to 131072



# config -e -o bsd.new /bsd

ukc> nkmempg 131072

quit



Then rebooted with the very same issue.

It seems the nkmempg variable is not properly takes into account since 
'sysctl vm.nkmempages' still shows 32768 after reboot




# sysctl vm.nkmempages vm.nkmempages=32768



# config -e -o bsd.new /bsd OpenBSD 7.4 (GENERIC.MP) #0: Sun Oct 22 
12:13:42 MDT 2023 
r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP 
Enter 'help' for information ukc> nkmempg nkmempages = 262144 


Modifying /etc/bsd.re-config and rebooting (twice) didn't help either.

I 'had'to recompile kernel (after modifying: 
/usr/src/sys/kern/kern_malloc.c with '#define NKMEMPAGES 262144 '), the 
issue is not occuring again.


Do you recomend using this approach to mitigate the issue, or is there a 
more 'long term' fix ?


Thanks

Laurent



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Claudio Jeker
On Mon, Nov 27, 2023 at 05:51:25PM +0100, Laurent CARON wrote:
> Please find attached the relevant info:
> 
> vmstat-m_SP_with_bgpd -> vmstat -m SP with bgpd
> 
> vmstat-m_SMP_without_bgpd -> vmstat -m SMP without bgpd
> 
> vmstat-m_SMP_with_bgpd_0{01..11} -> vmstat -m SMP with bgpd until crash.
> 
> 
> Thanks
> 
> Laurent
> 
> Le 27/11/2023 à 17:10, Claudio Jeker a écrit :
> > vmstat -m

So the problem is that the malloc space is filled by
a) 26540K of devbuf -- because of the multiqueue support in ixl
b) 63493K of ACPI -- what the heck ACPI?!?
and then there is not enough space for rtable. A full table requires
in your example 50816K of rtable malloc space.

Now on amd64 all of this needs to fit into 128MB which is impossible.

You can use config(8) and bsd.re-config(5) to adjust the nkmempg variable
to something like 131072 (which is 4 times the default size).
This can be verified with `sysctl vm.nkmempages`

Now ixl(4) and ACPI should not be such pigs but in the end 128MB of kernel
malloc space is just stupidly small on a system with 128GB of memory.
-- 
:wq Claudio



Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-28 Thread Laurent CARON

Hi Claudio,

Should you need remote access to the server, this is of course possible.

Le 27/11/2023 à 17:51, Laurent CARON a écrit :


Please find attached the relevant info:

vmstat-m_SP_with_bgpd -> vmstat -m SP with bgpd

vmstat-m_SMP_without_bgpd -> vmstat -m SMP without bgpd

vmstat-m_SMP_with_bgpd_0{01..11} -> vmstat -m SMP with bgpd until crash.


Thanks

Laurent

Le 27/11/2023 à 17:10, Claudio Jeker a écrit :

vmstat -m


Re: OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-27 Thread Claudio Jeker
On Mon, Nov 27, 2023 at 04:57:35PM +0100, Laurent CARON wrote:
> Hi,
> 
> I'm currently migrating a BGPd server.
> 
> Specs of "old" machine:
> 
> - Dell R720 with Intel(R) Xeon(R) CPU E5-2637 v2and 16GB RAM
> 
> - SMP Kernel (default)
> 
> - BGPd runs fine with 5 full views
> 
> - X710 NIC (ixl) 4 port interface
> 
> Specs of "new" machine:
> 
> - Dell R750xs with Intel(R) Xeon(R) Gold 6334 CPU @ 3.60GHz and 128GB RAM
> 
> - SMP Kernel (default)
> 
> - X710 NIC (ixl) 2 nics with 2 ports each
> 
> - BGPd crashes with "panic: malloc: out of space in kmem_map" (please see
> screenshot).
> 
> - When launching 'bgpd -dv' on the console, logs are showing:
> 
> send_rtmsg: action 1, prefix 179.62.148.0/24: No buffer space available
> send_rtmsg: action 1, prefix 176.59.72.0/23: No buffer space available
> send_rtmsg: action 1, prefix 176.59.70.0/23: No buffer space available
> send_rtmsg: action 1, prefix 176.59.74.0/23: No buffer space available
> send_rtmsg: action 1, prefix 185.78.92.0/22: No buffer space available
> send_rtmsg: action 1, prefix 176.59.64.0/23: No buffer space available
> send_rtmsg: action 1, prefix 176.59.66.0/23: No buffer space available
> 
> .
> 
> send_rtmsg: action 1, prefix 31.132.21.0/24: No buffer space available
> send_rtmsg: action 1, prefix 38.94.167.0/24: No buffer space available
> 
> then the machine crashes after having processed a few thousands prefixes.
> 
> When using the SP (boot /bsd.sp) kernel, the issue doesn't arise.
> 
> Do you have any pointer to solve this issue ?

Please send vmstat -m output of the affected machine.
The problem is probably the multiqueue support in ixl(4) that consumes too
much memory.

-- 
:wq Claudio



OpenBSD SMP - BGPd - send_rtmsg: action 1, prefix A.B.C.D/24: No buffer space available - panic: malloc: out of space in kmem_map

2023-11-27 Thread Laurent CARON

Hi,

I'm currently migrating a BGPd server.

Specs of "old" machine:

- Dell R720 with Intel(R) Xeon(R) CPU E5-2637 v2and 16GB RAM

- SMP Kernel (default)

- BGPd runs fine with 5 full views

- X710 NIC (ixl) 4 port interface

Specs of "new" machine:

- Dell R750xs with Intel(R) Xeon(R) Gold 6334 CPU @ 3.60GHz and 128GB RAM

- SMP Kernel (default)

- X710 NIC (ixl) 2 nics with 2 ports each

- BGPd crashes with "panic: malloc: out of space in kmem_map" (please 
see screenshot).


- When launching 'bgpd -dv' on the console, logs are showing:

send_rtmsg: action 1, prefix 179.62.148.0/24: No buffer space available
send_rtmsg: action 1, prefix 176.59.72.0/23: No buffer space available
send_rtmsg: action 1, prefix 176.59.70.0/23: No buffer space available
send_rtmsg: action 1, prefix 176.59.74.0/23: No buffer space available
send_rtmsg: action 1, prefix 185.78.92.0/22: No buffer space available
send_rtmsg: action 1, prefix 176.59.64.0/23: No buffer space available
send_rtmsg: action 1, prefix 176.59.66.0/23: No buffer space available

.

send_rtmsg: action 1, prefix 31.132.21.0/24: No buffer space available
send_rtmsg: action 1, prefix 38.94.167.0/24: No buffer space available

then the machine crashes after having processed a few thousands prefixes.

When using the SP (boot /bsd.sp) kernel, the issue doesn't arise.

Do you have any pointer to solve this issue ?

Thanks