Re: increase softint_bytes

2017-11-24 Thread 6bone

On Wed, 22 Nov 2017, Masanobu SAITOH wrote:


Date: Wed, 22 Nov 2017 11:41:59 +0900
From: Masanobu SAITOH <msai...@execsw.org>
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: msai...@execsw.org, tech-kern@NetBSD.org
Subject: Re: increase softint_bytes

On 2017/11/21 15:35, Masanobu SAITOH wrote:

On 2017/11/20 17:28, Masanobu SAITOH wrote:

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

 http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


Updated patch

 http://www.netbsd.org/~msaitoh/vlan-20171121-0.dif

 Fix compile error (sorry)

 Revert if_wmreg.h 1.104 and if_wm.c 1.542


Committed in -current.


-current now boots without error on my server.

Regards
Uwe


Re: increase softint_bytes

2017-11-23 Thread Thor Lancelot Simon
On Thu, Nov 16, 2017 at 10:25:54PM +0100, Jarom??r Dole??ek wrote:
> 
> If I count correctly, with current 8192 bytes the system supports some 100
> softints, which seems to be on quite low side - more advanced hardware and
> drivers usually use queue and softint per cpu, so it can quickly run out on
> system with e.g. 32 cores.

Indeed, the same driver (ixg) used to routinely run my FreeBSD systems
out of their equivalent to a softint -- 2 dual-port cards, 8-core CPU, and
*wham*.  I believe this was adjusted over there so that they allocate
based on the number of cores with a cap set by total RAM.  And a higher
default.

Thor


Re: increase softint_bytes

2017-11-22 Thread Masanobu SAITOH

On 2017/11/21 15:35, Masanobu SAITOH wrote:

On 2017/11/20 17:28, Masanobu SAITOH wrote:

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

 http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


Updated patch

 http://www.netbsd.org/~msaitoh/vlan-20171121-0.dif

 Fix compile error (sorry)

 Revert if_wmreg.h 1.104 and if_wm.c 1.542


 Committed in -current.


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-21 Thread Masanobu SAITOH

On 2017/11/21 17:03, matthew green wrote:

   It'll take a little time to write this change. And, it's low level
and important code, so it will take a time to test the stability
before sending pullup request. So,

   0) Apply the following change to -current.


Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.43
diff -u -p -r1.43 kern_softint.c
--- kern_softint.c  4 Jul 2016 04:20:14 -   1.43
+++ kern_softint.c  21 Nov 2017 06:41:35 -
@@ -217,7 +217,7 @@ typedef struct softcpu {
   
   static void	softint_thread(void *);
   
-u_int		softint_bytes = 8192;

+u_int  softint_bytes = 16384;
   u_intsoftint_timing;
   static u_int softint_max;
   static kmutex_t  softint_lock;


   1) Sent the pullup request to netbsd-8

   2) Write auto-resize code and commit.

   3) If it's stable, send the pullup request to netbsd-8.


sounds good to me.  i would even be ok with a switch
to 16k or even 32k (platform basis?)) for netbsd-8 or


 Committed with 32K.

 Thanks!



earlier right now and allow the resize code to live in
-current until -9.

thanks!


.mrg.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


re: increase softint_bytes

2017-11-21 Thread matthew green
>   It'll take a little time to write this change. And, it's low level
> and important code, so it will take a time to test the stability
> before sending pullup request. So,
> 
>   0) Apply the following change to -current.
> 
> 
> Index: kern_softint.c
> ===
> RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
> retrieving revision 1.43
> diff -u -p -r1.43 kern_softint.c
> --- kern_softint.c4 Jul 2016 04:20:14 -   1.43
> +++ kern_softint.c21 Nov 2017 06:41:35 -
> @@ -217,7 +217,7 @@ typedef struct softcpu {
>   
>   static void softint_thread(void *);
>   
> -u_intsoftint_bytes = 8192;
> +u_intsoftint_bytes = 16384;
>   u_int   softint_timing;
>   static u_intsoftint_max;
>   static kmutex_t softint_lock;
> 
> 
>   1) Sent the pullup request to netbsd-8
> 
>   2) Write auto-resize code and commit.
> 
>   3) If it's stable, send the pullup request to netbsd-8.

sounds good to me.  i would even be ok with a switch
to 16k or even 32k (platform basis?)) for netbsd-8 or
earlier right now and allow the resize code to live in
-current until -9.

thanks!


.mrg.


Re: increase softint_bytes

2017-11-20 Thread Martin Husemann
On Tue, Nov 21, 2017 at 03:45:26PM +0900, Masanobu SAITOH wrote:
>  0) Apply the following change to -current.
> 
> 
> Index: kern_softint.c
> ===
> RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
> retrieving revision 1.43
> diff -u -p -r1.43 kern_softint.c
> --- kern_softint.c4 Jul 2016 04:20:14 -   1.43
> +++ kern_softint.c21 Nov 2017 06:41:35 -
> @@ -217,7 +217,7 @@ typedef struct softcpu {
>  static void  softint_thread(void *);
> -u_intsoftint_bytes = 8192;
> +u_intsoftint_bytes = 16384;
>  u_intsoftint_timing;
>  static u_int softint_max;
>  static kmutex_t  softint_lock;
> 
> 
>  1) Sent the pullup request to netbsd-8
> 
>  2) Write auto-resize code and commit.
> 
>  3) If it's stable, send the pullup request to netbsd-8.
> 
> 
>  OK?

Sounds like a great plan!

Martin


Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH



can't this be fixed by making it dynamic?


  It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


  I'm sorry. I misread kern_softint.c. The return value is not directly point to
the area, but it's offset of the area, so it would be easy to resize it.

  I'll try to modify the code to do auto-resize.


 It'll take a little time to write this change. And, it's low level
and important code, so it will take a time to test the stability
before sending pullup request. So,

 0) Apply the following change to -current.


Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.43
diff -u -p -r1.43 kern_softint.c
--- kern_softint.c  4 Jul 2016 04:20:14 -   1.43
+++ kern_softint.c  21 Nov 2017 06:41:35 -
@@ -217,7 +217,7 @@ typedef struct softcpu {
 
 static void	softint_thread(void *);
 
-u_int		softint_bytes = 8192;

+u_int  softint_bytes = 16384;
 u_int  softint_timing;
 static u_int   softint_max;
 static kmutex_tsoftint_lock;


 1) Sent the pullup request to netbsd-8

 2) Write auto-resize code and commit.

 3) If it's stable, send the pullup request to netbsd-8.


 OK?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-17 Thread 6bone

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Re: increase softint_bytes

2017-11-17 Thread Masanobu SAITOH

Hi, all.

On 2017/11/17 15:35, Masanobu SAITOH wrote:

Hi, mrg.

On 2017/11/17 5:05, matthew green wrote:

Masanobu SAITOH writes:

   Hi, all.

   Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?


can't this be fixed by making it dynamic?


  It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


 I'm sorry. I misread kern_softint.c. The return value is not directly point to
the area, but it's offset of the area, so it would be easy to resize it.

 I'll try to modify the code to do auto-resize.






 ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.


The code is:

ip_init(void)
{
    const struct protosw *pr;

    in_init();
    sysctl_net_inet_ip_setup(NULL);

    pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
    KASSERT(pr != NULL);

    ip_pktq = pktq_create(IFQ_MAXLEN, ipintr, NULL);
    KASSERT(ip_pktq != NULL); <== This.






the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.


  I think this KASSERT() is reasonable.


  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.


bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
pseudo interfaces which is created by ifconfig command allocate
softint. Some of them may be created tens, hundreds or more.
So I think it's worth to check the current number.




thanks.


.mrg.







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-17 Thread Manuel Bouyer
On Fri, Nov 17, 2017 at 03:35:48PM +0900, Masanobu SAITOH wrote:
> bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
> pseudo interfaces which is created by ifconfig command allocate
> softint. Some of them may be created tens, hundreds or more.
> So I think it's worth to check the current number.

I have dom0 Xen with about 50 vlans and 50 bridges at work.
They're stuck to netbsd-6 because netbsd-7 runs out of softints
with that many interfaces.

As I understand it, things changed in this area in netbsd-8 and 
it should support that many vlans/bridges, but don't put the barrier too low.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: increase softint_bytes

2017-11-17 Thread Martin Husemann
On Fri, Nov 17, 2017 at 03:43:57PM +0900, Masanobu SAITOH wrote:
>  Each driver which call softint_establish() keep the return value, so
> it's very hard without big modification.

We could make the "void *cookie" returned by softint_establish(9) an
index (internally) instead (w/o changing ABI or API), so the value
would stay valid on reallocations of the array.

The realllocation could be handled lazy per cpu (that is: note the new
target size globaly, have the local cpu handle the reallocation at
next time a softint is established/disesatablished/scheduled).

But maybe I'm overlooking something.

Martin


Re: increase softint_bytes

2017-11-16 Thread Jaromír Doleček
softint_establish() already fails if there is no space, and prints a
WARNING in this case. The caller however needs to check for the failure,
which ixgbe(4) mostly doesn't. Not sure if that is really the root cause,
the panic seems to be actually in different area, and I didn't actually see
the warning in the dmesg.

Making softint size fully dynamic is unfortunately not straighforward, it
would need to reallocate the softint memory on all CPUs in a way that any
serviced softints would continue working during this time. It would be
simple however to adjust it based on % of physical memory, instead of fixed
value.

If I count correctly, with current 8192 bytes the system supports some 100
softints, which seems to be on quite low side - more advanced hardware and
drivers usually use queue and softint per cpu, so it can quickly run out on
system with e.g. 32 cores.

I agree that the sysctl seems somewhat unnecessary, since the limit is only
relevant during boot anyway, so it's not very helpful.

Jaromir

2017-11-16 21:05 GMT+01:00 matthew green :

> Masanobu SAITOH writes:
> >   Hi, all.
> >
> >   Some device drivers now allocate a lot of softints.
> > See:
> >
> >   http://mail-index.netbsd.org/current-users/2017/11/09/
> msg032581.html
> >
> > To avoid this panic, I wrote the following patch:
> >
> >   http://www.netbsd.org/~msaitoh/softint-20171116-0.dif
> >
> > Summary:
> >
> >   - Increase the default size from 8192 bytes to 32768 bytes.
> >   - Add new option SOFTINT_BYTES to change the value.
> >   - Add two new read-only sysctls kern.softint.{max,count}
> >
> > Any comment?
>
> can't this be fixed by making it dynamic?  ie, fix it properly so
> that it never has this problem again -- if in 10 years we're adding
> 5x as many softints as today, we'll hit it again.
>
> i also don't see why the crash can't be fixed to become an error.
> the code tries to avoid adding more than it can, but soemthing must
> be wrong for the crash to occur anyway.  ie, at worst we'd fix it
> to return an error instead of crashing.
>
> if that is done, the sysctl exports seem unneccesary as well.
>
> thanks.
>
>
> .mrg.
>


Re: increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

On 2017/11/17 6:25, Jaromír Doleček wrote:

softint_establish() already fails if there is no space, and prints a WARNING in 
this case. The caller however needs to check for the failure, which ixgbe(4) 
mostly doesn't.


No. ixgbe() checks the return value of if_initialize() and return
correctly. That panic is not in ixgbe(4) driver but in ip_input.c::ip_init().


 Not sure if that is really the root cause, the panic seems to be actually in 
different area, and I didn't actually see the warning in the dmesg.


Making softint size fully dynamic is unfortunately not straighforward, it would 
need to reallocate the softint memory on all CPUs in a way that any serviced 
softints would continue working during this time.


 Each driver which call softint_establish() keep the return value, so
it's very hard without big modification.


It would be simple however to adjust it based on % of physical memory, instead 
of fixed value.


It might be good, but it's only small area, so fixed value might be OK.
And not so many people create a lot of pseudo interface, so I think it's
OK to make the max value as a kernel option to increase.



If I count correctly, with current 8192 bytes the system supports some 100 
softints, which seems to be on quite low side - more advanced hardware and 
drivers usually use queue and softint per cpu, so it can quickly run out on 
system with e.g. 32 cores.

I agree that the sysctl seems somewhat unnecessary, since the limit is only 
relevant during boot anyway,


As I wrote, it's incorrect because ifconfig xxxN create allocates softint.



so it's not very helpful.
Jaromir

2017-11-16 21:05 GMT+01:00 matthew green >:

Masanobu SAITOH writes:
>   Hi, all.
>
>   Some device drivers now allocate a lot of softints.
> See:
>
>   http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html 

>
> To avoid this panic, I wrote the following patch:
>
>   http://www.netbsd.org/~msaitoh/softint-20171116-0.dif 

>
> Summary:
>
>       - Increase the default size from 8192 bytes to 32768 bytes.
>       - Add new option SOFTINT_BYTES to change the value.
>       - Add two new read-only sysctls kern.softint.{max,count}
>
> Any comment?

can't this be fixed by making it dynamic?  ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.
the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.

thanks.


.mrg.





--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

Hi, mrg.

On 2017/11/17 5:05, matthew green wrote:

Masanobu SAITOH writes:

   Hi, all.

   Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?


can't this be fixed by making it dynamic?


 It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


 ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.


The code is:

ip_init(void)
{
const struct protosw *pr;

in_init();
sysctl_net_inet_ip_setup(NULL);

pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
KASSERT(pr != NULL);

ip_pktq = pktq_create(IFQ_MAXLEN, ipintr, NULL);
KASSERT(ip_pktq != NULL); <== This.






the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.


 I think this KASSERT() is reasonable.


  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.


bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
pseudo interfaces which is created by ifconfig command allocate
softint. Some of them may be created tens, hundreds or more.
So I think it's worth to check the current number.




thanks.


.mrg.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


re: increase softint_bytes

2017-11-16 Thread matthew green
Masanobu SAITOH writes:
>   Hi, all.
> 
>   Some device drivers now allocate a lot of softints.
> See:
> 
>   http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html
> 
> To avoid this panic, I wrote the following patch:
> 
>   http://www.netbsd.org/~msaitoh/softint-20171116-0.dif
> 
> Summary:
> 
>   - Increase the default size from 8192 bytes to 32768 bytes.
>   - Add new option SOFTINT_BYTES to change the value.
>   - Add two new read-only sysctls kern.softint.{max,count}
> 
> Any comment?

can't this be fixed by making it dynamic?  ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.
the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.

thanks.


.mrg.