Re: timeout.9: document kclock timeouts + a bunch of other changes

2021-06-23 Thread Jason McIntyre
On Wed, Jun 23, 2021 at 06:57:00PM -0500, Scott Cheloha wrote:
> Hi,
> 
> I want to document kclock timeouts so others can use them.
> 

morning. reads fine, except one issue:

> 
> Index: share/man/man9/timeout.9
> ===
> RCS file: /cvs/src/share/man/man9/timeout.9,v
> retrieving revision 1.53
> diff -u -p -r1.53 timeout.9
> --- share/man/man9/timeout.9  11 May 2021 13:29:25 -  1.53
> +++ share/man/man9/timeout.9  23 Jun 2021 23:39:04 -
> @@ -44,7 +44,7 @@
>  .Nm timeout_triggered ,
>  .Nm TIMEOUT_INITIALIZER ,
>  .Nm TIMEOUT_INITIALIZER_FLAGS
> -.Nd execute a function after a specified period of time
> +.Nd execute a function in the future
>  .Sh SYNOPSIS
>  .In sys/types.h
>  .In sys/timeout.h
> @@ -55,12 +55,13 @@
>  .Fa "struct timeout *to"
>  .Fa "void (*fn)(void *)"
>  .Fa "void *arg"
> +.Fa "int kclock"
>  .Fa "int flags"
>  .Fc
>  .Ft void
>  .Fn timeout_set_proc "struct timeout *to" "void (*fn)(void *)" "void *arg"
>  .Ft int
> -.Fn timeout_add "struct timeout *to" "int ticks"
> +.Fn timeout_add "struct timeout *to" "int nticks"
>  .Ft int
>  .Fn timeout_del "struct timeout *to"
>  .Ft int
> @@ -83,174 +84,218 @@
>  .Fn timeout_add_usec "struct timeout *to" "int usec"
>  .Ft int
>  .Fn timeout_add_nsec "struct timeout *to" "int nsec"
> -.Fn TIMEOUT_INITIALIZER "void (*fn)(void *)" "void *arg"
> -.Fn TIMEOUT_INITIALIZER_FLAGS "void (*fn)(void *)" "void *arg" "int flags"
> +.Ft int
> +.Fn timeout_in_nsec "struct timeout *to" "uint64_t nsecs"
> +.Ft int
> +.Fn timeout_at_ts "struct timeout *to" "const struct timespec *abs"
> +.Fo TIMEOUT_INITIALIZER
> +.Fa "void (*fn)(void *)"
> +.Fa "void *arg"
> +.Fc
> +.Fo TIMEOUT_INITIALIZER_FLAGS
> +.Fa "void (*fn)(void *)"
> +.Fa "void *arg"
> +.Fa "int kclock"
> +.Fa "int flags"
> +.Fc
>  .Sh DESCRIPTION
>  The
>  .Nm timeout
> -API provides a mechanism to execute a function at a given time.
> -The granularity of the time is limited by the granularity of the
> -.Xr hardclock 9
> -timer which executes
> -.Xr hz 9
> -times a second.
> +API provides a mechanism to schedule an arbitrary function for asynchronous
> +execution in the future.
>  .Pp
> -It is the responsibility of the caller to provide these functions with
> -pre-allocated timeout structures.
> +All state is encapsulated in a caller-allocated timeout structure
> +.Pq hereafter, a Qo timeout Qc .
> +A timeout must be initialized before it may be used as input to other
> +functions in the API.
>  .Pp
>  The
>  .Fn timeout_set
> -function prepares the timeout structure
> -.Fa to
> -to be used in future calls to
> -.Fn timeout_add
> -and
> -.Fn timeout_del .
> -The timeout will be prepared to call the function specified by the
> +function initializes the timeout
> +.Fa to .
> +When executed,
> +the timeout will call the function
>  .Fa fn
> -argument with a
> -.Fa void *
> -argument given in the
> +with
>  .Fa arg
> -argument.
> -Once initialized, the
> -.Fa to
> -structure can be used repeatedly in
> -.Fn timeout_add
> -and
> -.Fn timeout_del
> -and does not need to be reinitialized unless
> -the function called and/or its argument must change.
> +as its first parameter.
> +The timeout is implicitly scheduled against the
> +.Dv KCLOCK_NONE
> +clock and is not configured with any additional flags.
>  .Pp
>  The
>  .Fn timeout_set_flags
>  function is similar to
> -.Fn timeout_set
> -but it additionally accepts the bitwise OR of zero or more of the
> -following
> +.Fn timeout_set ,
> +except that it takes two additional parameters:
> +.Bl -tag -width kclock
> +.It Fa kclock
> +The timeout is scheduled against the given
> +.Fa kclock,

you need a space between kclock and the comma

jmc

> +which must be one of the following:
> +.Bl -tag -width KCLOCK_UPTIME
> +.It Dv KCLOCK_NONE
> +Low resolution tick-based clock.
> +The granularity of this clock is limited by the
> +.Xr hardclock 9 ,
> +which executes roughly
> +.Xr hz 9
> +times per second.
> +.It Dv KCLOCK_UPTIME
> +The uptime clock.
> +Counts the time elapsed since the system booted.
> +.El
> +.It Fa flags
> +The timeout's behavior may be configured with the bitwise OR of
> +zero or more of the following
>  .Fa flags :
> -.Bl -tag -width TIMEOUT_PROC -offset indent
> +.Bl -tag -width TIMEOUT_PROC
>  .It Dv TIMEOUT_PROC
> -Runs the timeout in a process context instead of the default
> +Execute the timeout in a process context instead of the default
>  .Dv IPL_SOFTCLOCK
>  interrupt context.
>  .El
> +.El
>  .Pp
>  The
>  .Fn timeout_set_proc
> -function is similar to
> +function is equivalent to
> +.Fn timeout_set ,
> +except that the given timeout is configured with the
> +.Dv TIMEOUT_PROC
> +flag.
> +.Pp
> +A timeout may also be initialized statically.
> +The
> +.Fn TIMEOUT_INITIALIZER
> +macro is equivalent to the
>  .Fn timeout_set
> -but it runs the timeout in a process context instead of the default
> -.Dv IPL_SOFTCLOCK
> -interrupt context.
> +function,
> +and the
> +.Fn 

Re: if_etherbridge.c vs. parallel forwarding

2021-06-23 Thread David Gwynne
On Sat, Jun 19, 2021 at 12:32:04AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> skip reading if you are not interested in L2 switching combined
> with bluhm's diff [1], which enables parallel forwarding.
> 
> Hrvoje gave it a try and soon discovered some panics. Diff below
> fixes a panic indicated by stack as follows:

nice.

> login: panic: kernel diagnostic assertion "smr->smr_func == NULL"\
> fai(ed: file "/home/sasha/src.sashan/sys/kern/kern_smr.c", line 247
> Stopped at  db_enter+0x10:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  168970  15734  0 0x14000  0x2003  softnet
>   92200  58362  0 0x14000  0x2005  softnet
>  195539  36092  0 0x14000  0x2002  softnet
> *162819  18587  0 0x14000  0x2001  softnet
> db_enter() at db_enter+0x10
> panic(81e0483c) at panic+0xbf
> __assert(81e6d854,81e6b008,f7,81e6b033) at 
> __assert+0x2b
> smr_call_impl(fd83936c7160,810ef0f0,fd83936c7100,0) \
> at smr_call_impl+0xd4
> veb_port_input(80082048,fd80cccaef00,90e2ba33b4a1,8015f900)\
> at veb_port_input+0x2fa
> ether_input(80082048,fd80cccaef00) at ether_input+0xf5
> if_input_process(80082048,800022c62388) at if_input_process+0x6f
> ifiq_process(80082458) at ifiq_process+0x69
> taskq_thread(8002f080) at taskq_thread+0x81
> end trace frame: 0x0, count: 6
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> 
> Hrvoje knows all details [2] how to wire things up to trigger the
> crash. I'm just using all HW and scripts he kindly provided me
> to reproduce those panics reliably.
> 
> I think the crash comes from combination of SMR and reference
> counting done by atomic ops. Let's assume two cpus
> are trying to update the same entry.
> 
> Let's assume one CPU (cpu0) just found oebe using ebl_find() at line 311:
> 
> 310 smr_read_enter();
> 311 oebe = ebl_find(ebl, eba);
> 312 if (oebe == NULL)
> 313 new = 1;
> 314 else {
> 315 if (oebe->ebe_age != now)
> 316 oebe->ebe_age = now;
> 317 
> 318 /* does this entry need to be replaced? */
> 319 if (oebe->ebe_type == EBE_DYNAMIC &&
> 320 !eb_port_eq(eb, oebe->ebe_port, port)) {
> 321 new = 1;
> 322 ebe_take(oebe);
> 323 } else
> 
> few ticks later the other CPU (cpu1) runs ahead. It just removed
> the same entry found by cpu0 at line 360:
> 353 mtx_enter(>eb_lock);
> 354 num = eb->eb_num + (oebe == NULL);
> 355 if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {
> 356 /* we won, do the update */
> 357 ebl_insert(ebl, nebe);
> 358 
> 359 if (oebe != NULL) {
> 360 ebl_remove(ebl, oebe);
> 361 ebt_replace(eb, oebe, nebe);
> 362 
> 
> let's further assume cpu1 reaches line 389:
> 383 if (oebe != NULL) {
> 384 /*
> 385  * the old entry could be referenced in
> 386  * multiple places, including an smr read
> 387  * section, so release it properly.
> 388  */
> 389 ebe_rele(oebe);
> 390 }
> before cpu0 raches line 322 (ebe_take()). If that happens the cpu1 drops
> the last reference to oebe, which is in fact shared between cpu1 and cpu0.
> if cpu1 sees reference count is zero, it does smr_call(), which schedules
> ebe_free() on oebe, so oebe can be freed when cpu0 is done with it.
> 
> few ticks later cpu0 reaches line 322 and takes its reference to oebe.
> the cpu0 enters critical section and sees it lost race (because
> ebt_insert() != oebe). cpu0 continues at line 389. it drops last
> reference and calls smr_call(). It trips the assert, because ebe_free()
> is scheduled already by cpu1.
> 
> diff below fixes the flaw by introducing `cebe` (conflicting ebe) local
> variable:

i think your real fix is where you stop taking an oebe reference
from the smr critical section.

> 345 mtx_enter(>eb_lock);
> 346 num = eb->eb_num + (oebe == NULL);
> 347 cebe = NULL;
> 348 if (num <= eb->eb_max) {
> 349 cebe = ebt_insert(eb, nebe);
> 350 
> 351 if (cebe == NULL) {
> 352 /* nebe got inserted without conflict */
> 353 eb->eb_num++;
> 354 ebl_insert(ebl, nebe);
> 355 nebe = NULL;
> 356 } else if ((oebe != NULL) && (oebe == cebe)) {
> 357 /* we won, do the update */
> 358 ebl_insert(ebl, nebe);
> 359 

timeout.9: document kclock timeouts + a bunch of other changes

2021-06-23 Thread Scott Cheloha
Hi,

I want to document kclock timeouts so others can use them.

I think timeout.9 itself could use some cleanup, both to reflect the
current state of the API and to improve the documentation in general.

This patch includes two small API changes:

1. Combine timeout_set_kclock() and timeout_set_flags(9) into a single
   function, timeout_set_flags(9).  There are no callers for
   timeout_set_flags(9).  There is only one caller for
   timeout_set_kclock().  The caller is updated in the patch.

2. Combine TIMEOUT_INITIALIZER_KCLOCK() and TIMEOUT_INITIALIZER_FLAGS(9)
   into a single macro, TIMEOUT_INITIALIZER_FLAGS(9).  There are no users
   for either macro.

Before we get to the changelist, there are a couple naming things
still nagging at me that I'd like feedback (bikeshedding) on before
the kclock timeout API documentation is committed and, you know,
frozen.  Feel free to weigh in on these:

- Maybe "KCLOCK_TICK" is a better name for the tick-based schedule
  than "KCLOCK_NONE"?

- Maybe "timeout_rel_nsec(9)" is a better function name than
  "timeout_in_nsec(9)"?  The former spells out that the interface
  schedules relative timeouts.  I'm worried the latter is overly
  clever.

  There are no callers for timeout_in_nsec().  A name change would
  be trivial.

- Maybe "timeout_abs_ts(9)" is a better name than "timeout_at_ts(9)"?
  The former spells out that the interface schedules absolute timeouts.
  I'm worried the latter is overly clever.

  There is only one caller for timeout_at_ts().  Easy to change.

- It'd save a little space if we truncated the external flags
  namespace from "TIMEOUT_" to "TO_".  My thinking is with a shorter
  prefix you could fit more flags on a line.  Think:

timeout_set_flags(..., TO_PROC | TO_MPSAFE | TO_FOO)

  versus:

timeout_set_flags(..., TIMEOUT_PROC | TIMEOUT_MPSAFE | TIMEOUT_FOO);

  We would then change "TIMEOUT_PROC" to "TO_PROC".  There are no
  users of the flag outside of kern_timeout.c, so the change would
  be trivial.

--

Anyway, here is the list of changes to timeout.9 from top to bottom
with some notes where appropriate to explain my thinking:

- Change the short summary to "in the future".  The phrase "after a
  specified period of time" is inaccurate and a little clunky.

- Update parameters for timeout_set_flags(9).

- Add timeout_in_nsec(9) and timeout_at_ts(9) to the SYNOPSIS.

- Update parameters for TIMEOUT_INITIALIZER_FLAGS(9).

- Reuse the new short summary in the first paragraph of the DESCRIPTION.
  Move talk about granularity limitations down to the new description
  of KCLOCK_NONE.  Note that the execution is asynchronous.

- In paragraph 2 of the DESCRIPTION, talk a bit about the timeout
  object and what it is for.  Talk about how a timeout must be
  initialized before you can use it.

- ... we can then simplify the description of timeout_set(9) in the
  next paragraph.  Note what timeout_set(9) now does implicitly.

- Update documentation for timeout_set_flags(9).  Talk about the
  available kclocks.  Move the flag list into a sublist.

- Tweak the timeout_set_proc(9) paragraph to refer to the TIMEOUT_PROC
  flag instead of duplicating a description of what the flag does from
  earlier.

- Move the description of the static initialization macros up to
  this point in the document.  I think they should be documented
  immediately after the runtime initialization functions.

- Add a paragraph explaining that the functions used to schedule the
  timeout vary with the timeout's kclock before talking about the
  scheduling functions.  We need to note this.  It's an error to
  mix and match functions.

- Simplify the description of timeout_add(9).  Only talk about what
  it does.  Move rules about reinitializing a scheduled timeout
  further down.  Move details about rescheduling a timeout further
  down.

- Move the description of timeout_add_sec(9) etc. up to immediately
  follow the timeout_add(9) paragraph.  They are wrapper functions,
  they are relevant *here*.

- Add a paragraph describing timeout_in_nsec(9) and timeout_at_ts(9).

- ... now that we are done describing the scheduling interfaces, talk
  about the rules for reinitializing a timeout.

- ... and then talk about how rescheduling a timeout works.

- Clean up the cancellation function documentation a bit (timeout_del,
  timeout_del_barrier, timeout_barrier).  Prefer "blocks" to "waits", as
  the latter is a bit ambiguous.

- Note when a timeout's pending status is cleared in the
  timeout_pending(9) paragraph. 

- Add a caveat that the underlying memory needs to be zeroed to
  the timeout_initialized(9) paragraph.

- Make the timeout_triggered(9) paragraph a bit more general.
  "Rescheduling" instead of "timeout_add(9)", "cancelling" instead of
  "timeout_del(9)". 

- Add timeout_in_nsec(9) and timeout_at_ts(9) to the CONTEXT section.

- Generalize our description of timeout execution in the CONTEXT
  section.

- Add timeout_in_nsec(9) and timeout_at_ts(9) to 

Re: Bad comparison between double and uint64_t

2021-06-23 Thread Masato Asou
Hi Ali,

In this case, ULLONG_MAX is implicitly cast to double, isn't it?

Do you have any problems if you don't cast to double? 
--
ASOU Masato

From: Ali Farzanrad 
Date: Wed, 23 Jun 2021 20:24:27 +

> Oh, my bad, sorry.
> I assumed that val is always integer.
> I guess it is better to ignore val == ULLONG_MAX:
> 
> ===
> RCS file: /cvs/src/sbin/disklabel/editor.c,v
> retrieving revision 1.368
> diff -u -p -r1.368 editor.c
> --- editor.c  2021/05/30 19:02:30 1.368
> +++ editor.c  2021/06/23 20:23:03
> @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
>   }
>  
>   val *= factor / DEV_BSIZE;
> - if (val > ULLONG_MAX)
> + if (val >= (double)ULLONG_MAX)
>   return (-1);
>   *n = val;
>   return (0);
> @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char **un
>  {
>   errno = 0;
>   *val = strtod(buf, unit);
> - if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
> + if (errno == ERANGE || *val < 0 || *val >= (double)ULLONG_MAX)
>   return (-1);/* too big/small */
>   if (*val == 0 && *unit == buf)
>   return (-1);/* No conversion performed. */
> 
> Ali Farzanrad  wrote:
>> Hi tech@,
>> 
>> disklabel shows a warning at build time which might be important.
>> Following diff will surpass the warning.
>> 
>> ===
>> RCS file: /cvs/src/sbin/disklabel/editor.c,v
>> retrieving revision 1.368
>> diff -u -p -r1.368 editor.c
>> --- editor.c 2021/05/30 19:02:30 1.368
>> +++ editor.c 2021/06/23 19:25:45
>> @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
>>  }
>>  
>>  val *= factor / DEV_BSIZE;
>> -if (val > ULLONG_MAX)
>> +if (val != (double)(u_int64_t)val)
>>  return (-1);
>>  *n = val;
>>  return (0);
>> @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char **un
>>  {
>>  errno = 0;
>>  *val = strtod(buf, unit);
>> -if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
>> +if (errno == ERANGE || *val < 0 || *val != (double)(u_int64_t)*val)
>>  return (-1);/* too big/small */
>>  if (*val == 0 && *unit == buf)
>>  return (-1);/* No conversion performed. */
>> 
>> 
> 



Re: Bad comparison between double and uint64_t

2021-06-23 Thread Ali Farzanrad
Oh, my bad, sorry.
I assumed that val is always integer.
I guess it is better to ignore val == ULLONG_MAX:

===
RCS file: /cvs/src/sbin/disklabel/editor.c,v
retrieving revision 1.368
diff -u -p -r1.368 editor.c
--- editor.c2021/05/30 19:02:30 1.368
+++ editor.c2021/06/23 20:23:03
@@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
}
 
val *= factor / DEV_BSIZE;
-   if (val > ULLONG_MAX)
+   if (val >= (double)ULLONG_MAX)
return (-1);
*n = val;
return (0);
@@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char **un
 {
errno = 0;
*val = strtod(buf, unit);
-   if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
+   if (errno == ERANGE || *val < 0 || *val >= (double)ULLONG_MAX)
return (-1);/* too big/small */
if (*val == 0 && *unit == buf)
return (-1);/* No conversion performed. */

Ali Farzanrad  wrote:
> Hi tech@,
> 
> disklabel shows a warning at build time which might be important.
> Following diff will surpass the warning.
> 
> ===
> RCS file: /cvs/src/sbin/disklabel/editor.c,v
> retrieving revision 1.368
> diff -u -p -r1.368 editor.c
> --- editor.c  2021/05/30 19:02:30 1.368
> +++ editor.c  2021/06/23 19:25:45
> @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
>   }
>  
>   val *= factor / DEV_BSIZE;
> - if (val > ULLONG_MAX)
> + if (val != (double)(u_int64_t)val)
>   return (-1);
>   *n = val;
>   return (0);
> @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char **un
>  {
>   errno = 0;
>   *val = strtod(buf, unit);
> - if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
> + if (errno == ERANGE || *val < 0 || *val != (double)(u_int64_t)*val)
>   return (-1);/* too big/small */
>   if (*val == 0 && *unit == buf)
>   return (-1);/* No conversion performed. */
> 
> 



Bad comparison between double and uint64_t

2021-06-23 Thread Ali Farzanrad
Hi tech@,

disklabel shows a warning at build time which might be important.
Following diff will surpass the warning.

===
RCS file: /cvs/src/sbin/disklabel/editor.c,v
retrieving revision 1.368
diff -u -p -r1.368 editor.c
--- editor.c2021/05/30 19:02:30 1.368
+++ editor.c2021/06/23 19:25:45
@@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
}
 
val *= factor / DEV_BSIZE;
-   if (val > ULLONG_MAX)
+   if (val != (double)(u_int64_t)val)
return (-1);
*n = val;
return (0);
@@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char **un
 {
errno = 0;
*val = strtod(buf, unit);
-   if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
+   if (errno == ERANGE || *val < 0 || *val != (double)(u_int64_t)*val)
return (-1);/* too big/small */
if (*val == 0 && *unit == buf)
return (-1);/* No conversion performed. */



Re: amd64: softintr_dispatch: remove kernel lock

2021-06-23 Thread Visa Hankala
On Wed, Jun 23, 2021 at 05:39:05PM +0200, Mark Kettenis wrote:
> > Date: Wed, 23 Jun 2021 15:32:03 +
> > From: Visa Hankala 
> > 
> > On Wed, Jun 23, 2021 at 05:15:05PM +0200, Mark Kettenis wrote:
> > > > Date: Wed, 23 Jun 2021 14:56:45 +
> > > > From: Visa Hankala 
> > > > 
> > > > On Tue, Jun 22, 2021 at 09:46:22AM -0500, Scott Cheloha wrote:
> > > > > On Mon, Jun 21, 2021 at 02:04:30PM +, Visa Hankala wrote:
> > > > > > On Thu, May 27, 2021 at 07:40:26PM -0500, Scott Cheloha wrote:
> > > > > > > On Sun, May 23, 2021 at 09:05:24AM +, Visa Hankala wrote:
> > > > > > > > When a CPU starts processing a soft interrupt, it reserves the 
> > > > > > > > handler
> > > > > > > > to prevent concurrent execution. If the soft interrupt gets 
> > > > > > > > rescheduled
> > > > > > > > during processing, the handler is run again by the same CPU. 
> > > > > > > > This breaks
> > > > > > > > FIFO ordering, though.
> > > > > > > 
> > > > > > > If you want to preserve FIFO you can reinsert the handler at the 
> > > > > > > queue
> > > > > > > tail.  That would be more fair.
> > > > > > > 
> > > > > > > If FIFO is the current behavior I think we ought to keep it.
> > > > > > 
> > > > > > I have updated the patch to preserve the FIFO order.
> > > > > > 
> > > > > > > > +STAILQ_HEAD(x86_soft_intr_queue, x86_soft_intrhand);
> > > > > > > > +
> > > > > > > > +struct x86_soft_intr_queue softintr_queue[X86_NSOFTINTR];
> > > > > > > 
> > > > > > > Why did we switch to STAILQ?  I know we don't have very many
> > > > > > > softintr_disestablish() calls but isn't O(1) removal worth the 
> > > > > > > extra
> > > > > > > pointer?
> > > > > > 
> > > > > > I used STAILQ because it avoids the hassle of updating the list 
> > > > > > nodes'
> > > > > > back pointers. softintr_disestablish() with multiple items pending 
> > > > > > in
> > > > > > the queue is very rare in comparison to the normal 
> > > > > > softintr_schedule() /
> > > > > > softintr_dispatch() cycle.
> > > > > > 
> > > > > > However, I have changed the code back to using TAILQ.
> > > > > 
> > > > > This looks good to me.  I mean, it looked good before, but it still
> > > > > looks good.
> > > > > 
> > > > > I will run with it for a few days.
> > > > > 
> > > > > Assuming I hit no issues I'll come back with an OK.
> > > > > 
> > > > > Is there an easy way to exercise this code from userspace?  There
> > > > > aren't many softintr users.
> > > > > 
> > > > > Maybe audio drivers?
> > > > 
> > > > audio(4) is one option with a relatively high rate of scheduling.
> > > > Serial communications drivers, such as com(4), might be useful for
> > > > testing too.
> > > > 
> > > > softintr_disestablish() can be exercised with uaudio(4) and ucom(4)
> > > > for example.
> > > > 
> > > > I am still uncertain whether the barrier in softintr_disestablish()
> > > > is fully safe. The typical detach-side users are audio_detach(),
> > > > com_detach() and usb_detach(). They should be fine because the
> > > > surrounding code may sleep. However, sbus(4) worries me because it
> > > > invokes softintr_disestablish() from PCMCIA intr_disestablish callback,
> > > > and I do not know how wild the usage contexts can be. sbus(4) is
> > > > specific to sparc64, though.
> > > 
> > > Suprise-removal is a thing for PCI as well as PCMCIA and USB.  And in
> > > the PCI case this will call com_detach() and therefore
> > > softintr_disestablish() from interrupt context, where you can't sleep.
> > > 
> > > So I don't think that using some sort of barrier that sleeps is an
> > > option.
> > 
> > Well, com_detach() does things that may sleep, so then the existing code
> > seems wrong.
> 
> Hmm, actually, it seems I misremembered and PCI hotplug remove runs in
> a task (see dev/pci/ppb.c).  So maybe it is ok.
> 
> > I will revise the diff so that it spins rather than sleeps when a handler
> > is active.
> 
> That wouldn't work on non-MP kernels isn't it?

Barrier-like removal is not reliable in interrupt context because an
interrupt might have preempted the soft interrupt handler that the
interrupt handler is about to remove. This applies to both MP and non-MP
kernels.

On a non-MP kernel, spinning, or sleeping, is not actually needed.
Pending soft interrupts should run immediately when the system priority
level allows them. If SPL is none on entry to softintr_disestablish(),
the handler should have finished already. If SPL blocks soft interrupts
on entry, the dequeueing clears any pending soft interrupt.



Re: amd64: softintr_dispatch: remove kernel lock

2021-06-23 Thread Mark Kettenis
> Date: Wed, 23 Jun 2021 15:32:03 +
> From: Visa Hankala 
> 
> On Wed, Jun 23, 2021 at 05:15:05PM +0200, Mark Kettenis wrote:
> > > Date: Wed, 23 Jun 2021 14:56:45 +
> > > From: Visa Hankala 
> > > 
> > > On Tue, Jun 22, 2021 at 09:46:22AM -0500, Scott Cheloha wrote:
> > > > On Mon, Jun 21, 2021 at 02:04:30PM +, Visa Hankala wrote:
> > > > > On Thu, May 27, 2021 at 07:40:26PM -0500, Scott Cheloha wrote:
> > > > > > On Sun, May 23, 2021 at 09:05:24AM +, Visa Hankala wrote:
> > > > > > > When a CPU starts processing a soft interrupt, it reserves the 
> > > > > > > handler
> > > > > > > to prevent concurrent execution. If the soft interrupt gets 
> > > > > > > rescheduled
> > > > > > > during processing, the handler is run again by the same CPU. This 
> > > > > > > breaks
> > > > > > > FIFO ordering, though.
> > > > > > 
> > > > > > If you want to preserve FIFO you can reinsert the handler at the 
> > > > > > queue
> > > > > > tail.  That would be more fair.
> > > > > > 
> > > > > > If FIFO is the current behavior I think we ought to keep it.
> > > > > 
> > > > > I have updated the patch to preserve the FIFO order.
> > > > > 
> > > > > > > +STAILQ_HEAD(x86_soft_intr_queue, x86_soft_intrhand);
> > > > > > > +
> > > > > > > +struct x86_soft_intr_queue softintr_queue[X86_NSOFTINTR];
> > > > > > 
> > > > > > Why did we switch to STAILQ?  I know we don't have very many
> > > > > > softintr_disestablish() calls but isn't O(1) removal worth the extra
> > > > > > pointer?
> > > > > 
> > > > > I used STAILQ because it avoids the hassle of updating the list nodes'
> > > > > back pointers. softintr_disestablish() with multiple items pending in
> > > > > the queue is very rare in comparison to the normal 
> > > > > softintr_schedule() /
> > > > > softintr_dispatch() cycle.
> > > > > 
> > > > > However, I have changed the code back to using TAILQ.
> > > > 
> > > > This looks good to me.  I mean, it looked good before, but it still
> > > > looks good.
> > > > 
> > > > I will run with it for a few days.
> > > > 
> > > > Assuming I hit no issues I'll come back with an OK.
> > > > 
> > > > Is there an easy way to exercise this code from userspace?  There
> > > > aren't many softintr users.
> > > > 
> > > > Maybe audio drivers?
> > > 
> > > audio(4) is one option with a relatively high rate of scheduling.
> > > Serial communications drivers, such as com(4), might be useful for
> > > testing too.
> > > 
> > > softintr_disestablish() can be exercised with uaudio(4) and ucom(4)
> > > for example.
> > > 
> > > I am still uncertain whether the barrier in softintr_disestablish()
> > > is fully safe. The typical detach-side users are audio_detach(),
> > > com_detach() and usb_detach(). They should be fine because the
> > > surrounding code may sleep. However, sbus(4) worries me because it
> > > invokes softintr_disestablish() from PCMCIA intr_disestablish callback,
> > > and I do not know how wild the usage contexts can be. sbus(4) is
> > > specific to sparc64, though.
> > 
> > Suprise-removal is a thing for PCI as well as PCMCIA and USB.  And in
> > the PCI case this will call com_detach() and therefore
> > softintr_disestablish() from interrupt context, where you can't sleep.
> > 
> > So I don't think that using some sort of barrier that sleeps is an
> > option.
> 
> Well, com_detach() does things that may sleep, so then the existing code
> seems wrong.

Hmm, actually, it seems I misremembered and PCI hotplug remove runs in
a task (see dev/pci/ppb.c).  So maybe it is ok.

> I will revise the diff so that it spins rather than sleeps when a handler
> is active.

That wouldn't work on non-MP kernels isn't it?



Re: amd64: softintr_dispatch: remove kernel lock

2021-06-23 Thread Visa Hankala
On Wed, Jun 23, 2021 at 05:15:05PM +0200, Mark Kettenis wrote:
> > Date: Wed, 23 Jun 2021 14:56:45 +
> > From: Visa Hankala 
> > 
> > On Tue, Jun 22, 2021 at 09:46:22AM -0500, Scott Cheloha wrote:
> > > On Mon, Jun 21, 2021 at 02:04:30PM +, Visa Hankala wrote:
> > > > On Thu, May 27, 2021 at 07:40:26PM -0500, Scott Cheloha wrote:
> > > > > On Sun, May 23, 2021 at 09:05:24AM +, Visa Hankala wrote:
> > > > > > When a CPU starts processing a soft interrupt, it reserves the 
> > > > > > handler
> > > > > > to prevent concurrent execution. If the soft interrupt gets 
> > > > > > rescheduled
> > > > > > during processing, the handler is run again by the same CPU. This 
> > > > > > breaks
> > > > > > FIFO ordering, though.
> > > > > 
> > > > > If you want to preserve FIFO you can reinsert the handler at the queue
> > > > > tail.  That would be more fair.
> > > > > 
> > > > > If FIFO is the current behavior I think we ought to keep it.
> > > > 
> > > > I have updated the patch to preserve the FIFO order.
> > > > 
> > > > > > +STAILQ_HEAD(x86_soft_intr_queue, x86_soft_intrhand);
> > > > > > +
> > > > > > +struct x86_soft_intr_queue softintr_queue[X86_NSOFTINTR];
> > > > > 
> > > > > Why did we switch to STAILQ?  I know we don't have very many
> > > > > softintr_disestablish() calls but isn't O(1) removal worth the extra
> > > > > pointer?
> > > > 
> > > > I used STAILQ because it avoids the hassle of updating the list nodes'
> > > > back pointers. softintr_disestablish() with multiple items pending in
> > > > the queue is very rare in comparison to the normal softintr_schedule() /
> > > > softintr_dispatch() cycle.
> > > > 
> > > > However, I have changed the code back to using TAILQ.
> > > 
> > > This looks good to me.  I mean, it looked good before, but it still
> > > looks good.
> > > 
> > > I will run with it for a few days.
> > > 
> > > Assuming I hit no issues I'll come back with an OK.
> > > 
> > > Is there an easy way to exercise this code from userspace?  There
> > > aren't many softintr users.
> > > 
> > > Maybe audio drivers?
> > 
> > audio(4) is one option with a relatively high rate of scheduling.
> > Serial communications drivers, such as com(4), might be useful for
> > testing too.
> > 
> > softintr_disestablish() can be exercised with uaudio(4) and ucom(4)
> > for example.
> > 
> > I am still uncertain whether the barrier in softintr_disestablish()
> > is fully safe. The typical detach-side users are audio_detach(),
> > com_detach() and usb_detach(). They should be fine because the
> > surrounding code may sleep. However, sbus(4) worries me because it
> > invokes softintr_disestablish() from PCMCIA intr_disestablish callback,
> > and I do not know how wild the usage contexts can be. sbus(4) is
> > specific to sparc64, though.
> 
> Suprise-removal is a thing for PCI as well as PCMCIA and USB.  And in
> the PCI case this will call com_detach() and therefore
> softintr_disestablish() from interrupt context, where you can't sleep.
> 
> So I don't think that using some sort of barrier that sleeps is an
> option.

Well, com_detach() does things that may sleep, so then the existing code
seems wrong.

I will revise the diff so that it spins rather than sleeps when a handler
is active.



Re: amd64: softintr_dispatch: remove kernel lock

2021-06-23 Thread Mark Kettenis
> Date: Wed, 23 Jun 2021 14:56:45 +
> From: Visa Hankala 
> 
> On Tue, Jun 22, 2021 at 09:46:22AM -0500, Scott Cheloha wrote:
> > On Mon, Jun 21, 2021 at 02:04:30PM +, Visa Hankala wrote:
> > > On Thu, May 27, 2021 at 07:40:26PM -0500, Scott Cheloha wrote:
> > > > On Sun, May 23, 2021 at 09:05:24AM +, Visa Hankala wrote:
> > > > > When a CPU starts processing a soft interrupt, it reserves the handler
> > > > > to prevent concurrent execution. If the soft interrupt gets 
> > > > > rescheduled
> > > > > during processing, the handler is run again by the same CPU. This 
> > > > > breaks
> > > > > FIFO ordering, though.
> > > > 
> > > > If you want to preserve FIFO you can reinsert the handler at the queue
> > > > tail.  That would be more fair.
> > > > 
> > > > If FIFO is the current behavior I think we ought to keep it.
> > > 
> > > I have updated the patch to preserve the FIFO order.
> > > 
> > > > > +STAILQ_HEAD(x86_soft_intr_queue, x86_soft_intrhand);
> > > > > +
> > > > > +struct x86_soft_intr_queue softintr_queue[X86_NSOFTINTR];
> > > > 
> > > > Why did we switch to STAILQ?  I know we don't have very many
> > > > softintr_disestablish() calls but isn't O(1) removal worth the extra
> > > > pointer?
> > > 
> > > I used STAILQ because it avoids the hassle of updating the list nodes'
> > > back pointers. softintr_disestablish() with multiple items pending in
> > > the queue is very rare in comparison to the normal softintr_schedule() /
> > > softintr_dispatch() cycle.
> > > 
> > > However, I have changed the code back to using TAILQ.
> > 
> > This looks good to me.  I mean, it looked good before, but it still
> > looks good.
> > 
> > I will run with it for a few days.
> > 
> > Assuming I hit no issues I'll come back with an OK.
> > 
> > Is there an easy way to exercise this code from userspace?  There
> > aren't many softintr users.
> > 
> > Maybe audio drivers?
> 
> audio(4) is one option with a relatively high rate of scheduling.
> Serial communications drivers, such as com(4), might be useful for
> testing too.
> 
> softintr_disestablish() can be exercised with uaudio(4) and ucom(4)
> for example.
> 
> I am still uncertain whether the barrier in softintr_disestablish()
> is fully safe. The typical detach-side users are audio_detach(),
> com_detach() and usb_detach(). They should be fine because the
> surrounding code may sleep. However, sbus(4) worries me because it
> invokes softintr_disestablish() from PCMCIA intr_disestablish callback,
> and I do not know how wild the usage contexts can be. sbus(4) is
> specific to sparc64, though.

Suprise-removal is a thing for PCI as well as PCMCIA and USB.  And in
the PCI case this will call com_detach() and therefore
softintr_disestablish() from interrupt context, where you can't sleep.

So I don't think that using some sort of barrier that sleeps is an
option.



Re: amd64: softintr_dispatch: remove kernel lock

2021-06-23 Thread Visa Hankala
On Tue, Jun 22, 2021 at 09:46:22AM -0500, Scott Cheloha wrote:
> On Mon, Jun 21, 2021 at 02:04:30PM +, Visa Hankala wrote:
> > On Thu, May 27, 2021 at 07:40:26PM -0500, Scott Cheloha wrote:
> > > On Sun, May 23, 2021 at 09:05:24AM +, Visa Hankala wrote:
> > > > When a CPU starts processing a soft interrupt, it reserves the handler
> > > > to prevent concurrent execution. If the soft interrupt gets rescheduled
> > > > during processing, the handler is run again by the same CPU. This breaks
> > > > FIFO ordering, though.
> > > 
> > > If you want to preserve FIFO you can reinsert the handler at the queue
> > > tail.  That would be more fair.
> > > 
> > > If FIFO is the current behavior I think we ought to keep it.
> > 
> > I have updated the patch to preserve the FIFO order.
> > 
> > > > +STAILQ_HEAD(x86_soft_intr_queue, x86_soft_intrhand);
> > > > +
> > > > +struct x86_soft_intr_queue softintr_queue[X86_NSOFTINTR];
> > > 
> > > Why did we switch to STAILQ?  I know we don't have very many
> > > softintr_disestablish() calls but isn't O(1) removal worth the extra
> > > pointer?
> > 
> > I used STAILQ because it avoids the hassle of updating the list nodes'
> > back pointers. softintr_disestablish() with multiple items pending in
> > the queue is very rare in comparison to the normal softintr_schedule() /
> > softintr_dispatch() cycle.
> > 
> > However, I have changed the code back to using TAILQ.
> 
> This looks good to me.  I mean, it looked good before, but it still
> looks good.
> 
> I will run with it for a few days.
> 
> Assuming I hit no issues I'll come back with an OK.
> 
> Is there an easy way to exercise this code from userspace?  There
> aren't many softintr users.
> 
> Maybe audio drivers?

audio(4) is one option with a relatively high rate of scheduling.
Serial communications drivers, such as com(4), might be useful for
testing too.

softintr_disestablish() can be exercised with uaudio(4) and ucom(4)
for example.

I am still uncertain whether the barrier in softintr_disestablish()
is fully safe. The typical detach-side users are audio_detach(),
com_detach() and usb_detach(). They should be fine because the
surrounding code may sleep. However, sbus(4) worries me because it
invokes softintr_disestablish() from PCMCIA intr_disestablish callback,
and I do not know how wild the usage contexts can be. sbus(4) is
specific to sparc64, though.



Re: new kqueue-based select(2) implementation

2021-06-23 Thread Theo de Raadt
Alexander Bluhm  wrote:

> On Wed, Jun 23, 2021 at 11:40:18AM +0200, Martin Pieuchot wrote:
> > Our previous attempt [0] to replace the current select(2) implementation
> > has been reverted due to non-acceptable latency increase on sockets [1].
> 
> I have measured the performance difference.
> 
> http://bluhm.genua.de/perform/results/2021-06-21T09%3A44%3A18Z/perform.html
> 
> Worst 20% throughput drop [...]

That is terrible.

It is really crazy that this conversion should produce worse results.
As in, if it produces any worse results it should not be done.




Re: new kqueue-based select(2) implementation

2021-06-23 Thread Alexander Bluhm
On Wed, Jun 23, 2021 at 11:40:18AM +0200, Martin Pieuchot wrote:
> Our previous attempt [0] to replace the current select(2) implementation
> has been reverted due to non-acceptable latency increase on sockets [1].

I have measured the performance difference.

http://bluhm.genua.de/perform/results/2021-06-21T09%3A44%3A18Z/perform.html

Worst 20% throughput drop is in 'iperf3 -c10.3.45.35 -u -b10G -w1m
-t10 -R' which can be seen here.

http://bluhm.genua.de/perform/results/2021-06-21T09%3A44%3A18Z/gnuplot/udp.html

Note that iperf3 calls select(2) multiple times per UDP packet.

As a new feature I have links to btrace kstack flame graphs in the
table.

bluhm



Re: uao references & uao_swap_off() cleanup

2021-06-23 Thread Jonathan Matthew
On Wed, Jun 23, 2021 at 09:37:10AM +0200, Martin Pieuchot wrote:
> On 16/06/21(Wed) 11:26, Martin Pieuchot wrote:
> > Diff below does two things:
> > 
> > - Use atomic operations for incrementing/decrementing references of
> >   anonymous objects.  This allows us to manipulate them without holding
> >   the KERNEL_LOCK().
> > 
> > - Rewrite the loop from uao_swap_off() to only keep a reference to the
> >   next item in the list.  This is imported from NetBSD and is necessary
> >   to introduce locking around uao_pagein().
> > 
> > ok?
> 
> Anyone?

uao_reference_locked() and uao_detach_locked() are prototyped in
uvm_extern.h, so they should be removed here too.

It doesn't look like uao_detach() is safe to call without the
kernel lock; it calls uao_dropswap() for each page, which calls
uao_set_swslot(), which includes a KERNEL_ASSERT_LOCKED().
Should we keep the KERNEL_ASSERT_LOCKED() in uao_detach()?

ok jmatthew@ otherwise

> 
> > 
> > Index: uvm/uvm_aobj.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_aobj.c,v
> > retrieving revision 1.98
> > diff -u -p -r1.98 uvm_aobj.c
> > --- uvm/uvm_aobj.c  16 Jun 2021 09:02:21 -  1.98
> > +++ uvm/uvm_aobj.c  16 Jun 2021 09:20:26 -
> > @@ -779,19 +779,11 @@ uao_init(void)
> >  void
> >  uao_reference(struct uvm_object *uobj)
> >  {
> > -   KERNEL_ASSERT_LOCKED();
> > -   uao_reference_locked(uobj);
> > -}
> > -
> > -void
> > -uao_reference_locked(struct uvm_object *uobj)
> > -{
> > -
> > /* Kernel object is persistent. */
> > if (UVM_OBJ_IS_KERN_OBJECT(uobj))
> > return;
> >  
> > -   uobj->uo_refs++;
> > +   atomic_inc_int(>uo_refs);
> >  }
> >  
> >  
> > @@ -801,34 +793,19 @@ uao_reference_locked(struct uvm_object *
> >  void
> >  uao_detach(struct uvm_object *uobj)
> >  {
> > -   KERNEL_ASSERT_LOCKED();
> > -   uao_detach_locked(uobj);
> > -}
> > -
> > -
> > -/*
> > - * uao_detach_locked: drop a reference to an aobj
> > - *
> > - * => aobj may freed upon return.
> > - */
> > -void
> > -uao_detach_locked(struct uvm_object *uobj)
> > -{
> > struct uvm_aobj *aobj = (struct uvm_aobj *)uobj;
> > struct vm_page *pg;
> >  
> > /*
> >  * Detaching from kernel_object is a NOP.
> >  */
> > -   if (UVM_OBJ_IS_KERN_OBJECT(uobj)) {
> > +   if (UVM_OBJ_IS_KERN_OBJECT(uobj))
> > return;
> > -   }
> >  
> > /*
> >  * Drop the reference.  If it was the last one, destroy the object.
> >  */
> > -   uobj->uo_refs--;
> > -   if (uobj->uo_refs) {
> > +   if (atomic_dec_int_nv(>uo_refs) > 0) {
> > return;
> > }
> >  
> > @@ -1265,68 +1242,54 @@ uao_dropswap(struct uvm_object *uobj, in
> >  boolean_t
> >  uao_swap_off(int startslot, int endslot)
> >  {
> > -   struct uvm_aobj *aobj, *nextaobj, *prevaobj = NULL;
> > +   struct uvm_aobj *aobj;
> >  
> > /*
> > -* Walk the list of all anonymous UVM objects.
> > +* Walk the list of all anonymous UVM objects.  Grab the first.
> >  */
> > mtx_enter(_list_lock);
> > +   if ((aobj = LIST_FIRST(_list)) == NULL) {
> > +   mtx_leave(_list_lock);
> > +   return FALSE;
> > +   }
> > +   uao_reference(>u_obj);
> >  
> > -   for (aobj = LIST_FIRST(_list);
> > -aobj != NULL;
> > -aobj = nextaobj) {
> > +   do {
> > +   struct uvm_aobj *nextaobj;
> > boolean_t rv;
> >  
> > /*
> > -* add a ref to the aobj so it doesn't disappear
> > -* while we're working.
> > -*/
> > -   uao_reference_locked(>u_obj);
> > -
> > -   /*
> > -* now it's safe to unlock the uao list.
> > -* note that lock interleaving is alright with IPL_NONE mutexes.
> > +* Prefetch the next object and immediately hold a reference
> > +* on it, so neither the current nor the next entry could
> > +* disappear while we are iterating.
> >  */
> > -   mtx_leave(_list_lock);
> > -
> > -   if (prevaobj) {
> > -   uao_detach_locked(>u_obj);
> > -   prevaobj = NULL;
> > +   if ((nextaobj = LIST_NEXT(aobj, u_list)) != NULL) {
> > +   uao_reference(>u_obj);
> > }
> > +   mtx_leave(_list_lock);
> >  
> > /*
> > -* page in any pages in the swslot range.
> > -* if there's an error, abort and return the error.
> > +* Page in all pages in the swap slot range.
> >  */
> > rv = uao_pagein(aobj, startslot, endslot);
> > +
> > +   /* Drop the reference of the current object. */
> > +   uao_detach(>u_obj);
> > if (rv) {
> > -   uao_detach_locked(>u_obj);
> > +   if (nextaobj) {
> > +   uao_detach(>u_obj);
> > +   }
> > return rv;
> > }
> >  
> > -   /*
> 

base-gcc: fix -Wno-error=uninitialized

2021-06-23 Thread Jeremie Courreges-Anglas


In base-gcc -Wno-error=uninitialized is currently ineffective, in that
cc1 still errors out with the following flags combination:

  billy /tmp$ cat t.c
  #include 

  int
  f(void)
  {
  int r;
  return r;
  }
  billy /tmp$ gcc -v -c -O2 -Wall -Werror  -Wno-error=uninitialized t.c

Previous behavior:

  [...]
  cc1.old: warnings being treated as errors
  t.c: In function 'f':
  t.c:7: warning: 'r' is used uninitialized in this function
  billy /tmp$ echo $?
  1
  billy /tmp$

After the patch (expected behavior):

  billy /tmp$ gcc -c -O2 -Wall -Werror -Wno-error=uninitialized t.c
  t.c: In function 'f':
  t.c:7: warning: 'r' is used uninitialized in this function
  billy /tmp$ echo $?
  0
  billy /tmp$

I know that base-gcc is low priority on our roadmap so I'll explain the
rationale for this diff:

I had plans for using -Wall -Werror -Wno-error=uninitialized in our
kernel builds, but those plans went nowhere - so far.  One of the
problems was adding more differences between clang and gcc in kernel
Makefiles (because -Wno-error=uninitialized didn't work with base-gcc).
Another problem was (in my mind at least) base-gcc spewing lots of bogus
"may be used uninitialized" warnings.  I have another possible fix
for that.

Anyway, I'd like to see where those tiny fixes can lead us.  Even if the
answer is "nowhere", the fix below is simple and worth committing
I think.

(If you care about how it works, see diagnostic_report_diagnostic() and
diagnostic_action_after_output() in diagnostic.c)

ok?


Index: tree-ssa.c
===
RCS file: /cvs/src/gnu/gcc/gcc/tree-ssa.c,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 tree-ssa.c
--- tree-ssa.c  15 Oct 2009 17:11:28 -  1.1.1.1
+++ tree-ssa.c  21 Jun 2021 23:37:37 -
@@ -1177,7 +1177,7 @@ warn_uninit (tree t, const char *gmsgid,
   locus = (context != NULL && EXPR_HAS_LOCATION (context)
   ? EXPR_LOCUS (context)
   : _SOURCE_LOCATION (var));
-  warning (0, gmsgid, locus, var);
+  warning (OPT_Wuninitialized, gmsgid, locus, var);
   fun_locus = _SOURCE_LOCATION (cfun->decl);
   if (locus->file != fun_locus->file
   || locus->line < fun_locus->line


-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: sparc64: enable dt(4) in GENERIC

2021-06-23 Thread Mark Kettenis
> Date: Wed, 23 Jun 2021 11:43:47 +0200
> From: Martin Pieuchot 
> 
> Similar to what has been done on x86 & arm64, ok?

ok kettenis@

> Index: conf/GENERIC
> ===
> RCS file: /cvs/src/sys/arch/sparc64/conf/GENERIC,v
> retrieving revision 1.316
> diff -u -p -r1.316 GENERIC
> --- conf/GENERIC  4 Feb 2021 16:25:39 -   1.316
> +++ conf/GENERIC  23 Jun 2021 07:39:53 -
> @@ -556,4 +556,5 @@ owtemp* at onewire?   # Temperature
>  owctr*   at onewire? # Counter device
>  
>  pseudo-devicehotplug 1   # devices hot plugging
> +pseudo-devicedt
>  pseudo-devicewsmux   2   # mouse & keyboard multiplexor
> 
> 



sparc64: enable dt(4) in GENERIC

2021-06-23 Thread Martin Pieuchot
Similar to what has been done on x86 & arm64, ok?

Index: conf/GENERIC
===
RCS file: /cvs/src/sys/arch/sparc64/conf/GENERIC,v
retrieving revision 1.316
diff -u -p -r1.316 GENERIC
--- conf/GENERIC4 Feb 2021 16:25:39 -   1.316
+++ conf/GENERIC23 Jun 2021 07:39:53 -
@@ -556,4 +556,5 @@ owtemp* at onewire? # Temperature
 owctr* at onewire? # Counter device
 
 pseudo-device  hotplug 1   # devices hot plugging
+pseudo-device  dt
 pseudo-device  wsmux   2   # mouse & keyboard multiplexor



new kqueue-based select(2) implementation

2021-06-23 Thread Martin Pieuchot
Our previous attempt [0] to replace the current select(2) implementation
has been reverted due to non-acceptable latency increase on sockets [1].

This performance regression has been analysed and partially addressed
thanks to bluhm@ and visa@.  The cost of allocating/freeing 'knote'
descriptors has been mitigated by using a pool cache and by using lazy
removal of items.

The next bottleneck is due to contention on the NET_LOCK() intensified
by the fact that kqueue hooks might be checked multiple times per
syscall.  We are aware of those and we are interested in improving the
situation, however I believe we should get this diff in first.  Keeping
it out of tree is starting to be painful.

So I'm asking for tests.  Darren I'd greatly appreciate if you could
check if the ssh regression suite is working well with this.

Thanks,
Martin

[0] https://marc.info/?l=openbsd-tech=160675386103259=2
[1] https://marc.info/?l=openbsd-bugs=161003823423816=2

Index: kern/sys_generic.c
===
RCS file: /cvs/src/sys/kern/sys_generic.c,v
retrieving revision 1.135
diff -u -p -r1.135 sys_generic.c
--- kern/sys_generic.c  8 Jan 2021 09:29:04 -   1.135
+++ kern/sys_generic.c  21 Jun 2021 07:58:07 -
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef KTRACE
 #include 
 #endif
@@ -66,8 +67,21 @@
 
 #include 
 
-int selscan(struct proc *, fd_set *, fd_set *, int, int, register_t *);
-void pollscan(struct proc *, struct pollfd *, u_int, register_t *);
+/*
+ * Debug values:
+ *  1 - print implementation errors, things that should not happen.
+ *  2 - print ppoll(2) information, somewhat verbose
+ *  3 - print pselect(2) and ppoll(2) information, very verbose
+ */
+int kqpoll_debug = 0;
+#define DPRINTFN(v, x...) if (kqpoll_debug > v) {  \
+   printf("%s(%d): ", curproc->p_p->ps_comm, curproc->p_tid);  \
+   printf(x);  \
+}
+
+int pselregister(struct proc *, fd_set *[], int, int *);
+int pselcollect(struct proc *, struct kevent *, fd_set *[], int *);
+
 int pollout(struct pollfd *, struct pollfd *, u_int);
 int dopselect(struct proc *, int, fd_set *, fd_set *, fd_set *,
 struct timespec *, const sigset_t *, register_t *);
@@ -584,11 +598,10 @@ int
 dopselect(struct proc *p, int nd, fd_set *in, fd_set *ou, fd_set *ex,
 struct timespec *timeout, const sigset_t *sigmask, register_t *retval)
 {
+   struct kqueue_scan_state scan;
fd_mask bits[6];
fd_set *pibits[3], *pobits[3];
-   struct timespec elapsed, start, stop;
-   uint64_t nsecs;
-   int s, ncoll, error = 0;
+   int error, ncollected = 0, nevents = 0;
u_int ni;
 
if (nd < 0)
@@ -618,6 +631,8 @@ dopselect(struct proc *p, int nd, fd_set
pobits[2] = (fd_set *)[5];
}
 
+   kqpoll_init();
+
 #definegetbits(name, x) \
if (name && (error = copyin(name, pibits[x], ni))) \
goto done;
@@ -636,43 +651,61 @@ dopselect(struct proc *p, int nd, fd_set
if (sigmask)
dosigsuspend(p, *sigmask &~ sigcantmask);
 
-retry:
-   ncoll = nselcoll;
-   atomic_setbits_int(>p_flag, P_SELECT);
-   error = selscan(p, pibits[0], pobits[0], nd, ni, retval);
-   if (error || *retval)
+   /* Register kqueue events */
+   error = pselregister(p, pibits, nd, );
+   if (error != 0)
goto done;
-   if (timeout == NULL || timespecisset(timeout)) {
-   if (timeout != NULL) {
-   getnanouptime();
-   nsecs = MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP);
-   } else
-   nsecs = INFSLP;
-   s = splhigh();
-   if ((p->p_flag & P_SELECT) == 0 || nselcoll != ncoll) {
-   splx(s);
-   goto retry;
-   }
-   atomic_clearbits_int(>p_flag, P_SELECT);
-   error = tsleep_nsec(, PSOCK | PCATCH, "select", nsecs);
-   splx(s);
+
+   /*
+* The poll/select family of syscalls has been designed to
+* block when file descriptors are not available, even if
+* there's nothing to wait for.
+*/
+   if (nevents == 0) {
+   uint64_t nsecs = INFSLP;
+
if (timeout != NULL) {
-   getnanouptime();
-   timespecsub(, , );
-   timespecsub(timeout, , timeout);
-   if (timeout->tv_sec < 0)
-   timespecclear(timeout);
+   if (!timespecisset(timeout))
+   goto done;
+   nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP));
}
-   if (error == 0 || error == EWOULDBLOCK)
-   goto retry;
+   error = 

Re: snmpd(8) Better traphandler flow

2021-06-23 Thread Stuart Henderson
On 2021/06/20 22:31, Martijn van Duren wrote:
> On Fri, 2021-06-11 at 16:13 +0200, Martijn van Duren wrote:
> > any takers?
> > 
> > On Fri, 2021-06-04 at 22:11 +0200, Martijn van Duren wrote:
> > > ping
> > > 
> > > On Fri, 2021-05-28 at 08:19 +0200, Martijn van Duren wrote:
> > > > As the original comment said:
> > > > /*
> > > >  * This should probably go into parsevarbinds, but that's for a
> > > >  * next refactor
> > > >  */
> > > > 
> > > > This moves the pdu parsing of traps into parsevarbinds.
> > > > Added bonus, we can now see the packets in debugging mode:
> > > > startup
> > > > snmpe: listening on udp 127.0.0.1:162
> > > > ktable_new: new ktable for rtableid 0
> > > > snmpe_parse: 127.0.0.1:162: SNMPv2 'public' pdutype SNMPv2-Trap request 
> > > > 1329591927
> > > > snmpe_parse: 127.0.0.1:162: SNMPv1 'public' pdutype Trap request 0
> > > > 
> > > > OK?
> > > > 
> > > > martijn@
> > > > 
> Somehow this one survived the tightened security mayhem with only a few
> offset warnings. Still looking for OKs.

I don't know snmpd well enough to give a considered OK for this
(and don't use snmpd's trap handler, figure that if I'm going to
need to call net-snmp code to translate the numeric oid anyway
I might as well use their trapd), it does seem generally neater
to me

> @@ -460,7 +457,14 @@ snmpe_parsevarbinds(struct snmp_message 
>   struct ber_element  *pvarbind = NULL, *end;
>   char buf[BUFSIZ];
>   struct ber_oid   o;
> - int  i;
> + int  i = 1;
> +
> + if (msg->sm_pdutype == SNMP_C_TRAP ||
> + msg->sm_pdutype == SNMP_C_TRAPV2) {
> + if (traphandler_parse(msg) == -1)
> + goto varfail;
> + return 0;
> + }
>  
>   msg->sm_errstr = "invalid varbind element";
>  
> @@ -468,7 +472,7 @@ snmpe_parsevarbinds(struct snmp_message 
>   msg->sm_varbindresp = NULL;
>   end = NULL;
>  
> - for (i = 1; varbind != NULL && i < SNMPD_MAXVARBIND;
> + for (; varbind != NULL && i < SNMPD_MAXVARBIND;
>   varbind = varbind->be_next, i++) {
>   if (ober_scanf_elements(varbind, "{oeS$}", , ) == -1) {
>   stats->snmp_inasnparseerrs++;

not sure moving the initializer for i here is an improvement though,
seems better to have the loop more self-contained?



Re: uao references & uao_swap_off() cleanup

2021-06-23 Thread Martin Pieuchot
On 16/06/21(Wed) 11:26, Martin Pieuchot wrote:
> Diff below does two things:
> 
> - Use atomic operations for incrementing/decrementing references of
>   anonymous objects.  This allows us to manipulate them without holding
>   the KERNEL_LOCK().
> 
> - Rewrite the loop from uao_swap_off() to only keep a reference to the
>   next item in the list.  This is imported from NetBSD and is necessary
>   to introduce locking around uao_pagein().
> 
> ok?

Anyone?

> 
> Index: uvm/uvm_aobj.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_aobj.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 uvm_aobj.c
> --- uvm/uvm_aobj.c16 Jun 2021 09:02:21 -  1.98
> +++ uvm/uvm_aobj.c16 Jun 2021 09:20:26 -
> @@ -779,19 +779,11 @@ uao_init(void)
>  void
>  uao_reference(struct uvm_object *uobj)
>  {
> - KERNEL_ASSERT_LOCKED();
> - uao_reference_locked(uobj);
> -}
> -
> -void
> -uao_reference_locked(struct uvm_object *uobj)
> -{
> -
>   /* Kernel object is persistent. */
>   if (UVM_OBJ_IS_KERN_OBJECT(uobj))
>   return;
>  
> - uobj->uo_refs++;
> + atomic_inc_int(>uo_refs);
>  }
>  
>  
> @@ -801,34 +793,19 @@ uao_reference_locked(struct uvm_object *
>  void
>  uao_detach(struct uvm_object *uobj)
>  {
> - KERNEL_ASSERT_LOCKED();
> - uao_detach_locked(uobj);
> -}
> -
> -
> -/*
> - * uao_detach_locked: drop a reference to an aobj
> - *
> - * => aobj may freed upon return.
> - */
> -void
> -uao_detach_locked(struct uvm_object *uobj)
> -{
>   struct uvm_aobj *aobj = (struct uvm_aobj *)uobj;
>   struct vm_page *pg;
>  
>   /*
>* Detaching from kernel_object is a NOP.
>*/
> - if (UVM_OBJ_IS_KERN_OBJECT(uobj)) {
> + if (UVM_OBJ_IS_KERN_OBJECT(uobj))
>   return;
> - }
>  
>   /*
>* Drop the reference.  If it was the last one, destroy the object.
>*/
> - uobj->uo_refs--;
> - if (uobj->uo_refs) {
> + if (atomic_dec_int_nv(>uo_refs) > 0) {
>   return;
>   }
>  
> @@ -1265,68 +1242,54 @@ uao_dropswap(struct uvm_object *uobj, in
>  boolean_t
>  uao_swap_off(int startslot, int endslot)
>  {
> - struct uvm_aobj *aobj, *nextaobj, *prevaobj = NULL;
> + struct uvm_aobj *aobj;
>  
>   /*
> -  * Walk the list of all anonymous UVM objects.
> +  * Walk the list of all anonymous UVM objects.  Grab the first.
>*/
>   mtx_enter(_list_lock);
> + if ((aobj = LIST_FIRST(_list)) == NULL) {
> + mtx_leave(_list_lock);
> + return FALSE;
> + }
> + uao_reference(>u_obj);
>  
> - for (aobj = LIST_FIRST(_list);
> -  aobj != NULL;
> -  aobj = nextaobj) {
> + do {
> + struct uvm_aobj *nextaobj;
>   boolean_t rv;
>  
>   /*
> -  * add a ref to the aobj so it doesn't disappear
> -  * while we're working.
> -  */
> - uao_reference_locked(>u_obj);
> -
> - /*
> -  * now it's safe to unlock the uao list.
> -  * note that lock interleaving is alright with IPL_NONE mutexes.
> +  * Prefetch the next object and immediately hold a reference
> +  * on it, so neither the current nor the next entry could
> +  * disappear while we are iterating.
>*/
> - mtx_leave(_list_lock);
> -
> - if (prevaobj) {
> - uao_detach_locked(>u_obj);
> - prevaobj = NULL;
> + if ((nextaobj = LIST_NEXT(aobj, u_list)) != NULL) {
> + uao_reference(>u_obj);
>   }
> + mtx_leave(_list_lock);
>  
>   /*
> -  * page in any pages in the swslot range.
> -  * if there's an error, abort and return the error.
> +  * Page in all pages in the swap slot range.
>*/
>   rv = uao_pagein(aobj, startslot, endslot);
> +
> + /* Drop the reference of the current object. */
> + uao_detach(>u_obj);
>   if (rv) {
> - uao_detach_locked(>u_obj);
> + if (nextaobj) {
> + uao_detach(>u_obj);
> + }
>   return rv;
>   }
>  
> - /*
> -  * we're done with this aobj.
> -  * relock the list and drop our ref on the aobj.
> -  */
> + aobj = nextaobj;
>   mtx_enter(_list_lock);
> - nextaobj = LIST_NEXT(aobj, u_list);
> - /*
> -  * prevaobj means that we have an object that we need
> -  * to drop a reference for. We can't just drop it now with
> -  * the list locked since that could cause lock recursion in
> -  * the case where we reduce the refcount to 0. It will be
> -