Re: snmpd drop traphandler process

2020-12-05 Thread Martijn van Duren
Found some minor issues.
Please disregard for now.

On Tue, 2020-12-01 at 17:12 +0100, Martijn van Duren wrote:
> Hello tech@,
> 
> Long story short: the traphandler process in snmpd annoys me a great
> deal and is in the way for overhauling the transport mapping section
> of snmpe, which is needed for implementing new agentx master support.
> 
> The current traphandler process is also a joke, since all it does is
> receive a message, does a minimal parsing validation and then pass
> then entire buffer to the parent process, which forks the actual
> handlers. This means that the current pledgeset is also way too wide
> for what traphandler does.
> 
> Since snmpe already does a similar packet parsing I see no reason to
> keep this initial verification in the traphandler process. The diff
> below drops the traphandler process and moves the receiving of the
> packets to snmpe, which then forwards the pdu (instead of the
> current whole packet) to the parent. I also extended the checking, so
> it should be a little more resilient to malformed packets.
> 
> While here I also didn't like the fact that listening on a trap-port
> always implies that snmp itself is also listening. This diff adds the
> trap keyword (for udp) on the listen on line, so that we can setup a
> traphandler-only setup. This gives the added bonus that we can now
> also choose the port we listen on for traps. Adding a listen trap
> statement requires a trap handle and vice versa.
> 
> It's not the pretties code, but it's a stopgap that allows me to move
> forward without creating even larger diffs and without (hopefully)
> breaking existing setups (apart from the keyword change).
> 
> OK?
> 
> martijn@
> 
> Index: parse.y
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/parse.y,v
> retrieving revision 1.62
> diff -u -p -r1.62 parse.y
> --- parse.y 30 Oct 2020 07:43:48 -  1.62
> +++ parse.y 1 Dec 2020 16:12:20 -
> @@ -94,11 +94,9 @@ char *symget(const char *);
>  struct snmpd   *conf = NULL;
>  static int  errors = 0;
>  static struct usmuser  *user = NULL;
> -static char*snmpd_port = SNMPD_PORT;
>  
>  int host(const char *, const char *, int,
>     struct sockaddr_storage *, int);
> -int listen_add(struct sockaddr_storage *, int);
>  
>  typedef struct {
> union {
> @@ -284,13 +282,16 @@ listenproto   : UDP listen_udp
>  listen_udp : STRING port   {
> struct sockaddr_storage ss[16];
> int nhosts, i;
> +   char *port = $2;
>  
> -   nhosts = host($1, $2, SOCK_DGRAM, ss, nitems(ss));
> +   if (port == NULL)
> +   port = SNMPD_PORT;
> +
> +   nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss));
> if (nhosts < 1) {
> yyerror("invalid address: %s", $1);
> free($1);
> -   if ($2 != snmpd_port)
> -   free($2);
> +   free($2);
> YYERROR;
> }
> if (nhosts > (int)nitems(ss))
> @@ -298,10 +299,38 @@ listen_udp: STRING port   {
>     $1, $2, nitems(ss));
>  
> free($1);
> -   if ($2 != snmpd_port)
> +   free($2);
> +   for (i = 0; i < nhosts; i++) {
> +   if (listen_add(&(ss[i]), SOCK_DGRAM, 0) == 
> -1) {
> +   yyerror("calloc");
> +   YYERROR;
> +   }
> +   }
> +   }
> +   | STRING port TRAP  {
> +   struct sockaddr_storage ss[16];
> +   int nhosts, i;
> +   char *port = $2;
> +
> +   if (port == NULL)
> +   port = SNMPD_TRAPPORT;
> +
> +   nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss));
> +   if (nhosts < 1) {
> +   yyerror("invalid address: %s", $1);
> +   free($1);
> free($2);
> +   YYERROR;
> +   }
> +   if (nhosts > (int)nitems(ss))
> +   log_warn("%s:%s resolves to more than %zu 
> hosts",
> +   $1, $2, nitems(ss));
> +
> +   free($1);
> +   free($2);
> for (i = 0; i < nhosts; 

Re: Use SMR_TAILQ for `ps_threads'

2020-12-05 Thread Jonathan Matthew
On Fri, Dec 04, 2020 at 10:03:46AM -0300, Martin Pieuchot wrote:
> On 04/12/20(Fri) 12:01, Jonathan Matthew wrote:
> > On Wed, Dec 02, 2020 at 11:41:04AM -0300, Martin Pieuchot wrote:
> > > [...] 
> > > Could you try the diff below that only call smr_barrier() for multi-
> > > threaded processes with threads still in the list.  I guess this also
> > > answers guenther@'s question.  The same could be done with smr_flush().
> > 
> > This removes the overhead, more or less.  Are we only looking at unlocking 
> > access
> > to ps_threads from within a process (not the sysctl or ptrace stuff)?  
> > Otherwise
> > this doesn't seem safe.
> 
> I'd argue that if `ps_thread' is being iterated the CPU doing the
> iteration must already have a reference to the "struct process" so
> the serialization should be done on this reference.

Sounds reasonable to me.

> 
> Now I doubt we'll be able to answer all the questions right now.  If we
> can find a path forward that doesn't decrease performances too much and
> allow us to move signal delivery and sleep out of the KERNEL_LOCK()
> that's a huge win.

I think we're at an acceptable performance hit now, so if this lets you
progress with unlocking signal delivery, I'm happy.



Re: srp_finalize(9): tsleep(9) -> tsleep_nsec(9)

2020-12-05 Thread Jonathan Matthew
On Fri, Dec 04, 2020 at 12:17:31PM -0600, Scott Cheloha wrote:
> On Fri, Dec 04, 2020 at 09:56:02AM +0100, Claudio Jeker wrote:
> > On Thu, Dec 03, 2020 at 10:05:30PM -0600, Scott Cheloha wrote:
> > > Hi,
> > > 
> > > srp_finalize(9) uses tsleep(9) to spin while it waits for the object's
> > > refcount to reach zero.  It blocks for up to 1 tick and then checks
> > > the refecount again and again.
> > > 
> > > We can just as easily do this with tsleep_nsec(9) and block for 1
> > > millisecond per interval.
> > > 
> > > ok?
> > > 
> > > Index: kern_srp.c
> > > ===
> > > RCS file: /cvs/src/sys/kern/kern_srp.c,v
> > > retrieving revision 1.12
> > > diff -u -p -r1.12 kern_srp.c
> > > --- kern_srp.c8 Sep 2017 05:36:53 -   1.12
> > > +++ kern_srp.c4 Dec 2020 04:04:39 -
> > > @@ -274,7 +274,7 @@ void
> > >  srp_finalize(void *v, const char *wmesg)
> > >  {
> > >   while (srp_referenced(v))
> > > - tsleep(v, PWAIT, wmesg, 1);
> > > + tsleep_nsec(v, PWAIT, wmesg, MSEC_TO_NSEC(1));
> > >  }
> > >  
> > >  #else /* MULTIPROCESSOR */
> > > 
> > 
> > Why only 1ms instead of the original 10ms (at least on most archs)?
> 
> The underlying implementation can only process timeouts from
> hardclock(9) which runs about hz times per second.  If we tell the
> thread to "sleep for 10ms" it's almost always going to overshoot the
> next hardclock(9) and wind up sleeping ~20ms.
> 
> Some people run with HZ=1000 kernels.  I don't think many people run
> with kernels with a higher HZ than that, though.  So I figure a 1ms
> sleep is "good enough" for all practical kernels.  On HZ=100 kernels
> the thread will oversleep because it doesn't process timeouts often
> enough to honor the 1ms request.
> 
> Basically I'm trying to pick a reasonable polling interval (not too
> fast) that also won't cause the existing default kernel to block for
> longer than it already does (~10ms).  The default kernel is HZ=100, so
> a 1ms sleep will, in this case, almost always sleep ~10ms per
> iteration of this loop.

This sleep should basically be 'as short as possible', since it's waiting
out SRP references which are very short lived.  ok jmatthew@



Re: srp_finalize(9): tsleep(9) -> tsleep_nsec(9)

2020-12-05 Thread Claudio Jeker
On Fri, Dec 04, 2020 at 12:17:31PM -0600, Scott Cheloha wrote:
> On Fri, Dec 04, 2020 at 09:56:02AM +0100, Claudio Jeker wrote:
> > On Thu, Dec 03, 2020 at 10:05:30PM -0600, Scott Cheloha wrote:
> > > Hi,
> > > 
> > > srp_finalize(9) uses tsleep(9) to spin while it waits for the object's
> > > refcount to reach zero.  It blocks for up to 1 tick and then checks
> > > the refecount again and again.
> > > 
> > > We can just as easily do this with tsleep_nsec(9) and block for 1
> > > millisecond per interval.
> > > 
> > > ok?
> > > 
> > > Index: kern_srp.c
> > > ===
> > > RCS file: /cvs/src/sys/kern/kern_srp.c,v
> > > retrieving revision 1.12
> > > diff -u -p -r1.12 kern_srp.c
> > > --- kern_srp.c8 Sep 2017 05:36:53 -   1.12
> > > +++ kern_srp.c4 Dec 2020 04:04:39 -
> > > @@ -274,7 +274,7 @@ void
> > >  srp_finalize(void *v, const char *wmesg)
> > >  {
> > >   while (srp_referenced(v))
> > > - tsleep(v, PWAIT, wmesg, 1);
> > > + tsleep_nsec(v, PWAIT, wmesg, MSEC_TO_NSEC(1));
> > >  }
> > >  
> > >  #else /* MULTIPROCESSOR */
> > > 
> > 
> > Why only 1ms instead of the original 10ms (at least on most archs)?
> 
> The underlying implementation can only process timeouts from
> hardclock(9) which runs about hz times per second.  If we tell the
> thread to "sleep for 10ms" it's almost always going to overshoot the
> next hardclock(9) and wind up sleeping ~20ms.
> 
> Some people run with HZ=1000 kernels.  I don't think many people run
> with kernels with a higher HZ than that, though.  So I figure a 1ms
> sleep is "good enough" for all practical kernels.  On HZ=100 kernels
> the thread will oversleep because it doesn't process timeouts often
> enough to honor the 1ms request.
> 
> Basically I'm trying to pick a reasonable polling interval (not too
> fast) that also won't cause the existing default kernel to block for
> longer than it already does (~10ms).  The default kernel is HZ=100, so
> a 1ms sleep will, in this case, almost always sleep ~10ms per
> iteration of this loop.
> 
> It's a bit of a chicken-and-egg problem.
> 
> Does that make any sense?

I understand the chicken-and-egg problem and the rounding problem now.
In the end the result will be a higher poll frequencey (once the timeout
subsystem moves away from the tick based timeouts). It probably does not
matter. So OK claudio@

-- 
:wq Claudio



Re: vmm(4): cpuid leaf 0x15 fixes clock speed problem in Linux guest [PATCH]

2020-12-05 Thread Pratik Vyas

* Jozef Hatala  [2020-11-29 00:32:17 -0800]:


On Sun, Nov 29, 2020 at 12:04:04AM -0800, I wrote:

On Sun, Nov 29, 2020 at 06:36:17AM +, Mike wrote:
> And what are you going to return for the other leaf nodes now that
> you are claiming a CPUID level of 0x15, on CPUs that are less than
> that?
>
> Eg, if your host supports level 0xb (11), what happens in your diff
> if the guest VM requests CPUID leaf node 0x0c (12)?
>
> It seems you are only handing the single 0x15 case.

That is true.

We need to note though that the current situation is not correct either.
[...]
What do you suggest?


Two ideas spring to mind for how to resolve this:

(a) Figure out from Intel's documentation for each leaf (and sub-leaf)
what response is correct and does not require us to emulate too much.

(b) Add a sysctl knob that controls whether we even announce a higher
cpu level to the guest than what the host supports, so that we do not
expose people to the incorrectness by default.

What do you think?

jh



Hi Jozef,

I don't see Mike's replies but can you try Dave's linux kernel module [0]
and see if it performs well when linux chooses to use it?  disclaimer:
I haven't tried it but my attempt to attach kvm-clock itself worked [1].

This cpuid emulation bit was added during the time when using tsc was
the only way to get a precise clock and before pvclock was added [2].  This
also doesn't work on AMD machines (on at least mine).  We could get rid
of this cpuid emulation.


0: https://github.com/voutilad/vmm_clock
1: http://ix.io/2lzK
2: https://github.com/openbsd/src/commit/fa1855cdc021

--
Pratik



Re: mbg(4): tsleep(9) -> tsleep_nsec(9)

2020-12-05 Thread Claudio Jeker
On Fri, Dec 04, 2020 at 12:08:39PM -0600, Scott Cheloha wrote:
> On Fri, Dec 04, 2020 at 10:07:07AM +0100, Claudio Jeker wrote:
> > On Thu, Dec 03, 2020 at 10:42:50PM -0600, Scott Cheloha wrote:
> > > Hi,
> > > 
> > > mbg(4) is among the few remaining drivers using tsleep(9).
> > > 
> > > In a few spots, when the kernel is not cold, the driver will spin for
> > > up to 1/10 seconds waiting for the MBG_BUSY flag to go low.
> > > 
> > > We can approximate this behavior by spinning 10 times and sleeping 10
> > > milliseconds each iteration.  10 x 10ms = 100ms = 1/10 seconds.
> > > 
> > > I can't test this but I was able to compile it on amd64.  It isn't
> > > normally built for amd64, though.  Just i386.
> > > 
> > > I have my doubts that anyone has this card and is able to actually
> > > test this diff.
> > > 
> > > Anybody ok?
> > 
> > This code needs to wait for around 70us for the card to process the
> > command (according to the comment). The cold code sleeps a max of
> > 50 * 20us (1ms). I don't see why the regular code should sleep so much
> > longer. I would suggest to use a 1ms timeout and loop 10 times. This is a
> > magnitude more than enough and most probably only one cycle will be
> > needed.
> > 
> > IIRC someone got a mbg(4) device some time ago apart from mbalmer@
> 
> Makes sense to me.  Updated diff attached.

OK claudio@
 
> How are we going to find this person?

Best way is to commit the diff and wait :)
 
> Index: mbg.c
> ===
> RCS file: /cvs/src/sys/dev/pci/mbg.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 mbg.c
> --- mbg.c 29 Nov 2020 03:17:27 -  1.31
> +++ mbg.c 4 Dec 2020 18:07:43 -
> @@ -417,12 +417,12 @@ mbg_read_amcc_s5920(struct mbg_softc *sc
>  
>   /* wait for the BUSY flag to go low (approx 70 us on i386) */
>   timer = 0;
> - tmax = cold ? 50 : hz / 10;
> + tmax = cold ? 50 : 10;
>   do {
>   if (cold)
>   delay(20);
>   else
> - tsleep(tstamp, 0, "mbg", 1);
> + tsleep_nsec(tstamp, 0, "mbg", MSEC_TO_NSEC(1));
>   status = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
>   AMCC_IMB4 + 3);
>   } while ((status & MBG_BUSY) && timer++ < tmax);
> @@ -473,12 +473,12 @@ mbg_read_amcc_s5933(struct mbg_softc *sc
>  
>   /* wait for the BUSY flag to go low (approx 70 us on i386) */
>   timer = 0;
> - tmax = cold ? 50 : hz / 10;
> + tmax = cold ? 50 : 10;
>   do {
>   if (cold)
>   delay(20);
>   else
> - tsleep(tstamp, 0, "mbg", 1);
> + tsleep_nsec(tstamp, 0, "mbg", MSEC_TO_NSEC(1));
>   status = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
>   AMCC_IMB4 + 3);
>   } while ((status & MBG_BUSY) && timer++ < tmax);
> @@ -525,12 +525,12 @@ mbg_read_asic(struct mbg_softc *sc, int 
>  
>   /* wait for the BUSY flag to go low */
>   timer = 0;
> - tmax = cold ? 50 : hz / 10;
> + tmax = cold ? 50 : 10;
>   do {
>   if (cold)
>   delay(20);
>   else
> - tsleep(tstamp, 0, "mbg", 1);
> + tsleep_nsec(tstamp, 0, "mbg", MSEC_TO_NSEC(1));
>   status = bus_space_read_1(sc->sc_iot, sc->sc_ioh, ASIC_STATUS);
>   } while ((status & MBG_BUSY) && timer++ < tmax);
>  

-- 
:wq Claudio