Re: snmpd drop traphandler process
Found some minor issues. Please disregard for now. On Tue, 2020-12-01 at 17:12 +0100, Martijn van Duren wrote: > Hello tech@, > > Long story short: the traphandler process in snmpd annoys me a great > deal and is in the way for overhauling the transport mapping section > of snmpe, which is needed for implementing new agentx master support. > > The current traphandler process is also a joke, since all it does is > receive a message, does a minimal parsing validation and then pass > then entire buffer to the parent process, which forks the actual > handlers. This means that the current pledgeset is also way too wide > for what traphandler does. > > Since snmpe already does a similar packet parsing I see no reason to > keep this initial verification in the traphandler process. The diff > below drops the traphandler process and moves the receiving of the > packets to snmpe, which then forwards the pdu (instead of the > current whole packet) to the parent. I also extended the checking, so > it should be a little more resilient to malformed packets. > > While here I also didn't like the fact that listening on a trap-port > always implies that snmp itself is also listening. This diff adds the > trap keyword (for udp) on the listen on line, so that we can setup a > traphandler-only setup. This gives the added bonus that we can now > also choose the port we listen on for traps. Adding a listen trap > statement requires a trap handle and vice versa. > > It's not the pretties code, but it's a stopgap that allows me to move > forward without creating even larger diffs and without (hopefully) > breaking existing setups (apart from the keyword change). > > OK? > > martijn@ > > Index: parse.y > === > RCS file: /cvs/src/usr.sbin/snmpd/parse.y,v > retrieving revision 1.62 > diff -u -p -r1.62 parse.y > --- parse.y 30 Oct 2020 07:43:48 - 1.62 > +++ parse.y 1 Dec 2020 16:12:20 - > @@ -94,11 +94,9 @@ char *symget(const char *); > struct snmpd *conf = NULL; > static int errors = 0; > static struct usmuser *user = NULL; > -static char*snmpd_port = SNMPD_PORT; > > int host(const char *, const char *, int, > struct sockaddr_storage *, int); > -int listen_add(struct sockaddr_storage *, int); > > typedef struct { > union { > @@ -284,13 +282,16 @@ listenproto : UDP listen_udp > listen_udp : STRING port { > struct sockaddr_storage ss[16]; > int nhosts, i; > + char *port = $2; > > - nhosts = host($1, $2, SOCK_DGRAM, ss, nitems(ss)); > + if (port == NULL) > + port = SNMPD_PORT; > + > + nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss)); > if (nhosts < 1) { > yyerror("invalid address: %s", $1); > free($1); > - if ($2 != snmpd_port) > - free($2); > + free($2); > YYERROR; > } > if (nhosts > (int)nitems(ss)) > @@ -298,10 +299,38 @@ listen_udp: STRING port { > $1, $2, nitems(ss)); > > free($1); > - if ($2 != snmpd_port) > + free($2); > + for (i = 0; i < nhosts; i++) { > + if (listen_add(&(ss[i]), SOCK_DGRAM, 0) == > -1) { > + yyerror("calloc"); > + YYERROR; > + } > + } > + } > + | STRING port TRAP { > + struct sockaddr_storage ss[16]; > + int nhosts, i; > + char *port = $2; > + > + if (port == NULL) > + port = SNMPD_TRAPPORT; > + > + nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss)); > + if (nhosts < 1) { > + yyerror("invalid address: %s", $1); > + free($1); > free($2); > + YYERROR; > + } > + if (nhosts > (int)nitems(ss)) > + log_warn("%s:%s resolves to more than %zu > hosts", > + $1, $2, nitems(ss)); > + > + free($1); > + free($2); > for (i = 0; i < nhosts;
Re: Use SMR_TAILQ for `ps_threads'
On Fri, Dec 04, 2020 at 10:03:46AM -0300, Martin Pieuchot wrote: > On 04/12/20(Fri) 12:01, Jonathan Matthew wrote: > > On Wed, Dec 02, 2020 at 11:41:04AM -0300, Martin Pieuchot wrote: > > > [...] > > > Could you try the diff below that only call smr_barrier() for multi- > > > threaded processes with threads still in the list. I guess this also > > > answers guenther@'s question. The same could be done with smr_flush(). > > > > This removes the overhead, more or less. Are we only looking at unlocking > > access > > to ps_threads from within a process (not the sysctl or ptrace stuff)? > > Otherwise > > this doesn't seem safe. > > I'd argue that if `ps_thread' is being iterated the CPU doing the > iteration must already have a reference to the "struct process" so > the serialization should be done on this reference. Sounds reasonable to me. > > Now I doubt we'll be able to answer all the questions right now. If we > can find a path forward that doesn't decrease performances too much and > allow us to move signal delivery and sleep out of the KERNEL_LOCK() > that's a huge win. I think we're at an acceptable performance hit now, so if this lets you progress with unlocking signal delivery, I'm happy.
Re: srp_finalize(9): tsleep(9) -> tsleep_nsec(9)
On Fri, Dec 04, 2020 at 12:17:31PM -0600, Scott Cheloha wrote: > On Fri, Dec 04, 2020 at 09:56:02AM +0100, Claudio Jeker wrote: > > On Thu, Dec 03, 2020 at 10:05:30PM -0600, Scott Cheloha wrote: > > > Hi, > > > > > > srp_finalize(9) uses tsleep(9) to spin while it waits for the object's > > > refcount to reach zero. It blocks for up to 1 tick and then checks > > > the refecount again and again. > > > > > > We can just as easily do this with tsleep_nsec(9) and block for 1 > > > millisecond per interval. > > > > > > ok? > > > > > > Index: kern_srp.c > > > === > > > RCS file: /cvs/src/sys/kern/kern_srp.c,v > > > retrieving revision 1.12 > > > diff -u -p -r1.12 kern_srp.c > > > --- kern_srp.c8 Sep 2017 05:36:53 - 1.12 > > > +++ kern_srp.c4 Dec 2020 04:04:39 - > > > @@ -274,7 +274,7 @@ void > > > srp_finalize(void *v, const char *wmesg) > > > { > > > while (srp_referenced(v)) > > > - tsleep(v, PWAIT, wmesg, 1); > > > + tsleep_nsec(v, PWAIT, wmesg, MSEC_TO_NSEC(1)); > > > } > > > > > > #else /* MULTIPROCESSOR */ > > > > > > > Why only 1ms instead of the original 10ms (at least on most archs)? > > The underlying implementation can only process timeouts from > hardclock(9) which runs about hz times per second. If we tell the > thread to "sleep for 10ms" it's almost always going to overshoot the > next hardclock(9) and wind up sleeping ~20ms. > > Some people run with HZ=1000 kernels. I don't think many people run > with kernels with a higher HZ than that, though. So I figure a 1ms > sleep is "good enough" for all practical kernels. On HZ=100 kernels > the thread will oversleep because it doesn't process timeouts often > enough to honor the 1ms request. > > Basically I'm trying to pick a reasonable polling interval (not too > fast) that also won't cause the existing default kernel to block for > longer than it already does (~10ms). The default kernel is HZ=100, so > a 1ms sleep will, in this case, almost always sleep ~10ms per > iteration of this loop. This sleep should basically be 'as short as possible', since it's waiting out SRP references which are very short lived. ok jmatthew@
Re: srp_finalize(9): tsleep(9) -> tsleep_nsec(9)
On Fri, Dec 04, 2020 at 12:17:31PM -0600, Scott Cheloha wrote: > On Fri, Dec 04, 2020 at 09:56:02AM +0100, Claudio Jeker wrote: > > On Thu, Dec 03, 2020 at 10:05:30PM -0600, Scott Cheloha wrote: > > > Hi, > > > > > > srp_finalize(9) uses tsleep(9) to spin while it waits for the object's > > > refcount to reach zero. It blocks for up to 1 tick and then checks > > > the refecount again and again. > > > > > > We can just as easily do this with tsleep_nsec(9) and block for 1 > > > millisecond per interval. > > > > > > ok? > > > > > > Index: kern_srp.c > > > === > > > RCS file: /cvs/src/sys/kern/kern_srp.c,v > > > retrieving revision 1.12 > > > diff -u -p -r1.12 kern_srp.c > > > --- kern_srp.c8 Sep 2017 05:36:53 - 1.12 > > > +++ kern_srp.c4 Dec 2020 04:04:39 - > > > @@ -274,7 +274,7 @@ void > > > srp_finalize(void *v, const char *wmesg) > > > { > > > while (srp_referenced(v)) > > > - tsleep(v, PWAIT, wmesg, 1); > > > + tsleep_nsec(v, PWAIT, wmesg, MSEC_TO_NSEC(1)); > > > } > > > > > > #else /* MULTIPROCESSOR */ > > > > > > > Why only 1ms instead of the original 10ms (at least on most archs)? > > The underlying implementation can only process timeouts from > hardclock(9) which runs about hz times per second. If we tell the > thread to "sleep for 10ms" it's almost always going to overshoot the > next hardclock(9) and wind up sleeping ~20ms. > > Some people run with HZ=1000 kernels. I don't think many people run > with kernels with a higher HZ than that, though. So I figure a 1ms > sleep is "good enough" for all practical kernels. On HZ=100 kernels > the thread will oversleep because it doesn't process timeouts often > enough to honor the 1ms request. > > Basically I'm trying to pick a reasonable polling interval (not too > fast) that also won't cause the existing default kernel to block for > longer than it already does (~10ms). The default kernel is HZ=100, so > a 1ms sleep will, in this case, almost always sleep ~10ms per > iteration of this loop. > > It's a bit of a chicken-and-egg problem. > > Does that make any sense? I understand the chicken-and-egg problem and the rounding problem now. In the end the result will be a higher poll frequencey (once the timeout subsystem moves away from the tick based timeouts). It probably does not matter. So OK claudio@ -- :wq Claudio
Re: vmm(4): cpuid leaf 0x15 fixes clock speed problem in Linux guest [PATCH]
* Jozef Hatala [2020-11-29 00:32:17 -0800]: On Sun, Nov 29, 2020 at 12:04:04AM -0800, I wrote: On Sun, Nov 29, 2020 at 06:36:17AM +, Mike wrote: > And what are you going to return for the other leaf nodes now that > you are claiming a CPUID level of 0x15, on CPUs that are less than > that? > > Eg, if your host supports level 0xb (11), what happens in your diff > if the guest VM requests CPUID leaf node 0x0c (12)? > > It seems you are only handing the single 0x15 case. That is true. We need to note though that the current situation is not correct either. [...] What do you suggest? Two ideas spring to mind for how to resolve this: (a) Figure out from Intel's documentation for each leaf (and sub-leaf) what response is correct and does not require us to emulate too much. (b) Add a sysctl knob that controls whether we even announce a higher cpu level to the guest than what the host supports, so that we do not expose people to the incorrectness by default. What do you think? jh Hi Jozef, I don't see Mike's replies but can you try Dave's linux kernel module [0] and see if it performs well when linux chooses to use it? disclaimer: I haven't tried it but my attempt to attach kvm-clock itself worked [1]. This cpuid emulation bit was added during the time when using tsc was the only way to get a precise clock and before pvclock was added [2]. This also doesn't work on AMD machines (on at least mine). We could get rid of this cpuid emulation. 0: https://github.com/voutilad/vmm_clock 1: http://ix.io/2lzK 2: https://github.com/openbsd/src/commit/fa1855cdc021 -- Pratik
Re: mbg(4): tsleep(9) -> tsleep_nsec(9)
On Fri, Dec 04, 2020 at 12:08:39PM -0600, Scott Cheloha wrote: > On Fri, Dec 04, 2020 at 10:07:07AM +0100, Claudio Jeker wrote: > > On Thu, Dec 03, 2020 at 10:42:50PM -0600, Scott Cheloha wrote: > > > Hi, > > > > > > mbg(4) is among the few remaining drivers using tsleep(9). > > > > > > In a few spots, when the kernel is not cold, the driver will spin for > > > up to 1/10 seconds waiting for the MBG_BUSY flag to go low. > > > > > > We can approximate this behavior by spinning 10 times and sleeping 10 > > > milliseconds each iteration. 10 x 10ms = 100ms = 1/10 seconds. > > > > > > I can't test this but I was able to compile it on amd64. It isn't > > > normally built for amd64, though. Just i386. > > > > > > I have my doubts that anyone has this card and is able to actually > > > test this diff. > > > > > > Anybody ok? > > > > This code needs to wait for around 70us for the card to process the > > command (according to the comment). The cold code sleeps a max of > > 50 * 20us (1ms). I don't see why the regular code should sleep so much > > longer. I would suggest to use a 1ms timeout and loop 10 times. This is a > > magnitude more than enough and most probably only one cycle will be > > needed. > > > > IIRC someone got a mbg(4) device some time ago apart from mbalmer@ > > Makes sense to me. Updated diff attached. OK claudio@ > How are we going to find this person? Best way is to commit the diff and wait :) > Index: mbg.c > === > RCS file: /cvs/src/sys/dev/pci/mbg.c,v > retrieving revision 1.31 > diff -u -p -r1.31 mbg.c > --- mbg.c 29 Nov 2020 03:17:27 - 1.31 > +++ mbg.c 4 Dec 2020 18:07:43 - > @@ -417,12 +417,12 @@ mbg_read_amcc_s5920(struct mbg_softc *sc > > /* wait for the BUSY flag to go low (approx 70 us on i386) */ > timer = 0; > - tmax = cold ? 50 : hz / 10; > + tmax = cold ? 50 : 10; > do { > if (cold) > delay(20); > else > - tsleep(tstamp, 0, "mbg", 1); > + tsleep_nsec(tstamp, 0, "mbg", MSEC_TO_NSEC(1)); > status = bus_space_read_1(sc->sc_iot, sc->sc_ioh, > AMCC_IMB4 + 3); > } while ((status & MBG_BUSY) && timer++ < tmax); > @@ -473,12 +473,12 @@ mbg_read_amcc_s5933(struct mbg_softc *sc > > /* wait for the BUSY flag to go low (approx 70 us on i386) */ > timer = 0; > - tmax = cold ? 50 : hz / 10; > + tmax = cold ? 50 : 10; > do { > if (cold) > delay(20); > else > - tsleep(tstamp, 0, "mbg", 1); > + tsleep_nsec(tstamp, 0, "mbg", MSEC_TO_NSEC(1)); > status = bus_space_read_1(sc->sc_iot, sc->sc_ioh, > AMCC_IMB4 + 3); > } while ((status & MBG_BUSY) && timer++ < tmax); > @@ -525,12 +525,12 @@ mbg_read_asic(struct mbg_softc *sc, int > > /* wait for the BUSY flag to go low */ > timer = 0; > - tmax = cold ? 50 : hz / 10; > + tmax = cold ? 50 : 10; > do { > if (cold) > delay(20); > else > - tsleep(tstamp, 0, "mbg", 1); > + tsleep_nsec(tstamp, 0, "mbg", MSEC_TO_NSEC(1)); > status = bus_space_read_1(sc->sc_iot, sc->sc_ioh, ASIC_STATUS); > } while ((status & MBG_BUSY) && timer++ < tmax); > -- :wq Claudio