Re: ldom.conf.5: clarify vcpu strides
On Wed, Nov 04, 2020 at 10:44:52PM +0100, Mark Kettenis wrote: > Yeah, that reads better. On request though. Can you pick a character > name from: > > https://www.openbsd.org/lyrics.html#38 > > as the name of the domain? Beluge is the bad guy, so this probably > should be Marlus. Sure! Final version with Marlus, a missing .Pp and another grammar fix. OK? Index: ldom.conf.5 === RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v retrieving revision 1.14 diff -u -p -r1.14 ldom.conf.5 --- ldom.conf.5 14 Sep 2020 19:42:16 - 1.14 +++ ldom.conf.5 4 Nov 2020 22:11:45 - @@ -38,8 +38,11 @@ If no configuration for the primary doma all CPU and memory resources not used by any guest domains. .It Ic vcpu Ar number Ns Op : Ns Ar stride Declare the number of virtual CPUs assigned to a domain. -Optionally a stride can be specified to allocate additional virtual CPUs -but not assign them to a domain. +Optionally a stride can be specified to allocate +.Ar stride +VCPUs at a time but assign only +.Ar number +VCPUs to the domain. This can be used to distribute virtual CPUs over the available CPU cores. .It Ic memory Ar bytes Declare the amount of memory assigned to a domain, in bytes. @@ -117,6 +120,21 @@ domain "salmah" { .Pp On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and 58GB memory to the primary domain. +.Pp +Use a +.Ar stride +step size to distribute VCPUs: +.Bd -literal -offset indent +domain "marlus" { + vcpu 2:4 + memory 4G + vdisk "/home/marlus/vdisk0" +} +.Ed +.Pp +On a machine with eight threads per physical core, this allocates two strides +of four VCPUs each for the guest domain but assigns only two VCPUs to it, i.e. +makes it occupy an entire physical core while running on two threads only. .Sh SEE ALSO .Xr eeprom 8 , .Xr ldomctl 8 ,
Re: ldom.conf.5: clarify vcpu strides
> Date: Wed, 4 Nov 2020 22:38:13 +0100 > From: Klemens Nanni > > On Wed, Nov 04, 2020 at 09:46:39PM +0100, Mark Kettenis wrote: > > stride is not a factor, so your description makes no sense to me. > ldomctl/config.c uses it as factor: > > SIMPLEQ_FOREACH(domain, &conf.domain_list, entry) { > if (strcmp(domain->name, "primary") == 0) { > primary_num_cpus = domain->vcpu; > primary_stride = domain->vcpu_stride; > primary_memory = domain->memory; > } > num_cpus += (domain->vcpu * domain->vcpu_stride); > memory += domain->memory; > } > > > a stride of 4 means we allocate VCPUs 4-at-a-time but only assign 1 of > > those to the domain. It is a step size. > Let's reword, is that any better? Yeah, that reads better. On request though. Can you pick a character name from: https://www.openbsd.org/lyrics.html#38 as the name of the domain? Beluge is the bad guy, so this probably should be Marlus. > Index: ldom.conf.5 > === > RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v > retrieving revision 1.14 > diff -u -p -r1.14 ldom.conf.5 > --- ldom.conf.5 14 Sep 2020 19:42:16 - 1.14 > +++ ldom.conf.5 4 Nov 2020 21:37:20 - > @@ -38,8 +38,11 @@ If no configuration for the primary doma > all CPU and memory resources not used by any guest domains. > .It Ic vcpu Ar number Ns Op : Ns Ar stride > Declare the number of virtual CPUs assigned to a domain. > -Optionally a stride can be specified to allocate additional virtual CPUs > -but not assign them to a domain. > +Optionally a stride can be specified to allocate > +.Ar stride > +VCPUs at a time but assign only > +.Ar number > +VCPUs to the domain. > This can be used to distribute virtual CPUs over the available CPU cores. > .It Ic memory Ar bytes > Declare the amount of memory assigned to a domain, in bytes. > @@ -117,6 +120,20 @@ domain "salmah" { > .Pp > On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and > 58GB memory to the primary domain. > +.Pp > +Use a > +.Ar stride > +step size to distribute VCPUs: > +.Bd -literal -offset indent > +domain "sun" { > + vcpu 2:4 > + memory 4G > + vdisk "/home/sun/vdisk0" > +} > +.Ed > +On a machine with eight threads per physical core, this allocates two strides > +of four VCPUs each for the guest domain but assigns only two VCPUs to it, > i.e. > +make it occupy an entire physical core while running on two threads only. > .Sh SEE ALSO > .Xr eeprom 8 , > .Xr ldomctl 8 , >
Re: ldom.conf.5: clarify vcpu strides
On Wed, Nov 04, 2020 at 09:46:39PM +0100, Mark Kettenis wrote: > stride is not a factor, so your description makes no sense to me. ldomctl/config.c uses it as factor: SIMPLEQ_FOREACH(domain, &conf.domain_list, entry) { if (strcmp(domain->name, "primary") == 0) { primary_num_cpus = domain->vcpu; primary_stride = domain->vcpu_stride; primary_memory = domain->memory; } num_cpus += (domain->vcpu * domain->vcpu_stride); memory += domain->memory; } > a stride of 4 means we allocate VCPUs 4-at-a-time but only assign 1 of > those to the domain. It is a step size. Let's reword, is that any better? Index: ldom.conf.5 === RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v retrieving revision 1.14 diff -u -p -r1.14 ldom.conf.5 --- ldom.conf.5 14 Sep 2020 19:42:16 - 1.14 +++ ldom.conf.5 4 Nov 2020 21:37:20 - @@ -38,8 +38,11 @@ If no configuration for the primary doma all CPU and memory resources not used by any guest domains. .It Ic vcpu Ar number Ns Op : Ns Ar stride Declare the number of virtual CPUs assigned to a domain. -Optionally a stride can be specified to allocate additional virtual CPUs -but not assign them to a domain. +Optionally a stride can be specified to allocate +.Ar stride +VCPUs at a time but assign only +.Ar number +VCPUs to the domain. This can be used to distribute virtual CPUs over the available CPU cores. .It Ic memory Ar bytes Declare the amount of memory assigned to a domain, in bytes. @@ -117,6 +120,20 @@ domain "salmah" { .Pp On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and 58GB memory to the primary domain. +.Pp +Use a +.Ar stride +step size to distribute VCPUs: +.Bd -literal -offset indent +domain "sun" { + vcpu 2:4 + memory 4G + vdisk "/home/sun/vdisk0" +} +.Ed +On a machine with eight threads per physical core, this allocates two strides +of four VCPUs each for the guest domain but assigns only two VCPUs to it, i.e. +make it occupy an entire physical core while running on two threads only. .Sh SEE ALSO .Xr eeprom 8 , .Xr ldomctl 8 ,
Re: ldom.conf.5: clarify vcpu strides
> Date: Wed, 4 Nov 2020 21:36:17 +0100 > From: Klemens Nanni > > On Mon, Sep 14, 2020 at 07:52:34PM +0200, Klemens Nanni wrote: > > On Wed, Sep 02, 2020 at 04:58:39PM +0200, Stefan Sperling wrote: > > > I would like to suggest an example for the EXAMPLES section which > > > illustrates how a suitable stride factor can be determined (divide the > > > number of desired "unused" cpus by the number of desired "used" cpus): > > We can do with an example, but to me yours does not read obvious enough. > > > > Also, `vcpu' denotes *virtual* CPUs inside domains, not CPUs on the > > machine, so "CPU" (without "V") reads off in your example and conflicts > > with the otherwise consistent mentions of "virtual CPUs" in this manual. > > > > Here's my last diff incl. an example which reads a tad clearer to me and > > is placed in the EXAMPLES section instead. > > > > Feedback? OK? > Ping. Diff reattached. stride is not a factor, so your description makes no sense to me. a stride of 4 means we allocate VCPUs 4-at-a-time but only assign 1 of those to the domain. It is a step size. > Index: ldom.conf.5 > === > RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v > retrieving revision 1.13 > diff -u -p -r1.13 ldom.conf.5 > --- ldom.conf.5 21 Feb 2020 19:39:28 - 1.13 > +++ ldom.conf.5 14 Sep 2020 17:51:39 - > @@ -38,8 +38,13 @@ If no configuration for the primary doma > all CPU and memory resources not used by any guest domains. > .It Ic vcpu Ar number Ns Op : Ns Ar stride > Declare the number of virtual CPUs assigned to a domain. > -Optionally a stride can be specified to allocate additional virtual CPUs > -but not assign them to a domain. > +Optionally a stride factor can be specified to allocate > +.Ar number > +virtual CPUs > +.Ar stride > +times but not assign more than > +.Ar number > +virtual CPUs to a domain, leaving the rest unassigned. > This can be used to distribute virtual CPUs over the available CPU cores. > .It Ic memory Ar bytes > Declare the amount of memory assigned to a domain, in bytes. > @@ -112,6 +117,20 @@ domain "salmah" { > .Pp > On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and > 58GB memory to the primary domain. > +.Pp > +Use > +.Ar stride > +factors to distribute virtual CPUs: > +.Bd -literal -offset indent > +domain "sun" { > + vcpu 2:4 > + memory 4G > + vdisk "/home/sun/vdisk0" > +} > +.Ed > +On a machine with eight threads per physical core, this allocates four > strides > +of two virtual CPUs to the guest domain but only assigns one stride to it, > i.e. > +make it occupy an entire physical core while running on only two threads. > .Sh SEE ALSO > .Xr eeprom 8 , > .Xr ldomctl 8 , > >
Re: [PATCH] tcpdump: Fix missing argument from icmp_print call in print-skip.c
On Tue, Nov 03, 2020 at 01:15:49PM +0100, Theo Buehler wrote: > There is quite a bit more that is wrong with print-skip.c than just > that (try to add it to the Makefile and compile it). It was unhooked > from the build in 1996. > > Shouldn't it rather be sent to the attic? OK kn
Re: ldom.conf.5: clarify vcpu strides
On Mon, Sep 14, 2020 at 07:52:34PM +0200, Klemens Nanni wrote: > On Wed, Sep 02, 2020 at 04:58:39PM +0200, Stefan Sperling wrote: > > I would like to suggest an example for the EXAMPLES section which > > illustrates how a suitable stride factor can be determined (divide the > > number of desired "unused" cpus by the number of desired "used" cpus): > We can do with an example, but to me yours does not read obvious enough. > > Also, `vcpu' denotes *virtual* CPUs inside domains, not CPUs on the > machine, so "CPU" (without "V") reads off in your example and conflicts > with the otherwise consistent mentions of "virtual CPUs" in this manual. > > Here's my last diff incl. an example which reads a tad clearer to me and > is placed in the EXAMPLES section instead. > > Feedback? OK? Ping. Diff reattached. Index: ldom.conf.5 === RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v retrieving revision 1.13 diff -u -p -r1.13 ldom.conf.5 --- ldom.conf.5 21 Feb 2020 19:39:28 - 1.13 +++ ldom.conf.5 14 Sep 2020 17:51:39 - @@ -38,8 +38,13 @@ If no configuration for the primary doma all CPU and memory resources not used by any guest domains. .It Ic vcpu Ar number Ns Op : Ns Ar stride Declare the number of virtual CPUs assigned to a domain. -Optionally a stride can be specified to allocate additional virtual CPUs -but not assign them to a domain. +Optionally a stride factor can be specified to allocate +.Ar number +virtual CPUs +.Ar stride +times but not assign more than +.Ar number +virtual CPUs to a domain, leaving the rest unassigned. This can be used to distribute virtual CPUs over the available CPU cores. .It Ic memory Ar bytes Declare the amount of memory assigned to a domain, in bytes. @@ -112,6 +117,20 @@ domain "salmah" { .Pp On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and 58GB memory to the primary domain. +.Pp +Use +.Ar stride +factors to distribute virtual CPUs: +.Bd -literal -offset indent +domain "sun" { + vcpu 2:4 + memory 4G + vdisk "/home/sun/vdisk0" +} +.Ed +On a machine with eight threads per physical core, this allocates four strides +of two virtual CPUs to the guest domain but only assigns one stride to it, i.e. +make it occupy an entire physical core while running on only two threads. .Sh SEE ALSO .Xr eeprom 8 , .Xr ldomctl 8 ,
Re: unwind(8): query_imsg2str
On Wed, Nov 04, 2020 at 04:06:13PM +0100, Florian Obser wrote: > Introduce query_imsg2str() for the printing "qname class type". OK kn > @@ -2116,3 +2107,18 @@ resolvers_to_restart(struct uw_conf *oconf, struct > uw_conf *nconf) > } > return restart; > } > + > +const char* I'd put a space between "char" and "*". > +query_imsg2str(struct query_imsg *query_imsg) > +{
Re: [PATCH] ifconfig: keep track of allowed ips pointer correctly
On Tue, Oct 27, 2020 at 06:16:08PM +0100, Jason A. Donenfeld wrote: > Somebody on IRC mentioned that using ifconfig to set wgallowedips wasn't > working on macppc. I don't have a macppc to test this on, but it seems > like the code is assuming that the two values printed out by this test > program must always be the same: > > struct s { > int i; > }; > > struct p { > long l; > char c; > struct s a[]; > }; > > int main(int argc, char *argv[]) > { > printf("%zu %zu\n", sizeof(struct p), (size_t)&((struct p *)0)->a[0]); > return 0; > } > > But actually, on my amd64 system, that little test prints out "16 12". > This patch fixes up ifconfig.c to do the right thing, so that it > corresponds with how the kernel handles iteration. > > I don't have a macppc in order to test this, but it works on amd64. Using the wg(4) EXAMPLES section as test, this fixes wg(4) on macppc. amd64 continues to work for me as well. FWIW, both platforms produced the same ifconfig.o regardless of using a void pointer or char pointer cast as guenther suggested. I don't know if `char *' might be beneficial on other platform/compiler combinations. OK? Otherwise I'll commit this on friday unless I hear objections or further feedback. Jason's diff (with `char *' as per guenther) reattached below. Index: ifconfig.c === RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v retrieving revision 1.429 diff -u -p -r1.429 ifconfig.c --- ifconfig.c 7 Oct 2020 14:38:54 - 1.429 +++ ifconfig.c 4 Nov 2020 19:35:20 - @@ -5696,11 +5696,10 @@ ensurewginterface(void) err(1, "calloc"); } -void * +void growwgdata(size_t by) { ptrdiff_t peer_offset, aip_offset; - void *ret; if (wg_interface == NULL) wgdata.wgd_size = sizeof(*wg_interface); @@ -5721,16 +5720,18 @@ growwgdata(size_t by) if (wg_aip != NULL) wg_aip = (void *)wg_interface + aip_offset; - ret = (void *)wg_interface + wgdata.wgd_size - by; - bzero(ret, by); - - return ret; + bzero((char *)wg_interface + wgdata.wgd_size - by, by); } void setwgpeer(const char *peerkey_b64, int param) { - wg_peer = growwgdata(sizeof(*wg_peer)); + growwgdata(sizeof(*wg_peer)); + if (wg_aip) + wg_peer = (struct wg_peer_io *)wg_aip; + else + wg_peer = &wg_interface->i_peers[0]; + wg_aip = &wg_peer->p_aips[0]; wg_peer->p_flags |= WG_PEER_HAS_PUBLIC; WG_LOAD_KEY(wg_peer->p_public, peerkey_b64, "wgpeer"); wg_interface->i_peers_count++; @@ -5743,7 +5744,7 @@ setwgpeeraip(const char *aip, int param) if (wg_peer == NULL) errx(1, "wgaip: wgpeer not set"); - wg_aip = growwgdata(sizeof(*wg_aip)); + growwgdata(sizeof(*wg_aip)); if ((res = inet_net_pton(AF_INET, aip, &wg_aip->a_ipv4, sizeof(wg_aip->a_ipv4))) != -1) { @@ -5759,6 +5760,8 @@ setwgpeeraip(const char *aip, int param) wg_peer->p_flags |= WG_PEER_REPLACE_AIPS; wg_peer->p_aips_count++; + + wg_aip++; } void
unwind(8): handle large answers
Handle DNS answers that are larger than the maximum imsg size by splitting them up. This might fix an issue Leo reported to me in private where unwind would exit with this in syslog: unwind[13752]: fatal in frontend: expected IMSG_ANSWER but got HEADER unwind[45337]: resolver exiting unwind[43711]: terminating The maximum imsg size is about 16k and it kinda seems unlikely to see answers in the wild that are this big but that's currently the only theory I have. This also improves error logging, so if it's not this we might ave more luck once it triggers again. This is on top of the query_imsg2str diff I just send to tech. OK? diff --git frontend.c frontend.c index f8558d60ab4..90fdd805257 100644 --- frontend.c +++ frontend.c @@ -420,12 +420,14 @@ frontend_dispatch_main(int fd, short event, void *bula) void frontend_dispatch_resolver(int fd, short event, void *bula) { - static struct pending_query *pq; + struct pending_query*pq; struct imsgev *iev = bula; struct imsgbuf *ibuf = &iev->ibuf; struct imsg imsg; struct query_imsg *query_imsg; + struct answer_imsg *answer_imsg; int n, shut = 0, chg; + uint8_t *p; if (event & EV_READ) { if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN) @@ -448,8 +450,6 @@ frontend_dispatch_resolver(int fd, short event, void *bula) switch (imsg.hdr.type) { case IMSG_ANSWER_HEADER: - if (pq != NULL) - fatalx("expected IMSG_ANSWER but got HEADER"); if (IMSG_DATA_SIZE(imsg) != sizeof(*query_imsg)) fatalx("%s: IMSG_ANSWER_HEADER wrong length: " "%lu", __func__, IMSG_DATA_SIZE(imsg)); @@ -468,19 +468,32 @@ frontend_dispatch_resolver(int fd, short event, void *bula) pq->bogus = query_imsg->bogus; break; case IMSG_ANSWER: - if (pq == NULL) - fatalx("IMSG_ANSWER without HEADER"); - - if (pq->answer) - fatal("pq->answer"); - if ((pq->answer = malloc(IMSG_DATA_SIZE(imsg))) != + if (IMSG_DATA_SIZE(imsg) != sizeof(*answer_imsg)) + fatalx("%s: IMSG_ANSWER wrong length: " + "%lu", __func__, IMSG_DATA_SIZE(imsg)); + answer_imsg = (struct answer_imsg *)imsg.data; + if ((pq = find_pending_query(answer_imsg->id)) == NULL) { - pq->answer_len = IMSG_DATA_SIZE(imsg); - memcpy(pq->answer, imsg.data, pq->answer_len); - } else + log_warnx("cannot find pending query %llu", + answer_imsg->id); + break; + } + + p = realloc(pq->answer, pq->answer_len + + answer_imsg->len); + + if (p != NULL) { + pq->answer = p; + memcpy(pq->answer + pq->answer_len, + answer_imsg->answer, answer_imsg->len); + pq->answer_len += answer_imsg->len; + } else { + pq->answer_len = 0; + pq->answer = NULL; pq->rcode_override = LDNS_RCODE_SERVFAIL; - send_answer(pq); - pq = NULL; + } + if (!answer_imsg->truncated) + send_answer(pq); break; case IMSG_CTL_RESOLVER_INFO: case IMSG_CTL_AUTOCONF_RESOLVER_INFO: diff --git resolver.c resolver.c index a1e18f9d130..4aec42ac5cb 100644 --- resolver.c +++ resolver.c @@ -879,6 +879,7 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode, sldns_buffer*buf = NULL; struct regional *region = NULL; struct query_imsg *query_imsg; + struct answer_imsg answer_imsg; struct running_query*rq; struct timespec tp, elapsed; int64_t ms; @@ -886,6 +887,7 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode, int running_res, asr_pref_pos, force_acceptbogus; char*str; char rcode_buf[16]; + uint8_t *p; clock_gettime(CLOCK_MONOTONIC, &tp); @
unwind(8): query_imsg2str
Introduce query_imsg2str() for the printing "qname class type". OK? diff --git resolver.c resolver.c index dea78ca2fb3..a1e18f9d130 100644 --- resolver.c +++ resolver.c @@ -194,6 +194,7 @@ void decay_latest_histograms(int, short, void *); int running_query_cnt(void); int*resolvers_to_restart(struct uw_conf *, struct uw_conf *); +const char *query_imsg2str(struct query_imsg *); struct uw_conf *resolver_conf; struct imsgev *iev_frontend; @@ -752,8 +753,6 @@ try_next_resolver(struct running_query *rq) struct timespec tp, elapsed; struct timeval tv = {0, 0}; int64_t ms; - char qclass_buf[16]; - char qtype_buf[16]; while(rq->next_resolver < rq->res_pref.len && ((res = resolvers[rq->res_pref.types[rq->next_resolver]]) == NULL || @@ -772,13 +771,9 @@ try_next_resolver(struct running_query *rq) timespecsub(&tp, &rq->tp, &elapsed); ms = elapsed.tv_sec * 1000 + elapsed.tv_nsec / 100; - sldns_wire2str_class_buf(rq->query_imsg->c, qclass_buf, - sizeof(qclass_buf)); - sldns_wire2str_type_buf(rq->query_imsg->t, qtype_buf, - sizeof(qtype_buf)); - log_debug("%s[+%lldms]: %s[%s] %s %s %s", __func__, ms, + log_debug("%s[+%lldms]: %s[%s] %s", __func__, ms, uw_resolver_type_str[res->type], uw_resolver_state_str[res->state], - rq->query_imsg->qname, qclass_buf, qtype_buf); + query_imsg2str(rq->query_imsg)); if ((query_imsg = malloc(sizeof(*query_imsg))) == NULL) { log_warnx("%s", __func__); @@ -891,8 +886,6 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode, int running_res, asr_pref_pos, force_acceptbogus; char*str; char rcode_buf[16]; - char qclass_buf[16]; - char qtype_buf[16]; clock_gettime(CLOCK_MONOTONIC, &tp); @@ -949,11 +942,9 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode, result->answer_len = 0; sldns_wire2str_rcode_buf(result->rcode, rcode_buf, sizeof(rcode_buf)); - sldns_wire2str_class_buf(query_imsg->c, qclass_buf, sizeof(qclass_buf)); - sldns_wire2str_type_buf(query_imsg->t, qtype_buf, sizeof(qtype_buf)); - log_debug("%s[%s]: %s %s %s rcode: %s[%d], elapsed: %lldms, running: %d", - __func__, uw_resolver_type_str[res->type], query_imsg->qname, - qclass_buf, qtype_buf, rcode_buf, result->rcode, ms, + log_debug("%s[%s]: %s rcode: %s[%d], elapsed: %lldms, running: %d", + __func__, uw_resolver_type_str[res->type], + query_imsg2str(query_imsg), rcode_buf, result->rcode, ms, running_query_cnt()); force_acceptbogus = find_force(&resolver_conf->force, query_imsg->qname, @@ -2116,3 +2107,18 @@ resolvers_to_restart(struct uw_conf *oconf, struct uw_conf *nconf) } return restart; } + +const char* +query_imsg2str(struct query_imsg *query_imsg) +{ + static char buf[sizeof(query_imsg->qname) + 1 + 16 + 1 + 16]; + char qclass_buf[16]; + char qtype_buf[16]; + + sldns_wire2str_class_buf(query_imsg->c, qclass_buf, sizeof(qclass_buf)); + sldns_wire2str_type_buf(query_imsg->t, qtype_buf, sizeof(qtype_buf)); + + snprintf(buf, sizeof(buf), "%s %s %s", query_imsg->qname, qclass_buf, + qtype_buf); + return buf; +} -- I'm not entirely sure you are real.
Prevent race in single_thread_set()
Here's a 3rd approach to solve the TOCTOU race in single_thread_set(). The issue being that the lock serializing access to `ps_single' is not held when calling single_thread_check(). The approach below is controversial because it extends the scope of the SCHED_LOCK(). On the other hand, the two others approaches that both add a new lock to avoid this race ignore the fact that accesses to `ps_single' are currently not clearly serialized w/o KERNEL_LOCK(). So the diff below improves the situation in that regard and do not add more complexity due to the use of multiple locks. After having looked for a way to split the SCHED_LOCK() I believe this is the simplest approach. I deliberately used a *_locked() function to avoid grabbing the lock recursively as I'm trying to get rid of the recursion, see the other thread on tech@. That said the uses of `ps_single' in ptrace_ctrl() are not covered by this diff and I'd be glad to hear some comments about them. This is fine as long as all the code using `ps_single' runs under KERNEL_LOCK() but since we're trying to get the single_thread_* API out of it, this need to be addressed. Note that this diff introduces a helper for initializing ps_single* values in order to keep all the accesses of those fields in the same file. ok? Index: kern/kern_fork.c === RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.226 diff -u -p -r1.226 kern_fork.c --- kern/kern_fork.c25 Oct 2020 01:55:18 - 1.226 +++ kern/kern_fork.c4 Nov 2020 12:52:54 - @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta * if somebody else wants to take us to single threaded mode, * count ourselves in. */ - if (pr->ps_single) { - atomic_inc_int(&pr->ps_singlecount); - atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); - } + single_thread_init(p); /* * Return tid to parent thread and copy it out to userspace Index: kern/kern_sig.c === RCS file: /cvs/src/sys/kern/kern_sig.c,v retrieving revision 1.263 diff -u -p -r1.263 kern_sig.c --- kern/kern_sig.c 16 Sep 2020 13:50:42 - 1.263 +++ kern/kern_sig.c 4 Nov 2020 12:38:35 - @@ -1932,11 +1932,27 @@ userret(struct proc *p) p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri; } +void +single_thread_init(struct proc *p) +{ + struct process *pr = p->p_p; + int s; + + SCHED_LOCK(s); + if (pr->ps_single) { + atomic_inc_int(&pr->ps_singlecount); + atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); + } + SCHED_UNLOCK(s); +} + int -single_thread_check(struct proc *p, int deep) +_single_thread_check_locked(struct proc *p, int deep) { struct process *pr = p->p_p; + SCHED_ASSERT_LOCKED(); + if (pr->ps_single != NULL && pr->ps_single != p) { do { int s; @@ -1949,14 +1965,12 @@ single_thread_check(struct proc *p, int return (EINTR); } - SCHED_LOCK(s); - if (pr->ps_single == NULL) { - SCHED_UNLOCK(s); + if (pr->ps_single == NULL) continue; - } if (atomic_dec_int_nv(&pr->ps_singlecount) == 0) wakeup(&pr->ps_singlecount); + if (pr->ps_flags & PS_SINGLEEXIT) { SCHED_UNLOCK(s); KERNEL_LOCK(); @@ -1967,13 +1981,24 @@ single_thread_check(struct proc *p, int /* not exiting and don't need to unwind, so suspend */ p->p_stat = SSTOP; mi_switch(); - SCHED_UNLOCK(s); } while (pr->ps_single != NULL); } return (0); } +int +single_thread_check(struct proc *p, int deep) +{ + int s, error; + + SCHED_LOCK(s); + error = _single_thread_check_locked(p, deep); + SCHED_UNLOCK(s); + + return error; +} + /* * Stop other threads in the process. The mode controls how and * where the other threads should stop: @@ -1995,8 +2020,12 @@ single_thread_set(struct proc *p, enum s KERNEL_ASSERT_LOCKED(); KASSERT(curproc == p); - if ((error = single_thread_check(p, deep))) + SCHED_LOCK(s); + error = _single_thread_check_locked(p, deep); + if (error) { + SCHED_UNLOCK(s); return error; + } switch (mode) { case SINGLE_SUSPEND: @@ -2014,7 +2043,6 @@ single_thread_set(struct proc *p, enum s panic("single_thread_mode = %d", mode); #endif } - SCHED_LOCK(s);
uvm_fault: is there an anon?
Diff below introduces a helper that looks for existing mapping. The value returned by this lookup function determines if there's an anon at the faulting address which tells us if we're dealign with a fault of type 1 or 2. This small refactoring is part of the current work to separate the code handling faults of type 1 and 2. The end goal being to move the type 1 faults handling out of the KERNEL_LOCK(). The function name is taken from NetBSD to not introduce more difference than there's already. ok? Index: uvm/uvm_fault.c === RCS file: /cvs/src/sys/uvm/uvm_fault.c,v retrieving revision 1.103 diff -u -p -r1.103 uvm_fault.c --- uvm/uvm_fault.c 21 Oct 2020 08:55:40 - 1.103 +++ uvm/uvm_fault.c 4 Nov 2020 13:57:01 - @@ -637,6 +637,84 @@ uvm_fault_check(struct uvm_faultinfo *uf return 0; } + +/* + * uvm_fault_upper_lookup: look up existing h/w mapping and amap. + * + * iterate range of interest: + * 1. check if h/w mapping exists. if yes, we don't care + * 2. check if anon exists. if not, page is lower. + * 3. if anon exists, enter h/w mapping for neighbors. + */ +boolean_t +uvm_fault_upper_lookup(struct uvm_faultinfo *ufi, +const struct uvm_faultctx *flt, struct vm_anon **anons, +struct vm_page **pages) +{ + struct vm_amap *amap = ufi->entry->aref.ar_amap; + struct vm_anon *anon; + boolean_t shadowed; + vaddr_t currva; + paddr_t pa; + int lcv; + + /* +* map in the backpages and frontpages we found in the amap in hopes +* of preventing future faults.we also init the pages[] array as +* we go. +*/ + currva = flt->startva; + shadowed = FALSE; + for (lcv = 0 ; lcv < flt->npages ; lcv++, currva += PAGE_SIZE) { + /* +* dont play with VAs that are already mapped +* except for center) +*/ + if (lcv != flt->centeridx && + pmap_extract(ufi->orig_map->pmap, currva, &pa)) { + pages[lcv] = PGO_DONTCARE; + continue; + } + + /* unmapped or center page. check if any anon at this level. */ + if (amap == NULL || anons[lcv] == NULL) { + pages[lcv] = NULL; + continue; + } + + /* check for present page and map if possible. re-activate it */ + pages[lcv] = PGO_DONTCARE; + if (lcv == flt->centeridx) {/* save center for later! */ + shadowed = TRUE; + continue; + } + anon = anons[lcv]; + if (anon->an_page && + (anon->an_page->pg_flags & (PG_RELEASED|PG_BUSY)) == 0) { + uvm_lock_pageq(); + uvm_pageactivate(anon->an_page);/* reactivate */ + uvm_unlock_pageq(); + uvmexp.fltnamap++; + + /* +* Since this isn't the page that's actually faulting, +* ignore pmap_enter() failures; it's not critical +* that we enter these right now. +*/ + (void) pmap_enter(ufi->orig_map->pmap, currva, + VM_PAGE_TO_PHYS(anon->an_page) | flt->pa_flags, + (anon->an_ref > 1) ? + (flt->enter_prot & ~PROT_WRITE) : flt->enter_prot, + PMAP_CANFAIL | +(VM_MAPENT_ISWIRED(ufi->entry) ? PMAP_WIRED : 0)); + } + } + if (flt->npages > 1) + pmap_update(ufi->orig_map->pmap); + + return shadowed; +} + /* * F A U L T - m a i n e n t r y p o i n t */ @@ -663,7 +741,6 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad int result, lcv, gotpages, ret; vaddr_t currva; voff_t uoff; - paddr_t pa; struct vm_amap *amap; struct uvm_object *uobj; struct vm_anon *anons_store[UVM_MAXRANGE], **anons, *anon, *oanon; @@ -704,61 +781,9 @@ ReFault: amap = ufi.entry->aref.ar_amap; uobj = ufi.entry->object.uvm_obj; - /* -* map in the backpages and frontpages we found in the amap in hopes -* of preventing future faults.we also init the pages[] array as -* we go. -*/ - currva = flt.startva; - shadowed = FALSE; - for (lcv = 0 ; lcv < flt.npages ; lcv++, currva += PAGE_SIZE) { - /* -* dont play with VAs that are already mapped -* except for center) -*/ - if (lcv != flt.centeridx && - pmap_extract(ufi.orig_map->pmap, currva, &pa)) { - pages[lcv] = P
Turn SCHED_LOCK() into a mutex
Diff below removes the recursive attribute of the SCHED_LOCK() by turning it into a IPL_NONE mutex. I'm not intending to commit it yet as it raises multiple questions, see below. This work has been started by art@ more than a decade ago and I'm willing to finish it as I believe it's the easiest way to reduce the scope of this lock. Having a global mutex is the first step to have a per runqueue and per sleepqueue mutex. This is also a way to avoid lock ordering problems exposed by the recent races in single_thread_set(). About the diff: The diff below includes a (hugly) refactoring of rw_exit() to avoid a recursion on the SCHED_LOCK(). In this case the lock is used to protect the global sleepqueue and is grabbed in sleep_setup(). The same pattern can be observed in single_thread_check(). However in this case the lock is used to protect different fields so there's no "recursive access" to the same data structure. assertwaitok() has been moved down in mi_switch() which isn't ideal. It becomes obvious that the per-CPU and per-thread accounting fields updated in mi_switch() won't need a separate mutex as proposed last year and that splitting this global mutex will be enough. It's unclear to me if/how WITNESS should be modified to handle this lock change. This has been tested on sparc64 and amd64. I'm not convinced it exposed all the recursions. So if you want to give it a go and can break it, it is more than welcome. Comments? Questions? Index: kern/kern_fork.c === RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.226 diff -u -p -r1.226 kern_fork.c --- kern/kern_fork.c25 Oct 2020 01:55:18 - 1.226 +++ kern/kern_fork.c2 Nov 2020 10:50:24 - @@ -665,7 +665,7 @@ void proc_trampoline_mp(void) { SCHED_ASSERT_LOCKED(); - __mp_unlock(&sched_lock); + mtx_leave(&sched_lock); spl0(); SCHED_ASSERT_UNLOCKED(); KERNEL_ASSERT_UNLOCKED(); Index: kern/kern_lock.c === RCS file: /cvs/src/sys/kern/kern_lock.c,v retrieving revision 1.71 diff -u -p -r1.71 kern_lock.c --- kern/kern_lock.c5 Mar 2020 09:28:31 - 1.71 +++ kern/kern_lock.c2 Nov 2020 10:50:24 - @@ -97,9 +97,6 @@ ___mp_lock_init(struct __mp_lock *mpl, c if (mpl == &kernel_lock) mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED | LO_SLEEPABLE | (LO_CLASS_KERNEL_LOCK << LO_CLASSSHIFT); - else if (mpl == &sched_lock) - mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED | - LO_RECURSABLE | (LO_CLASS_SCHED_LOCK << LO_CLASSSHIFT); WITNESS_INIT(&mpl->mpl_lock_obj, type); #endif } Index: kern/kern_rwlock.c === RCS file: /cvs/src/sys/kern/kern_rwlock.c,v retrieving revision 1.45 diff -u -p -r1.45 kern_rwlock.c --- kern/kern_rwlock.c 2 Mar 2020 17:07:49 - 1.45 +++ kern/kern_rwlock.c 2 Nov 2020 23:13:01 - @@ -128,36 +128,6 @@ rw_enter_write(struct rwlock *rwl) } } -void -rw_exit_read(struct rwlock *rwl) -{ - unsigned long owner; - - rw_assert_rdlock(rwl); - WITNESS_UNLOCK(&rwl->rwl_lock_obj, 0); - - membar_exit_before_atomic(); - owner = rwl->rwl_owner; - if (__predict_false((owner & RWLOCK_WAIT) || - rw_cas(&rwl->rwl_owner, owner, owner - RWLOCK_READ_INCR))) - rw_do_exit(rwl, 0); -} - -void -rw_exit_write(struct rwlock *rwl) -{ - unsigned long owner; - - rw_assert_wrlock(rwl); - WITNESS_UNLOCK(&rwl->rwl_lock_obj, LOP_EXCLUSIVE); - - membar_exit_before_atomic(); - owner = rwl->rwl_owner; - if (__predict_false((owner & RWLOCK_WAIT) || - rw_cas(&rwl->rwl_owner, owner, 0))) - rw_do_exit(rwl, RWLOCK_WRLOCK); -} - #ifdef DIAGNOSTIC /* * Put the diagnostic functions here to keep the main code free @@ -314,9 +284,10 @@ retry: } void -rw_exit(struct rwlock *rwl) +_rw_exit(struct rwlock *rwl, int locked) { unsigned long wrlock; + unsigned long owner, set; /* Avoid deadlocks after panic or in DDB */ if (panicstr || db_active) @@ -330,15 +301,6 @@ rw_exit(struct rwlock *rwl) WITNESS_UNLOCK(&rwl->rwl_lock_obj, wrlock ? LOP_EXCLUSIVE : 0); membar_exit_before_atomic(); - rw_do_exit(rwl, wrlock); -} - -/* membar_exit_before_atomic() has to precede call of this function. */ -void -rw_do_exit(struct rwlock *rwl, unsigned long wrlock) -{ - unsigned long owner, set; - do { owner = rwl->rwl_owner; if (wrlock) @@ -349,7 +311,13 @@ rw_do_exit(struct rwlock *rwl, unsigned } while (__predict_false(rw_cas(&rwl->rwl_owner, owner, set))); if (owner & RWLOCK_WAIT) - wakeup(rwl); + wakeup_n(rw
[PATCH] .apple kbd layout for sf and sg
Hello, this diff with current allows setting a useable kbd sf.apple and sg.apple (and the nodead variant) on a Macbook 1.1 (wscons) (alternative was xorg.conf macbook79 with ch layout and fr variant, but without the beautiful spleen consolefont...) Index: makemap.awk === RCS file: /home/cvs/src/sys/dev/usb/makemap.awk,v retrieving revision 1.15 diff -u -p -r1.15 makemap.awk --- makemap.awk 2 Nov 2020 19:45:18 - 1.15 +++ makemap.awk 4 Nov 2020 07:50:20 - @@ -37,6 +37,7 @@ BEGIN { declk = 0 haskeys = 0 kbfr = 0 + kbsg = 0 nmaps = 0 # PS/2 id -> UKBD conversion table, or "sanity lossage 101" @@ -407,6 +408,38 @@ $1 == "#define" || $1 == "#undef" { print "KC(47),\tKS_masculine,\tKS_ordfeminine," print "KC(50),\tKS_backslash,\tKS_bar," print "KC(52),\tKS_dead_tilde,\tKS_dead_circumflex" + } else + if (mapname == "ukbd_keydesc_sg[]") { + print $0 + print "\nstatic const keysym_t ukbd_keydesc_sg_apple[] = {" + print "/* pos normal\t\tshifted\t\talt\t\tshift-alt */" + print "KC(10),\tKS_g,\t\tKS_G,\t\tKS_at," + print "KC(17),\tKS_n,\t\tKS_N,\t\tKS_dead_tilde," + print "KC(30),\tKS_1,\t\tKS_plus,\tKS_plusminus," + print " KC(34),\tKS_5,\t\tKS_percent,\tKS_bracketleft," + print " KC(35),\tKS_6,\t\tKS_ampersand,\tKS_bracketright," + print " KC(36),\tKS_7,\t\tKS_slash,\tKS_bar,\t\tKS_backslash," + print " KC(37),\tKS_8,\t\tKS_parenleft,\tKS_braceleft," + print " KC(38),\tKS_9,\t\tKS_parenright,\tKS_braceright," + print "KC(88),\tKS_Mode_switch,\tKS_Multi_key," + print "KC(226),\tKS_Mode_switch,\tKS_Multi_key," + } else + if (mapname == "ukbd_keydesc_sg_nodead[]") { + print $0 + print "\nstatic const keysym_t ukbd_keydesc_sg_apple_nodead[] = {" + print "/* pos normal\t\tshifted\t\talt\t\tshift-alt */" + print "KC(17),\tKS_n,\t\tKS_N,\t\tKS_asciitilde," + print " KC(45),\tKS_apostrophe,\tKS_question,\tKS_acute," + print "KC(46),\tKS_asciicircum,\tKS_grave," + print "KC(48),\tKS_diaeresis,\tKS_exclam," + } else + if (mapname == "ukbd_keydesc_sf[]") { + print $0 + print "\nstatic const keysym_t ukbd_keydesc_sf_apple[] = {" + print "/* pos normal\t\tshifted\t\talt\t\tshift-alt */" + print "KC(47),\tKS_egrave,\tKS_udiaeresis," + print "KC(51),\tKS_eacute,\tKS_odiaeresis," + print "KC(52),\tKS_agrave,\tKS_adiaeresis," } } } @@ -425,6 +458,27 @@ $1 == "#define" || $1 == "#undef" { /KB_PT/ { print $0 print "\tKBD_MAP(KB_PT | KB_APPLE,\tKB_PT,\tukbd_keydesc_pt_apple)," + next +} +/KB_SG/ { + print $0 + kbsg++ + if (kbsg == 1) { + print "\tKBD_MAP(KB_SG | KB_APPLE,\tKB_SG,\tukbd_keydesc_sg_apple)," + } else if (kbsg == 2) { + print "\tKBD_MAP(KB_SG | KB_APPLE | KB_NODEAD,\tKB_SG | KB_APPLE," + print "\t\tukbd_keydesc_sg_apple_nodead)," + # Add first .apple variant for sf + } else if (kbsg == 3) { + print "\tKBD_MAP(KB_SF | KB_APPLE,\tKB_SG | KB_APPLE,\tukbd_keydesc_sf_apple)," + } + next +} +/KB_SF/ { + print $0 + # Add second .apple variant for sf + print "\tKBD_MAP(KB_SF | KB_APPLE | KB_NODEAD,\tKB_SF | KB_APPLE," + print "\t\tukbd_keydesc_sg_apple_nodead)," next } { Index: ukbd.4 === RCS file: /home/cvs/src/share/man/man4/ukbd.4,v retrieving revision 1.22 diff -u -p -r1.22 ukbd.4 --- ukbd.4 11 May 2019 14:19:16 - 1.22 +++ ukbd.4 3 Nov 2020 19:36:27 - @@ -171,10 +171,20 @@ Russian in .Pq sf Swiss French with .Dq dead accents . +.It KB_SF | KB_APPLE +.Pq sf.apple +Swiss French with +.Dq dead accents +for MacBooks. .It KB_SG .Pq sg Swiss German with .Dq dead accents . +.It KB_SG | KB_APPLE +.Pq sg.apple +Swiss German with +.Dq dead accents +for MacBooks. .It KB_SI .Pq si Slovenian.