Re: ldom.conf.5: clarify vcpu strides

2020-11-04 Thread Klemens Nanni
On Wed, Nov 04, 2020 at 10:44:52PM +0100, Mark Kettenis wrote:
> Yeah, that reads better.  On request though.  Can you pick a character
> name from:
> 
> https://www.openbsd.org/lyrics.html#38
> 
> as the name of the domain?  Beluge is the bad guy, so this probably
> should be Marlus.
Sure!

Final version with Marlus, a missing .Pp and another grammar fix.
OK?


Index: ldom.conf.5
===
RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v
retrieving revision 1.14
diff -u -p -r1.14 ldom.conf.5
--- ldom.conf.5 14 Sep 2020 19:42:16 -  1.14
+++ ldom.conf.5 4 Nov 2020 22:11:45 -
@@ -38,8 +38,11 @@ If no configuration for the primary doma
 all CPU and memory resources not used by any guest domains.
 .It Ic vcpu Ar number Ns Op : Ns Ar stride
 Declare the number of virtual CPUs assigned to a domain.
-Optionally a stride can be specified to allocate additional virtual CPUs
-but not assign them to a domain.
+Optionally a stride can be specified to allocate
+.Ar stride
+VCPUs at a time but assign only
+.Ar number
+VCPUs to the domain.
 This can be used to distribute virtual CPUs over the available CPU cores.
 .It Ic memory Ar bytes
 Declare the amount of memory assigned to a domain, in bytes.
@@ -117,6 +120,21 @@ domain "salmah" {
 .Pp
 On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and
 58GB memory to the primary domain.
+.Pp
+Use a
+.Ar stride
+step size to distribute VCPUs:
+.Bd -literal -offset indent
+domain "marlus" {
+   vcpu 2:4
+   memory 4G
+   vdisk "/home/marlus/vdisk0"
+}
+.Ed
+.Pp
+On a machine with eight threads per physical core, this allocates two strides
+of four VCPUs each for the guest domain but assigns only two VCPUs to it, i.e.
+makes it occupy an entire physical core while running on two threads only.
 .Sh SEE ALSO
 .Xr eeprom 8 ,
 .Xr ldomctl 8 ,



Re: ldom.conf.5: clarify vcpu strides

2020-11-04 Thread Mark Kettenis
> Date: Wed, 4 Nov 2020 22:38:13 +0100
> From: Klemens Nanni 
> 
> On Wed, Nov 04, 2020 at 09:46:39PM +0100, Mark Kettenis wrote:
> > stride is not a factor, so your description makes no sense to me.
> ldomctl/config.c uses it as factor:
> 
>   SIMPLEQ_FOREACH(domain, _list, entry) {
>   if (strcmp(domain->name, "primary") == 0) {
>   primary_num_cpus = domain->vcpu;
>   primary_stride = domain->vcpu_stride;
>   primary_memory = domain->memory;
>   }
>   num_cpus += (domain->vcpu * domain->vcpu_stride);
>   memory += domain->memory;
>   }
> 
> > a stride of 4 means we allocate VCPUs 4-at-a-time but only assign 1 of
> > those to the domain.  It is a step size.
> Let's reword, is that any better?

Yeah, that reads better.  On request though.  Can you pick a character
name from:

https://www.openbsd.org/lyrics.html#38

as the name of the domain?  Beluge is the bad guy, so this probably
should be Marlus.

> Index: ldom.conf.5
> ===
> RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v
> retrieving revision 1.14
> diff -u -p -r1.14 ldom.conf.5
> --- ldom.conf.5   14 Sep 2020 19:42:16 -  1.14
> +++ ldom.conf.5   4 Nov 2020 21:37:20 -
> @@ -38,8 +38,11 @@ If no configuration for the primary doma
>  all CPU and memory resources not used by any guest domains.
>  .It Ic vcpu Ar number Ns Op : Ns Ar stride
>  Declare the number of virtual CPUs assigned to a domain.
> -Optionally a stride can be specified to allocate additional virtual CPUs
> -but not assign them to a domain.
> +Optionally a stride can be specified to allocate
> +.Ar stride
> +VCPUs at a time but assign only
> +.Ar number
> +VCPUs to the domain.
>  This can be used to distribute virtual CPUs over the available CPU cores.
>  .It Ic memory Ar bytes
>  Declare the amount of memory assigned to a domain, in bytes.
> @@ -117,6 +120,20 @@ domain "salmah" {
>  .Pp
>  On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and
>  58GB memory to the primary domain.
> +.Pp
> +Use a
> +.Ar stride
> +step size to distribute VCPUs:
> +.Bd -literal -offset indent
> +domain "sun" {
> + vcpu 2:4
> + memory 4G
> + vdisk "/home/sun/vdisk0"
> +}
> +.Ed
> +On a machine with eight threads per physical core, this allocates two strides
> +of four VCPUs each for the guest domain but assigns only two VCPUs to it, 
> i.e.
> +make it occupy an entire physical core while running on two threads only.
>  .Sh SEE ALSO
>  .Xr eeprom 8 ,
>  .Xr ldomctl 8 ,
> 



Re: ldom.conf.5: clarify vcpu strides

2020-11-04 Thread Klemens Nanni
On Wed, Nov 04, 2020 at 09:46:39PM +0100, Mark Kettenis wrote:
> stride is not a factor, so your description makes no sense to me.
ldomctl/config.c uses it as factor:

SIMPLEQ_FOREACH(domain, _list, entry) {
if (strcmp(domain->name, "primary") == 0) {
primary_num_cpus = domain->vcpu;
primary_stride = domain->vcpu_stride;
primary_memory = domain->memory;
}
num_cpus += (domain->vcpu * domain->vcpu_stride);
memory += domain->memory;
}

> a stride of 4 means we allocate VCPUs 4-at-a-time but only assign 1 of
> those to the domain.  It is a step size.
Let's reword, is that any better?


Index: ldom.conf.5
===
RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v
retrieving revision 1.14
diff -u -p -r1.14 ldom.conf.5
--- ldom.conf.5 14 Sep 2020 19:42:16 -  1.14
+++ ldom.conf.5 4 Nov 2020 21:37:20 -
@@ -38,8 +38,11 @@ If no configuration for the primary doma
 all CPU and memory resources not used by any guest domains.
 .It Ic vcpu Ar number Ns Op : Ns Ar stride
 Declare the number of virtual CPUs assigned to a domain.
-Optionally a stride can be specified to allocate additional virtual CPUs
-but not assign them to a domain.
+Optionally a stride can be specified to allocate
+.Ar stride
+VCPUs at a time but assign only
+.Ar number
+VCPUs to the domain.
 This can be used to distribute virtual CPUs over the available CPU cores.
 .It Ic memory Ar bytes
 Declare the amount of memory assigned to a domain, in bytes.
@@ -117,6 +120,20 @@ domain "salmah" {
 .Pp
 On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and
 58GB memory to the primary domain.
+.Pp
+Use a
+.Ar stride
+step size to distribute VCPUs:
+.Bd -literal -offset indent
+domain "sun" {
+   vcpu 2:4
+   memory 4G
+   vdisk "/home/sun/vdisk0"
+}
+.Ed
+On a machine with eight threads per physical core, this allocates two strides
+of four VCPUs each for the guest domain but assigns only two VCPUs to it, i.e.
+make it occupy an entire physical core while running on two threads only.
 .Sh SEE ALSO
 .Xr eeprom 8 ,
 .Xr ldomctl 8 ,



Re: ldom.conf.5: clarify vcpu strides

2020-11-04 Thread Mark Kettenis
> Date: Wed, 4 Nov 2020 21:36:17 +0100
> From: Klemens Nanni 
> 
> On Mon, Sep 14, 2020 at 07:52:34PM +0200, Klemens Nanni wrote:
> > On Wed, Sep 02, 2020 at 04:58:39PM +0200, Stefan Sperling wrote:
> > > I would like to suggest an example for the EXAMPLES section which
> > > illustrates how a suitable stride factor can be determined (divide the
> > > number of desired "unused" cpus by the number of desired "used" cpus):
> > We can do with an example, but to me yours does not read obvious enough.
> > 
> > Also, `vcpu' denotes *virtual* CPUs inside domains, not CPUs on the
> > machine, so "CPU" (without "V") reads off in your example and conflicts
> > with the otherwise consistent mentions of "virtual CPUs" in this manual.
> > 
> > Here's my last diff incl. an example which reads a tad clearer to me and
> > is placed in the EXAMPLES section instead.
> > 
> > Feedback? OK?
> Ping.  Diff reattached.

stride is not a factor, so your description makes no sense to me.

a stride of 4 means we allocate VCPUs 4-at-a-time but only assign 1 of
those to the domain.  It is a step size.

> Index: ldom.conf.5
> ===
> RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v
> retrieving revision 1.13
> diff -u -p -r1.13 ldom.conf.5
> --- ldom.conf.5   21 Feb 2020 19:39:28 -  1.13
> +++ ldom.conf.5   14 Sep 2020 17:51:39 -
> @@ -38,8 +38,13 @@ If no configuration for the primary doma
>  all CPU and memory resources not used by any guest domains.
>  .It Ic vcpu Ar number Ns Op : Ns Ar stride
>  Declare the number of virtual CPUs assigned to a domain.
> -Optionally a stride can be specified to allocate additional virtual CPUs
> -but not assign them to a domain.
> +Optionally a stride factor can be specified to allocate
> +.Ar number
> +virtual CPUs
> +.Ar stride
> +times but not assign more than
> +.Ar number
> +virtual CPUs to a domain, leaving the rest unassigned.
>  This can be used to distribute virtual CPUs over the available CPU cores.
>  .It Ic memory Ar bytes
>  Declare the amount of memory assigned to a domain, in bytes.
> @@ -112,6 +117,20 @@ domain "salmah" {
>  .Pp
>  On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and
>  58GB memory to the primary domain.
> +.Pp
> +Use
> +.Ar stride
> +factors to distribute virtual CPUs:
> +.Bd -literal -offset indent
> +domain "sun" {
> + vcpu 2:4
> + memory 4G
> + vdisk "/home/sun/vdisk0"
> +}
> +.Ed
> +On a machine with eight threads per physical core, this allocates four 
> strides
> +of two virtual CPUs to the guest domain but only assigns one stride to it, 
> i.e.
> +make it occupy an entire physical core while running on only two threads.
>  .Sh SEE ALSO
>  .Xr eeprom 8 ,
>  .Xr ldomctl 8 ,
> 
> 



Re: [PATCH] tcpdump: Fix missing argument from icmp_print call in print-skip.c

2020-11-04 Thread Klemens Nanni
On Tue, Nov 03, 2020 at 01:15:49PM +0100, Theo Buehler wrote:
> There is quite a bit more that is wrong with print-skip.c than just
> that (try to add it to the Makefile and compile it). It was unhooked
> from the build in 1996.
> 
> Shouldn't it rather be sent to the attic?
OK kn



Re: ldom.conf.5: clarify vcpu strides

2020-11-04 Thread Klemens Nanni
On Mon, Sep 14, 2020 at 07:52:34PM +0200, Klemens Nanni wrote:
> On Wed, Sep 02, 2020 at 04:58:39PM +0200, Stefan Sperling wrote:
> > I would like to suggest an example for the EXAMPLES section which
> > illustrates how a suitable stride factor can be determined (divide the
> > number of desired "unused" cpus by the number of desired "used" cpus):
> We can do with an example, but to me yours does not read obvious enough.
> 
> Also, `vcpu' denotes *virtual* CPUs inside domains, not CPUs on the
> machine, so "CPU" (without "V") reads off in your example and conflicts
> with the otherwise consistent mentions of "virtual CPUs" in this manual.
> 
> Here's my last diff incl. an example which reads a tad clearer to me and
> is placed in the EXAMPLES section instead.
> 
> Feedback? OK?
Ping.  Diff reattached.


Index: ldom.conf.5
===
RCS file: /cvs/src/usr.sbin/ldomctl/ldom.conf.5,v
retrieving revision 1.13
diff -u -p -r1.13 ldom.conf.5
--- ldom.conf.5 21 Feb 2020 19:39:28 -  1.13
+++ ldom.conf.5 14 Sep 2020 17:51:39 -
@@ -38,8 +38,13 @@ If no configuration for the primary doma
 all CPU and memory resources not used by any guest domains.
 .It Ic vcpu Ar number Ns Op : Ns Ar stride
 Declare the number of virtual CPUs assigned to a domain.
-Optionally a stride can be specified to allocate additional virtual CPUs
-but not assign them to a domain.
+Optionally a stride factor can be specified to allocate
+.Ar number
+virtual CPUs
+.Ar stride
+times but not assign more than
+.Ar number
+virtual CPUs to a domain, leaving the rest unassigned.
 This can be used to distribute virtual CPUs over the available CPU cores.
 .It Ic memory Ar bytes
 Declare the amount of memory assigned to a domain, in bytes.
@@ -112,6 +117,20 @@ domain "salmah" {
 .Pp
 On a machine with 32 cores and 64GB physical memory, this leaves 12 cores and
 58GB memory to the primary domain.
+.Pp
+Use
+.Ar stride
+factors to distribute virtual CPUs:
+.Bd -literal -offset indent
+domain "sun" {
+   vcpu 2:4
+   memory 4G
+   vdisk "/home/sun/vdisk0"
+}
+.Ed
+On a machine with eight threads per physical core, this allocates four strides
+of two virtual CPUs to the guest domain but only assigns one stride to it, i.e.
+make it occupy an entire physical core while running on only two threads.
 .Sh SEE ALSO
 .Xr eeprom 8 ,
 .Xr ldomctl 8 ,



Re: unwind(8): query_imsg2str

2020-11-04 Thread Klemens Nanni
On Wed, Nov 04, 2020 at 04:06:13PM +0100, Florian Obser wrote:
> Introduce query_imsg2str() for the printing "qname class type".
OK kn

> @@ -2116,3 +2107,18 @@ resolvers_to_restart(struct uw_conf *oconf, struct 
> uw_conf *nconf)
>   }
>   return restart;
>  }
> +
> +const char*
I'd put a space between "char" and "*".

> +query_imsg2str(struct query_imsg *query_imsg)
> +{



Re: [PATCH] ifconfig: keep track of allowed ips pointer correctly

2020-11-04 Thread Klemens Nanni
On Tue, Oct 27, 2020 at 06:16:08PM +0100, Jason A. Donenfeld wrote:
> Somebody on IRC mentioned that using ifconfig to set wgallowedips wasn't
> working on macppc. I don't have a macppc to test this on, but it seems
> like the code is assuming that the two values printed out by this test
> program must always be the same:
> 
>   struct s {
>   int i;
>   };
> 
>   struct p {
>   long l;
>   char c;
>   struct s a[];
>   };
> 
>   int main(int argc, char *argv[])
>   {
>   printf("%zu %zu\n", sizeof(struct p), (size_t)&((struct p *)0)->a[0]);
>   return 0;
>   }
> 
> But actually, on my amd64 system, that little test prints out "16 12".
> This patch fixes up ifconfig.c to do the right thing, so that it
> corresponds with how the kernel handles iteration.
> 
> I don't have a macppc in order to test this, but it works on amd64.
Using the wg(4) EXAMPLES section as test, this fixes wg(4) on macppc.
amd64 continues to work for me as well.

FWIW, both platforms produced the same ifconfig.o regardless of using a
void pointer or char pointer cast as guenther suggested.  I don't know
if `char *' might be beneficial on other platform/compiler combinations.

OK?
Otherwise I'll commit this on friday unless I hear objections or further
feedback.

Jason's diff (with `char *' as per guenther) reattached below.


Index: ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.429
diff -u -p -r1.429 ifconfig.c
--- ifconfig.c  7 Oct 2020 14:38:54 -   1.429
+++ ifconfig.c  4 Nov 2020 19:35:20 -
@@ -5696,11 +5696,10 @@ ensurewginterface(void)
err(1, "calloc");
 }
 
-void *
+void
 growwgdata(size_t by)
 {
ptrdiff_t peer_offset, aip_offset;
-   void *ret;
 
if (wg_interface == NULL)
wgdata.wgd_size = sizeof(*wg_interface);
@@ -5721,16 +5720,18 @@ growwgdata(size_t by)
if (wg_aip != NULL)
wg_aip = (void *)wg_interface + aip_offset;
 
-   ret = (void *)wg_interface + wgdata.wgd_size - by;
-   bzero(ret, by);
-
-   return ret;
+   bzero((char *)wg_interface + wgdata.wgd_size - by, by);
 }
 
 void
 setwgpeer(const char *peerkey_b64, int param)
 {
-   wg_peer = growwgdata(sizeof(*wg_peer));
+   growwgdata(sizeof(*wg_peer));
+   if (wg_aip)
+   wg_peer = (struct wg_peer_io *)wg_aip;
+   else
+   wg_peer = _interface->i_peers[0];
+   wg_aip = _peer->p_aips[0];
wg_peer->p_flags |= WG_PEER_HAS_PUBLIC;
WG_LOAD_KEY(wg_peer->p_public, peerkey_b64, "wgpeer");
wg_interface->i_peers_count++;
@@ -5743,7 +5744,7 @@ setwgpeeraip(const char *aip, int param)
if (wg_peer == NULL)
errx(1, "wgaip: wgpeer not set");
 
-   wg_aip = growwgdata(sizeof(*wg_aip));
+   growwgdata(sizeof(*wg_aip));
 
if ((res = inet_net_pton(AF_INET, aip, _aip->a_ipv4,
sizeof(wg_aip->a_ipv4))) != -1) {
@@ -5759,6 +5760,8 @@ setwgpeeraip(const char *aip, int param)
 
wg_peer->p_flags |= WG_PEER_REPLACE_AIPS;
wg_peer->p_aips_count++;
+
+   wg_aip++;
 }
 
 void



unwind(8): handle large answers

2020-11-04 Thread Florian Obser
Handle DNS answers that are larger than the maximum imsg size by
splitting them up.

This might fix an issue Leo reported to me in private where unwind
would exit with this in syslog:
unwind[13752]: fatal in frontend: expected IMSG_ANSWER but got HEADER
unwind[45337]: resolver exiting
unwind[43711]: terminating 

The maximum imsg size is about 16k and it kinda seems unlikely to see
answers in the wild that are this big but that's currently the only
theory I have. This also improves error logging, so if it's not this
we might ave more luck once it triggers again.

This is on top of the query_imsg2str diff I just send to tech.

OK?

diff --git frontend.c frontend.c
index f8558d60ab4..90fdd805257 100644
--- frontend.c
+++ frontend.c
@@ -420,12 +420,14 @@ frontend_dispatch_main(int fd, short event, void *bula)
 void
 frontend_dispatch_resolver(int fd, short event, void *bula)
 {
-   static struct pending_query *pq;
+   struct pending_query*pq;
struct imsgev   *iev = bula;
struct imsgbuf  *ibuf = >ibuf;
struct imsg  imsg;
struct query_imsg   *query_imsg;
+   struct answer_imsg  *answer_imsg;
int  n, shut = 0, chg;
+   uint8_t *p;
 
if (event & EV_READ) {
if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
@@ -448,8 +450,6 @@ frontend_dispatch_resolver(int fd, short event, void *bula)
 
switch (imsg.hdr.type) {
case IMSG_ANSWER_HEADER:
-   if (pq != NULL)
-   fatalx("expected IMSG_ANSWER but got HEADER");
if (IMSG_DATA_SIZE(imsg) != sizeof(*query_imsg))
fatalx("%s: IMSG_ANSWER_HEADER wrong length: "
"%lu", __func__, IMSG_DATA_SIZE(imsg));
@@ -468,19 +468,32 @@ frontend_dispatch_resolver(int fd, short event, void 
*bula)
pq->bogus = query_imsg->bogus;
break;
case IMSG_ANSWER:
-   if (pq == NULL)
-   fatalx("IMSG_ANSWER without HEADER");
-
-   if (pq->answer)
-   fatal("pq->answer");
-   if ((pq->answer = malloc(IMSG_DATA_SIZE(imsg))) !=
+   if (IMSG_DATA_SIZE(imsg) != sizeof(*answer_imsg))
+   fatalx("%s: IMSG_ANSWER wrong length: "
+   "%lu", __func__, IMSG_DATA_SIZE(imsg));
+   answer_imsg = (struct answer_imsg *)imsg.data;
+   if ((pq = find_pending_query(answer_imsg->id)) ==
NULL) {
-   pq->answer_len = IMSG_DATA_SIZE(imsg);
-   memcpy(pq->answer, imsg.data, pq->answer_len);
-   } else
+   log_warnx("cannot find pending query %llu",
+   answer_imsg->id);
+   break;
+   }
+
+   p = realloc(pq->answer, pq->answer_len +
+   answer_imsg->len);
+
+   if (p != NULL) {
+   pq->answer = p;
+   memcpy(pq->answer + pq->answer_len,
+   answer_imsg->answer, answer_imsg->len);
+   pq->answer_len += answer_imsg->len;
+   } else {
+   pq->answer_len = 0;
+   pq->answer = NULL;
pq->rcode_override = LDNS_RCODE_SERVFAIL;
-   send_answer(pq);
-   pq = NULL;
+   }
+   if (!answer_imsg->truncated)
+   send_answer(pq);
break;
case IMSG_CTL_RESOLVER_INFO:
case IMSG_CTL_AUTOCONF_RESOLVER_INFO:
diff --git resolver.c resolver.c
index a1e18f9d130..4aec42ac5cb 100644
--- resolver.c
+++ resolver.c
@@ -879,6 +879,7 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode,
sldns_buffer*buf = NULL;
struct regional *region = NULL;
struct query_imsg   *query_imsg;
+   struct answer_imsg   answer_imsg;
struct running_query*rq;
struct timespec  tp, elapsed;
int64_t  ms;
@@ -886,6 +887,7 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode,
int  running_res, asr_pref_pos, force_acceptbogus;
char*str;
char rcode_buf[16];
+   uint8_t *p;
 
clock_gettime(CLOCK_MONOTONIC, );
 
@@ 

unwind(8): query_imsg2str

2020-11-04 Thread Florian Obser
Introduce query_imsg2str() for the printing "qname class type".

OK?

diff --git resolver.c resolver.c
index dea78ca2fb3..a1e18f9d130 100644
--- resolver.c
+++ resolver.c
@@ -194,6 +194,7 @@ void decay_latest_histograms(int, 
short, void *);
 int running_query_cnt(void);
 int*resolvers_to_restart(struct uw_conf *,
 struct uw_conf *);
+const char *query_imsg2str(struct query_imsg *);
 
 struct uw_conf *resolver_conf;
 struct imsgev  *iev_frontend;
@@ -752,8 +753,6 @@ try_next_resolver(struct running_query *rq)
struct timespec  tp, elapsed;
struct timeval   tv = {0, 0};
int64_t  ms;
-   char qclass_buf[16];
-   char qtype_buf[16];
 
while(rq->next_resolver < rq->res_pref.len &&
((res = resolvers[rq->res_pref.types[rq->next_resolver]]) == NULL ||
@@ -772,13 +771,9 @@ try_next_resolver(struct running_query *rq)
timespecsub(, >tp, );
ms = elapsed.tv_sec * 1000 + elapsed.tv_nsec / 100;
 
-   sldns_wire2str_class_buf(rq->query_imsg->c, qclass_buf,
-   sizeof(qclass_buf));
-   sldns_wire2str_type_buf(rq->query_imsg->t, qtype_buf,
-   sizeof(qtype_buf));
-   log_debug("%s[+%lldms]: %s[%s] %s %s %s", __func__, ms,
+   log_debug("%s[+%lldms]: %s[%s] %s", __func__, ms,
uw_resolver_type_str[res->type], uw_resolver_state_str[res->state],
-   rq->query_imsg->qname, qclass_buf, qtype_buf);
+   query_imsg2str(rq->query_imsg));
 
if ((query_imsg = malloc(sizeof(*query_imsg))) == NULL) {
log_warnx("%s", __func__);
@@ -891,8 +886,6 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode,
int  running_res, asr_pref_pos, force_acceptbogus;
char*str;
char rcode_buf[16];
-   char qclass_buf[16];
-   char qtype_buf[16];
 
clock_gettime(CLOCK_MONOTONIC, );
 
@@ -949,11 +942,9 @@ resolve_done(struct uw_resolver *res, void *arg, int rcode,
result->answer_len = 0;
 
sldns_wire2str_rcode_buf(result->rcode, rcode_buf, sizeof(rcode_buf));
-   sldns_wire2str_class_buf(query_imsg->c, qclass_buf, sizeof(qclass_buf));
-   sldns_wire2str_type_buf(query_imsg->t, qtype_buf, sizeof(qtype_buf));
-   log_debug("%s[%s]: %s %s %s rcode: %s[%d], elapsed: %lldms, running: 
%d",
-   __func__, uw_resolver_type_str[res->type], query_imsg->qname,
-   qclass_buf, qtype_buf, rcode_buf, result->rcode, ms,
+   log_debug("%s[%s]: %s rcode: %s[%d], elapsed: %lldms, running: %d",
+   __func__, uw_resolver_type_str[res->type],
+   query_imsg2str(query_imsg), rcode_buf, result->rcode, ms,
running_query_cnt());
 
force_acceptbogus = find_force(_conf->force, query_imsg->qname,
@@ -2116,3 +2107,18 @@ resolvers_to_restart(struct uw_conf *oconf, struct 
uw_conf *nconf)
}
return restart;
 }
+
+const char*
+query_imsg2str(struct query_imsg *query_imsg)
+{
+   static char  buf[sizeof(query_imsg->qname) + 1 + 16 + 1 + 16];
+   char qclass_buf[16];
+   char qtype_buf[16];
+
+   sldns_wire2str_class_buf(query_imsg->c, qclass_buf, sizeof(qclass_buf));
+   sldns_wire2str_type_buf(query_imsg->t, qtype_buf, sizeof(qtype_buf));
+
+   snprintf(buf, sizeof(buf), "%s %s %s", query_imsg->qname, qclass_buf,
+   qtype_buf);
+   return buf;
+}


-- 
I'm not entirely sure you are real.



Prevent race in single_thread_set()

2020-11-04 Thread Martin Pieuchot
Here's a 3rd approach to solve the TOCTOU race in single_thread_set().
The issue being that the lock serializing access to `ps_single' is not
held when calling single_thread_check().

The approach below is controversial because it extends the scope of the
SCHED_LOCK().  On the other hand, the two others approaches that both
add a new lock to avoid this race ignore the fact that accesses to
`ps_single' are currently not clearly serialized w/o KERNEL_LOCK().

So the diff below improves the situation in that regard and do not add
more complexity due to the use of multiple locks.  After having looked
for a way to split the SCHED_LOCK() I believe this is the simplest
approach.

I deliberately used a *_locked() function to avoid grabbing the lock
recursively as I'm trying to get rid of the recursion, see the other
thread on tech@.

That said the uses of `ps_single' in ptrace_ctrl() are not covered by
this diff and I'd be glad to hear some comments about them.  This is
fine as long as all the code using `ps_single' runs under KERNEL_LOCK()
but since we're trying to get the single_thread_* API out of it, this
need to be addressed.

Note that this diff introduces a helper for initializing ps_single*
values in order to keep all the accesses of those fields in the same
file.

ok?

Index: kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.226
diff -u -p -r1.226 kern_fork.c
--- kern/kern_fork.c25 Oct 2020 01:55:18 -  1.226
+++ kern/kern_fork.c4 Nov 2020 12:52:54 -
@@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta
 * if somebody else wants to take us to single threaded mode,
 * count ourselves in.
 */
-   if (pr->ps_single) {
-   atomic_inc_int(>ps_singlecount);
-   atomic_setbits_int(>p_flag, P_SUSPSINGLE);
-   }
+   single_thread_init(p);
 
/*
 * Return tid to parent thread and copy it out to userspace
Index: kern/kern_sig.c
===
RCS file: /cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.263
diff -u -p -r1.263 kern_sig.c
--- kern/kern_sig.c 16 Sep 2020 13:50:42 -  1.263
+++ kern/kern_sig.c 4 Nov 2020 12:38:35 -
@@ -1932,11 +1932,27 @@ userret(struct proc *p)
p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri;
 }
 
+void
+single_thread_init(struct proc *p)
+{
+   struct process *pr = p->p_p;
+   int s;
+
+   SCHED_LOCK(s);
+   if (pr->ps_single) {
+   atomic_inc_int(>ps_singlecount);
+   atomic_setbits_int(>p_flag, P_SUSPSINGLE);
+   }
+   SCHED_UNLOCK(s);
+}
+
 int
-single_thread_check(struct proc *p, int deep)
+_single_thread_check_locked(struct proc *p, int deep)
 {
struct process *pr = p->p_p;
 
+   SCHED_ASSERT_LOCKED();
+
if (pr->ps_single != NULL && pr->ps_single != p) {
do {
int s;
@@ -1949,14 +1965,12 @@ single_thread_check(struct proc *p, int 
return (EINTR);
}
 
-   SCHED_LOCK(s);
-   if (pr->ps_single == NULL) {
-   SCHED_UNLOCK(s);
+   if (pr->ps_single == NULL)
continue;
-   }
 
if (atomic_dec_int_nv(>ps_singlecount) == 0)
wakeup(>ps_singlecount);
+
if (pr->ps_flags & PS_SINGLEEXIT) {
SCHED_UNLOCK(s);
KERNEL_LOCK();
@@ -1967,13 +1981,24 @@ single_thread_check(struct proc *p, int 
/* not exiting and don't need to unwind, so suspend */
p->p_stat = SSTOP;
mi_switch();
-   SCHED_UNLOCK(s);
} while (pr->ps_single != NULL);
}
 
return (0);
 }
 
+int
+single_thread_check(struct proc *p, int deep)
+{
+   int s, error;
+
+   SCHED_LOCK(s);
+   error = _single_thread_check_locked(p, deep);
+   SCHED_UNLOCK(s);
+
+   return error;
+}
+
 /*
  * Stop other threads in the process.  The mode controls how and
  * where the other threads should stop:
@@ -1995,8 +2020,12 @@ single_thread_set(struct proc *p, enum s
KERNEL_ASSERT_LOCKED();
KASSERT(curproc == p);
 
-   if ((error = single_thread_check(p, deep)))
+   SCHED_LOCK(s);
+   error = _single_thread_check_locked(p, deep);
+   if (error) {
+   SCHED_UNLOCK(s);
return error;
+   }
 
switch (mode) {
case SINGLE_SUSPEND:
@@ -2014,7 +2043,6 @@ single_thread_set(struct proc *p, enum s
panic("single_thread_mode = %d", mode);
 #endif
}
-   SCHED_LOCK(s);
pr->ps_singlecount = 

uvm_fault: is there an anon?

2020-11-04 Thread Martin Pieuchot
Diff below introduces a helper that looks for existing mapping.  The
value returned by this lookup function determines if there's an anon
at the faulting address which tells us if we're dealign with a fault
of type 1 or 2.

This small refactoring is part of the current work to separate the code
handling faults of type 1 and 2.  The end goal being to move the type 1
faults handling out of the KERNEL_LOCK().

The function name is taken from NetBSD to not introduce more difference
than there's already.

ok?

Index: uvm/uvm_fault.c
===
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.103
diff -u -p -r1.103 uvm_fault.c
--- uvm/uvm_fault.c 21 Oct 2020 08:55:40 -  1.103
+++ uvm/uvm_fault.c 4 Nov 2020 13:57:01 -
@@ -637,6 +637,84 @@ uvm_fault_check(struct uvm_faultinfo *uf
return 0;
 }
 
+
+/*
+ * uvm_fault_upper_lookup: look up existing h/w mapping and amap.
+ *
+ * iterate range of interest:
+ *  1. check if h/w mapping exists.  if yes, we don't care
+ *  2. check if anon exists.  if not, page is lower.
+ *  3. if anon exists, enter h/w mapping for neighbors.
+ */
+boolean_t
+uvm_fault_upper_lookup(struct uvm_faultinfo *ufi,
+const struct uvm_faultctx *flt, struct vm_anon **anons,
+struct vm_page **pages)
+{
+   struct vm_amap *amap = ufi->entry->aref.ar_amap;
+   struct vm_anon *anon;
+   boolean_t shadowed;
+   vaddr_t currva;
+   paddr_t pa;
+   int lcv;
+
+   /*
+* map in the backpages and frontpages we found in the amap in hopes
+* of preventing future faults.we also init the pages[] array as
+* we go.
+*/
+   currva = flt->startva;
+   shadowed = FALSE;
+   for (lcv = 0 ; lcv < flt->npages ; lcv++, currva += PAGE_SIZE) {
+   /*
+* dont play with VAs that are already mapped
+* except for center)
+*/
+   if (lcv != flt->centeridx &&
+   pmap_extract(ufi->orig_map->pmap, currva, )) {
+   pages[lcv] = PGO_DONTCARE;
+   continue;
+   }
+
+   /* unmapped or center page. check if any anon at this level. */
+   if (amap == NULL || anons[lcv] == NULL) {
+   pages[lcv] = NULL;
+   continue;
+   }
+
+   /* check for present page and map if possible. re-activate it */
+   pages[lcv] = PGO_DONTCARE;
+   if (lcv == flt->centeridx) {/* save center for later! */
+   shadowed = TRUE;
+   continue;
+   }
+   anon = anons[lcv];
+   if (anon->an_page &&
+   (anon->an_page->pg_flags & (PG_RELEASED|PG_BUSY)) == 0) {
+   uvm_lock_pageq();
+   uvm_pageactivate(anon->an_page);/* reactivate */
+   uvm_unlock_pageq();
+   uvmexp.fltnamap++;
+
+   /*
+* Since this isn't the page that's actually faulting,
+* ignore pmap_enter() failures; it's not critical
+* that we enter these right now.
+*/
+   (void) pmap_enter(ufi->orig_map->pmap, currva,
+   VM_PAGE_TO_PHYS(anon->an_page) | flt->pa_flags,
+   (anon->an_ref > 1) ?
+   (flt->enter_prot & ~PROT_WRITE) : flt->enter_prot,
+   PMAP_CANFAIL |
+(VM_MAPENT_ISWIRED(ufi->entry) ? PMAP_WIRED : 0));
+   }
+   }
+   if (flt->npages > 1)
+   pmap_update(ufi->orig_map->pmap);
+
+   return shadowed;
+}
+
 /*
  *   F A U L T   -   m a i n   e n t r y   p o i n t
  */
@@ -663,7 +741,6 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad
int result, lcv, gotpages, ret;
vaddr_t currva;
voff_t uoff;
-   paddr_t pa;
struct vm_amap *amap;
struct uvm_object *uobj;
struct vm_anon *anons_store[UVM_MAXRANGE], **anons, *anon, *oanon;
@@ -704,61 +781,9 @@ ReFault:
amap = ufi.entry->aref.ar_amap;
uobj = ufi.entry->object.uvm_obj;
 
-   /*
-* map in the backpages and frontpages we found in the amap in hopes
-* of preventing future faults.we also init the pages[] array as
-* we go.
-*/
-   currva = flt.startva;
-   shadowed = FALSE;
-   for (lcv = 0 ; lcv < flt.npages ; lcv++, currva += PAGE_SIZE) {
-   /*
-* dont play with VAs that are already mapped
-* except for center)
-*/
-   if (lcv != flt.centeridx &&
-   pmap_extract(ufi.orig_map->pmap, currva, )) {
-   pages[lcv] = 

Turn SCHED_LOCK() into a mutex

2020-11-04 Thread Martin Pieuchot
Diff below removes the recursive attribute of the SCHED_LOCK() by
turning it into a IPL_NONE mutex.  I'm not intending to commit it
yet as it raises multiple questions, see below. 

This work has been started by art@ more than a decade ago and I'm
willing to finish it as I believe it's the easiest way to reduce
the scope of this lock.  Having a global mutex is the first step to
have a per runqueue and per sleepqueue mutex. 

This is also a way to avoid lock ordering problems exposed by the recent
races in single_thread_set().

About the diff:

 The diff below includes a (hugly) refactoring of rw_exit() to avoid a
 recursion on the SCHED_LOCK().  In this case the lock is used to protect
 the global sleepqueue and is grabbed in sleep_setup().

 The same pattern can be observed in single_thread_check().  However in
 this case the lock is used to protect different fields so there's no
 "recursive access" to the same data structure.

 assertwaitok() has been moved down in mi_switch() which isn't ideal.

 It becomes obvious that the per-CPU and per-thread accounting fields
 updated in mi_switch() won't need a separate mutex as proposed last
 year and that splitting this global mutex will be enough.

It's unclear to me if/how WITNESS should be modified to handle this lock
change.

This has been tested on sparc64 and amd64.  I'm not convinced it exposed
all the recursions.  So if you want to give it a go and can break it, it
is more than welcome.

Comments?  Questions?

Index: kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.226
diff -u -p -r1.226 kern_fork.c
--- kern/kern_fork.c25 Oct 2020 01:55:18 -  1.226
+++ kern/kern_fork.c2 Nov 2020 10:50:24 -
@@ -665,7 +665,7 @@ void
 proc_trampoline_mp(void)
 {
SCHED_ASSERT_LOCKED();
-   __mp_unlock(_lock);
+   mtx_leave(_lock);
spl0();
SCHED_ASSERT_UNLOCKED();
KERNEL_ASSERT_UNLOCKED();
Index: kern/kern_lock.c
===
RCS file: /cvs/src/sys/kern/kern_lock.c,v
retrieving revision 1.71
diff -u -p -r1.71 kern_lock.c
--- kern/kern_lock.c5 Mar 2020 09:28:31 -   1.71
+++ kern/kern_lock.c2 Nov 2020 10:50:24 -
@@ -97,9 +97,6 @@ ___mp_lock_init(struct __mp_lock *mpl, c
if (mpl == _lock)
mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED |
LO_SLEEPABLE | (LO_CLASS_KERNEL_LOCK << LO_CLASSSHIFT);
-   else if (mpl == _lock)
-   mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED |
-   LO_RECURSABLE | (LO_CLASS_SCHED_LOCK << LO_CLASSSHIFT);
WITNESS_INIT(>mpl_lock_obj, type);
 #endif
 }
Index: kern/kern_rwlock.c
===
RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.45
diff -u -p -r1.45 kern_rwlock.c
--- kern/kern_rwlock.c  2 Mar 2020 17:07:49 -   1.45
+++ kern/kern_rwlock.c  2 Nov 2020 23:13:01 -
@@ -128,36 +128,6 @@ rw_enter_write(struct rwlock *rwl)
}
 }
 
-void
-rw_exit_read(struct rwlock *rwl)
-{
-   unsigned long owner;
-
-   rw_assert_rdlock(rwl);
-   WITNESS_UNLOCK(>rwl_lock_obj, 0);
-
-   membar_exit_before_atomic();
-   owner = rwl->rwl_owner;
-   if (__predict_false((owner & RWLOCK_WAIT) ||
-   rw_cas(>rwl_owner, owner, owner - RWLOCK_READ_INCR)))
-   rw_do_exit(rwl, 0);
-}
-
-void
-rw_exit_write(struct rwlock *rwl)
-{
-   unsigned long owner;
-
-   rw_assert_wrlock(rwl);
-   WITNESS_UNLOCK(>rwl_lock_obj, LOP_EXCLUSIVE);
-
-   membar_exit_before_atomic();
-   owner = rwl->rwl_owner;
-   if (__predict_false((owner & RWLOCK_WAIT) ||
-   rw_cas(>rwl_owner, owner, 0)))
-   rw_do_exit(rwl, RWLOCK_WRLOCK);
-}
-
 #ifdef DIAGNOSTIC
 /*
  * Put the diagnostic functions here to keep the main code free
@@ -314,9 +284,10 @@ retry:
 }
 
 void
-rw_exit(struct rwlock *rwl)
+_rw_exit(struct rwlock *rwl, int locked)
 {
unsigned long wrlock;
+   unsigned long owner, set;
 
/* Avoid deadlocks after panic or in DDB */
if (panicstr || db_active)
@@ -330,15 +301,6 @@ rw_exit(struct rwlock *rwl)
WITNESS_UNLOCK(>rwl_lock_obj, wrlock ? LOP_EXCLUSIVE : 0);
 
membar_exit_before_atomic();
-   rw_do_exit(rwl, wrlock);
-}
-
-/* membar_exit_before_atomic() has to precede call of this function. */
-void
-rw_do_exit(struct rwlock *rwl, unsigned long wrlock)
-{
-   unsigned long owner, set;
-
do {
owner = rwl->rwl_owner;
if (wrlock)
@@ -349,7 +311,13 @@ rw_do_exit(struct rwlock *rwl, unsigned 
} while (__predict_false(rw_cas(>rwl_owner, owner, set)));
 
if (owner & RWLOCK_WAIT)
-   wakeup(rwl);
+   wakeup_n(rwl, -1, locked);
+}
+
+void
+rw_exit(struct rwlock *rwl)
+{

[PATCH] .apple kbd layout for sf and sg

2020-11-04 Thread Mathias Schmocker

Hello,
this diff with current allows setting a useable kbd sf.apple and 
sg.apple (and the nodead variant) on a Macbook 1.1 (wscons)

(alternative was xorg.conf macbook79 with ch layout and fr variant,
 but without the beautiful spleen consolefont...)

Index: makemap.awk
===
RCS file: /home/cvs/src/sys/dev/usb/makemap.awk,v
retrieving revision 1.15
diff -u -p -r1.15 makemap.awk
--- makemap.awk 2 Nov 2020 19:45:18 -   1.15
+++ makemap.awk 4 Nov 2020 07:50:20 -
@@ -37,6 +37,7 @@ BEGIN {
declk = 0
haskeys = 0
kbfr = 0
+   kbsg = 0
nmaps = 0

# PS/2 id -> UKBD conversion table, or "sanity lossage 101"
@@ -407,6 +408,38 @@ $1 == "#define" || $1 == "#undef" {
print "KC(47),\tKS_masculine,\tKS_ordfeminine,"
print "KC(50),\tKS_backslash,\tKS_bar,"
print "KC(52),\tKS_dead_tilde,\tKS_dead_circumflex"
+   } else
+   if (mapname == "ukbd_keydesc_sg[]") {
+   print $0
+   print "\nstatic const keysym_t ukbd_keydesc_sg_apple[] = 
{"
+   print "/*  pos  normal\t\tshifted\t\talt\t\tshift-alt 
*/"
+   print "KC(10),\tKS_g,\t\tKS_G,\t\tKS_at,"
+   print "KC(17),\tKS_n,\t\tKS_N,\t\tKS_dead_tilde,"
+   print "KC(30),\tKS_1,\t\tKS_plus,\tKS_plusminus,"
+   print "
KC(34),\tKS_5,\t\tKS_percent,\tKS_bracketleft,"
+   print "
KC(35),\tKS_6,\t\tKS_ampersand,\tKS_bracketright,"
+   print "
KC(36),\tKS_7,\t\tKS_slash,\tKS_bar,\t\tKS_backslash,"
+   print "
KC(37),\tKS_8,\t\tKS_parenleft,\tKS_braceleft,"
+   print "
KC(38),\tKS_9,\t\tKS_parenright,\tKS_braceright,"
+   print "KC(88),\tKS_Mode_switch,\tKS_Multi_key,"
+   print "KC(226),\tKS_Mode_switch,\tKS_Multi_key,"
+   } else
+   if (mapname == "ukbd_keydesc_sg_nodead[]") {
+   print $0
+   print "\nstatic const keysym_t 
ukbd_keydesc_sg_apple_nodead[] = {"
+   print "/*  pos  normal\t\tshifted\t\talt\t\tshift-alt 
*/"
+   print "KC(17),\tKS_n,\t\tKS_N,\t\tKS_asciitilde,"
+   print "
KC(45),\tKS_apostrophe,\tKS_question,\tKS_acute,"
+   print "KC(46),\tKS_asciicircum,\tKS_grave,"
+   print "KC(48),\tKS_diaeresis,\tKS_exclam,"
+   } else
+   if (mapname == "ukbd_keydesc_sf[]") {
+   print $0
+   print "\nstatic const keysym_t ukbd_keydesc_sf_apple[] = 
{"
+   print "/*  pos  normal\t\tshifted\t\talt\t\tshift-alt 
*/"
+   print "KC(47),\tKS_egrave,\tKS_udiaeresis,"
+   print "KC(51),\tKS_eacute,\tKS_odiaeresis,"
+   print "KC(52),\tKS_agrave,\tKS_adiaeresis,"
}
}
 }
@@ -425,6 +458,27 @@ $1 == "#define" || $1 == "#undef" {
 /KB_PT/ {
print $0
print "\tKBD_MAP(KB_PT | KB_APPLE,\tKB_PT,\tukbd_keydesc_pt_apple),"
+   next
+}
+/KB_SG/ {
+   print $0
+   kbsg++
+   if (kbsg == 1) {
+   print "\tKBD_MAP(KB_SG | 
KB_APPLE,\tKB_SG,\tukbd_keydesc_sg_apple),"
+   } else if (kbsg == 2) {
+   print "\tKBD_MAP(KB_SG | KB_APPLE | KB_NODEAD,\tKB_SG | 
KB_APPLE,"
+   print "\t\tukbd_keydesc_sg_apple_nodead),"
+   # Add first .apple variant for sf
+   } else if (kbsg == 3) {
+		print "\tKBD_MAP(KB_SF | KB_APPLE,\tKB_SG | 
KB_APPLE,\tukbd_keydesc_sf_apple),"

+   }
+   next
+}
+/KB_SF/ {
+   print $0
+   # Add second .apple variant for sf
+   print "\tKBD_MAP(KB_SF | KB_APPLE | KB_NODEAD,\tKB_SF | KB_APPLE,"
+   print "\t\tukbd_keydesc_sg_apple_nodead),"
next
 }
 {
Index: ukbd.4
===
RCS file: /home/cvs/src/share/man/man4/ukbd.4,v
retrieving revision 1.22
diff -u -p -r1.22 ukbd.4
--- ukbd.4  11 May 2019 14:19:16 -  1.22
+++ ukbd.4  3 Nov 2020 19:36:27 -
@@ -171,10 +171,20 @@ Russian in
 .Pq sf
 Swiss French with
 .Dq dead accents .
+.It KB_SF | KB_APPLE
+.Pq sf.apple
+Swiss French with
+.Dq dead accents
+for MacBooks.
 .It KB_SG
 .Pq sg
 Swiss German with
 .Dq dead accents .
+.It KB_SG | KB_APPLE
+.Pq sg.apple
+Swiss German with
+.Dq dead accents
+for MacBooks.
 .It KB_SI
 .Pq si
 Slovenian.