Add another Lenovo NVMe device id

2023-04-26 Thread Kevin Lo
Found in my x1 extreme gen 1:
nvme0 at pci4 dev 0 function 0 vendor "Lenovo", unknown product 0x0006 rev 
0x00: msix, NVMe 1.2

ok?

Index: sys/dev/pci/pcidevs
===
RCS file: /cvs/src/sys/dev/pci/pcidevs,v
retrieving revision 1.2032
diff -u -p -u -p -r1.2032 pcidevs
--- sys/dev/pci/pcidevs 25 Apr 2023 21:57:29 -  1.2032
+++ sys/dev/pci/pcidevs 27 Apr 2023 06:38:24 -
@@ -7061,6 +7061,7 @@ product LEADTEK WINFAST_XP0x6609  Leadte
 
 /* Lenovo products */
 product LENOVO NVME0x0003  NVMe
+product LENOVO NVME_2  0x0006  NVMe
 
 /* Level 1 (Intel) */
 product LEVEL1 LXT1001 0x0001  LXT1001



Re: atactl(8): 'readattr' subcommand quits silently.

2023-04-26 Thread Kevin Lo
On Thu, Apr 27, 2023 at 08:37:23AM +0900, YASUOKA Masahiko wrote:
> 
> Hello,
> 
> On Wed, 26 Apr 2023 16:32:28 +0900
> Yuichiro NAITO  wrote:
> > These 2 revisions of 'attr_val' and 'attr_thr' are different on this
> > disk.
> > The comment says that it's wrong vendor implementation but I can see
> > 'smartctl' shows the attributes as follows. NetBSD's atactl doesn't
> > see these 2 revisions are same but checks each checksum is valid.
> 
> The diff seems correct.
> 
> ok?

ok kevlo@

> > diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c
> > index 85dfced8c9a..1f77460ce3d 100644
> > --- a/sbin/atactl/atactl.c
> > +++ b/sbin/atactl/atactl.c
> > @@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[])
> > req.datalen = sizeof(attr_thr);
> > ata_command(&req);
> > 
> > -   if (attr_val.revision != attr_thr.revision) {
> > -   /*
> > -* Non standard vendor implementation.
> > -* Return, since we don't know how to use this.
> > -*/
> > -   return;
> > -   }
> > +   if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0)
> > +   errx(1, "Checksum mismatch (attr_val)");
> > +
> > +   if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0)
> > +   errx(1, "Checksum mismatch (attr_thr)");
> > 
> > attr = attr_val.attribute;
> > thr = attr_thr.threshold;
> > 
> 



Re: pass M_CANFAIL to malloc() which use M_WAITOK but are tested for failure

2023-04-26 Thread Kevin Lo
On Wed, Apr 26, 2023 at 08:19:10PM +0200, Alexander Bluhm wrote:
> 
> On Sat, Apr 15, 2023 at 10:44:15PM +0800, Kevin Lo wrote:
> > On Fri, Apr 14, 2023 at 02:01:29PM +0200, Alexander Bluhm wrote:
> > > I think you are trying to change the kernel in the wrong direction.
> > > It should not fail, but handle the requests.  Panic if there is a
> > > bug.
> > > 
> > > Why do you think M_CANFAIL is a good thing at this place?
> > 
> > Because M_WAITOK will not return NULl, I think adding M_CANFAIL will
> > keep the mallocarray call unchanged.
> 
> The goal is not to handle errors by failing, but to prevent them.
> Better keep M_WAITOK, avoid M_CANFAIL, and remove the check.
> 
> Discussion in the hackroom concluded that M_CANFAIL is for input
> from userland that can be unlimited large.  If userland requests
> too much, it can fail.  But it is not for normal operation of the
> kernel.
> 
> M_NOWAIT should be used when the kernel has a good way to deal with
> a temporary failure, e.g. drop the packet.
> 
> M_CANFAIL | M_WAITOK deals with user input that cannot be fullfilled
> permanently and should be reported as an error.
> 
> M_WAITOK is for all other cases.  If it panics, fix the underlying
> bug that requested unrealistic memory size.
> 
> Here use number of queues which should be reasonable low and limited
> by the driver code.  Keep M_WAITOK and remove the error check.

Thank you for discussing this topic at the hackathon, and I also greatly
appreciate your detailed explanation.

> By the way, M_DEVBUF is also wrong.  It should be M_TEMP as it is
> freed in the same function and not stored permanently.  But I wont
> fix that now as malloc(9) type usage is far from consistent.
> 
> ok?

ok kevlo@

> bluhm
> 
> Index: dev/pci/if_igc.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
> retrieving revision 1.12
> diff -u -p -r1.12 if_igc.c
> --- dev/pci/if_igc.c  9 Mar 2023 00:13:47 -   1.12
> +++ dev/pci/if_igc.c  21 Apr 2023 18:25:35 -
> @@ -1209,9 +1209,8 @@ igc_rxrinfo(struct igc_softc *sc, struct
>   struct rx_ring *rxr;
>   int error, i, n = 0;
>  
> - if ((ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF,
> - M_WAITOK | M_ZERO)) == NULL)
> - return ENOMEM;
> + ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF,
> + M_WAITOK | M_ZERO);
>  
>   for (i = 0; i < sc->sc_nqueues; i++) {
>   rxr = &sc->rx_rings[i];
> Index: dev/pci/if_ix.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v
> retrieving revision 1.192
> diff -u -p -r1.192 if_ix.c
> --- dev/pci/if_ix.c   6 Feb 2023 20:27:44 -   1.192
> +++ dev/pci/if_ix.c   21 Apr 2023 18:25:35 -
> @@ -640,9 +640,8 @@ ixgbe_rxrinfo(struct ix_softc *sc, struc
>   u_int n = 0;
>  
>   if (sc->num_queues > 1) {
> - if ((ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF,
> - M_WAITOK | M_ZERO)) == NULL)
> - return (ENOMEM);
> + ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF,
> + M_WAITOK | M_ZERO);
>   } else
>   ifr = &ifr1;
>  
> Index: dev/pci/if_oce.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_oce.c,v
> retrieving revision 1.106
> diff -u -p -r1.106 if_oce.c
> --- dev/pci/if_oce.c  11 Mar 2022 18:00:48 -  1.106
> +++ dev/pci/if_oce.c  21 Apr 2023 18:25:35 -
> @@ -902,9 +902,8 @@ oce_rxrinfo(struct oce_softc *sc, struct
>   u_int n = 0;
>  
>   if (sc->sc_nrq > 1) {
> - if ((ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF,
> - M_WAITOK | M_ZERO)) == NULL)
> - return (ENOMEM);
> + ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF,
> + M_WAITOK | M_ZERO);
>   } else
>   ifr = &ifr1;
>  
> 



Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread David Gwynne
On Wed, Apr 26, 2023 at 11:45:08PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Thu, Apr 27, 2023 at 06:51:52AM +1000, David Gwynne wrote:
> 
> > > is that kind of check in KASSET() something you have on your mind?
> > > perhaps I can trade KASSERT() to regular code:
> > > 
> > >   if (t->pft_unit != minor(dev))
> > >   return (EPERM);
> > 
> > i would pass the dev/minor/unit to pf_find_trans() and compare along
> > with the ticket value as a matter of course. returning a different
> > errno if the minor is different is unecessary.
> > 
> 
> something like this?
> 
> struct pf_trans *
> pf_find_trans(uint32_t unit, uint64_t ticket)
> {
>   struct pf_trans *t;
> 
>   rw_assert_anylock(&pfioctl_rw);
> 
>   LIST_FOREACH(t, &pf_ioctl_trans, pft_entry) {
>   if (t->pft_ticket == ticket)
>   break;
>   }

t could be NULL here. just do the unit check inside the loop?

> 
>   if (t->pft_unit != unit)
>   return (NULL);
> 
>   return (t);
> }
> 
> just return NULL on unit mismatch.  updated diff is below.

ok once you fix the nit above.

> 
> thanks and
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> index 7ea22050506..ebe1b912766 100644
> --- a/sys/net/pf_ioctl.c
> +++ b/sys/net/pf_ioctl.c
> @@ -117,6 +117,11 @@ void  pf_qid_unref(u_int16_t);
>  int   pf_states_clr(struct pfioc_state_kill *);
>  int   pf_states_get(struct pfioc_states *);
>  
> +struct pf_trans  *pf_open_trans(uint32_t);
> +struct pf_trans  *pf_find_trans(uint32_t, uint64_t);
> +void  pf_free_trans(struct pf_trans *);
> +void  pf_rollback_trans(struct pf_trans *);
> +
>  struct pf_rulepf_default_rule, pf_default_rule_new;
>  
>  struct {
> @@ -168,6 +173,8 @@ intpf_rtlabel_add(struct 
> pf_addr_wrap *);
>  void  pf_rtlabel_remove(struct pf_addr_wrap *);
>  void  pf_rtlabel_copyout(struct pf_addr_wrap *);
>  
> +uint64_t trans_ticket = 1;
> +LIST_HEAD(, pf_trans)pf_ioctl_trans = 
> LIST_HEAD_INITIALIZER(pf_trans);
>  
>  void
>  pfattach(int num)
> @@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
>  int
>  pfclose(dev_t dev, int flags, int fmt, struct proc *p)
>  {
> + struct pf_trans *w, *s;
> + LIST_HEAD(, pf_trans)   tmp_list;
> + uint32_t unit = minor(dev);
> +
> + LIST_INIT(&tmp_list);
> + rw_enter_write(&pfioctl_rw);
> + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
> + if (w->pft_unit == unit) {
> + LIST_REMOVE(w, pft_entry);
> + LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
> + }
> + }
> + rw_exit_write(&pfioctl_rw);
> +
> + while ((w = LIST_FIRST(&tmp_list)) != NULL) {
> + LIST_REMOVE(w, pft_entry);
> + pf_free_trans(w);
> + }
> +
>   return (0);
>  }
>  
> @@ -522,7 +548,7 @@ pf_qid_unref(u_int16_t qid)
>  }
>  
>  int
> -pf_begin_rules(u_int32_t *version, const char *anchor)
> +pf_begin_rules(u_int32_t *ticket, const char *anchor)
>  {
>   struct pf_ruleset   *rs;
>   struct pf_rule  *rule;
> @@ -533,20 +559,20 @@ pf_begin_rules(u_int32_t *version, const char *anchor)
>   pf_rm_rule(rs->rules.inactive.ptr, rule);
>   rs->rules.inactive.rcount--;
>   }
> - *version = ++rs->rules.inactive.version;
> + *ticket = ++rs->rules.inactive.ticket;
>   rs->rules.inactive.open = 1;
>   return (0);
>  }
>  
>  void
> -pf_rollback_rules(u_int32_t version, char *anchor)
> +pf_rollback_rules(u_int32_t ticket, char *anchor)
>  {
>   struct pf_ruleset   *rs;
>   struct pf_rule  *rule;
>  
>   rs = pf_find_ruleset(anchor);
>   if (rs == NULL || !rs->rules.inactive.open ||
> - rs->rules.inactive.version != version)
> + rs->rules.inactive.ticket != ticket)
>   return;
>   while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) {
>   pf_rm_rule(rs->rules.inactive.ptr, rule);
> @@ -825,7 +851,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule)
>  }
>  
>  int
> -pf_commit_rules(u_int32_t version, char *anchor)
> +pf_commit_rules(u_int32_t ticket, char *anchor)
>  {
>   struct pf_ruleset   *rs;
>   struct pf_rule  *rule;
> @@ -834,7 +860,7 @@ pf_commit_rules(u_int32_t version, char *anchor)
>  
>   rs = pf_find_ruleset(anchor);
>   if (rs == NULL || !rs->rules.inactive.open ||
> - version != rs->rules.inactive.version)
> + ticket != rs->rules.inactive.ticket)
>   return (EBUSY);
>  
>   if (rs == &pf_main_ruleset)
> @@ -849,7 +875,7 @@ pf_commit_rules(u_int32_t version, 

Re: vmd(8): multi-process device emulation (plz test)

2023-04-26 Thread Dave Voutila


Dave Voutila  writes:

> tech@:
>
> The below diff splits out virtio device emulation for virtio block and
> network devices into separate fork+exec'd & pledge(2)'d subprocesses.
>
> In order of priority, this diff:
>
> 1. Isolates common exploit targets (e.g. emulated network devices) from
>the rest of the vm process, tightening pledge to "stdio" per device.
>
> 2. Increases responsiveness of guest i/o since we no longer have a
>single thread servicing virtio pci and device emulation.
>
> I'd like to land this diff this week, so if you use atypical vmd
> configurations like:
>
> 1. multiple vioblk disks per vm
> 2. multiple nics per vm
> 3. send/receive
> 4. qcow2 base images
>
> This diff has lots of info logging enabled by default to help me
> identify what's breaking, so please reply with log message output if
> something goes sideways.

This diff cleans things up for review and (hopefully) OK:

1. set the temporary info-level logging to debug in vionet/vioblk/vm.c
2. fix a calloc/malloc issue; vio{net,blk} are now individually malloc'd
   and free'd to remove a double-free
3. only dump memory once, not twice, on send
4. fix vioblk restore

So far no issues reported by testers, so I will be asking for OK
tomorrow. So I can iterate in the tree and get this monster diff
landed.


diff refs/heads/master refs/heads/vmd-dev-exec6
commit - e2a6bdce4ccbbc673d16acb8bd87d6b1b8fc4f36
commit + 67c27d1f05449bee19c3ae0ffb0b14171fac15bc
blob - d0e7d0c2fb1c11e39caea6896726b25ec315cd22
blob + e9387ec59530d7f441ee92687841634685991344
--- usr.sbin/vmd/Makefile
+++ usr.sbin/vmd/Makefile
@@ -7,7 +7,7 @@ SRCS+=  vm_agentx.c
 SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c
 SRCS+= ns8250.c i8253.c dhcp.c packet.c mmio.c
 SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c fw_cfg.c
-SRCS+= vm_agentx.c
+SRCS+= vm_agentx.c vioblk.c vionet.c

 CFLAGS+=   -Wall -I${.CURDIR}
 CFLAGS+=   -Wstrict-prototypes -Wmissing-prototypes
blob - 7d7d8d02a263f6e426e359f4d8939fffe1f7d365
blob + a6fdb7623e548b7a22a907302a84376ffce3
--- usr.sbin/vmd/dhcp.c
+++ usr.sbin/vmd/dhcp.c
@@ -43,8 +43,9 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t
 extern struct vmd *env;

 ssize_t
-dhcp_request(struct vionet_dev *dev, char *buf, size_t buflen, char **obuf)
+dhcp_request(struct virtio_dev *dev, char *buf, size_t buflen, char **obuf)
 {
+   struct vionet_dev   *vionet = &dev->vionet;
unsigned char   *respbuf = NULL, *op, *oe, dhcptype = 0;
unsigned char   *opts = NULL;
ssize_t  offset, optslen, respbuflen = 0;
@@ -65,10 +66,10 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t
return (-1);

if (memcmp(pc.pc_dmac, broadcast, ETHER_ADDR_LEN) != 0 &&
-   memcmp(pc.pc_dmac, dev->hostmac, ETHER_ADDR_LEN) != 0)
+   memcmp(pc.pc_dmac, vionet->hostmac, ETHER_ADDR_LEN) != 0)
return (-1);

-   if (memcmp(pc.pc_smac, dev->mac, ETHER_ADDR_LEN) != 0)
+   if (memcmp(pc.pc_smac, vionet->mac, ETHER_ADDR_LEN) != 0)
return (-1);

if ((offset = decode_udp_ip_header(buf, buflen, offset, &pc)) < 0)
@@ -87,7 +88,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t
if (req.op != BOOTREQUEST ||
req.htype != pc.pc_htype ||
req.hlen != ETHER_ADDR_LEN ||
-   memcmp(dev->mac, req.chaddr, req.hlen) != 0)
+   memcmp(vionet->mac, req.chaddr, req.hlen) != 0)
return (-1);

/* Ignore unsupported requests for now */
@@ -134,7 +135,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t
resp.hlen = req.hlen;
resp.xid = req.xid;

-   if (dev->pxeboot) {
+   if (vionet->pxeboot) {
strlcpy(resp.file, "auto_install", sizeof resp.file);
vm = vm_getbyvmid(dev->vm_vmid);
if (vm && res_hnok(vm->vm_params.vmc_params.vcp_name))
@@ -143,7 +144,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t

if ((client_addr.s_addr =
vm_priv_addr(&env->vmd_cfg,
-   dev->vm_vmid, dev->idx, 1)) == 0)
+   dev->vm_vmid, vionet->idx, 1)) == 0)
return (-1);
memcpy(&resp.yiaddr, &client_addr,
sizeof(client_addr));
@@ -152,7 +153,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t
ss2sin(&pc.pc_dst)->sin_port = htons(CLIENT_PORT);

if ((server_addr.s_addr = vm_priv_addr(&env->vmd_cfg, dev->vm_vmid,
-   dev->idx, 0)) == 0)
+   vionet->idx, 0)) == 0)
return (-1);
memcpy(&resp.siaddr, &server_addr, sizeof(server_addr));
memcpy(&ss2sin(&pc.pc_src)->sin_addr, &server_addr,
@@ -167,9 +168,9 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t
if ((respbuf = calloc(1, respbuflen)) == NULL)
goto fail;

-   memcpy(&pc.pc_dmac, dev->mac, sizeof

Re: atactl(8): 'readattr' subcommand quits silently.

2023-04-26 Thread YASUOKA Masahiko
Hello,

On Wed, 26 Apr 2023 16:32:28 +0900
Yuichiro NAITO  wrote:
> These 2 revisions of 'attr_val' and 'attr_thr' are different on this
> disk.
> The comment says that it's wrong vendor implementation but I can see
> 'smartctl' shows the attributes as follows. NetBSD's atactl doesn't
> see these 2 revisions are same but checks each checksum is valid.

The diff seems correct.

ok?

> diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c
> index 85dfced8c9a..1f77460ce3d 100644
> --- a/sbin/atactl/atactl.c
> +++ b/sbin/atactl/atactl.c
> @@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[])
>   req.datalen = sizeof(attr_thr);
>   ata_command(&req);
> 
> - if (attr_val.revision != attr_thr.revision) {
> - /*
> -  * Non standard vendor implementation.
> -  * Return, since we don't know how to use this.
> -  */
> - return;
> - }
> + if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0)
> + errx(1, "Checksum mismatch (attr_val)");
> +
> + if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0)
> + errx(1, "Checksum mismatch (attr_thr)");
> 
>   attr = attr_val.attribute;
>   thr = attr_thr.threshold;
> 



Initialize `rtentry_pool' with IPL_SOFTNET IPL

2023-04-26 Thread Vitaliy Makkoveev
Index: sys/net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.418
diff -u -p -r1.418 route.c
--- sys/net/route.c 26 Apr 2023 16:09:44 -  1.418
+++ sys/net/route.c 26 Apr 2023 23:00:02 -
@@ -176,7 +176,7 @@ route_init(void)
 {
rtcounters = counters_alloc(rts_ncounters);
 
-   pool_init(&rtentry_pool, sizeof(struct rtentry), 0, IPL_MPFLOOR, 0,
+   pool_init(&rtentry_pool, sizeof(struct rtentry), 0, IPL_SOFTNET, 0,
"rtentry", NULL);
 
while (rt_hashjitter == 0)



Re: Updated virtio.4 man page with viogpu device

2023-04-26 Thread Jason McIntyre
On Wed, Apr 26, 2023 at 07:32:07PM +, Eichert, Diana wrote:
> Based on this commit, https://marc.info/?l=openbsd-cvs&m=168201887006175&w=2 
> , add viogpu, a VirtIO GPU driver
> 
> Updated virtio.4 man page with viogpu device
> 
> diff -c virtio.4.orig virtio.4
> 
> 

fixed, thanks.
jmc

> *** virtio.4.orig   Wed Apr 26 13:07:35 2023
> --- virtio.4Wed Apr 26 13:10:31 2023
> ***
> *** 42,47 
> --- 42,49 
>   VirtIO disk
>   .It Xr viocon 4
>   VirtIO console device
> + .It Xr viogpu 4
> + VirtIO GPU device
>   .It Xr viomb 4
>   VirtIO memory ballooning driver
>   .It Xr viornd 4
> 



Re: DIOCGETRULE is slow for large rulesets (3/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

updated diff below adopts to recent changes to pf_find_trans() in diff 2/3 [1]

in case DIOCGETRULE does not find desired transaction the ioctl(2) call
to /dev/pf returns ENXIO.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-tech&m=168254555211083&w=2

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 8cbd9d77b4f..3787e97a6b1 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, &pr) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, &pr) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ebe1b912766..0f78c5b12ac 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint32_t, 
uint64_t);
 voidpf_free_trans(struct pf_trans *);
 voidpf_rollback_trans(struct pf_trans *);
 
+voidpf_init_tgetrule(struct pf_trans *,
+   struct pf_anchor *, uint32_t, struct pf_rule *);
+voidpf_cleanup_tgetrule(struct pf_trans *t);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -548,7 +552,7 @@ pf_qid_unref(u_int16_t qid)
 }
 
 int
-pf_begin_rules(u_int32_t *ticket, const char *anchor)
+pf_begin_rules(u_int32_t *version, const char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -559,20 +563,20 @@ pf_begin_rules(u_int32_t *ticket, const char *anchor)
pf_rm_rule(rs->rules.inactive.ptr, rule);
rs->rules.inactive.rcount--;
}
-   *ticket = ++rs->rules.inactive.ticket;
+   *version = ++rs->rules.inactive.version;
rs->rules.inactive.open = 1;
return (0);
 }
 
 void
-pf_rollback_rules(u_int32_t ticket, char *anchor)
+pf_rollback_rules(u_int32_t version, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   rs->rules.inactive.ticket != ticket)
+   rs->rules.inactive.version != version)
return;
while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) {
pf_rm_rule(rs->rules.inactive.ptr, rule);
@@ -851,7 +855,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule)
 }
 
 int
-pf_commit_rules(u_int32_t ticket, char *anchor)
+pf_commit_rules(u_int32_t version, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -860,7 +864,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   ticket != rs->rules.inactive.ticket)
+   version != rs->rules.inactive.version)
return (EBUSY);
 
if (rs == &pf_main_ruleset)
@@ -875,7 +879,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
rs->rules.inactive.ptr = old_rules;
rs->rules.inactive.rcount = old_rcount;
 
-   rs->rules.active.ticket = rs->rules.inactive.ticket;
+   rs->rules.active.version = rs->rules.inactive.version;
pf_calc_skip_steps(rs->rules.active.ptr);
 
 
@@ -1214,7 +1218,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
 
NET_LOCK();
PF_LOCK();
- 

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

On Thu, Apr 27, 2023 at 06:51:52AM +1000, David Gwynne wrote:

> > is that kind of check in KASSET() something you have on your mind?
> > perhaps I can trade KASSERT() to regular code:
> > 
> > if (t->pft_unit != minor(dev))
> > return (EPERM);
> 
> i would pass the dev/minor/unit to pf_find_trans() and compare along
> with the ticket value as a matter of course. returning a different
> errno if the minor is different is unecessary.
> 

something like this?

struct pf_trans *
pf_find_trans(uint32_t unit, uint64_t ticket)
{
struct pf_trans *t;

rw_assert_anylock(&pfioctl_rw);

LIST_FOREACH(t, &pf_ioctl_trans, pft_entry) {
if (t->pft_ticket == ticket)
break;
}

if (t->pft_unit != unit)
return (NULL);

return (t);
}

just return NULL on unit mismatch.  updated diff is below.


thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 7ea22050506..ebe1b912766 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(uint32_t);
+struct pf_trans*pf_find_trans(uint32_t, uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+   uint32_t unit = minor(dev);
+
+   LIST_INIT(&tmp_list);
+   rw_enter_write(&pfioctl_rw);
+   LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
+   if (w->pft_unit == unit) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(&pfioctl_rw);
+
+   while ((w = LIST_FIRST(&tmp_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+
return (0);
 }
 
@@ -522,7 +548,7 @@ pf_qid_unref(u_int16_t qid)
 }
 
 int
-pf_begin_rules(u_int32_t *version, const char *anchor)
+pf_begin_rules(u_int32_t *ticket, const char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -533,20 +559,20 @@ pf_begin_rules(u_int32_t *version, const char *anchor)
pf_rm_rule(rs->rules.inactive.ptr, rule);
rs->rules.inactive.rcount--;
}
-   *version = ++rs->rules.inactive.version;
+   *ticket = ++rs->rules.inactive.ticket;
rs->rules.inactive.open = 1;
return (0);
 }
 
 void
-pf_rollback_rules(u_int32_t version, char *anchor)
+pf_rollback_rules(u_int32_t ticket, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   rs->rules.inactive.version != version)
+   rs->rules.inactive.ticket != ticket)
return;
while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) {
pf_rm_rule(rs->rules.inactive.ptr, rule);
@@ -825,7 +851,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule)
 }
 
 int
-pf_commit_rules(u_int32_t version, char *anchor)
+pf_commit_rules(u_int32_t ticket, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -834,7 +860,7 @@ pf_commit_rules(u_int32_t version, char *anchor)
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   version != rs->rules.inactive.version)
+   ticket != rs->rules.inactive.ticket)
return (EBUSY);
 
if (rs == &pf_main_ruleset)
@@ -849,7 +875,7 @@ pf_commit_rules(u_int32_t version, char *anchor)
rs->rules.inactive.ptr = old_rules;
rs->rules.inactive.rcount = old_rcount;
 
-   rs->rules.active.version = rs->rules.inactive.version;
+   rs->rules.active.ticket = rs->rules.inactive.ticket;
pf_calc_skip_steps(rs->rules.active.ptr);
 
 
@@ -1142,10 +1168,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr,

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread David Gwynne
On Wed, Apr 26, 2023 at 01:47:31PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote:
> > >  fail:
> > > - if (flags & FWRITE)
> > > - rw_exit_write(&pfioctl_rw);
> > > - else
> > > - rw_exit_read(&pfioctl_rw);
> > > + rw_exit_write(&pfioctl_rw);
> > 
> > i dont think having the open mode flags affect whether you take a shared
> > or exclusive lock makes sense anyway, so im happy with these bits.
> 
> while updating my diff I've noticed pfclose() needs
> same change: dropping 'flags & FWRITE' test. This is
> fixed i new diff below.

cool.

> 
> 
> > 
> > int unit = minor(dev);
> > 
> > if (unit & ((1 << CLONE_SHIFT) - 1))
> > return (ENXIO);
> > 
> > this has some ties into the second issue, which is that we shouldn't
> > use the pid as an identifier for state across syscalls like this
> > in the kernel. the reason is that userland could be weird or buggy
> > (or hostile) and fork in between creating a transaction and ending
> > a transaction, but it will still have the fd it from from open(/dev/pf).
> > instead, we should be using the whole dev_t or the minor number,
> > as long as it includes the bits that the cloning infrastructure
> > uses, as the identifier.
> 
> in new diff I'm using a minor(dev) to identify transaction owner.
> > 
> > i would also check that the process^Wminor number that created a
> > transaction is the same when looking up a transaction in pf_find_trans.
> 
> the pf_find_trans() is a dead code in diff below as it is
> not being used there. Just to make sure I got things right
> so I can update '3/3' diff in stack accordingly. Let's assume
> snippet below comes from pfioctl() function
> 
> int
> pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) {
> ...
>   t = pf_find_trans(ticket);
>   if (t == NULL)
>   return (ENXIO);
> 
>   KASSERT(t->pft_unit == minor(dev));

userland controls the ticket value that's passed to the ioctl, so if
it deliberately passes a ticket another fd/minor owns it can trigger
this assert, so i don't think this is a good idea. imagine a program
that opens /dev/pf twice, opens a transaction on one fd, and then
uses the ticket in the other.

> is that kind of check in KASSET() something you have on your mind?
> perhaps I can trade KASSERT() to regular code:
> 
>   if (t->pft_unit != minor(dev))
>   return (EPERM);

i would pass the dev/minor/unit to pf_find_trans() and compare along
with the ticket value as a matter of course. returning a different
errno if the minor is different is unecessary.

> 
> thank you for pointers to bpf.c updated diff is below.
> 
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> index 1141069dcf6..b9904c545c5 100644
> --- a/sys/net/pf_ioctl.c
> +++ b/sys/net/pf_ioctl.c
> @@ -117,6 +117,11 @@ void  pf_qid_unref(u_int16_t);
>  int   pf_states_clr(struct pfioc_state_kill *);
>  int   pf_states_get(struct pfioc_states *);
>  
> +struct pf_trans  *pf_open_trans(uint32_t);
> +struct pf_trans  *pf_find_trans(uint64_t);
> +void  pf_free_trans(struct pf_trans *);
> +void  pf_rollback_trans(struct pf_trans *);
> +
>  struct pf_rulepf_default_rule, pf_default_rule_new;
>  
>  struct {
> @@ -168,6 +173,8 @@ intpf_rtlabel_add(struct 
> pf_addr_wrap *);
>  void  pf_rtlabel_remove(struct pf_addr_wrap *);
>  void  pf_rtlabel_copyout(struct pf_addr_wrap *);
>  
> +uint64_t trans_ticket = 1;
> +LIST_HEAD(, pf_trans)pf_ioctl_trans = 
> LIST_HEAD_INITIALIZER(pf_trans);
>  
>  void
>  pfattach(int num)
> @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
>  int
>  pfclose(dev_t dev, int flags, int fmt, struct proc *p)
>  {
> + struct pf_trans *w, *s;
> + LIST_HEAD(, pf_trans)   tmp_list;
> + uint32_t unit = minor(dev);
> +
> + if (minor(dev) >= 1)
> + return (ENXIO);

gerhard already spotted this one.

> +
> + LIST_INIT(&tmp_list);
> + rw_enter_write(&pfioctl_rw);
> + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
> + if (w->pft_unit == unit) {
> + LIST_REMOVE(w, pft_entry);
> + LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
> + }
> + }
> + rw_exit_write(&pfioctl_rw);
> +
> + while ((w = LIST_FIRST(&tmp_list)) != NULL) {
> + LIST_REMOVE(w, pft_entry);
> + pf_free_trans(w);
> + }
> +
>   return (0);
>  }
>  
> @@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
>   return (EACCES);
>   }
>  
> - if (flags & FWRITE)
> -  

Updated virtio.4 man page with viogpu device

2023-04-26 Thread Eichert, Diana
Based on this commit, https://marc.info/?l=openbsd-cvs&m=168201887006175&w=2 , 
add viogpu, a VirtIO GPU driver

Updated virtio.4 man page with viogpu device

diff -c virtio.4.orig virtio.4


*** virtio.4.orig   Wed Apr 26 13:07:35 2023
--- virtio.4Wed Apr 26 13:10:31 2023
***
*** 42,47 
--- 42,49 
  VirtIO disk
  .It Xr viocon 4
  VirtIO console device
+ .It Xr viogpu 4
+ VirtIO GPU device
  .It Xr viomb 4
  VirtIO memory ballooning driver
  .It Xr viornd 4



Remove kernel lock from rtfree(9)

2023-04-26 Thread Vitaliy Makkoveev
Route timers and route labels protected by corresponding mutexes. `ifa'
uses references counting for protection. No protection required for `rt'
passed to rt_mpls_clear() because only current thread owns it.

ok?

Index: sys/net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.418
diff -u -p -r1.418 route.c
--- sys/net/route.c 26 Apr 2023 16:09:44 -  1.418
+++ sys/net/route.c 26 Apr 2023 20:11:16 -
@@ -497,7 +497,6 @@ rtfree(struct rtentry *rt)
KASSERT(!RT_ROOT(rt));
atomic_dec_int(&rttrash);
 
-   KERNEL_LOCK();
rt_timer_remove_all(rt);
ifafree(rt->rt_ifa);
rtlabel_unref(rt->rt_labelid);
@@ -506,7 +505,6 @@ rtfree(struct rtentry *rt)
 #endif
free(rt->rt_gateway, M_RTABLE, ROUNDUP(rt->rt_gateway->sa_len));
free(rt_key(rt), M_RTABLE, rt_key(rt)->sa_len);
-   KERNEL_UNLOCK();
 
pool_put(&rtentry_pool, rt);
 }



Re: pass M_CANFAIL to malloc() which use M_WAITOK but are tested for failure

2023-04-26 Thread Alexander Bluhm
On Sat, Apr 15, 2023 at 10:44:15PM +0800, Kevin Lo wrote:
> On Fri, Apr 14, 2023 at 02:01:29PM +0200, Alexander Bluhm wrote:
> > I think you are trying to change the kernel in the wrong direction.
> > It should not fail, but handle the requests.  Panic if there is a
> > bug.
> > 
> > Why do you think M_CANFAIL is a good thing at this place?
> 
> Because M_WAITOK will not return NULl, I think adding M_CANFAIL will
> keep the mallocarray call unchanged.

The goal is not to handle errors by failing, but to prevent them.
Better keep M_WAITOK, avoid M_CANFAIL, and remove the check.

Discussion in the hackroom concluded that M_CANFAIL is for input
from userland that can be unlimited large.  If userland requests
too much, it can fail.  But it is not for normal operation of the
kernel.

M_NOWAIT should be used when the kernel has a good way to deal with
a temporary failure, e.g. drop the packet.

M_CANFAIL | M_WAITOK deals with user input that cannot be fullfilled
permanently and should be reported as an error.

M_WAITOK is for all other cases.  If it panics, fix the underlying
bug that requested unrealistic memory size.

Here use number of queues which should be reasonable low and limited
by the driver code.  Keep M_WAITOK and remove the error check.

By the way, M_DEVBUF is also wrong.  It should be M_TEMP as it is
freed in the same function and not stored permanently.  But I wont
fix that now as malloc(9) type usage is far from consistent.

ok?

bluhm

Index: dev/pci/if_igc.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
retrieving revision 1.12
diff -u -p -r1.12 if_igc.c
--- dev/pci/if_igc.c9 Mar 2023 00:13:47 -   1.12
+++ dev/pci/if_igc.c21 Apr 2023 18:25:35 -
@@ -1209,9 +1209,8 @@ igc_rxrinfo(struct igc_softc *sc, struct
struct rx_ring *rxr;
int error, i, n = 0;
 
-   if ((ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF,
-   M_WAITOK | M_ZERO)) == NULL)
-   return ENOMEM;
+   ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF,
+   M_WAITOK | M_ZERO);
 
for (i = 0; i < sc->sc_nqueues; i++) {
rxr = &sc->rx_rings[i];
Index: dev/pci/if_ix.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.192
diff -u -p -r1.192 if_ix.c
--- dev/pci/if_ix.c 6 Feb 2023 20:27:44 -   1.192
+++ dev/pci/if_ix.c 21 Apr 2023 18:25:35 -
@@ -640,9 +640,8 @@ ixgbe_rxrinfo(struct ix_softc *sc, struc
u_int n = 0;
 
if (sc->num_queues > 1) {
-   if ((ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF,
-   M_WAITOK | M_ZERO)) == NULL)
-   return (ENOMEM);
+   ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF,
+   M_WAITOK | M_ZERO);
} else
ifr = &ifr1;
 
Index: dev/pci/if_oce.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_oce.c,v
retrieving revision 1.106
diff -u -p -r1.106 if_oce.c
--- dev/pci/if_oce.c11 Mar 2022 18:00:48 -  1.106
+++ dev/pci/if_oce.c21 Apr 2023 18:25:35 -
@@ -902,9 +902,8 @@ oce_rxrinfo(struct oce_softc *sc, struct
u_int n = 0;
 
if (sc->sc_nrq > 1) {
-   if ((ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF,
-   M_WAITOK | M_ZERO)) == NULL)
-   return (ENOMEM);
+   ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF,
+   M_WAITOK | M_ZERO);
} else
ifr = &ifr1;
 



Re: Introduce `rtlabel_mtx' mutex(9) ...

2023-04-26 Thread Alexander Bluhm
On Mon, Apr 24, 2023 at 06:31:03PM +0300, Vitaliy Makkoveev wrote:
> ... and use it to protect route labels storage. This time it is not
> clean, which lock protects it because we holding kernel and net locks in
> various combinations while accessing it. I see no reason to put kernel
> lock to all the places. Also netlock could not be used, because rtfree()
> which calls rtlabel_unref() has unknown netlock state within.
> 
> This new `rtlabel_mtx' mutex(9) protects `rt_labels' list and `label'
> entry dereference. Since we don't export 'rt_label' structure, I want to
> keep this lock private to net/route.c. For this reason rtlabel_id2name()
> now copies label string to externally passed buffer instead of returning
> address of `rt_labels' list data. This is the way which rtlabel_id2sa()
> already works.
> 
> ok?

I did run this though perform test.  Full regress still running due
to unrelated tree breakage.  I don't expect fallout.

OK bluhm@

> Index: sys/net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.692
> diff -u -p -r1.692 if.c
> --- sys/net/if.c  22 Apr 2023 04:39:46 -  1.692
> +++ sys/net/if.c  24 Apr 2023 15:11:33 -
> @@ -2383,7 +2383,6 @@ ifioctl_get(u_long cmd, caddr_t data)
>   char ifrtlabelbuf[RTLABEL_LEN];
>   int error = 0;
>   size_t bytesdone;
> - const char *label;
>  
>   switch(cmd) {
>   case SIOCGIFCONF:
> @@ -2458,9 +2457,8 @@ ifioctl_get(u_long cmd, caddr_t data)
>   break;
>  
>   case SIOCGIFRTLABEL:
> - if (ifp->if_rtlabelid &&
> - (label = rtlabel_id2name(ifp->if_rtlabelid)) != NULL) {
> - strlcpy(ifrtlabelbuf, label, RTLABEL_LEN);
> + if (ifp->if_rtlabelid && rtlabel_id2name(ifp->if_rtlabelid,
> + ifrtlabelbuf, RTLABEL_LEN) != NULL) {
>   error = copyoutstr(ifrtlabelbuf, ifr->ifr_data,
>   RTLABEL_LEN, &bytesdone);
>   } else
> Index: sys/net/pf_ioctl.c
> ===
> RCS file: /cvs/src/sys/net/pf_ioctl.c,v
> retrieving revision 1.397
> diff -u -p -r1.397 pf_ioctl.c
> --- sys/net/pf_ioctl.c6 Jan 2023 17:44:34 -   1.397
> +++ sys/net/pf_ioctl.c24 Apr 2023 15:11:33 -
> @@ -491,14 +491,10 @@ pf_rtlabel_remove(struct pf_addr_wrap *a
>  void
>  pf_rtlabel_copyout(struct pf_addr_wrap *a)
>  {
> - const char  *name;
> -
>   if (a->type == PF_ADDR_RTLABEL && a->v.rtlabel) {
> - if ((name = rtlabel_id2name(a->v.rtlabel)) == NULL)
> + if (rtlabel_id2name(a->v.rtlabel, a->v.rtlabelname,
> + sizeof(a->v.rtlabelname)) == NULL)
>   strlcpy(a->v.rtlabelname, "?",
> - sizeof(a->v.rtlabelname));
> - else
> - strlcpy(a->v.rtlabelname, name,
>   sizeof(a->v.rtlabelname));
>   }
>  }
> Index: sys/net/route.c
> ===
> RCS file: /cvs/src/sys/net/route.c,v
> retrieving revision 1.416
> diff -u -p -r1.416 route.c
> --- sys/net/route.c   28 Jan 2023 10:17:16 -  1.416
> +++ sys/net/route.c   24 Apr 2023 15:11:33 -
> @@ -113,6 +113,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -137,6 +138,12 @@
>  #include 
>  #endif
>  
> +/*
> + * Locks used to protect struct members:
> + *  I   immutable after creation
> + *  L   rtlabel_mtx
> + */
> +
>  #define ROUNDUP(a) (a>0 ? (1 + (((a) - 1) | (sizeof(long) - 1))) : 
> sizeof(long))
>  
>  /* Give some jitter to hash, to avoid synchronization between routers. */
> @@ -163,13 +170,15 @@ static int rt_copysa(struct sockaddr *, 
>  #define  LABELID_MAX 5
>  
>  struct rt_label {
> - TAILQ_ENTRY(rt_label)   rtl_entry;
> - charrtl_name[RTLABEL_LEN];
> - u_int16_t   rtl_id;
> - int rtl_ref;
> + TAILQ_ENTRY(rt_label)   rtl_entry;  /* [L] */
> + charrtl_name[RTLABEL_LEN];  /* [I] */
> + u_int16_t   rtl_id; /* [I] */
> + int rtl_ref;/* [L] */
>  };
>  
> -TAILQ_HEAD(rt_labels, rt_label)  rt_labels = 
> TAILQ_HEAD_INITIALIZER(rt_labels);
> +TAILQ_HEAD(rt_labels, rt_label)  rt_labels =
> +TAILQ_HEAD_INITIALIZER(rt_labels);   /* [L] */
> +struct mutex rtlabel_mtx = MUTEX_INITIALIZER(IPL_NET);
>  
>  void
>  route_init(void)
> @@ -1603,15 +1612,17 @@ u_int16_t
>  rtlabel_name2id(char *name)
>  {
>   struct rt_label *label, *p;
> - u_int16_tnew_id = 1;
> + u_int16_tnew_id = 1, id = 0;
>  
>   if (!name[0])
>   return (0);
>  
> + mtx_enter(&rtlab

Re: arpresolve reduce kernel lock

2023-04-26 Thread Vitaliy Makkoveev
> On 26 Apr 2023, at 15:30, Hrvoje Popovski  wrote:
> 
> On 26.4.2023. 12:15, Alexander Bluhm wrote:
>> On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote:
>>> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote:
 On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> Mutex arp_mtx protects the llinfo_arp la_...  fields.  So kernel
> lock is only needed for changing the route rt_flags.
> 
> Of course there is a race between checking and setting rt_flags.
> But the other checks of the RTF_REJECT flags were without kernel
> lock before.  This does not cause trouble, the worst thing that may
> happen is to wait another exprire time for ARP retry.  My diff does
> not make it worse, reading rt_flags and rt_expire is done without
> lock anyway.
> 
> The kernel lock is needed to change rt_flags.  Testing without
> KERNEL_LOCK() caused crashes.
> 
 Hi,
 
 I'm interesting is the system stable with the diff below? If so, we
 could avoid kernel lock in the arpresolve().
>>> I could not crash it.
>> I was too fast.  Just after writing this mail I restarted the test.
> 
> Hi,
> 
> my boxes are still ok with mvs@ diff even if I'm running arp -ad in loop.
> 

Thanks for testing. It seems rtfree(rt->rt_parent) was called twice, so
I’m confusing a little with this unlocked RTF_REJECT check.  


Re: acpithinkpad: do not report fans running at 65535 rpm

2023-04-26 Thread Klemens Nanni
On Mon, Apr 24, 2023 at 02:07:11PM +, Miod Vallat wrote:
> After suspending a machine with acpithinkpad(4) and resuming, the fan
> senors report a value of 65535 (i.e. 0x) for a few seconds, and
> then start reporting correct values.

I don't see these bogus values on an intel t14 gen3 when I resume into
'systat -s.1 sens', but your diff makes sense and does not break here.

OK kn

> 
> The following diff marks the sensor as invalid when such a value is
> read.
> 
> Index: acpithinkpad.c
> ===
> RCS file: /OpenBSD/src/sys/dev/acpi/acpithinkpad.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 acpithinkpad.c
> --- acpithinkpad.c9 Apr 2023 17:50:02 -   1.71
> +++ acpithinkpad.c24 Apr 2023 14:05:18 -
> @@ -287,7 +287,12 @@ thinkpad_sensor_refresh(void *arg)
>   /* Read fan RPM */
>   acpiec_read(sc->sc_ec, THINKPAD_ECOFFSET_FANLO, 1, &lo);
>   acpiec_read(sc->sc_ec, THINKPAD_ECOFFSET_FANHI, 1, &hi);
> - sc->sc_sens[THINKPAD_SENSOR_FANRPM].value = ((hi << 8L) + lo);
> + if (hi == 0xff && lo == 0xff) {
> + sc->sc_sens[THINKPAD_SENSOR_FANRPM].flags = SENSOR_FINVALID;
> + } else {
> + sc->sc_sens[THINKPAD_SENSOR_FANRPM].value = ((hi << 8L) + lo);
> + sc->sc_sens[THINKPAD_SENSOR_FANRPM].flags = 0;
> + }
>  }
>  
>  void
> 



Re: arpresolve reduce kernel lock

2023-04-26 Thread Hrvoje Popovski
On 26.4.2023. 12:15, Alexander Bluhm wrote:
> On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote:
>> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote:
>>> On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote:
 Hi,

 Mutex arp_mtx protects the llinfo_arp la_...  fields.  So kernel
 lock is only needed for changing the route rt_flags.

 Of course there is a race between checking and setting rt_flags.
 But the other checks of the RTF_REJECT flags were without kernel
 lock before.  This does not cause trouble, the worst thing that may
 happen is to wait another exprire time for ARP retry.  My diff does
 not make it worse, reading rt_flags and rt_expire is done without
 lock anyway.

 The kernel lock is needed to change rt_flags.  Testing without
 KERNEL_LOCK() caused crashes.

>>> Hi,
>>>
>>> I'm interesting is the system stable with the diff below? If so, we
>>> could avoid kernel lock in the arpresolve().
>> I could not crash it.
> I was too fast.  Just after writing this mail I restarted the test.

Hi,

my boxes are still ok with mvs@ diff even if I'm running arp -ad in loop.



Re: DIOCGETRULE is slow for large rulesets (3/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

On Wed, Apr 26, 2023 at 01:25:45AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> this is 3rd of three diffs to improve DIOCGETRULE ioctl operation.
> the summary of changes done here is as follows:
> - add new members to pf_trans structure so it can
>   hold data for DIOCGETRULE operation
> 
> - DIOCGETRULES opens transaction and returns ticket/transaction id
>   to pfctl. pfctl uses ticket in DIOCGETRULE to identify transaction
>   when it retrieves the rule
> 
> - change introduces new reference counter to pf_anchor which is used
>   by pf_trans to keep reference to pf_anchor object.
> 
> - DIOCGETRULES opens transaction and stores a reference to anchor
>   with current ruleset version and pointer to the first rule
> 
> - DIOCGETRULE looks up transaction, verifies version number is
>   same so it can safely access next rule.
> 
> - pfctl when handling 'pfctl -sr -R n' must iterate to
>   n-th rule one-by-one.
> 

updating diff to accommodate changes in diff 2/3 [1] in the stack
of changes.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-tech&m=168251129621089&w=2

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 8cbd9d77b4f..3787e97a6b1 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, &pr) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, &pr) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 8ce0af53a89..d294a8a3b3c 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint64_t);
 voidpf_free_trans(struct pf_trans *);
 voidpf_rollback_trans(struct pf_trans *);
 
+voidpf_init_tgetrule(struct pf_trans *,
+   struct pf_anchor *, uint32_t, struct pf_rule *);
+voidpf_cleanup_tgetrule(struct pf_trans *t);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -1470,7 +1474,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
case DIOCGETRULES: {
struct pfioc_rule   *pr = (struct pfioc_rule *)addr;
struct pf_ruleset   *ruleset;
-   struct pf_rule  *tail;
+   struct pf_rule  *rule;
+   struct pf_trans *t;
+   u_int32_truleset_version;
 
NET_LOCK();
PF_LOCK();
@@ -1482,14 +1488,21 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
NET_UNLOCK();
goto fail;
}
-   tail = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue);
-   if (tail)
-   pr->nr = tail->nr + 1;
+   rule = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue);
+   if (rule)
+   pr->nr = rule->nr + 1;
else
pr->nr = 0;
-   pr->ticket = ruleset->rules.active.version;
+   ruleset_version = ruleset->rules.active.version;
+   pf_anchor_take(ruleset->anchor);
+   rule = TAILQ_FIRST(ruleset->rules.active.ptr);
PF_UNLOCK();
NET_UNLO

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

On Wed, Apr 26, 2023 at 11:51:26AM +, Gerhard Roth wrote:
> On Wed, 2023-04-26 at 13:47 +0200, Alexandr Nedvedicky wrote:

> > @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
> > ??int
> > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p)
> > ??{
> > +??struct pf_trans *w, *s;
> > +??LIST_HEAD(, pf_trans)??tmp_list;
> > +??uint32_t unit = minor(dev);
> > +
> > +??if (minor(dev) >= 1)
> > +??return (ENXIO);
> 
> If you want to use the minor dev-id to release the transaction,
> the above two lines should go away.
> 
> 

yes, you are right. thanks for spotting that.

updated diff is below.

regards
sashan

8<---8<---8<--8<

diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 1141069dcf6..74af2b74ecb 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(uint32_t);
+struct pf_trans*pf_find_trans(uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+   uint32_t unit = minor(dev);
+
+   LIST_INIT(&tmp_list);
+   rw_enter_write(&pfioctl_rw);
+   LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
+   if (w->pft_unit == unit) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(&pfioctl_rw);
+
+   while ((w = LIST_FIRST(&tmp_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+
return (0);
 }
 
@@ -1142,10 +1168,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   if (flags & FWRITE)
-   rw_enter_write(&pfioctl_rw);
-   else
-   rw_enter_read(&pfioctl_rw);
+   rw_enter_write(&pfioctl_rw);
 
switch (cmd) {
 
@@ -3022,10 +3045,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
 fail:
-   if (flags & FWRITE)
-   rw_exit_write(&pfioctl_rw);
-   else
-   rw_exit_read(&pfioctl_rw);
+   rw_exit_write(&pfioctl_rw);
 
return (error);
 }
@@ -3244,3 +3264,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, 
size_t newlen)
 
return sysctl_rdstruct(oldp, oldlenp, newp, &pfs, sizeof(pfs));
 }
+
+struct pf_trans *
+pf_open_trans(uint32_t unit)
+{
+   static uint64_t ticket = 1;
+   struct pf_trans *t;
+
+   rw_assert_wrlock(&pfioctl_rw);
+
+   t = malloc(sizeof(*t), M_TEMP, M_WAITOK);
+   memset(t, 0, sizeof(struct pf_trans));
+   t->pft_unit = unit;
+   t->pft_ticket = ticket++;
+
+   LIST_INSERT_HEAD(&pf_ioctl_trans, t, pft_entry);
+
+   return (t);
+}
+
+struct pf_trans *
+pf_find_trans(uint64_t ticket)
+{
+   struct pf_trans *t;
+
+   rw_assert_anylock(&pfioctl_rw);
+
+   LIST_FOREACH(t, &pf_ioctl_trans, pft_entry) {
+   if (t->pft_ticket == ticket)
+   break;
+   }
+
+   return (t);
+}
+
+void
+pf_free_trans(struct pf_trans *t)
+{
+   free(t, M_TEMP, sizeof(*t));
+}
+
+void
+pf_rollback_trans(struct pf_trans *t)
+{
+   if (t != NULL) {
+   rw_assert_wrlock(&pfioctl_rw);
+   LIST_REMOVE(t, pft_entry);
+   pf_free_trans(t);
+   }
+}
diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h
index 38fff6473aa..5af2027733a 100644
--- a/sys/net/pfvar_priv.h
+++ b/sys/net/pfvar_priv.h
@@ -322,6 +322,17 @@ enum {
 
 extern struct cpumem *pf_anchor_stack;
 
+enum pf_trans_type {
+   PF_TRANS_NONE,
+   PF_TRANS_MAX
+};
+
+struct pf_trans {
+   LIST_ENTRY(pf_trans)pft_entry;
+   uint32_tpft_unit;   /* process id */
+   uint64_tpft_ticket;
+   enum pf_trans_type  pft_type;
+};
 extern struct task pf_purge_task;
 e

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Gerhard Roth
On Wed, 2023-04-26 at 13:47 +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote:
> > >  fail:
> > > -   if (flags & FWRITE)
> > > -   rw_exit_write(&pfioctl_rw);
> > > -   else
> > > -   rw_exit_read(&pfioctl_rw);
> > > +   rw_exit_write(&pfioctl_rw);
> > 
> > i dont think having the open mode flags affect whether you take a shared
> > or exclusive lock makes sense anyway, so im happy with these bits.
> 
>     while updating my diff I've noticed pfclose() needs
>     same change: dropping 'flags & FWRITE' test. This is
>     fixed i new diff below.
> 
> 
> > 
> > int unit = minor(dev);
> > 
> > if (unit & ((1 << CLONE_SHIFT) - 1))
> > return (ENXIO);
> > 
> > this has some ties into the second issue, which is that we shouldn't
> > use the pid as an identifier for state across syscalls like this
> > in the kernel. the reason is that userland could be weird or buggy
> > (or hostile) and fork in between creating a transaction and ending
> > a transaction, but it will still have the fd it from from open(/dev/pf).
> > instead, we should be using the whole dev_t or the minor number,
> > as long as it includes the bits that the cloning infrastructure
> > uses, as the identifier.
> 
>     in new diff I'm using a minor(dev) to identify transaction owner.
> > 
> > i would also check that the process^Wminor number that created a
> > transaction is the same when looking up a transaction in pf_find_trans.
> 
>     the pf_find_trans() is a dead code in diff below as it is
>     not being used there. Just to make sure I got things right
>     so I can update '3/3' diff in stack accordingly. Let's assume
>     snippet below comes from pfioctl() function
> 
> int   
>   
>   
>     
> pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) {
> ...
> t = pf_find_trans(ticket);
> if (t == NULL)
> return (ENXIO);
> 
> KASSERT(t->pft_unit == minor(dev));
> 
>     is that kind of check in KASSET() something you have on your mind?
>     perhaps I can trade KASSERT() to regular code:
> 
> if (t->pft_unit != minor(dev))
> return (EPERM);
> 
> 
> thank you for pointers to bpf.c updated diff is below.
> 
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> index 1141069dcf6..b9904c545c5 100644
> --- a/sys/net/pf_ioctl.c
> +++ b/sys/net/pf_ioctl.c
> @@ -117,6 +117,11 @@ void    pf_qid_unref(u_int16_t);
>  int pf_states_clr(struct pfioc_state_kill *);
>  int pf_states_get(struct pfioc_states *);
>  
> +struct pf_trans*pf_open_trans(uint32_t);
> +struct pf_trans*pf_find_trans(uint64_t);
> +void    pf_free_trans(struct pf_trans *);
> +void    pf_rollback_trans(struct pf_trans *);
> +
>  struct pf_rule  pf_default_rule, pf_default_rule_new;
>  
>  struct {
> @@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap 
> *);
>  void    pf_rtlabel_remove(struct pf_addr_wrap *);
>  void    pf_rtlabel_copyout(struct pf_addr_wrap *);
>  
> +uint64_t trans_ticket = 1;
> +LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
>  
>  void
>  pfattach(int num)
> @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
>  int
>  pfclose(dev_t dev, int flags, int fmt, struct proc *p)
>  {
> +   struct pf_trans *w, *s;
> +   LIST_HEAD(, pf_trans)   tmp_list;
> +   uint32_t unit = minor(dev);
> +
> +   if (minor(dev) >= 1)
> +   return (ENXIO);

If you want to use the minor dev-id to release the transaction,
the above two lines should go away.


> +
> +   LIST_INIT(&tmp_list);
> +   rw_enter_write(&pfioctl_rw);
> +   LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
> +   if (w->pft_unit == unit) {
> +   LIST_REMOVE(w, pft_entry);
> +   LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
> +   }
> +   }
> +   rw_exit_write(&pfioctl_rw);
> +
> +   while ((w = LIST_FIRST(&tmp_list)) != NULL) {
> +   LIST_REMOVE(w, pft_entry);
> +   pf_free_trans(w);
> +   }
> +
> return (0);
>  }
>  
> @@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
> return (EACCES);
> }
>  
> -   if (flags & FWRITE)
> -   rw_enter_write(&pfioctl_rw);
> -   else
> -   rw_enter_read(&pfioctl_rw);
> 

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,


On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote:
> >  fail:
> > -   if (flags & FWRITE)
> > -   rw_exit_write(&pfioctl_rw);
> > -   else
> > -   rw_exit_read(&pfioctl_rw);
> > +   rw_exit_write(&pfioctl_rw);
> 
> i dont think having the open mode flags affect whether you take a shared
> or exclusive lock makes sense anyway, so im happy with these bits.

while updating my diff I've noticed pfclose() needs
same change: dropping 'flags & FWRITE' test. This is
fixed i new diff below.


> 
>   int unit = minor(dev);
> 
>   if (unit & ((1 << CLONE_SHIFT) - 1))
>   return (ENXIO);
> 
> this has some ties into the second issue, which is that we shouldn't
> use the pid as an identifier for state across syscalls like this
> in the kernel. the reason is that userland could be weird or buggy
> (or hostile) and fork in between creating a transaction and ending
> a transaction, but it will still have the fd it from from open(/dev/pf).
> instead, we should be using the whole dev_t or the minor number,
> as long as it includes the bits that the cloning infrastructure
> uses, as the identifier.

in new diff I'm using a minor(dev) to identify transaction owner.
> 
> i would also check that the process^Wminor number that created a
> transaction is the same when looking up a transaction in pf_find_trans.

the pf_find_trans() is a dead code in diff below as it is
not being used there. Just to make sure I got things right
so I can update '3/3' diff in stack accordingly. Let's assume
snippet below comes from pfioctl() function

int 

  
pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) {
...
t = pf_find_trans(ticket);
if (t == NULL)
return (ENXIO);

KASSERT(t->pft_unit == minor(dev));

is that kind of check in KASSET() something you have on your mind?
perhaps I can trade KASSERT() to regular code:

if (t->pft_unit != minor(dev))
return (EPERM);


thank you for pointers to bpf.c updated diff is below.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 1141069dcf6..b9904c545c5 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(uint32_t);
+struct pf_trans*pf_find_trans(uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+   uint32_t unit = minor(dev);
+
+   if (minor(dev) >= 1)
+   return (ENXIO);
+
+   LIST_INIT(&tmp_list);
+   rw_enter_write(&pfioctl_rw);
+   LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
+   if (w->pft_unit == unit) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(&pfioctl_rw);
+
+   while ((w = LIST_FIRST(&tmp_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+
return (0);
 }
 
@@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   if (flags & FWRITE)
-   rw_enter_write(&pfioctl_rw);
-   else
-   rw_enter_read(&pfioctl_rw);
+   rw_enter_write(&pfioctl_rw);
 
switch (cmd) {
 
@@ -3022,10 +3048,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
 fail:
-   if (flags & FWRITE)
-   rw_exit_write(&pfioctl_rw);
-   else
-   rw_exit_read(&pfioctl_rw);
+   rw_exit_write(&pfioctl_rw);
 
return (error);
 }
@@ -3244,3 +3267,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *

Re: arpresolve reduce kernel lock

2023-04-26 Thread Alexander Bluhm
On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote:
> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote:
> > On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote:
> > > Hi,
> > >
> > > Mutex arp_mtx protects the llinfo_arp la_...  fields.  So kernel
> > > lock is only needed for changing the route rt_flags.
> > >
> > > Of course there is a race between checking and setting rt_flags.
> > > But the other checks of the RTF_REJECT flags were without kernel
> > > lock before.  This does not cause trouble, the worst thing that may
> > > happen is to wait another exprire time for ARP retry.  My diff does
> > > not make it worse, reading rt_flags and rt_expire is done without
> > > lock anyway.
> > >
> > > The kernel lock is needed to change rt_flags.  Testing without
> > > KERNEL_LOCK() caused crashes.
> > >
> >
> > Hi,
> >
> > I'm interesting is the system stable with the diff below? If so, we
> > could avoid kernel lock in the arpresolve().
> 
> I could not crash it.

I was too fast.  Just after writing this mail I restarted the test.

[0] 0:arp- 1:ksh*"ot31.obsd-lab.genua.d" 12:00 
26-Apr-23ESC[mESC(BESC[23;18Hpanic: pool_do_get: art_node free list modified: 
page 0xfd8747128000; item addr 0xfd8747128410; offset 
0x0=0x182f4660f2a7188a != 0x182f4660f2a71889
Stopped at  db_enter+0x14:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
  45805  80626  0 0x14000  0x2003  reaper
 353816  99629  0 0x14000  0x2001  softnet
 487701  10647  0 0x14000  0x2002  softnet
 152789  43620  0 0x14000  0x2007  softnet
*356742  68683  0 0x14000  0x2005  softnet
db_enter() at db_enter+0x14
panic(8213f5d0) at panic+0xc3
pool_do_get(824ae060,a,8000247b71b4) at pool_do_get+0x321
pool_get(824ae060,a) at pool_get+0x9a
art_get(827ceac0,20) at art_get+0x30
rtable_insert(0,827ceac0,0,8000247b72f0,3,fd8745e4a948) at rtab
le_insert+0x1a2
rtrequest(b,8000247b73f8,3,8000247b7498,0) at rtrequest+0x613
rt_clone(8000247b7500,8000247b7558,0) at rt_clone+0x77
rtalloc_mpath(8000247b7558,fd800369aad8,0) at rtalloc_mpath+0x50
in_ouraddr(fd80a94fcd00,8077e048,8000247b75d8) at in_ouraddr+0x
88
ip_input_if(8000247b7678,8000247b7684,4,0,8077e048) at ip_input
ipv4_input(8077e048,fd80a94fcd00) at ipv4_input+0x3d
ether_input(8077e048,fd80a94fcd00) at ether_input+0x3b5
if_input_process(8077e048,8000247b7768) at if_input_process+0x6f
end trace frame: 0x8000247b77b0, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{5}> show panic
*cpu5: pool_do_get: art_node free list modified: page 0xfd8747128000; item a
ddr 0xfd8747128410; offset 0x0=0x182f4660f2a7188a != 0x182f4660f2a71889

ddb{5}> show register
rdi0
rsi 0x14
rbp   0x8000247b7060
rbx0
rdx   0xc000
rcx0x286
rax 0x9c
r8 0x101010101010101
r9 0
r10   0xcdd678d0954ec026
r11   0x92611e3f4c85263e
r12   0x80002252e990
r130
r140
r15   0x8213f5d0cy_pio_rec+0x1ea86
rip   0x81b0f124db_enter+0x14
cs   0x8
rflags 0x282
rsp   0x8000247b7060
ss  0x10
db_enter+0x14:  popq%rbp

ddb{5}> trace
db_enter() at db_enter+0x14
panic(8213f5d0) at panic+0xc3
pool_do_get(824ae060,a,8000247b71b4) at pool_do_get+0x321
pool_get(824ae060,a) at pool_get+0x9a
art_get(827ceac0,20) at art_get+0x30
rtable_insert(0,827ceac0,0,8000247b72f0,3,fd8745e4a948) at rtab
le_insert+0x1a2
rtrequest(b,8000247b73f8,3,8000247b7498,0) at rtrequest+0x613
rt_clone(8000247b7500,8000247b7558,0) at rt_clone+0x77
rtalloc_mpath(8000247b7558,fd800369aad8,0) at rtalloc_mpath+0x50
in_ouraddr(fd80a94fcd00,8077e048,8000247b75d8) at in_ouraddr+0x
88
ip_input_if(8000247b7678,8000247b7684,4,0,8077e048) at ip_input
_if+0x1f0
ipv4_input(8077e048,fd80a94fcd00) at ipv4_input+0x3d
ether_input(8077e048,fd80a94fcd00) at ether_input+0x3b5
if_input_process(8077e048,8000247b7768) at if_input_process+0x6f
ifiq_process(80782400) at ifiq_process+0x75
taskq_thread(80036000) at taskq_thread+0x100
end trace frame: 0x0, count: -16

ddb{5}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
  3208  21

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread David Gwynne
On Wed, Apr 26, 2023 at 09:49:18AM +, Gerhard Roth wrote:
> On Wed, 2023-04-26 at 19:42 +1000, David Gwynne wrote:
> > On Wed, Apr 26, 2023 at 07:48:18AM +, Gerhard Roth wrote:
> > > On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote:
> > > > Hello,
> > > > 
> > > > This is the second diff. It introduces a transaction (pf_trans).
> > > > It's more or less diff with dead code.
> > > > 
> > > > It's still worth to note those two chunks in this diff:
> > > > 
> > > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > > > flags, struct proc *p)
> > > > return (EACCES);
> > > > }
> > > > ??
> > > > -??if (flags & FWRITE)
> > > > -??rw_enter_write(&pfioctl_rw);
> > > > -??else
> > > > -??rw_enter_read(&pfioctl_rw);
> > > > +??rw_enter_write(&pfioctl_rw);
> > > > ??
> > > > switch (cmd) {
> > > > ??
> > > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > > > flags, struct proc *p)
> > > > break;
> > > > }
> > > > ??fail:
> > > > -??if (flags & FWRITE)
> > > > -??rw_exit_write(&pfioctl_rw);
> > > > -??else
> > > > -??rw_exit_read(&pfioctl_rw);
> > > > +??rw_exit_write(&pfioctl_rw);
> > > > ??
> > > > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4).
> > > > I'd like to also this lock to serialize access to table of transactions.
> > > > To keep things simple for now I'd like to make every process to perform
> > > > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll
> > > > be pushing operation on pfioctl_rw further down to individual ioctl
> > > > operations.
> > > > 
> > > > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans()
> > > > introduced in this diff are unused. They will be brought alive in
> > > > the 3rd diff.
> > > > 
> > > > thanks and
> > > > regards
> > > > sashan
> > > > 
> > > > 8<---8<---8<--8<
> > > > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> > > > index 7ea22050506..7e4c7d2a2ab 100644
> > > > --- a/sys/net/pf_ioctl.c
> > > > +++ b/sys/net/pf_ioctl.c
> > > > @@ -117,6 +117,11 @@ void?? 
> > > > pf_qid_unref(u_int16_t);
> > > > ??int pf_states_clr(struct 
> > > > pfioc_state_kill *);
> > > > ??int pf_states_get(struct 
> > > > pfioc_states *);
> > > > ??
> > > > +struct pf_trans*pf_open_trans(pid_t);
> > > > +struct 
> > > > pf_trans*pf_find_trans(uint64_t);
> > > > +void?? pf_free_trans(struct 
> > > > pf_trans *);
> > > > +void?? pf_rollback_trans(struct 
> > > > pf_trans *);
> > > > +
> > > > ??struct pf_rule?? pf_default_rule, pf_default_rule_new;
> > > > ??
> > > > ??struct {
> > > > @@ -168,6 +173,8 @@ int?? 
> > > > pf_rtlabel_add(struct pf_addr_wrap *);
> > > > ??void?? pf_rtlabel_remove(struct 
> > > > pf_addr_wrap *);
> > > > ??void?? pf_rtlabel_copyout(struct 
> > > > pf_addr_wrap *);
> > > > ??
> > > > +uint64_t trans_ticket = 1;
> > > > +LIST_HEAD(, pf_trans)pf_ioctl_trans = 
> > > > LIST_HEAD_INITIALIZER(pf_trans);
> > > > ??
> > > > ??void
> > > > ??pfattach(int num)
> > > > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc 
> > > > *p)
> > > > ??int
> > > > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p)
> > > > ??{
> > > > +??struct pf_trans *w, *s;
> > > > +??LIST_HEAD(, pf_trans)??tmp_list;
> > > > +
> > > > +??if (minor(dev) >= 1)
> > > > +??return (ENXIO);
> > > > +
> > > > +??if (flags & FWRITE) {
> > > > +??LIST_INIT(&tmp_list);
> > > > +??rw_enter_write(&pfioctl_rw);
> > > > +??LIST_FOREACH_SAFE(w, &pf_ioctl_trans, 
> > > > pft_entry, s) {
> > > > +??if (w->pft_pid == 
> > > > p->p_p->ps_pid) {
> > > > +??LIST_REMOVE(w,
> > > >  pft_entry);
> > > > +??LIST_INSERT_HEAD(&tmp_list,
> > > >  w, pft_entry);
> > > > +??}
> > > > +??}
> > > > +??rw_exit_write(&pfioctl_rw);
> > > > +
> > > > +??while ((w = LIST_FIR

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Gerhard Roth
On Wed, 2023-04-26 at 19:42 +1000, David Gwynne wrote:
> On Wed, Apr 26, 2023 at 07:48:18AM +, Gerhard Roth wrote:
> > On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote:
> > > Hello,
> > > 
> > > This is the second diff. It introduces a transaction (pf_trans).
> > > It's more or less diff with dead code.
> > > 
> > > It's still worth to note those two chunks in this diff:
> > > 
> > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > > flags, struct proc *p)
> > > return (EACCES);
> > > }
> > > ??
> > > -??if (flags & FWRITE)
> > > -??rw_enter_write(&pfioctl_rw);
> > > -??else
> > > -??rw_enter_read(&pfioctl_rw);
> > > +??rw_enter_write(&pfioctl_rw);
> > > ??
> > > switch (cmd) {
> > > ??
> > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > > flags, struct proc *p)
> > > break;
> > > }
> > > ??fail:
> > > -??if (flags & FWRITE)
> > > -??rw_exit_write(&pfioctl_rw);
> > > -??else
> > > -??rw_exit_read(&pfioctl_rw);
> > > +??rw_exit_write(&pfioctl_rw);
> > > ??
> > > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4).
> > > I'd like to also this lock to serialize access to table of transactions.
> > > To keep things simple for now I'd like to make every process to perform
> > > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll
> > > be pushing operation on pfioctl_rw further down to individual ioctl
> > > operations.
> > > 
> > > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans()
> > > introduced in this diff are unused. They will be brought alive in
> > > the 3rd diff.
> > > 
> > > thanks and
> > > regards
> > > sashan
> > > 
> > > 8<---8<---8<--8<
> > > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> > > index 7ea22050506..7e4c7d2a2ab 100644
> > > --- a/sys/net/pf_ioctl.c
> > > +++ b/sys/net/pf_ioctl.c
> > > @@ -117,6 +117,11 @@ void?? 
> > > pf_qid_unref(u_int16_t);
> > > ??int pf_states_clr(struct 
> > > pfioc_state_kill *);
> > > ??int pf_states_get(struct 
> > > pfioc_states *);
> > > ??
> > > +struct pf_trans*pf_open_trans(pid_t);
> > > +struct pf_trans*pf_find_trans(uint64_t);
> > > +void?? pf_free_trans(struct pf_trans 
> > > *);
> > > +void?? pf_rollback_trans(struct 
> > > pf_trans *);
> > > +
> > > ??struct pf_rule?? pf_default_rule, pf_default_rule_new;
> > > ??
> > > ??struct {
> > > @@ -168,6 +173,8 @@ int?? 
> > > pf_rtlabel_add(struct pf_addr_wrap *);
> > > ??void?? pf_rtlabel_remove(struct 
> > > pf_addr_wrap *);
> > > ??void?? pf_rtlabel_copyout(struct 
> > > pf_addr_wrap *);
> > > ??
> > > +uint64_t trans_ticket = 1;
> > > +LIST_HEAD(, pf_trans)pf_ioctl_trans = 
> > > LIST_HEAD_INITIALIZER(pf_trans);
> > > ??
> > > ??void
> > > ??pfattach(int num)
> > > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
> > > ??int
> > > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p)
> > > ??{
> > > +??struct pf_trans *w, *s;
> > > +??LIST_HEAD(, pf_trans)??tmp_list;
> > > +
> > > +??if (minor(dev) >= 1)
> > > +??return (ENXIO);
> > > +
> > > +??if (flags & FWRITE) {
> > > +??LIST_INIT(&tmp_list);
> > > +??rw_enter_write(&pfioctl_rw);
> > > +??LIST_FOREACH_SAFE(w, &pf_ioctl_trans, 
> > > pft_entry, s) {
> > > +??if (w->pft_pid == 
> > > p->p_p->ps_pid) {
> > > +??LIST_REMOVE(w,
> > >  pft_entry);
> > > +??LIST_INSERT_HEAD(&tmp_list,
> > >  w, pft_entry);
> > > +??}
> > > +??}
> > > +??rw_exit_write(&pfioctl_rw);
> > > +
> > > +??while ((w = LIST_FIRST(&tmp_list)) != 
> > > NULL) {
> > > +??LIST_REMOVE(w, pft_entry);
> > > +??pf_free_trans(w);
> > > +??}
> > > +??}
> > > +
> > > return (0);
> > > ??}
> 

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread David Gwynne
On Wed, Apr 26, 2023 at 07:48:18AM +, Gerhard Roth wrote:
> On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > This is the second diff. It introduces a transaction (pf_trans).
> > It's more or less diff with dead code.
> > 
> > It's still worth to note those two chunks in this diff:
> > 
> > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > flags, struct proc *p)
> > return (EACCES);
> > }
> > ??
> > -??if (flags & FWRITE)
> > -??rw_enter_write(&pfioctl_rw);
> > -??else
> > -??rw_enter_read(&pfioctl_rw);
> > +??rw_enter_write(&pfioctl_rw);
> > ??
> > switch (cmd) {
> > ??
> > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> > flags, struct proc *p)
> > break;
> > }
> > ??fail:
> > -??if (flags & FWRITE)
> > -??rw_exit_write(&pfioctl_rw);
> > -??else
> > -??rw_exit_read(&pfioctl_rw);
> > +??rw_exit_write(&pfioctl_rw);
> > ??
> > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4).
> > I'd like to also this lock to serialize access to table of transactions.
> > To keep things simple for now I'd like to make every process to perform
> > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll
> > be pushing operation on pfioctl_rw further down to individual ioctl
> > operations.
> > 
> > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans()
> > introduced in this diff are unused. They will be brought alive in
> > the 3rd diff.
> > 
> > thanks and
> > regards
> > sashan
> > 
> > 8<---8<---8<--8<
> > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> > index 7ea22050506..7e4c7d2a2ab 100644
> > --- a/sys/net/pf_ioctl.c
> > +++ b/sys/net/pf_ioctl.c
> > @@ -117,6 +117,11 @@ void?? 
> > pf_qid_unref(u_int16_t);
> > ??int pf_states_clr(struct 
> > pfioc_state_kill *);
> > ??int pf_states_get(struct 
> > pfioc_states *);
> > ??
> > +struct pf_trans*pf_open_trans(pid_t);
> > +struct pf_trans*pf_find_trans(uint64_t);
> > +void?? pf_free_trans(struct pf_trans 
> > *);
> > +void?? pf_rollback_trans(struct 
> > pf_trans *);
> > +
> > ??struct pf_rule?? pf_default_rule, pf_default_rule_new;
> > ??
> > ??struct {
> > @@ -168,6 +173,8 @@ int?? 
> > pf_rtlabel_add(struct pf_addr_wrap *);
> > ??void?? pf_rtlabel_remove(struct 
> > pf_addr_wrap *);
> > ??void?? pf_rtlabel_copyout(struct 
> > pf_addr_wrap *);
> > ??
> > +uint64_t trans_ticket = 1;
> > +LIST_HEAD(, pf_trans)pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
> > ??
> > ??void
> > ??pfattach(int num)
> > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
> > ??int
> > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p)
> > ??{
> > +??struct pf_trans *w, *s;
> > +??LIST_HEAD(, pf_trans)??tmp_list;
> > +
> > +??if (minor(dev) >= 1)
> > +??return (ENXIO);
> > +
> > +??if (flags & FWRITE) {
> > +??LIST_INIT(&tmp_list);
> > +??rw_enter_write(&pfioctl_rw);
> > +??LIST_FOREACH_SAFE(w, &pf_ioctl_trans, 
> > pft_entry, s) {
> > +??if (w->pft_pid == 
> > p->p_p->ps_pid) {
> > +??LIST_REMOVE(w,
> >  pft_entry);
> > +??LIST_INSERT_HEAD(&tmp_list,
> >  w, pft_entry);
> > +??}
> > +??}
> > +??rw_exit_write(&pfioctl_rw);
> > +
> > +??while ((w = LIST_FIRST(&tmp_list)) != NULL) {
> > +??LIST_REMOVE(w, pft_entry);
> > +??pf_free_trans(w);
> > +??}
> > +??}
> > +
> > return (0);
> > ??}
> 
> Kernel close() routines don't work the same way as user-land close() ones do.
> pfclose() is called only once when the last reference to /dev/pf goes away.
> There is no way you can keep track of your transactions like that.

we made /dev/pf clonable in src/sys/sys/conf.h r1.159 so we could do
t

Re: arpresolve reduce kernel lock

2023-04-26 Thread Alexander Bluhm
On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote:
> > Hi,
> > 
> > Mutex arp_mtx protects the llinfo_arp la_...  fields.  So kernel
> > lock is only needed for changing the route rt_flags.
> > 
> > Of course there is a race between checking and setting rt_flags.
> > But the other checks of the RTF_REJECT flags were without kernel
> > lock before.  This does not cause trouble, the worst thing that may
> > happen is to wait another exprire time for ARP retry.  My diff does
> > not make it worse, reading rt_flags and rt_expire is done without
> > lock anyway.
> > 
> > The kernel lock is needed to change rt_flags.  Testing without
> > KERNEL_LOCK() caused crashes.
> > 
> 
> Hi,
> 
> I'm interesting is the system stable with the diff below? If so, we
> could avoid kernel lock in the arpresolve().

I could not crash it.  Basically I run
while :; do arp -nd 10.6.31.33; arp -nd 10.6.16.36; done
on the router while forwarding between the two addresses.  So ARP
routes are constantly created and deleted.

But I think the diff is not correct.  While an update where rt_flags
is changed with kernel lock, cannot sneak into this atomic operation,
the ARP code can still corrupt rt_flags changes holding the kernel
lock.

Of course my stress test only triggers on side of the picture, the
problem is not exploited.

So if you just wanted to know, whether this kind of atomic operation
works, I would say yes.  To make it correct all rt_flags updates
would have to be changed in that way.  I doubt that this is the way
to go.  Maybe we could split rt_flags in multiple fields each with
its own locking strategy.

bluhm

> Index: sys/netinet/if_ether.c
> ===
> RCS file: /cvs/src/sys/netinet/if_ether.c,v
> retrieving revision 1.263
> diff -u -p -r1.263 if_ether.c
> --- sys/netinet/if_ether.c25 Apr 2023 16:24:25 -  1.263
> +++ sys/netinet/if_ether.c25 Apr 2023 20:55:08 -
> @@ -462,14 +462,20 @@ arpresolve(struct ifnet *ifp, struct rte
>   mtx_leave(&arp_mtx);
>  
>   if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) {
> - KERNEL_LOCK();
> - SET(rt->rt_flags, RTF_REJECT);
> - KERNEL_UNLOCK();
> + u_int flags;
> +
> + do {
> + flags = rt->rt_flags;
> + } while (atomic_cas_uint(&rt->rt_flags, flags,
> + flags | RTF_REJECT) != flags);
>   }
>   if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) {
> - KERNEL_LOCK();
> - CLR(rt->rt_flags, RTF_REJECT);
> - KERNEL_UNLOCK();
> + u_int flags;
> +
> + do {
> + flags = rt->rt_flags;
> + } while (atomic_cas_uint(&rt->rt_flags, flags,
> + flags & ~RTF_REJECT) != flags);
>   }
>   if (refresh)
>   arprequest(ifp, &satosin(rt->rt_ifa->ifa_addr)->sin_addr.s_addr,



Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Gerhard Roth
On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> This is the second diff. It introduces a transaction (pf_trans).
> It's more or less diff with dead code.
> 
> It's still worth to note those two chunks in this diff:
> 
> @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
> return (EACCES);
> }
>  
> -   if (flags & FWRITE)
> -   rw_enter_write(&pfioctl_rw);
> -   else
> -   rw_enter_read(&pfioctl_rw);
> +   rw_enter_write(&pfioctl_rw);
>  
> switch (cmd) {
>  
> @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
> break;
> }
>  fail:
> -   if (flags & FWRITE)
> -   rw_exit_write(&pfioctl_rw);
> -   else
> -   rw_exit_read(&pfioctl_rw);
> +   rw_exit_write(&pfioctl_rw);
>  
> `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4).
> I'd like to also this lock to serialize access to table of transactions.
> To keep things simple for now I'd like to make every process to perform
> ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll
> be pushing operation on pfioctl_rw further down to individual ioctl
> operations.
> 
> the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans()
> introduced in this diff are unused. They will be brought alive in
> the 3rd diff.
> 
> thanks and
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> index 7ea22050506..7e4c7d2a2ab 100644
> --- a/sys/net/pf_ioctl.c
> +++ b/sys/net/pf_ioctl.c
> @@ -117,6 +117,11 @@ void    pf_qid_unref(u_int16_t);
>  int pf_states_clr(struct pfioc_state_kill *);
>  int pf_states_get(struct pfioc_states *);
>  
> +struct pf_trans*pf_open_trans(pid_t);
> +struct pf_trans*pf_find_trans(uint64_t);
> +void    pf_free_trans(struct pf_trans *);
> +void    pf_rollback_trans(struct pf_trans *);
> +
>  struct pf_rule  pf_default_rule, pf_default_rule_new;
>  
>  struct {
> @@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap 
> *);
>  void    pf_rtlabel_remove(struct pf_addr_wrap *);
>  void    pf_rtlabel_copyout(struct pf_addr_wrap *);
>  
> +uint64_t trans_ticket = 1;
> +LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
>  
>  void
>  pfattach(int num)
> @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
>  int
>  pfclose(dev_t dev, int flags, int fmt, struct proc *p)
>  {
> +   struct pf_trans *w, *s;
> +   LIST_HEAD(, pf_trans)   tmp_list;
> +
> +   if (minor(dev) >= 1)
> +   return (ENXIO);
> +
> +   if (flags & FWRITE) {
> +   LIST_INIT(&tmp_list);
> +   rw_enter_write(&pfioctl_rw);
> +   LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) {
> +   if (w->pft_pid == p->p_p->ps_pid) {
> +   LIST_REMOVE(w, pft_entry);
> +   LIST_INSERT_HEAD(&tmp_list, w, pft_entry);
> +   }
> +   }
> +   rw_exit_write(&pfioctl_rw);
> +
> +   while ((w = LIST_FIRST(&tmp_list)) != NULL) {
> +   LIST_REMOVE(w, pft_entry);
> +   pf_free_trans(w);
> +   }
> +   }
> +
> return (0);
>  }

Kernel close() routines don't work the same way as user-land close() ones do.
pfclose() is called only once when the last reference to /dev/pf goes away.
There is no way you can keep track of your transactions like that.


>  
> @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
> return (EACCES);
> }
>  
> -   if (flags & FWRITE)
> -   rw_enter_write(&pfioctl_rw);
> -   else
> -   rw_enter_read(&pfioctl_rw);
> +   rw_enter_write(&pfioctl_rw);
>  
> switch (cmd) {
>  
> @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
> break;
> }
>  fail:
> -   if (flags & FWRITE)
> -   rw_exit_write(&pfioctl_rw);
> -   else
> -   rw_exit_read(&pfioctl_rw);
> +   rw_exit_write(&pfioctl_rw);
>  
> return (error);
>  }
> @@ -3244,3 +3268,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, 
> size_t newlen)
>  
> return sysctl_rdstruct(oldp, oldlenp, newp, &pfs, sizeof(pfs));
>  }
> +
> +struct pf_trans *
> +pf_open_trans(pid_t pid)
> +{
> +   static uint64_t ticket = 1;
> +   struct pf_trans *t;
> +
> +   rw_assert_wrlock(&pfioctl_rw);
> +
> +   t = malloc(sizeo

atactl(8): 'readattr' subcommand quits silently.

2023-04-26 Thread Yuichiro NAITO

Hi, Our IIJ team is planning to run OpenBSD on HPE ProLiant DL20 Gen10 server.
Yasuoka-san and I are testing on this host and found that `atactl sd0 readattr`
doesn't show any messages, just quits silently.

'sd0' disk is shown in dmesg as follows.

```
sd0 at scsibus1 targ 0 lun 0:  naa.5002538e01735e7f
sd0: 457862MB, 512 bytes/sector, 937703088 sectors, thin
```

To see what's happening to atactl(8), we added printf code like this.

```
diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c
index 85dfced8c9a..2babdafffc7 100644
--- a/sbin/atactl/atactl.c
+++ b/sbin/atactl/atactl.c
@@ -1660,10 +1660,11 @@ device_attr(int argc, char *argv[])
if (attr_val.revision != attr_thr.revision) {
/*
 * Non standard vendor implementation.
 * Return, since we don't know how to use this.
 */
+   printf("atactl: revision mismatch SMART_READ=0x%02x 
SMART_THREASHOLD=0x%02x\n", attr_val.revision, attr_thr.revision);
return;
}

attr = attr_val.attribute;
thr = attr_thr.threshold;

```

And got following message.

```
$ doas ./atactl sd0 readattr
atactl: revision mismatch SMART_READ=0x01 SMART_THREASHOLD=0x00

```

These 2 revisions of 'attr_val' and 'attr_thr' are different on this disk.
The comment says that it's wrong vendor implementation but I can see
'smartctl' shows the attributes as follows. NetBSD's atactl doesn't
see these 2 revisions are same but checks each checksum is valid.

```
SMART Attributes Data Structure revision number: 1
  Vendor Specific SMART Attributes with Thresholds:
  ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f   200   200   002Pre-fail  Always  
 -   0
5 Reallocated_Sector_Ct   0x0033   100   100   005Pre-fail  Always  
 -   0
9 Power_On_Hours  0x0032   099   099   000Old_age   Always  
 -   3138
  173 Unknown_Attribute   0x0033   099   099   005Pre-fail  Always  
 -   8
  175 Program_Fail_Count_Chip 0x0033   100   100   001Pre-fail  Always  
 -   0
  180 Unused_Rsvd_Blk_Cnt_Tot 0x003b   200   200   097Pre-fail  Always  
 -   0
  194 Temperature_Celsius 0x0022   080   075   000Old_age   Always  
 -   20
  196 Reallocated_Event_Count 0x0033   100   100   005Pre-fail  Always  
 -   0
  202 Unknown_SSD_Attribute   0x0033   100   100   010Pre-fail  Always  
 -   0
```

So, I propose to change the method to verify checksums instead of revision 
checks.

```
diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c
index 85dfced8c9a..1f77460ce3d 100644
--- a/sbin/atactl/atactl.c
+++ b/sbin/atactl/atactl.c
@@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[])
req.datalen = sizeof(attr_thr);
ata_command(&req);

-   if (attr_val.revision != attr_thr.revision) {
-   /*
-* Non standard vendor implementation.
-* Return, since we don't know how to use this.
-*/
-   return;
-   }
+   if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0)
+   errx(1, "Checksum mismatch (attr_val)");
+
+   if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0)
+   errx(1, "Checksum mismatch (attr_thr)");

attr = attr_val.attribute;
thr = attr_thr.threshold;

```

With this patch, we got following result.

```
Attributes table revision: 1
  ID  Attribute name  Threshold   Value   Raw
1 Raw Read Error Rate2200 0x
5 Reallocated Sector Count   5100 0x
9 Power-On Hours Count   0 99 0x0c42
  173 Unknown5 99 0x0008
  175 Unknown1100 0x
  180 Unknown   97200 0x
  194 Temperature0 80 0x0014
  196 Reallocation Event Count   5100 0x
  202 Data Address Mark Errors  10100 0x
```

Is the proposed patch OK?

--
Yuichiro NAITO (naito.yuich...@gmail.com)