Add another Lenovo NVMe device id
Found in my x1 extreme gen 1: nvme0 at pci4 dev 0 function 0 vendor "Lenovo", unknown product 0x0006 rev 0x00: msix, NVMe 1.2 ok? Index: sys/dev/pci/pcidevs === RCS file: /cvs/src/sys/dev/pci/pcidevs,v retrieving revision 1.2032 diff -u -p -u -p -r1.2032 pcidevs --- sys/dev/pci/pcidevs 25 Apr 2023 21:57:29 - 1.2032 +++ sys/dev/pci/pcidevs 27 Apr 2023 06:38:24 - @@ -7061,6 +7061,7 @@ product LEADTEK WINFAST_XP0x6609 Leadte /* Lenovo products */ product LENOVO NVME0x0003 NVMe +product LENOVO NVME_2 0x0006 NVMe /* Level 1 (Intel) */ product LEVEL1 LXT1001 0x0001 LXT1001
Re: atactl(8): 'readattr' subcommand quits silently.
On Thu, Apr 27, 2023 at 08:37:23AM +0900, YASUOKA Masahiko wrote: > > Hello, > > On Wed, 26 Apr 2023 16:32:28 +0900 > Yuichiro NAITO wrote: > > These 2 revisions of 'attr_val' and 'attr_thr' are different on this > > disk. > > The comment says that it's wrong vendor implementation but I can see > > 'smartctl' shows the attributes as follows. NetBSD's atactl doesn't > > see these 2 revisions are same but checks each checksum is valid. > > The diff seems correct. > > ok? ok kevlo@ > > diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c > > index 85dfced8c9a..1f77460ce3d 100644 > > --- a/sbin/atactl/atactl.c > > +++ b/sbin/atactl/atactl.c > > @@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[]) > > req.datalen = sizeof(attr_thr); > > ata_command(&req); > > > > - if (attr_val.revision != attr_thr.revision) { > > - /* > > -* Non standard vendor implementation. > > -* Return, since we don't know how to use this. > > -*/ > > - return; > > - } > > + if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0) > > + errx(1, "Checksum mismatch (attr_val)"); > > + > > + if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0) > > + errx(1, "Checksum mismatch (attr_thr)"); > > > > attr = attr_val.attribute; > > thr = attr_thr.threshold; > > >
Re: pass M_CANFAIL to malloc() which use M_WAITOK but are tested for failure
On Wed, Apr 26, 2023 at 08:19:10PM +0200, Alexander Bluhm wrote: > > On Sat, Apr 15, 2023 at 10:44:15PM +0800, Kevin Lo wrote: > > On Fri, Apr 14, 2023 at 02:01:29PM +0200, Alexander Bluhm wrote: > > > I think you are trying to change the kernel in the wrong direction. > > > It should not fail, but handle the requests. Panic if there is a > > > bug. > > > > > > Why do you think M_CANFAIL is a good thing at this place? > > > > Because M_WAITOK will not return NULl, I think adding M_CANFAIL will > > keep the mallocarray call unchanged. > > The goal is not to handle errors by failing, but to prevent them. > Better keep M_WAITOK, avoid M_CANFAIL, and remove the check. > > Discussion in the hackroom concluded that M_CANFAIL is for input > from userland that can be unlimited large. If userland requests > too much, it can fail. But it is not for normal operation of the > kernel. > > M_NOWAIT should be used when the kernel has a good way to deal with > a temporary failure, e.g. drop the packet. > > M_CANFAIL | M_WAITOK deals with user input that cannot be fullfilled > permanently and should be reported as an error. > > M_WAITOK is for all other cases. If it panics, fix the underlying > bug that requested unrealistic memory size. > > Here use number of queues which should be reasonable low and limited > by the driver code. Keep M_WAITOK and remove the error check. Thank you for discussing this topic at the hackathon, and I also greatly appreciate your detailed explanation. > By the way, M_DEVBUF is also wrong. It should be M_TEMP as it is > freed in the same function and not stored permanently. But I wont > fix that now as malloc(9) type usage is far from consistent. > > ok? ok kevlo@ > bluhm > > Index: dev/pci/if_igc.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v > retrieving revision 1.12 > diff -u -p -r1.12 if_igc.c > --- dev/pci/if_igc.c 9 Mar 2023 00:13:47 - 1.12 > +++ dev/pci/if_igc.c 21 Apr 2023 18:25:35 - > @@ -1209,9 +1209,8 @@ igc_rxrinfo(struct igc_softc *sc, struct > struct rx_ring *rxr; > int error, i, n = 0; > > - if ((ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF, > - M_WAITOK | M_ZERO)) == NULL) > - return ENOMEM; > + ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF, > + M_WAITOK | M_ZERO); > > for (i = 0; i < sc->sc_nqueues; i++) { > rxr = &sc->rx_rings[i]; > Index: dev/pci/if_ix.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v > retrieving revision 1.192 > diff -u -p -r1.192 if_ix.c > --- dev/pci/if_ix.c 6 Feb 2023 20:27:44 - 1.192 > +++ dev/pci/if_ix.c 21 Apr 2023 18:25:35 - > @@ -640,9 +640,8 @@ ixgbe_rxrinfo(struct ix_softc *sc, struc > u_int n = 0; > > if (sc->num_queues > 1) { > - if ((ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF, > - M_WAITOK | M_ZERO)) == NULL) > - return (ENOMEM); > + ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF, > + M_WAITOK | M_ZERO); > } else > ifr = &ifr1; > > Index: dev/pci/if_oce.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_oce.c,v > retrieving revision 1.106 > diff -u -p -r1.106 if_oce.c > --- dev/pci/if_oce.c 11 Mar 2022 18:00:48 - 1.106 > +++ dev/pci/if_oce.c 21 Apr 2023 18:25:35 - > @@ -902,9 +902,8 @@ oce_rxrinfo(struct oce_softc *sc, struct > u_int n = 0; > > if (sc->sc_nrq > 1) { > - if ((ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF, > - M_WAITOK | M_ZERO)) == NULL) > - return (ENOMEM); > + ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF, > + M_WAITOK | M_ZERO); > } else > ifr = &ifr1; > >
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, Apr 26, 2023 at 11:45:08PM +0200, Alexandr Nedvedicky wrote: > Hello, > > On Thu, Apr 27, 2023 at 06:51:52AM +1000, David Gwynne wrote: > > > > is that kind of check in KASSET() something you have on your mind? > > > perhaps I can trade KASSERT() to regular code: > > > > > > if (t->pft_unit != minor(dev)) > > > return (EPERM); > > > > i would pass the dev/minor/unit to pf_find_trans() and compare along > > with the ticket value as a matter of course. returning a different > > errno if the minor is different is unecessary. > > > > something like this? > > struct pf_trans * > pf_find_trans(uint32_t unit, uint64_t ticket) > { > struct pf_trans *t; > > rw_assert_anylock(&pfioctl_rw); > > LIST_FOREACH(t, &pf_ioctl_trans, pft_entry) { > if (t->pft_ticket == ticket) > break; > } t could be NULL here. just do the unit check inside the loop? > > if (t->pft_unit != unit) > return (NULL); > > return (t); > } > > just return NULL on unit mismatch. updated diff is below. ok once you fix the nit above. > > thanks and > regards > sashan > > 8<---8<---8<--8< > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > index 7ea22050506..ebe1b912766 100644 > --- a/sys/net/pf_ioctl.c > +++ b/sys/net/pf_ioctl.c > @@ -117,6 +117,11 @@ void pf_qid_unref(u_int16_t); > int pf_states_clr(struct pfioc_state_kill *); > int pf_states_get(struct pfioc_states *); > > +struct pf_trans *pf_open_trans(uint32_t); > +struct pf_trans *pf_find_trans(uint32_t, uint64_t); > +void pf_free_trans(struct pf_trans *); > +void pf_rollback_trans(struct pf_trans *); > + > struct pf_rulepf_default_rule, pf_default_rule_new; > > struct { > @@ -168,6 +173,8 @@ intpf_rtlabel_add(struct > pf_addr_wrap *); > void pf_rtlabel_remove(struct pf_addr_wrap *); > void pf_rtlabel_copyout(struct pf_addr_wrap *); > > +uint64_t trans_ticket = 1; > +LIST_HEAD(, pf_trans)pf_ioctl_trans = > LIST_HEAD_INITIALIZER(pf_trans); > > void > pfattach(int num) > @@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > int > pfclose(dev_t dev, int flags, int fmt, struct proc *p) > { > + struct pf_trans *w, *s; > + LIST_HEAD(, pf_trans) tmp_list; > + uint32_t unit = minor(dev); > + > + LIST_INIT(&tmp_list); > + rw_enter_write(&pfioctl_rw); > + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { > + if (w->pft_unit == unit) { > + LIST_REMOVE(w, pft_entry); > + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); > + } > + } > + rw_exit_write(&pfioctl_rw); > + > + while ((w = LIST_FIRST(&tmp_list)) != NULL) { > + LIST_REMOVE(w, pft_entry); > + pf_free_trans(w); > + } > + > return (0); > } > > @@ -522,7 +548,7 @@ pf_qid_unref(u_int16_t qid) > } > > int > -pf_begin_rules(u_int32_t *version, const char *anchor) > +pf_begin_rules(u_int32_t *ticket, const char *anchor) > { > struct pf_ruleset *rs; > struct pf_rule *rule; > @@ -533,20 +559,20 @@ pf_begin_rules(u_int32_t *version, const char *anchor) > pf_rm_rule(rs->rules.inactive.ptr, rule); > rs->rules.inactive.rcount--; > } > - *version = ++rs->rules.inactive.version; > + *ticket = ++rs->rules.inactive.ticket; > rs->rules.inactive.open = 1; > return (0); > } > > void > -pf_rollback_rules(u_int32_t version, char *anchor) > +pf_rollback_rules(u_int32_t ticket, char *anchor) > { > struct pf_ruleset *rs; > struct pf_rule *rule; > > rs = pf_find_ruleset(anchor); > if (rs == NULL || !rs->rules.inactive.open || > - rs->rules.inactive.version != version) > + rs->rules.inactive.ticket != ticket) > return; > while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) { > pf_rm_rule(rs->rules.inactive.ptr, rule); > @@ -825,7 +851,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule) > } > > int > -pf_commit_rules(u_int32_t version, char *anchor) > +pf_commit_rules(u_int32_t ticket, char *anchor) > { > struct pf_ruleset *rs; > struct pf_rule *rule; > @@ -834,7 +860,7 @@ pf_commit_rules(u_int32_t version, char *anchor) > > rs = pf_find_ruleset(anchor); > if (rs == NULL || !rs->rules.inactive.open || > - version != rs->rules.inactive.version) > + ticket != rs->rules.inactive.ticket) > return (EBUSY); > > if (rs == &pf_main_ruleset) > @@ -849,7 +875,7 @@ pf_commit_rules(u_int32_t version,
Re: vmd(8): multi-process device emulation (plz test)
Dave Voutila writes: > tech@: > > The below diff splits out virtio device emulation for virtio block and > network devices into separate fork+exec'd & pledge(2)'d subprocesses. > > In order of priority, this diff: > > 1. Isolates common exploit targets (e.g. emulated network devices) from >the rest of the vm process, tightening pledge to "stdio" per device. > > 2. Increases responsiveness of guest i/o since we no longer have a >single thread servicing virtio pci and device emulation. > > I'd like to land this diff this week, so if you use atypical vmd > configurations like: > > 1. multiple vioblk disks per vm > 2. multiple nics per vm > 3. send/receive > 4. qcow2 base images > > This diff has lots of info logging enabled by default to help me > identify what's breaking, so please reply with log message output if > something goes sideways. This diff cleans things up for review and (hopefully) OK: 1. set the temporary info-level logging to debug in vionet/vioblk/vm.c 2. fix a calloc/malloc issue; vio{net,blk} are now individually malloc'd and free'd to remove a double-free 3. only dump memory once, not twice, on send 4. fix vioblk restore So far no issues reported by testers, so I will be asking for OK tomorrow. So I can iterate in the tree and get this monster diff landed. diff refs/heads/master refs/heads/vmd-dev-exec6 commit - e2a6bdce4ccbbc673d16acb8bd87d6b1b8fc4f36 commit + 67c27d1f05449bee19c3ae0ffb0b14171fac15bc blob - d0e7d0c2fb1c11e39caea6896726b25ec315cd22 blob + e9387ec59530d7f441ee92687841634685991344 --- usr.sbin/vmd/Makefile +++ usr.sbin/vmd/Makefile @@ -7,7 +7,7 @@ SRCS+= vm_agentx.c SRCS+= vm.c loadfile_elf.c pci.c virtio.c i8259.c mc146818.c SRCS+= ns8250.c i8253.c dhcp.c packet.c mmio.c SRCS+= parse.y atomicio.c vioscsi.c vioraw.c vioqcow2.c fw_cfg.c -SRCS+= vm_agentx.c +SRCS+= vm_agentx.c vioblk.c vionet.c CFLAGS+= -Wall -I${.CURDIR} CFLAGS+= -Wstrict-prototypes -Wmissing-prototypes blob - 7d7d8d02a263f6e426e359f4d8939fffe1f7d365 blob + a6fdb7623e548b7a22a907302a84376ffce3 --- usr.sbin/vmd/dhcp.c +++ usr.sbin/vmd/dhcp.c @@ -43,8 +43,9 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t extern struct vmd *env; ssize_t -dhcp_request(struct vionet_dev *dev, char *buf, size_t buflen, char **obuf) +dhcp_request(struct virtio_dev *dev, char *buf, size_t buflen, char **obuf) { + struct vionet_dev *vionet = &dev->vionet; unsigned char *respbuf = NULL, *op, *oe, dhcptype = 0; unsigned char *opts = NULL; ssize_t offset, optslen, respbuflen = 0; @@ -65,10 +66,10 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t return (-1); if (memcmp(pc.pc_dmac, broadcast, ETHER_ADDR_LEN) != 0 && - memcmp(pc.pc_dmac, dev->hostmac, ETHER_ADDR_LEN) != 0) + memcmp(pc.pc_dmac, vionet->hostmac, ETHER_ADDR_LEN) != 0) return (-1); - if (memcmp(pc.pc_smac, dev->mac, ETHER_ADDR_LEN) != 0) + if (memcmp(pc.pc_smac, vionet->mac, ETHER_ADDR_LEN) != 0) return (-1); if ((offset = decode_udp_ip_header(buf, buflen, offset, &pc)) < 0) @@ -87,7 +88,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t if (req.op != BOOTREQUEST || req.htype != pc.pc_htype || req.hlen != ETHER_ADDR_LEN || - memcmp(dev->mac, req.chaddr, req.hlen) != 0) + memcmp(vionet->mac, req.chaddr, req.hlen) != 0) return (-1); /* Ignore unsupported requests for now */ @@ -134,7 +135,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t resp.hlen = req.hlen; resp.xid = req.xid; - if (dev->pxeboot) { + if (vionet->pxeboot) { strlcpy(resp.file, "auto_install", sizeof resp.file); vm = vm_getbyvmid(dev->vm_vmid); if (vm && res_hnok(vm->vm_params.vmc_params.vcp_name)) @@ -143,7 +144,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t if ((client_addr.s_addr = vm_priv_addr(&env->vmd_cfg, - dev->vm_vmid, dev->idx, 1)) == 0) + dev->vm_vmid, vionet->idx, 1)) == 0) return (-1); memcpy(&resp.yiaddr, &client_addr, sizeof(client_addr)); @@ -152,7 +153,7 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t ss2sin(&pc.pc_dst)->sin_port = htons(CLIENT_PORT); if ((server_addr.s_addr = vm_priv_addr(&env->vmd_cfg, dev->vm_vmid, - dev->idx, 0)) == 0) + vionet->idx, 0)) == 0) return (-1); memcpy(&resp.siaddr, &server_addr, sizeof(server_addr)); memcpy(&ss2sin(&pc.pc_src)->sin_addr, &server_addr, @@ -167,9 +168,9 @@ dhcp_request(struct vionet_dev *dev, char *buf, size_t if ((respbuf = calloc(1, respbuflen)) == NULL) goto fail; - memcpy(&pc.pc_dmac, dev->mac, sizeof
Re: atactl(8): 'readattr' subcommand quits silently.
Hello, On Wed, 26 Apr 2023 16:32:28 +0900 Yuichiro NAITO wrote: > These 2 revisions of 'attr_val' and 'attr_thr' are different on this > disk. > The comment says that it's wrong vendor implementation but I can see > 'smartctl' shows the attributes as follows. NetBSD's atactl doesn't > see these 2 revisions are same but checks each checksum is valid. The diff seems correct. ok? > diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c > index 85dfced8c9a..1f77460ce3d 100644 > --- a/sbin/atactl/atactl.c > +++ b/sbin/atactl/atactl.c > @@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[]) > req.datalen = sizeof(attr_thr); > ata_command(&req); > > - if (attr_val.revision != attr_thr.revision) { > - /* > - * Non standard vendor implementation. > - * Return, since we don't know how to use this. > - */ > - return; > - } > + if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0) > + errx(1, "Checksum mismatch (attr_val)"); > + > + if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0) > + errx(1, "Checksum mismatch (attr_thr)"); > > attr = attr_val.attribute; > thr = attr_thr.threshold; >
Initialize `rtentry_pool' with IPL_SOFTNET IPL
Index: sys/net/route.c === RCS file: /cvs/src/sys/net/route.c,v retrieving revision 1.418 diff -u -p -r1.418 route.c --- sys/net/route.c 26 Apr 2023 16:09:44 - 1.418 +++ sys/net/route.c 26 Apr 2023 23:00:02 - @@ -176,7 +176,7 @@ route_init(void) { rtcounters = counters_alloc(rts_ncounters); - pool_init(&rtentry_pool, sizeof(struct rtentry), 0, IPL_MPFLOOR, 0, + pool_init(&rtentry_pool, sizeof(struct rtentry), 0, IPL_SOFTNET, 0, "rtentry", NULL); while (rt_hashjitter == 0)
Re: Updated virtio.4 man page with viogpu device
On Wed, Apr 26, 2023 at 07:32:07PM +, Eichert, Diana wrote: > Based on this commit, https://marc.info/?l=openbsd-cvs&m=168201887006175&w=2 > , add viogpu, a VirtIO GPU driver > > Updated virtio.4 man page with viogpu device > > diff -c virtio.4.orig virtio.4 > > fixed, thanks. jmc > *** virtio.4.orig Wed Apr 26 13:07:35 2023 > --- virtio.4Wed Apr 26 13:10:31 2023 > *** > *** 42,47 > --- 42,49 > VirtIO disk > .It Xr viocon 4 > VirtIO console device > + .It Xr viogpu 4 > + VirtIO GPU device > .It Xr viomb 4 > VirtIO memory ballooning driver > .It Xr viornd 4 >
Re: DIOCGETRULE is slow for large rulesets (3/3)
Hello, updated diff below adopts to recent changes to pf_find_trans() in diff 2/3 [1] in case DIOCGETRULE does not find desired transaction the ioctl(2) call to /dev/pf returns ENXIO. thanks and regards sashan [1] https://marc.info/?l=openbsd-tech&m=168254555211083&w=2 8<---8<---8<--8< diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c index 8cbd9d77b4f..3787e97a6b1 100644 --- a/sbin/pfctl/pfctl.c +++ b/sbin/pfctl/pfctl.c @@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, char *anchorname, int depth, int wildcard, long shownr) { struct pfioc_rule pr; - u_int32_t nr, mnr, header = 0; + u_int32_t header = 0; int len = strlen(path), ret = 0; char *npath, *p; @@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, goto error; } - if (shownr < 0) { - mnr = pr.nr; - nr = 0; - } else if (shownr < pr.nr) { - nr = shownr; - mnr = shownr + 1; - } else { - warnx("rule %ld not found", shownr); - ret = -1; - goto error; - } - for (; nr < mnr; ++nr) { - pr.nr = nr; - if (ioctl(dev, DIOCGETRULE, &pr) == -1) { - warn("DIOCGETRULE"); - ret = -1; - goto error; - } + while (ioctl(dev, DIOCGETRULE, &pr) != -1) { + if ((shownr != -1) && (shownr != pr.nr)) + continue; /* anchor is the same for all rules in it */ if (pr.rule.anchor_wildcard == 0) @@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, case PFCTL_SHOW_NOTHING: break; } + errno = 0; + } + + if ((errno != 0) && (errno != ENOENT)) { + warn("DIOCGETRULE"); + ret = -1; + goto error; } /* diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index ebe1b912766..0f78c5b12ac 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint32_t, uint64_t); voidpf_free_trans(struct pf_trans *); voidpf_rollback_trans(struct pf_trans *); +voidpf_init_tgetrule(struct pf_trans *, + struct pf_anchor *, uint32_t, struct pf_rule *); +voidpf_cleanup_tgetrule(struct pf_trans *t); + struct pf_rule pf_default_rule, pf_default_rule_new; struct { @@ -548,7 +552,7 @@ pf_qid_unref(u_int16_t qid) } int -pf_begin_rules(u_int32_t *ticket, const char *anchor) +pf_begin_rules(u_int32_t *version, const char *anchor) { struct pf_ruleset *rs; struct pf_rule *rule; @@ -559,20 +563,20 @@ pf_begin_rules(u_int32_t *ticket, const char *anchor) pf_rm_rule(rs->rules.inactive.ptr, rule); rs->rules.inactive.rcount--; } - *ticket = ++rs->rules.inactive.ticket; + *version = ++rs->rules.inactive.version; rs->rules.inactive.open = 1; return (0); } void -pf_rollback_rules(u_int32_t ticket, char *anchor) +pf_rollback_rules(u_int32_t version, char *anchor) { struct pf_ruleset *rs; struct pf_rule *rule; rs = pf_find_ruleset(anchor); if (rs == NULL || !rs->rules.inactive.open || - rs->rules.inactive.ticket != ticket) + rs->rules.inactive.version != version) return; while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) { pf_rm_rule(rs->rules.inactive.ptr, rule); @@ -851,7 +855,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule) } int -pf_commit_rules(u_int32_t ticket, char *anchor) +pf_commit_rules(u_int32_t version, char *anchor) { struct pf_ruleset *rs; struct pf_rule *rule; @@ -860,7 +864,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor) rs = pf_find_ruleset(anchor); if (rs == NULL || !rs->rules.inactive.open || - ticket != rs->rules.inactive.ticket) + version != rs->rules.inactive.version) return (EBUSY); if (rs == &pf_main_ruleset) @@ -875,7 +879,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor) rs->rules.inactive.ptr = old_rules; rs->rules.inactive.rcount = old_rcount; - rs->rules.active.ticket = rs->rules.inactive.ticket; + rs->rules.active.version = rs->rules.inactive.version; pf_calc_skip_steps(rs->rules.active.ptr); @@ -1214,7 +1218,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) NET_LOCK(); PF_LOCK(); -
Re: DIOCGETRULE is slow for large rulesets (2/3)
Hello, On Thu, Apr 27, 2023 at 06:51:52AM +1000, David Gwynne wrote: > > is that kind of check in KASSET() something you have on your mind? > > perhaps I can trade KASSERT() to regular code: > > > > if (t->pft_unit != minor(dev)) > > return (EPERM); > > i would pass the dev/minor/unit to pf_find_trans() and compare along > with the ticket value as a matter of course. returning a different > errno if the minor is different is unecessary. > something like this? struct pf_trans * pf_find_trans(uint32_t unit, uint64_t ticket) { struct pf_trans *t; rw_assert_anylock(&pfioctl_rw); LIST_FOREACH(t, &pf_ioctl_trans, pft_entry) { if (t->pft_ticket == ticket) break; } if (t->pft_unit != unit) return (NULL); return (t); } just return NULL on unit mismatch. updated diff is below. thanks and regards sashan 8<---8<---8<--8< diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index 7ea22050506..ebe1b912766 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t); int pf_states_clr(struct pfioc_state_kill *); int pf_states_get(struct pfioc_states *); +struct pf_trans*pf_open_trans(uint32_t); +struct pf_trans*pf_find_trans(uint32_t, uint64_t); +voidpf_free_trans(struct pf_trans *); +voidpf_rollback_trans(struct pf_trans *); + struct pf_rule pf_default_rule, pf_default_rule_new; struct { @@ -168,6 +173,8 @@ int pf_rtlabel_add(struct pf_addr_wrap *); voidpf_rtlabel_remove(struct pf_addr_wrap *); voidpf_rtlabel_copyout(struct pf_addr_wrap *); +uint64_t trans_ticket = 1; +LIST_HEAD(, pf_trans) pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans); void pfattach(int num) @@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) int pfclose(dev_t dev, int flags, int fmt, struct proc *p) { + struct pf_trans *w, *s; + LIST_HEAD(, pf_trans) tmp_list; + uint32_t unit = minor(dev); + + LIST_INIT(&tmp_list); + rw_enter_write(&pfioctl_rw); + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { + if (w->pft_unit == unit) { + LIST_REMOVE(w, pft_entry); + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); + } + } + rw_exit_write(&pfioctl_rw); + + while ((w = LIST_FIRST(&tmp_list)) != NULL) { + LIST_REMOVE(w, pft_entry); + pf_free_trans(w); + } + return (0); } @@ -522,7 +548,7 @@ pf_qid_unref(u_int16_t qid) } int -pf_begin_rules(u_int32_t *version, const char *anchor) +pf_begin_rules(u_int32_t *ticket, const char *anchor) { struct pf_ruleset *rs; struct pf_rule *rule; @@ -533,20 +559,20 @@ pf_begin_rules(u_int32_t *version, const char *anchor) pf_rm_rule(rs->rules.inactive.ptr, rule); rs->rules.inactive.rcount--; } - *version = ++rs->rules.inactive.version; + *ticket = ++rs->rules.inactive.ticket; rs->rules.inactive.open = 1; return (0); } void -pf_rollback_rules(u_int32_t version, char *anchor) +pf_rollback_rules(u_int32_t ticket, char *anchor) { struct pf_ruleset *rs; struct pf_rule *rule; rs = pf_find_ruleset(anchor); if (rs == NULL || !rs->rules.inactive.open || - rs->rules.inactive.version != version) + rs->rules.inactive.ticket != ticket) return; while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) { pf_rm_rule(rs->rules.inactive.ptr, rule); @@ -825,7 +851,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule) } int -pf_commit_rules(u_int32_t version, char *anchor) +pf_commit_rules(u_int32_t ticket, char *anchor) { struct pf_ruleset *rs; struct pf_rule *rule; @@ -834,7 +860,7 @@ pf_commit_rules(u_int32_t version, char *anchor) rs = pf_find_ruleset(anchor); if (rs == NULL || !rs->rules.inactive.open || - version != rs->rules.inactive.version) + ticket != rs->rules.inactive.ticket) return (EBUSY); if (rs == &pf_main_ruleset) @@ -849,7 +875,7 @@ pf_commit_rules(u_int32_t version, char *anchor) rs->rules.inactive.ptr = old_rules; rs->rules.inactive.rcount = old_rcount; - rs->rules.active.version = rs->rules.inactive.version; + rs->rules.active.ticket = rs->rules.inactive.ticket; pf_calc_skip_steps(rs->rules.active.ptr); @@ -1142,10 +1168,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr,
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, Apr 26, 2023 at 01:47:31PM +0200, Alexandr Nedvedicky wrote: > Hello, > > > On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote: > > > fail: > > > - if (flags & FWRITE) > > > - rw_exit_write(&pfioctl_rw); > > > - else > > > - rw_exit_read(&pfioctl_rw); > > > + rw_exit_write(&pfioctl_rw); > > > > i dont think having the open mode flags affect whether you take a shared > > or exclusive lock makes sense anyway, so im happy with these bits. > > while updating my diff I've noticed pfclose() needs > same change: dropping 'flags & FWRITE' test. This is > fixed i new diff below. cool. > > > > > > int unit = minor(dev); > > > > if (unit & ((1 << CLONE_SHIFT) - 1)) > > return (ENXIO); > > > > this has some ties into the second issue, which is that we shouldn't > > use the pid as an identifier for state across syscalls like this > > in the kernel. the reason is that userland could be weird or buggy > > (or hostile) and fork in between creating a transaction and ending > > a transaction, but it will still have the fd it from from open(/dev/pf). > > instead, we should be using the whole dev_t or the minor number, > > as long as it includes the bits that the cloning infrastructure > > uses, as the identifier. > > in new diff I'm using a minor(dev) to identify transaction owner. > > > > i would also check that the process^Wminor number that created a > > transaction is the same when looking up a transaction in pf_find_trans. > > the pf_find_trans() is a dead code in diff below as it is > not being used there. Just to make sure I got things right > so I can update '3/3' diff in stack accordingly. Let's assume > snippet below comes from pfioctl() function > > int > pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) { > ... > t = pf_find_trans(ticket); > if (t == NULL) > return (ENXIO); > > KASSERT(t->pft_unit == minor(dev)); userland controls the ticket value that's passed to the ioctl, so if it deliberately passes a ticket another fd/minor owns it can trigger this assert, so i don't think this is a good idea. imagine a program that opens /dev/pf twice, opens a transaction on one fd, and then uses the ticket in the other. > is that kind of check in KASSET() something you have on your mind? > perhaps I can trade KASSERT() to regular code: > > if (t->pft_unit != minor(dev)) > return (EPERM); i would pass the dev/minor/unit to pf_find_trans() and compare along with the ticket value as a matter of course. returning a different errno if the minor is different is unecessary. > > thank you for pointers to bpf.c updated diff is below. > > regards > sashan > > 8<---8<---8<--8< > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > index 1141069dcf6..b9904c545c5 100644 > --- a/sys/net/pf_ioctl.c > +++ b/sys/net/pf_ioctl.c > @@ -117,6 +117,11 @@ void pf_qid_unref(u_int16_t); > int pf_states_clr(struct pfioc_state_kill *); > int pf_states_get(struct pfioc_states *); > > +struct pf_trans *pf_open_trans(uint32_t); > +struct pf_trans *pf_find_trans(uint64_t); > +void pf_free_trans(struct pf_trans *); > +void pf_rollback_trans(struct pf_trans *); > + > struct pf_rulepf_default_rule, pf_default_rule_new; > > struct { > @@ -168,6 +173,8 @@ intpf_rtlabel_add(struct > pf_addr_wrap *); > void pf_rtlabel_remove(struct pf_addr_wrap *); > void pf_rtlabel_copyout(struct pf_addr_wrap *); > > +uint64_t trans_ticket = 1; > +LIST_HEAD(, pf_trans)pf_ioctl_trans = > LIST_HEAD_INITIALIZER(pf_trans); > > void > pfattach(int num) > @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > int > pfclose(dev_t dev, int flags, int fmt, struct proc *p) > { > + struct pf_trans *w, *s; > + LIST_HEAD(, pf_trans) tmp_list; > + uint32_t unit = minor(dev); > + > + if (minor(dev) >= 1) > + return (ENXIO); gerhard already spotted this one. > + > + LIST_INIT(&tmp_list); > + rw_enter_write(&pfioctl_rw); > + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { > + if (w->pft_unit == unit) { > + LIST_REMOVE(w, pft_entry); > + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); > + } > + } > + rw_exit_write(&pfioctl_rw); > + > + while ((w = LIST_FIRST(&tmp_list)) != NULL) { > + LIST_REMOVE(w, pft_entry); > + pf_free_trans(w); > + } > + > return (0); > } > > @@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > flags, struct proc *p) > return (EACCES); > } > > - if (flags & FWRITE) > -
Updated virtio.4 man page with viogpu device
Based on this commit, https://marc.info/?l=openbsd-cvs&m=168201887006175&w=2 , add viogpu, a VirtIO GPU driver Updated virtio.4 man page with viogpu device diff -c virtio.4.orig virtio.4 *** virtio.4.orig Wed Apr 26 13:07:35 2023 --- virtio.4Wed Apr 26 13:10:31 2023 *** *** 42,47 --- 42,49 VirtIO disk .It Xr viocon 4 VirtIO console device + .It Xr viogpu 4 + VirtIO GPU device .It Xr viomb 4 VirtIO memory ballooning driver .It Xr viornd 4
Remove kernel lock from rtfree(9)
Route timers and route labels protected by corresponding mutexes. `ifa' uses references counting for protection. No protection required for `rt' passed to rt_mpls_clear() because only current thread owns it. ok? Index: sys/net/route.c === RCS file: /cvs/src/sys/net/route.c,v retrieving revision 1.418 diff -u -p -r1.418 route.c --- sys/net/route.c 26 Apr 2023 16:09:44 - 1.418 +++ sys/net/route.c 26 Apr 2023 20:11:16 - @@ -497,7 +497,6 @@ rtfree(struct rtentry *rt) KASSERT(!RT_ROOT(rt)); atomic_dec_int(&rttrash); - KERNEL_LOCK(); rt_timer_remove_all(rt); ifafree(rt->rt_ifa); rtlabel_unref(rt->rt_labelid); @@ -506,7 +505,6 @@ rtfree(struct rtentry *rt) #endif free(rt->rt_gateway, M_RTABLE, ROUNDUP(rt->rt_gateway->sa_len)); free(rt_key(rt), M_RTABLE, rt_key(rt)->sa_len); - KERNEL_UNLOCK(); pool_put(&rtentry_pool, rt); }
Re: pass M_CANFAIL to malloc() which use M_WAITOK but are tested for failure
On Sat, Apr 15, 2023 at 10:44:15PM +0800, Kevin Lo wrote: > On Fri, Apr 14, 2023 at 02:01:29PM +0200, Alexander Bluhm wrote: > > I think you are trying to change the kernel in the wrong direction. > > It should not fail, but handle the requests. Panic if there is a > > bug. > > > > Why do you think M_CANFAIL is a good thing at this place? > > Because M_WAITOK will not return NULl, I think adding M_CANFAIL will > keep the mallocarray call unchanged. The goal is not to handle errors by failing, but to prevent them. Better keep M_WAITOK, avoid M_CANFAIL, and remove the check. Discussion in the hackroom concluded that M_CANFAIL is for input from userland that can be unlimited large. If userland requests too much, it can fail. But it is not for normal operation of the kernel. M_NOWAIT should be used when the kernel has a good way to deal with a temporary failure, e.g. drop the packet. M_CANFAIL | M_WAITOK deals with user input that cannot be fullfilled permanently and should be reported as an error. M_WAITOK is for all other cases. If it panics, fix the underlying bug that requested unrealistic memory size. Here use number of queues which should be reasonable low and limited by the driver code. Keep M_WAITOK and remove the error check. By the way, M_DEVBUF is also wrong. It should be M_TEMP as it is freed in the same function and not stored permanently. But I wont fix that now as malloc(9) type usage is far from consistent. ok? bluhm Index: dev/pci/if_igc.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v retrieving revision 1.12 diff -u -p -r1.12 if_igc.c --- dev/pci/if_igc.c9 Mar 2023 00:13:47 - 1.12 +++ dev/pci/if_igc.c21 Apr 2023 18:25:35 - @@ -1209,9 +1209,8 @@ igc_rxrinfo(struct igc_softc *sc, struct struct rx_ring *rxr; int error, i, n = 0; - if ((ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF, - M_WAITOK | M_ZERO)) == NULL) - return ENOMEM; + ifr = mallocarray(sc->sc_nqueues, sizeof(*ifr), M_DEVBUF, + M_WAITOK | M_ZERO); for (i = 0; i < sc->sc_nqueues; i++) { rxr = &sc->rx_rings[i]; Index: dev/pci/if_ix.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v retrieving revision 1.192 diff -u -p -r1.192 if_ix.c --- dev/pci/if_ix.c 6 Feb 2023 20:27:44 - 1.192 +++ dev/pci/if_ix.c 21 Apr 2023 18:25:35 - @@ -640,9 +640,8 @@ ixgbe_rxrinfo(struct ix_softc *sc, struc u_int n = 0; if (sc->num_queues > 1) { - if ((ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF, - M_WAITOK | M_ZERO)) == NULL) - return (ENOMEM); + ifr = mallocarray(sc->num_queues, sizeof(*ifr), M_DEVBUF, + M_WAITOK | M_ZERO); } else ifr = &ifr1; Index: dev/pci/if_oce.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_oce.c,v retrieving revision 1.106 diff -u -p -r1.106 if_oce.c --- dev/pci/if_oce.c11 Mar 2022 18:00:48 - 1.106 +++ dev/pci/if_oce.c21 Apr 2023 18:25:35 - @@ -902,9 +902,8 @@ oce_rxrinfo(struct oce_softc *sc, struct u_int n = 0; if (sc->sc_nrq > 1) { - if ((ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF, - M_WAITOK | M_ZERO)) == NULL) - return (ENOMEM); + ifr = mallocarray(sc->sc_nrq, sizeof(*ifr), M_DEVBUF, + M_WAITOK | M_ZERO); } else ifr = &ifr1;
Re: Introduce `rtlabel_mtx' mutex(9) ...
On Mon, Apr 24, 2023 at 06:31:03PM +0300, Vitaliy Makkoveev wrote: > ... and use it to protect route labels storage. This time it is not > clean, which lock protects it because we holding kernel and net locks in > various combinations while accessing it. I see no reason to put kernel > lock to all the places. Also netlock could not be used, because rtfree() > which calls rtlabel_unref() has unknown netlock state within. > > This new `rtlabel_mtx' mutex(9) protects `rt_labels' list and `label' > entry dereference. Since we don't export 'rt_label' structure, I want to > keep this lock private to net/route.c. For this reason rtlabel_id2name() > now copies label string to externally passed buffer instead of returning > address of `rt_labels' list data. This is the way which rtlabel_id2sa() > already works. > > ok? I did run this though perform test. Full regress still running due to unrelated tree breakage. I don't expect fallout. OK bluhm@ > Index: sys/net/if.c > === > RCS file: /cvs/src/sys/net/if.c,v > retrieving revision 1.692 > diff -u -p -r1.692 if.c > --- sys/net/if.c 22 Apr 2023 04:39:46 - 1.692 > +++ sys/net/if.c 24 Apr 2023 15:11:33 - > @@ -2383,7 +2383,6 @@ ifioctl_get(u_long cmd, caddr_t data) > char ifrtlabelbuf[RTLABEL_LEN]; > int error = 0; > size_t bytesdone; > - const char *label; > > switch(cmd) { > case SIOCGIFCONF: > @@ -2458,9 +2457,8 @@ ifioctl_get(u_long cmd, caddr_t data) > break; > > case SIOCGIFRTLABEL: > - if (ifp->if_rtlabelid && > - (label = rtlabel_id2name(ifp->if_rtlabelid)) != NULL) { > - strlcpy(ifrtlabelbuf, label, RTLABEL_LEN); > + if (ifp->if_rtlabelid && rtlabel_id2name(ifp->if_rtlabelid, > + ifrtlabelbuf, RTLABEL_LEN) != NULL) { > error = copyoutstr(ifrtlabelbuf, ifr->ifr_data, > RTLABEL_LEN, &bytesdone); > } else > Index: sys/net/pf_ioctl.c > === > RCS file: /cvs/src/sys/net/pf_ioctl.c,v > retrieving revision 1.397 > diff -u -p -r1.397 pf_ioctl.c > --- sys/net/pf_ioctl.c6 Jan 2023 17:44:34 - 1.397 > +++ sys/net/pf_ioctl.c24 Apr 2023 15:11:33 - > @@ -491,14 +491,10 @@ pf_rtlabel_remove(struct pf_addr_wrap *a > void > pf_rtlabel_copyout(struct pf_addr_wrap *a) > { > - const char *name; > - > if (a->type == PF_ADDR_RTLABEL && a->v.rtlabel) { > - if ((name = rtlabel_id2name(a->v.rtlabel)) == NULL) > + if (rtlabel_id2name(a->v.rtlabel, a->v.rtlabelname, > + sizeof(a->v.rtlabelname)) == NULL) > strlcpy(a->v.rtlabelname, "?", > - sizeof(a->v.rtlabelname)); > - else > - strlcpy(a->v.rtlabelname, name, > sizeof(a->v.rtlabelname)); > } > } > Index: sys/net/route.c > === > RCS file: /cvs/src/sys/net/route.c,v > retrieving revision 1.416 > diff -u -p -r1.416 route.c > --- sys/net/route.c 28 Jan 2023 10:17:16 - 1.416 > +++ sys/net/route.c 24 Apr 2023 15:11:33 - > @@ -113,6 +113,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -137,6 +138,12 @@ > #include > #endif > > +/* > + * Locks used to protect struct members: > + * I immutable after creation > + * L rtlabel_mtx > + */ > + > #define ROUNDUP(a) (a>0 ? (1 + (((a) - 1) | (sizeof(long) - 1))) : > sizeof(long)) > > /* Give some jitter to hash, to avoid synchronization between routers. */ > @@ -163,13 +170,15 @@ static int rt_copysa(struct sockaddr *, > #define LABELID_MAX 5 > > struct rt_label { > - TAILQ_ENTRY(rt_label) rtl_entry; > - charrtl_name[RTLABEL_LEN]; > - u_int16_t rtl_id; > - int rtl_ref; > + TAILQ_ENTRY(rt_label) rtl_entry; /* [L] */ > + charrtl_name[RTLABEL_LEN]; /* [I] */ > + u_int16_t rtl_id; /* [I] */ > + int rtl_ref;/* [L] */ > }; > > -TAILQ_HEAD(rt_labels, rt_label) rt_labels = > TAILQ_HEAD_INITIALIZER(rt_labels); > +TAILQ_HEAD(rt_labels, rt_label) rt_labels = > +TAILQ_HEAD_INITIALIZER(rt_labels); /* [L] */ > +struct mutex rtlabel_mtx = MUTEX_INITIALIZER(IPL_NET); > > void > route_init(void) > @@ -1603,15 +1612,17 @@ u_int16_t > rtlabel_name2id(char *name) > { > struct rt_label *label, *p; > - u_int16_tnew_id = 1; > + u_int16_tnew_id = 1, id = 0; > > if (!name[0]) > return (0); > > + mtx_enter(&rtlab
Re: arpresolve reduce kernel lock
> On 26 Apr 2023, at 15:30, Hrvoje Popovski wrote: > > On 26.4.2023. 12:15, Alexander Bluhm wrote: >> On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote: >>> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > Hi, > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > lock is only needed for changing the route rt_flags. > > Of course there is a race between checking and setting rt_flags. > But the other checks of the RTF_REJECT flags were without kernel > lock before. This does not cause trouble, the worst thing that may > happen is to wait another exprire time for ARP retry. My diff does > not make it worse, reading rt_flags and rt_expire is done without > lock anyway. > > The kernel lock is needed to change rt_flags. Testing without > KERNEL_LOCK() caused crashes. > Hi, I'm interesting is the system stable with the diff below? If so, we could avoid kernel lock in the arpresolve(). >>> I could not crash it. >> I was too fast. Just after writing this mail I restarted the test. > > Hi, > > my boxes are still ok with mvs@ diff even if I'm running arp -ad in loop. > Thanks for testing. It seems rtfree(rt->rt_parent) was called twice, so I’m confusing a little with this unlocked RTF_REJECT check.
Re: acpithinkpad: do not report fans running at 65535 rpm
On Mon, Apr 24, 2023 at 02:07:11PM +, Miod Vallat wrote: > After suspending a machine with acpithinkpad(4) and resuming, the fan > senors report a value of 65535 (i.e. 0x) for a few seconds, and > then start reporting correct values. I don't see these bogus values on an intel t14 gen3 when I resume into 'systat -s.1 sens', but your diff makes sense and does not break here. OK kn > > The following diff marks the sensor as invalid when such a value is > read. > > Index: acpithinkpad.c > === > RCS file: /OpenBSD/src/sys/dev/acpi/acpithinkpad.c,v > retrieving revision 1.71 > diff -u -p -r1.71 acpithinkpad.c > --- acpithinkpad.c9 Apr 2023 17:50:02 - 1.71 > +++ acpithinkpad.c24 Apr 2023 14:05:18 - > @@ -287,7 +287,12 @@ thinkpad_sensor_refresh(void *arg) > /* Read fan RPM */ > acpiec_read(sc->sc_ec, THINKPAD_ECOFFSET_FANLO, 1, &lo); > acpiec_read(sc->sc_ec, THINKPAD_ECOFFSET_FANHI, 1, &hi); > - sc->sc_sens[THINKPAD_SENSOR_FANRPM].value = ((hi << 8L) + lo); > + if (hi == 0xff && lo == 0xff) { > + sc->sc_sens[THINKPAD_SENSOR_FANRPM].flags = SENSOR_FINVALID; > + } else { > + sc->sc_sens[THINKPAD_SENSOR_FANRPM].value = ((hi << 8L) + lo); > + sc->sc_sens[THINKPAD_SENSOR_FANRPM].flags = 0; > + } > } > > void >
Re: arpresolve reduce kernel lock
On 26.4.2023. 12:15, Alexander Bluhm wrote: > On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote: >> On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: >>> On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: Hi, Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel lock is only needed for changing the route rt_flags. Of course there is a race between checking and setting rt_flags. But the other checks of the RTF_REJECT flags were without kernel lock before. This does not cause trouble, the worst thing that may happen is to wait another exprire time for ARP retry. My diff does not make it worse, reading rt_flags and rt_expire is done without lock anyway. The kernel lock is needed to change rt_flags. Testing without KERNEL_LOCK() caused crashes. >>> Hi, >>> >>> I'm interesting is the system stable with the diff below? If so, we >>> could avoid kernel lock in the arpresolve(). >> I could not crash it. > I was too fast. Just after writing this mail I restarted the test. Hi, my boxes are still ok with mvs@ diff even if I'm running arp -ad in loop.
Re: DIOCGETRULE is slow for large rulesets (3/3)
Hello, On Wed, Apr 26, 2023 at 01:25:45AM +0200, Alexandr Nedvedicky wrote: > Hello, > > this is 3rd of three diffs to improve DIOCGETRULE ioctl operation. > the summary of changes done here is as follows: > - add new members to pf_trans structure so it can > hold data for DIOCGETRULE operation > > - DIOCGETRULES opens transaction and returns ticket/transaction id > to pfctl. pfctl uses ticket in DIOCGETRULE to identify transaction > when it retrieves the rule > > - change introduces new reference counter to pf_anchor which is used > by pf_trans to keep reference to pf_anchor object. > > - DIOCGETRULES opens transaction and stores a reference to anchor > with current ruleset version and pointer to the first rule > > - DIOCGETRULE looks up transaction, verifies version number is > same so it can safely access next rule. > > - pfctl when handling 'pfctl -sr -R n' must iterate to > n-th rule one-by-one. > updating diff to accommodate changes in diff 2/3 [1] in the stack of changes. thanks and regards sashan [1] https://marc.info/?l=openbsd-tech&m=168251129621089&w=2 8<---8<---8<--8< diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c index 8cbd9d77b4f..3787e97a6b1 100644 --- a/sbin/pfctl/pfctl.c +++ b/sbin/pfctl/pfctl.c @@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, char *anchorname, int depth, int wildcard, long shownr) { struct pfioc_rule pr; - u_int32_t nr, mnr, header = 0; + u_int32_t header = 0; int len = strlen(path), ret = 0; char *npath, *p; @@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, goto error; } - if (shownr < 0) { - mnr = pr.nr; - nr = 0; - } else if (shownr < pr.nr) { - nr = shownr; - mnr = shownr + 1; - } else { - warnx("rule %ld not found", shownr); - ret = -1; - goto error; - } - for (; nr < mnr; ++nr) { - pr.nr = nr; - if (ioctl(dev, DIOCGETRULE, &pr) == -1) { - warn("DIOCGETRULE"); - ret = -1; - goto error; - } + while (ioctl(dev, DIOCGETRULE, &pr) != -1) { + if ((shownr != -1) && (shownr != pr.nr)) + continue; /* anchor is the same for all rules in it */ if (pr.rule.anchor_wildcard == 0) @@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, case PFCTL_SHOW_NOTHING: break; } + errno = 0; + } + + if ((errno != 0) && (errno != ENOENT)) { + warn("DIOCGETRULE"); + ret = -1; + goto error; } /* diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index 8ce0af53a89..d294a8a3b3c 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint64_t); voidpf_free_trans(struct pf_trans *); voidpf_rollback_trans(struct pf_trans *); +voidpf_init_tgetrule(struct pf_trans *, + struct pf_anchor *, uint32_t, struct pf_rule *); +voidpf_cleanup_tgetrule(struct pf_trans *t); + struct pf_rule pf_default_rule, pf_default_rule_new; struct { @@ -1470,7 +1474,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) case DIOCGETRULES: { struct pfioc_rule *pr = (struct pfioc_rule *)addr; struct pf_ruleset *ruleset; - struct pf_rule *tail; + struct pf_rule *rule; + struct pf_trans *t; + u_int32_truleset_version; NET_LOCK(); PF_LOCK(); @@ -1482,14 +1488,21 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) NET_UNLOCK(); goto fail; } - tail = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue); - if (tail) - pr->nr = tail->nr + 1; + rule = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue); + if (rule) + pr->nr = rule->nr + 1; else pr->nr = 0; - pr->ticket = ruleset->rules.active.version; + ruleset_version = ruleset->rules.active.version; + pf_anchor_take(ruleset->anchor); + rule = TAILQ_FIRST(ruleset->rules.active.ptr); PF_UNLOCK(); NET_UNLO
Re: DIOCGETRULE is slow for large rulesets (2/3)
Hello, On Wed, Apr 26, 2023 at 11:51:26AM +, Gerhard Roth wrote: > On Wed, 2023-04-26 at 13:47 +0200, Alexandr Nedvedicky wrote: > > @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > > ??int > > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p) > > ??{ > > +??struct pf_trans *w, *s; > > +??LIST_HEAD(, pf_trans)??tmp_list; > > +??uint32_t unit = minor(dev); > > + > > +??if (minor(dev) >= 1) > > +??return (ENXIO); > > If you want to use the minor dev-id to release the transaction, > the above two lines should go away. > > yes, you are right. thanks for spotting that. updated diff is below. regards sashan 8<---8<---8<--8< diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index 1141069dcf6..74af2b74ecb 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t); int pf_states_clr(struct pfioc_state_kill *); int pf_states_get(struct pfioc_states *); +struct pf_trans*pf_open_trans(uint32_t); +struct pf_trans*pf_find_trans(uint64_t); +voidpf_free_trans(struct pf_trans *); +voidpf_rollback_trans(struct pf_trans *); + struct pf_rule pf_default_rule, pf_default_rule_new; struct { @@ -168,6 +173,8 @@ int pf_rtlabel_add(struct pf_addr_wrap *); voidpf_rtlabel_remove(struct pf_addr_wrap *); voidpf_rtlabel_copyout(struct pf_addr_wrap *); +uint64_t trans_ticket = 1; +LIST_HEAD(, pf_trans) pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans); void pfattach(int num) @@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) int pfclose(dev_t dev, int flags, int fmt, struct proc *p) { + struct pf_trans *w, *s; + LIST_HEAD(, pf_trans) tmp_list; + uint32_t unit = minor(dev); + + LIST_INIT(&tmp_list); + rw_enter_write(&pfioctl_rw); + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { + if (w->pft_unit == unit) { + LIST_REMOVE(w, pft_entry); + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); + } + } + rw_exit_write(&pfioctl_rw); + + while ((w = LIST_FIRST(&tmp_list)) != NULL) { + LIST_REMOVE(w, pft_entry); + pf_free_trans(w); + } + return (0); } @@ -1142,10 +1168,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) return (EACCES); } - if (flags & FWRITE) - rw_enter_write(&pfioctl_rw); - else - rw_enter_read(&pfioctl_rw); + rw_enter_write(&pfioctl_rw); switch (cmd) { @@ -3022,10 +3045,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) break; } fail: - if (flags & FWRITE) - rw_exit_write(&pfioctl_rw); - else - rw_exit_read(&pfioctl_rw); + rw_exit_write(&pfioctl_rw); return (error); } @@ -3244,3 +3264,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, size_t newlen) return sysctl_rdstruct(oldp, oldlenp, newp, &pfs, sizeof(pfs)); } + +struct pf_trans * +pf_open_trans(uint32_t unit) +{ + static uint64_t ticket = 1; + struct pf_trans *t; + + rw_assert_wrlock(&pfioctl_rw); + + t = malloc(sizeof(*t), M_TEMP, M_WAITOK); + memset(t, 0, sizeof(struct pf_trans)); + t->pft_unit = unit; + t->pft_ticket = ticket++; + + LIST_INSERT_HEAD(&pf_ioctl_trans, t, pft_entry); + + return (t); +} + +struct pf_trans * +pf_find_trans(uint64_t ticket) +{ + struct pf_trans *t; + + rw_assert_anylock(&pfioctl_rw); + + LIST_FOREACH(t, &pf_ioctl_trans, pft_entry) { + if (t->pft_ticket == ticket) + break; + } + + return (t); +} + +void +pf_free_trans(struct pf_trans *t) +{ + free(t, M_TEMP, sizeof(*t)); +} + +void +pf_rollback_trans(struct pf_trans *t) +{ + if (t != NULL) { + rw_assert_wrlock(&pfioctl_rw); + LIST_REMOVE(t, pft_entry); + pf_free_trans(t); + } +} diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h index 38fff6473aa..5af2027733a 100644 --- a/sys/net/pfvar_priv.h +++ b/sys/net/pfvar_priv.h @@ -322,6 +322,17 @@ enum { extern struct cpumem *pf_anchor_stack; +enum pf_trans_type { + PF_TRANS_NONE, + PF_TRANS_MAX +}; + +struct pf_trans { + LIST_ENTRY(pf_trans)pft_entry; + uint32_tpft_unit; /* process id */ + uint64_tpft_ticket; + enum pf_trans_type pft_type; +}; extern struct task pf_purge_task; e
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, 2023-04-26 at 13:47 +0200, Alexandr Nedvedicky wrote: > Hello, > > > On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote: > > > fail: > > > - if (flags & FWRITE) > > > - rw_exit_write(&pfioctl_rw); > > > - else > > > - rw_exit_read(&pfioctl_rw); > > > + rw_exit_write(&pfioctl_rw); > > > > i dont think having the open mode flags affect whether you take a shared > > or exclusive lock makes sense anyway, so im happy with these bits. > > while updating my diff I've noticed pfclose() needs > same change: dropping 'flags & FWRITE' test. This is > fixed i new diff below. > > > > > > int unit = minor(dev); > > > > if (unit & ((1 << CLONE_SHIFT) - 1)) > > return (ENXIO); > > > > this has some ties into the second issue, which is that we shouldn't > > use the pid as an identifier for state across syscalls like this > > in the kernel. the reason is that userland could be weird or buggy > > (or hostile) and fork in between creating a transaction and ending > > a transaction, but it will still have the fd it from from open(/dev/pf). > > instead, we should be using the whole dev_t or the minor number, > > as long as it includes the bits that the cloning infrastructure > > uses, as the identifier. > > in new diff I'm using a minor(dev) to identify transaction owner. > > > > i would also check that the process^Wminor number that created a > > transaction is the same when looking up a transaction in pf_find_trans. > > the pf_find_trans() is a dead code in diff below as it is > not being used there. Just to make sure I got things right > so I can update '3/3' diff in stack accordingly. Let's assume > snippet below comes from pfioctl() function > > int > > > > pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) { > ... > t = pf_find_trans(ticket); > if (t == NULL) > return (ENXIO); > > KASSERT(t->pft_unit == minor(dev)); > > is that kind of check in KASSET() something you have on your mind? > perhaps I can trade KASSERT() to regular code: > > if (t->pft_unit != minor(dev)) > return (EPERM); > > > thank you for pointers to bpf.c updated diff is below. > > regards > sashan > > 8<---8<---8<--8< > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > index 1141069dcf6..b9904c545c5 100644 > --- a/sys/net/pf_ioctl.c > +++ b/sys/net/pf_ioctl.c > @@ -117,6 +117,11 @@ void pf_qid_unref(u_int16_t); > int pf_states_clr(struct pfioc_state_kill *); > int pf_states_get(struct pfioc_states *); > > +struct pf_trans*pf_open_trans(uint32_t); > +struct pf_trans*pf_find_trans(uint64_t); > +void pf_free_trans(struct pf_trans *); > +void pf_rollback_trans(struct pf_trans *); > + > struct pf_rule pf_default_rule, pf_default_rule_new; > > struct { > @@ -168,6 +173,8 @@ int pf_rtlabel_add(struct pf_addr_wrap > *); > void pf_rtlabel_remove(struct pf_addr_wrap *); > void pf_rtlabel_copyout(struct pf_addr_wrap *); > > +uint64_t trans_ticket = 1; > +LIST_HEAD(, pf_trans) pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans); > > void > pfattach(int num) > @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > int > pfclose(dev_t dev, int flags, int fmt, struct proc *p) > { > + struct pf_trans *w, *s; > + LIST_HEAD(, pf_trans) tmp_list; > + uint32_t unit = minor(dev); > + > + if (minor(dev) >= 1) > + return (ENXIO); If you want to use the minor dev-id to release the transaction, the above two lines should go away. > + > + LIST_INIT(&tmp_list); > + rw_enter_write(&pfioctl_rw); > + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { > + if (w->pft_unit == unit) { > + LIST_REMOVE(w, pft_entry); > + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); > + } > + } > + rw_exit_write(&pfioctl_rw); > + > + while ((w = LIST_FIRST(&tmp_list)) != NULL) { > + LIST_REMOVE(w, pft_entry); > + pf_free_trans(w); > + } > + > return (0); > } > > @@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > flags, struct proc *p) > return (EACCES); > } > > - if (flags & FWRITE) > - rw_enter_write(&pfioctl_rw); > - else > - rw_enter_read(&pfioctl_rw); >
Re: DIOCGETRULE is slow for large rulesets (2/3)
Hello, On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote: > > fail: > > - if (flags & FWRITE) > > - rw_exit_write(&pfioctl_rw); > > - else > > - rw_exit_read(&pfioctl_rw); > > + rw_exit_write(&pfioctl_rw); > > i dont think having the open mode flags affect whether you take a shared > or exclusive lock makes sense anyway, so im happy with these bits. while updating my diff I've noticed pfclose() needs same change: dropping 'flags & FWRITE' test. This is fixed i new diff below. > > int unit = minor(dev); > > if (unit & ((1 << CLONE_SHIFT) - 1)) > return (ENXIO); > > this has some ties into the second issue, which is that we shouldn't > use the pid as an identifier for state across syscalls like this > in the kernel. the reason is that userland could be weird or buggy > (or hostile) and fork in between creating a transaction and ending > a transaction, but it will still have the fd it from from open(/dev/pf). > instead, we should be using the whole dev_t or the minor number, > as long as it includes the bits that the cloning infrastructure > uses, as the identifier. in new diff I'm using a minor(dev) to identify transaction owner. > > i would also check that the process^Wminor number that created a > transaction is the same when looking up a transaction in pf_find_trans. the pf_find_trans() is a dead code in diff below as it is not being used there. Just to make sure I got things right so I can update '3/3' diff in stack accordingly. Let's assume snippet below comes from pfioctl() function int pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) { ... t = pf_find_trans(ticket); if (t == NULL) return (ENXIO); KASSERT(t->pft_unit == minor(dev)); is that kind of check in KASSET() something you have on your mind? perhaps I can trade KASSERT() to regular code: if (t->pft_unit != minor(dev)) return (EPERM); thank you for pointers to bpf.c updated diff is below. regards sashan 8<---8<---8<--8< diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index 1141069dcf6..b9904c545c5 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t); int pf_states_clr(struct pfioc_state_kill *); int pf_states_get(struct pfioc_states *); +struct pf_trans*pf_open_trans(uint32_t); +struct pf_trans*pf_find_trans(uint64_t); +voidpf_free_trans(struct pf_trans *); +voidpf_rollback_trans(struct pf_trans *); + struct pf_rule pf_default_rule, pf_default_rule_new; struct { @@ -168,6 +173,8 @@ int pf_rtlabel_add(struct pf_addr_wrap *); voidpf_rtlabel_remove(struct pf_addr_wrap *); voidpf_rtlabel_copyout(struct pf_addr_wrap *); +uint64_t trans_ticket = 1; +LIST_HEAD(, pf_trans) pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans); void pfattach(int num) @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) int pfclose(dev_t dev, int flags, int fmt, struct proc *p) { + struct pf_trans *w, *s; + LIST_HEAD(, pf_trans) tmp_list; + uint32_t unit = minor(dev); + + if (minor(dev) >= 1) + return (ENXIO); + + LIST_INIT(&tmp_list); + rw_enter_write(&pfioctl_rw); + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { + if (w->pft_unit == unit) { + LIST_REMOVE(w, pft_entry); + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); + } + } + rw_exit_write(&pfioctl_rw); + + while ((w = LIST_FIRST(&tmp_list)) != NULL) { + LIST_REMOVE(w, pft_entry); + pf_free_trans(w); + } + return (0); } @@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) return (EACCES); } - if (flags & FWRITE) - rw_enter_write(&pfioctl_rw); - else - rw_enter_read(&pfioctl_rw); + rw_enter_write(&pfioctl_rw); switch (cmd) { @@ -3022,10 +3048,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) break; } fail: - if (flags & FWRITE) - rw_exit_write(&pfioctl_rw); - else - rw_exit_read(&pfioctl_rw); + rw_exit_write(&pfioctl_rw); return (error); } @@ -3244,3 +3267,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *
Re: arpresolve reduce kernel lock
On Wed, Apr 26, 2023 at 11:17:32AM +0200, Alexander Bluhm wrote: > On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: > > On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > > > Hi, > > > > > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > > > lock is only needed for changing the route rt_flags. > > > > > > Of course there is a race between checking and setting rt_flags. > > > But the other checks of the RTF_REJECT flags were without kernel > > > lock before. This does not cause trouble, the worst thing that may > > > happen is to wait another exprire time for ARP retry. My diff does > > > not make it worse, reading rt_flags and rt_expire is done without > > > lock anyway. > > > > > > The kernel lock is needed to change rt_flags. Testing without > > > KERNEL_LOCK() caused crashes. > > > > > > > Hi, > > > > I'm interesting is the system stable with the diff below? If so, we > > could avoid kernel lock in the arpresolve(). > > I could not crash it. I was too fast. Just after writing this mail I restarted the test. [0] 0:arp- 1:ksh*"ot31.obsd-lab.genua.d" 12:00 26-Apr-23ESC[mESC(BESC[23;18Hpanic: pool_do_get: art_node free list modified: page 0xfd8747128000; item addr 0xfd8747128410; offset 0x0=0x182f4660f2a7188a != 0x182f4660f2a71889 Stopped at db_enter+0x14: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 45805 80626 0 0x14000 0x2003 reaper 353816 99629 0 0x14000 0x2001 softnet 487701 10647 0 0x14000 0x2002 softnet 152789 43620 0 0x14000 0x2007 softnet *356742 68683 0 0x14000 0x2005 softnet db_enter() at db_enter+0x14 panic(8213f5d0) at panic+0xc3 pool_do_get(824ae060,a,8000247b71b4) at pool_do_get+0x321 pool_get(824ae060,a) at pool_get+0x9a art_get(827ceac0,20) at art_get+0x30 rtable_insert(0,827ceac0,0,8000247b72f0,3,fd8745e4a948) at rtab le_insert+0x1a2 rtrequest(b,8000247b73f8,3,8000247b7498,0) at rtrequest+0x613 rt_clone(8000247b7500,8000247b7558,0) at rt_clone+0x77 rtalloc_mpath(8000247b7558,fd800369aad8,0) at rtalloc_mpath+0x50 in_ouraddr(fd80a94fcd00,8077e048,8000247b75d8) at in_ouraddr+0x 88 ip_input_if(8000247b7678,8000247b7684,4,0,8077e048) at ip_input ipv4_input(8077e048,fd80a94fcd00) at ipv4_input+0x3d ether_input(8077e048,fd80a94fcd00) at ether_input+0x3b5 if_input_process(8077e048,8000247b7768) at if_input_process+0x6f end trace frame: 0x8000247b77b0, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{5}> show panic *cpu5: pool_do_get: art_node free list modified: page 0xfd8747128000; item a ddr 0xfd8747128410; offset 0x0=0x182f4660f2a7188a != 0x182f4660f2a71889 ddb{5}> show register rdi0 rsi 0x14 rbp 0x8000247b7060 rbx0 rdx 0xc000 rcx0x286 rax 0x9c r8 0x101010101010101 r9 0 r10 0xcdd678d0954ec026 r11 0x92611e3f4c85263e r12 0x80002252e990 r130 r140 r15 0x8213f5d0cy_pio_rec+0x1ea86 rip 0x81b0f124db_enter+0x14 cs 0x8 rflags 0x282 rsp 0x8000247b7060 ss 0x10 db_enter+0x14: popq%rbp ddb{5}> trace db_enter() at db_enter+0x14 panic(8213f5d0) at panic+0xc3 pool_do_get(824ae060,a,8000247b71b4) at pool_do_get+0x321 pool_get(824ae060,a) at pool_get+0x9a art_get(827ceac0,20) at art_get+0x30 rtable_insert(0,827ceac0,0,8000247b72f0,3,fd8745e4a948) at rtab le_insert+0x1a2 rtrequest(b,8000247b73f8,3,8000247b7498,0) at rtrequest+0x613 rt_clone(8000247b7500,8000247b7558,0) at rt_clone+0x77 rtalloc_mpath(8000247b7558,fd800369aad8,0) at rtalloc_mpath+0x50 in_ouraddr(fd80a94fcd00,8077e048,8000247b75d8) at in_ouraddr+0x 88 ip_input_if(8000247b7678,8000247b7684,4,0,8077e048) at ip_input _if+0x1f0 ipv4_input(8077e048,fd80a94fcd00) at ipv4_input+0x3d ether_input(8077e048,fd80a94fcd00) at ether_input+0x3b5 if_input_process(8077e048,8000247b7768) at if_input_process+0x6f ifiq_process(80782400) at ifiq_process+0x75 taskq_thread(80036000) at taskq_thread+0x100 end trace frame: 0x0, count: -16 ddb{5}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 3208 21
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, Apr 26, 2023 at 09:49:18AM +, Gerhard Roth wrote: > On Wed, 2023-04-26 at 19:42 +1000, David Gwynne wrote: > > On Wed, Apr 26, 2023 at 07:48:18AM +, Gerhard Roth wrote: > > > On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote: > > > > Hello, > > > > > > > > This is the second diff. It introduces a transaction (pf_trans). > > > > It's more or less diff with dead code. > > > > > > > > It's still worth to note those two chunks in this diff: > > > > > > > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > > > > flags, struct proc *p) > > > > return (EACCES); > > > > } > > > > ?? > > > > -??if (flags & FWRITE) > > > > -??rw_enter_write(&pfioctl_rw); > > > > -??else > > > > -??rw_enter_read(&pfioctl_rw); > > > > +??rw_enter_write(&pfioctl_rw); > > > > ?? > > > > switch (cmd) { > > > > ?? > > > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > > > > flags, struct proc *p) > > > > break; > > > > } > > > > ??fail: > > > > -??if (flags & FWRITE) > > > > -??rw_exit_write(&pfioctl_rw); > > > > -??else > > > > -??rw_exit_read(&pfioctl_rw); > > > > +??rw_exit_write(&pfioctl_rw); > > > > ?? > > > > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4). > > > > I'd like to also this lock to serialize access to table of transactions. > > > > To keep things simple for now I'd like to make every process to perform > > > > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll > > > > be pushing operation on pfioctl_rw further down to individual ioctl > > > > operations. > > > > > > > > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans() > > > > introduced in this diff are unused. They will be brought alive in > > > > the 3rd diff. > > > > > > > > thanks and > > > > regards > > > > sashan > > > > > > > > 8<---8<---8<--8< > > > > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > > > > index 7ea22050506..7e4c7d2a2ab 100644 > > > > --- a/sys/net/pf_ioctl.c > > > > +++ b/sys/net/pf_ioctl.c > > > > @@ -117,6 +117,11 @@ void?? > > > > pf_qid_unref(u_int16_t); > > > > ??int pf_states_clr(struct > > > > pfioc_state_kill *); > > > > ??int pf_states_get(struct > > > > pfioc_states *); > > > > ?? > > > > +struct pf_trans*pf_open_trans(pid_t); > > > > +struct > > > > pf_trans*pf_find_trans(uint64_t); > > > > +void?? pf_free_trans(struct > > > > pf_trans *); > > > > +void?? pf_rollback_trans(struct > > > > pf_trans *); > > > > + > > > > ??struct pf_rule?? pf_default_rule, pf_default_rule_new; > > > > ?? > > > > ??struct { > > > > @@ -168,6 +173,8 @@ int?? > > > > pf_rtlabel_add(struct pf_addr_wrap *); > > > > ??void?? pf_rtlabel_remove(struct > > > > pf_addr_wrap *); > > > > ??void?? pf_rtlabel_copyout(struct > > > > pf_addr_wrap *); > > > > ?? > > > > +uint64_t trans_ticket = 1; > > > > +LIST_HEAD(, pf_trans)pf_ioctl_trans = > > > > LIST_HEAD_INITIALIZER(pf_trans); > > > > ?? > > > > ??void > > > > ??pfattach(int num) > > > > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc > > > > *p) > > > > ??int > > > > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p) > > > > ??{ > > > > +??struct pf_trans *w, *s; > > > > +??LIST_HEAD(, pf_trans)??tmp_list; > > > > + > > > > +??if (minor(dev) >= 1) > > > > +??return (ENXIO); > > > > + > > > > +??if (flags & FWRITE) { > > > > +??LIST_INIT(&tmp_list); > > > > +??rw_enter_write(&pfioctl_rw); > > > > +??LIST_FOREACH_SAFE(w, &pf_ioctl_trans, > > > > pft_entry, s) { > > > > +??if (w->pft_pid == > > > > p->p_p->ps_pid) { > > > > +??LIST_REMOVE(w, > > > > pft_entry); > > > > +??LIST_INSERT_HEAD(&tmp_list, > > > > w, pft_entry); > > > > +??} > > > > +??} > > > > +??rw_exit_write(&pfioctl_rw); > > > > + > > > > +??while ((w = LIST_FIR
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, 2023-04-26 at 19:42 +1000, David Gwynne wrote: > On Wed, Apr 26, 2023 at 07:48:18AM +, Gerhard Roth wrote: > > On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote: > > > Hello, > > > > > > This is the second diff. It introduces a transaction (pf_trans). > > > It's more or less diff with dead code. > > > > > > It's still worth to note those two chunks in this diff: > > > > > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > > > flags, struct proc *p) > > > return (EACCES); > > > } > > > ?? > > > -??if (flags & FWRITE) > > > -??rw_enter_write(&pfioctl_rw); > > > -??else > > > -??rw_enter_read(&pfioctl_rw); > > > +??rw_enter_write(&pfioctl_rw); > > > ?? > > > switch (cmd) { > > > ?? > > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > > > flags, struct proc *p) > > > break; > > > } > > > ??fail: > > > -??if (flags & FWRITE) > > > -??rw_exit_write(&pfioctl_rw); > > > -??else > > > -??rw_exit_read(&pfioctl_rw); > > > +??rw_exit_write(&pfioctl_rw); > > > ?? > > > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4). > > > I'd like to also this lock to serialize access to table of transactions. > > > To keep things simple for now I'd like to make every process to perform > > > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll > > > be pushing operation on pfioctl_rw further down to individual ioctl > > > operations. > > > > > > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans() > > > introduced in this diff are unused. They will be brought alive in > > > the 3rd diff. > > > > > > thanks and > > > regards > > > sashan > > > > > > 8<---8<---8<--8< > > > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > > > index 7ea22050506..7e4c7d2a2ab 100644 > > > --- a/sys/net/pf_ioctl.c > > > +++ b/sys/net/pf_ioctl.c > > > @@ -117,6 +117,11 @@ void?? > > > pf_qid_unref(u_int16_t); > > > ??int pf_states_clr(struct > > > pfioc_state_kill *); > > > ??int pf_states_get(struct > > > pfioc_states *); > > > ?? > > > +struct pf_trans*pf_open_trans(pid_t); > > > +struct pf_trans*pf_find_trans(uint64_t); > > > +void?? pf_free_trans(struct pf_trans > > > *); > > > +void?? pf_rollback_trans(struct > > > pf_trans *); > > > + > > > ??struct pf_rule?? pf_default_rule, pf_default_rule_new; > > > ?? > > > ??struct { > > > @@ -168,6 +173,8 @@ int?? > > > pf_rtlabel_add(struct pf_addr_wrap *); > > > ??void?? pf_rtlabel_remove(struct > > > pf_addr_wrap *); > > > ??void?? pf_rtlabel_copyout(struct > > > pf_addr_wrap *); > > > ?? > > > +uint64_t trans_ticket = 1; > > > +LIST_HEAD(, pf_trans)pf_ioctl_trans = > > > LIST_HEAD_INITIALIZER(pf_trans); > > > ?? > > > ??void > > > ??pfattach(int num) > > > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > > > ??int > > > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p) > > > ??{ > > > +??struct pf_trans *w, *s; > > > +??LIST_HEAD(, pf_trans)??tmp_list; > > > + > > > +??if (minor(dev) >= 1) > > > +??return (ENXIO); > > > + > > > +??if (flags & FWRITE) { > > > +??LIST_INIT(&tmp_list); > > > +??rw_enter_write(&pfioctl_rw); > > > +??LIST_FOREACH_SAFE(w, &pf_ioctl_trans, > > > pft_entry, s) { > > > +??if (w->pft_pid == > > > p->p_p->ps_pid) { > > > +??LIST_REMOVE(w, > > > pft_entry); > > > +??LIST_INSERT_HEAD(&tmp_list, > > > w, pft_entry); > > > +??} > > > +??} > > > +??rw_exit_write(&pfioctl_rw); > > > + > > > +??while ((w = LIST_FIRST(&tmp_list)) != > > > NULL) { > > > +??LIST_REMOVE(w, pft_entry); > > > +??pf_free_trans(w); > > > +??} > > > +??} > > > + > > > return (0); > > > ??} >
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, Apr 26, 2023 at 07:48:18AM +, Gerhard Roth wrote: > On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote: > > Hello, > > > > This is the second diff. It introduces a transaction (pf_trans). > > It's more or less diff with dead code. > > > > It's still worth to note those two chunks in this diff: > > > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > > flags, struct proc *p) > > return (EACCES); > > } > > ?? > > -??if (flags & FWRITE) > > -??rw_enter_write(&pfioctl_rw); > > -??else > > -??rw_enter_read(&pfioctl_rw); > > +??rw_enter_write(&pfioctl_rw); > > ?? > > switch (cmd) { > > ?? > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > > flags, struct proc *p) > > break; > > } > > ??fail: > > -??if (flags & FWRITE) > > -??rw_exit_write(&pfioctl_rw); > > -??else > > -??rw_exit_read(&pfioctl_rw); > > +??rw_exit_write(&pfioctl_rw); > > ?? > > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4). > > I'd like to also this lock to serialize access to table of transactions. > > To keep things simple for now I'd like to make every process to perform > > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll > > be pushing operation on pfioctl_rw further down to individual ioctl > > operations. > > > > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans() > > introduced in this diff are unused. They will be brought alive in > > the 3rd diff. > > > > thanks and > > regards > > sashan > > > > 8<---8<---8<--8< > > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > > index 7ea22050506..7e4c7d2a2ab 100644 > > --- a/sys/net/pf_ioctl.c > > +++ b/sys/net/pf_ioctl.c > > @@ -117,6 +117,11 @@ void?? > > pf_qid_unref(u_int16_t); > > ??int pf_states_clr(struct > > pfioc_state_kill *); > > ??int pf_states_get(struct > > pfioc_states *); > > ?? > > +struct pf_trans*pf_open_trans(pid_t); > > +struct pf_trans*pf_find_trans(uint64_t); > > +void?? pf_free_trans(struct pf_trans > > *); > > +void?? pf_rollback_trans(struct > > pf_trans *); > > + > > ??struct pf_rule?? pf_default_rule, pf_default_rule_new; > > ?? > > ??struct { > > @@ -168,6 +173,8 @@ int?? > > pf_rtlabel_add(struct pf_addr_wrap *); > > ??void?? pf_rtlabel_remove(struct > > pf_addr_wrap *); > > ??void?? pf_rtlabel_copyout(struct > > pf_addr_wrap *); > > ?? > > +uint64_t trans_ticket = 1; > > +LIST_HEAD(, pf_trans)pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans); > > ?? > > ??void > > ??pfattach(int num) > > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > > ??int > > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p) > > ??{ > > +??struct pf_trans *w, *s; > > +??LIST_HEAD(, pf_trans)??tmp_list; > > + > > +??if (minor(dev) >= 1) > > +??return (ENXIO); > > + > > +??if (flags & FWRITE) { > > +??LIST_INIT(&tmp_list); > > +??rw_enter_write(&pfioctl_rw); > > +??LIST_FOREACH_SAFE(w, &pf_ioctl_trans, > > pft_entry, s) { > > +??if (w->pft_pid == > > p->p_p->ps_pid) { > > +??LIST_REMOVE(w, > > pft_entry); > > +??LIST_INSERT_HEAD(&tmp_list, > > w, pft_entry); > > +??} > > +??} > > +??rw_exit_write(&pfioctl_rw); > > + > > +??while ((w = LIST_FIRST(&tmp_list)) != NULL) { > > +??LIST_REMOVE(w, pft_entry); > > +??pf_free_trans(w); > > +??} > > +??} > > + > > return (0); > > ??} > > Kernel close() routines don't work the same way as user-land close() ones do. > pfclose() is called only once when the last reference to /dev/pf goes away. > There is no way you can keep track of your transactions like that. we made /dev/pf clonable in src/sys/sys/conf.h r1.159 so we could do t
Re: arpresolve reduce kernel lock
On Tue, Apr 25, 2023 at 11:57:09PM +0300, Vitaliy Makkoveev wrote: > On Tue, Apr 25, 2023 at 11:44:34AM +0200, Alexander Bluhm wrote: > > Hi, > > > > Mutex arp_mtx protects the llinfo_arp la_... fields. So kernel > > lock is only needed for changing the route rt_flags. > > > > Of course there is a race between checking and setting rt_flags. > > But the other checks of the RTF_REJECT flags were without kernel > > lock before. This does not cause trouble, the worst thing that may > > happen is to wait another exprire time for ARP retry. My diff does > > not make it worse, reading rt_flags and rt_expire is done without > > lock anyway. > > > > The kernel lock is needed to change rt_flags. Testing without > > KERNEL_LOCK() caused crashes. > > > > Hi, > > I'm interesting is the system stable with the diff below? If so, we > could avoid kernel lock in the arpresolve(). I could not crash it. Basically I run while :; do arp -nd 10.6.31.33; arp -nd 10.6.16.36; done on the router while forwarding between the two addresses. So ARP routes are constantly created and deleted. But I think the diff is not correct. While an update where rt_flags is changed with kernel lock, cannot sneak into this atomic operation, the ARP code can still corrupt rt_flags changes holding the kernel lock. Of course my stress test only triggers on side of the picture, the problem is not exploited. So if you just wanted to know, whether this kind of atomic operation works, I would say yes. To make it correct all rt_flags updates would have to be changed in that way. I doubt that this is the way to go. Maybe we could split rt_flags in multiple fields each with its own locking strategy. bluhm > Index: sys/netinet/if_ether.c > === > RCS file: /cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.263 > diff -u -p -r1.263 if_ether.c > --- sys/netinet/if_ether.c25 Apr 2023 16:24:25 - 1.263 > +++ sys/netinet/if_ether.c25 Apr 2023 20:55:08 - > @@ -462,14 +462,20 @@ arpresolve(struct ifnet *ifp, struct rte > mtx_leave(&arp_mtx); > > if (reject == RTF_REJECT && !ISSET(rt->rt_flags, RTF_REJECT)) { > - KERNEL_LOCK(); > - SET(rt->rt_flags, RTF_REJECT); > - KERNEL_UNLOCK(); > + u_int flags; > + > + do { > + flags = rt->rt_flags; > + } while (atomic_cas_uint(&rt->rt_flags, flags, > + flags | RTF_REJECT) != flags); > } > if (reject == ~RTF_REJECT && ISSET(rt->rt_flags, RTF_REJECT)) { > - KERNEL_LOCK(); > - CLR(rt->rt_flags, RTF_REJECT); > - KERNEL_UNLOCK(); > + u_int flags; > + > + do { > + flags = rt->rt_flags; > + } while (atomic_cas_uint(&rt->rt_flags, flags, > + flags & ~RTF_REJECT) != flags); > } > if (refresh) > arprequest(ifp, &satosin(rt->rt_ifa->ifa_addr)->sin_addr.s_addr,
Re: DIOCGETRULE is slow for large rulesets (2/3)
On Wed, 2023-04-26 at 00:39 +0200, Alexandr Nedvedicky wrote: > Hello, > > This is the second diff. It introduces a transaction (pf_trans). > It's more or less diff with dead code. > > It's still worth to note those two chunks in this diff: > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > flags, struct proc *p) > return (EACCES); > } > > - if (flags & FWRITE) > - rw_enter_write(&pfioctl_rw); > - else > - rw_enter_read(&pfioctl_rw); > + rw_enter_write(&pfioctl_rw); > > switch (cmd) { > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > flags, struct proc *p) > break; > } > fail: > - if (flags & FWRITE) > - rw_exit_write(&pfioctl_rw); > - else > - rw_exit_read(&pfioctl_rw); > + rw_exit_write(&pfioctl_rw); > > `pfioctl_rw` serializes processes which perform ioctl(2) to pf(4). > I'd like to also this lock to serialize access to table of transactions. > To keep things simple for now I'd like to make every process to perform > ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll > be pushing operation on pfioctl_rw further down to individual ioctl > operations. > > the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans() > introduced in this diff are unused. They will be brought alive in > the 3rd diff. > > thanks and > regards > sashan > > 8<---8<---8<--8< > diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c > index 7ea22050506..7e4c7d2a2ab 100644 > --- a/sys/net/pf_ioctl.c > +++ b/sys/net/pf_ioctl.c > @@ -117,6 +117,11 @@ void pf_qid_unref(u_int16_t); > int pf_states_clr(struct pfioc_state_kill *); > int pf_states_get(struct pfioc_states *); > > +struct pf_trans*pf_open_trans(pid_t); > +struct pf_trans*pf_find_trans(uint64_t); > +void pf_free_trans(struct pf_trans *); > +void pf_rollback_trans(struct pf_trans *); > + > struct pf_rule pf_default_rule, pf_default_rule_new; > > struct { > @@ -168,6 +173,8 @@ int pf_rtlabel_add(struct pf_addr_wrap > *); > void pf_rtlabel_remove(struct pf_addr_wrap *); > void pf_rtlabel_copyout(struct pf_addr_wrap *); > > +uint64_t trans_ticket = 1; > +LIST_HEAD(, pf_trans) pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans); > > void > pfattach(int num) > @@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p) > int > pfclose(dev_t dev, int flags, int fmt, struct proc *p) > { > + struct pf_trans *w, *s; > + LIST_HEAD(, pf_trans) tmp_list; > + > + if (minor(dev) >= 1) > + return (ENXIO); > + > + if (flags & FWRITE) { > + LIST_INIT(&tmp_list); > + rw_enter_write(&pfioctl_rw); > + LIST_FOREACH_SAFE(w, &pf_ioctl_trans, pft_entry, s) { > + if (w->pft_pid == p->p_p->ps_pid) { > + LIST_REMOVE(w, pft_entry); > + LIST_INSERT_HEAD(&tmp_list, w, pft_entry); > + } > + } > + rw_exit_write(&pfioctl_rw); > + > + while ((w = LIST_FIRST(&tmp_list)) != NULL) { > + LIST_REMOVE(w, pft_entry); > + pf_free_trans(w); > + } > + } > + > return (0); > } Kernel close() routines don't work the same way as user-land close() ones do. pfclose() is called only once when the last reference to /dev/pf goes away. There is no way you can keep track of your transactions like that. > > @@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > flags, struct proc *p) > return (EACCES); > } > > - if (flags & FWRITE) > - rw_enter_write(&pfioctl_rw); > - else > - rw_enter_read(&pfioctl_rw); > + rw_enter_write(&pfioctl_rw); > > switch (cmd) { > > @@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int > flags, struct proc *p) > break; > } > fail: > - if (flags & FWRITE) > - rw_exit_write(&pfioctl_rw); > - else > - rw_exit_read(&pfioctl_rw); > + rw_exit_write(&pfioctl_rw); > > return (error); > } > @@ -3244,3 +3268,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, > size_t newlen) > > return sysctl_rdstruct(oldp, oldlenp, newp, &pfs, sizeof(pfs)); > } > + > +struct pf_trans * > +pf_open_trans(pid_t pid) > +{ > + static uint64_t ticket = 1; > + struct pf_trans *t; > + > + rw_assert_wrlock(&pfioctl_rw); > + > + t = malloc(sizeo
atactl(8): 'readattr' subcommand quits silently.
Hi, Our IIJ team is planning to run OpenBSD on HPE ProLiant DL20 Gen10 server. Yasuoka-san and I are testing on this host and found that `atactl sd0 readattr` doesn't show any messages, just quits silently. 'sd0' disk is shown in dmesg as follows. ``` sd0 at scsibus1 targ 0 lun 0: naa.5002538e01735e7f sd0: 457862MB, 512 bytes/sector, 937703088 sectors, thin ``` To see what's happening to atactl(8), we added printf code like this. ``` diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c index 85dfced8c9a..2babdafffc7 100644 --- a/sbin/atactl/atactl.c +++ b/sbin/atactl/atactl.c @@ -1660,10 +1660,11 @@ device_attr(int argc, char *argv[]) if (attr_val.revision != attr_thr.revision) { /* * Non standard vendor implementation. * Return, since we don't know how to use this. */ + printf("atactl: revision mismatch SMART_READ=0x%02x SMART_THREASHOLD=0x%02x\n", attr_val.revision, attr_thr.revision); return; } attr = attr_val.attribute; thr = attr_thr.threshold; ``` And got following message. ``` $ doas ./atactl sd0 readattr atactl: revision mismatch SMART_READ=0x01 SMART_THREASHOLD=0x00 ``` These 2 revisions of 'attr_val' and 'attr_thr' are different on this disk. The comment says that it's wrong vendor implementation but I can see 'smartctl' shows the attributes as follows. NetBSD's atactl doesn't see these 2 revisions are same but checks each checksum is valid. ``` SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 200 200 002Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 005Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000Old_age Always - 3138 173 Unknown_Attribute 0x0033 099 099 005Pre-fail Always - 8 175 Program_Fail_Count_Chip 0x0033 100 100 001Pre-fail Always - 0 180 Unused_Rsvd_Blk_Cnt_Tot 0x003b 200 200 097Pre-fail Always - 0 194 Temperature_Celsius 0x0022 080 075 000Old_age Always - 20 196 Reallocated_Event_Count 0x0033 100 100 005Pre-fail Always - 0 202 Unknown_SSD_Attribute 0x0033 100 100 010Pre-fail Always - 0 ``` So, I propose to change the method to verify checksums instead of revision checks. ``` diff --git a/sbin/atactl/atactl.c b/sbin/atactl/atactl.c index 85dfced8c9a..1f77460ce3d 100644 --- a/sbin/atactl/atactl.c +++ b/sbin/atactl/atactl.c @@ -1657,13 +1657,11 @@ device_attr(int argc, char *argv[]) req.datalen = sizeof(attr_thr); ata_command(&req); - if (attr_val.revision != attr_thr.revision) { - /* -* Non standard vendor implementation. -* Return, since we don't know how to use this. -*/ - return; - } + if (smart_cksum((u_int8_t *)&attr_val, sizeof(attr_val)) != 0) + errx(1, "Checksum mismatch (attr_val)"); + + if (smart_cksum((u_int8_t *)&attr_thr, sizeof(attr_thr)) != 0) + errx(1, "Checksum mismatch (attr_thr)"); attr = attr_val.attribute; thr = attr_thr.threshold; ``` With this patch, we got following result. ``` Attributes table revision: 1 ID Attribute name Threshold Value Raw 1 Raw Read Error Rate2200 0x 5 Reallocated Sector Count 5100 0x 9 Power-On Hours Count 0 99 0x0c42 173 Unknown5 99 0x0008 175 Unknown1100 0x 180 Unknown 97200 0x 194 Temperature0 80 0x0014 196 Reallocation Event Count 5100 0x 202 Data Address Mark Errors 10100 0x ``` Is the proposed patch OK? -- Yuichiro NAITO (naito.yuich...@gmail.com)