Re: [www] fix broken links on loongson.html
Hi, I noticed this today and commited your change. mbuhl On Tue, Jun 02, 2020 at 12:33:36PM +, Yifei Zhan wrote: > Hi, > > Several links on loongson.html are now broken as zkml.lemote.com is no > longer being resolved. > > I think it's a good idea to redirect them to their archived version on > archive.org. > > Cheers, > Yifei Zhan > > > Index: loongson.html > === > RCS file: /cvs/www/loongson.html,v > retrieving revision 1.66 > diff -u -p -r1.66 loongson.html > --- loongson.html 19 May 2020 02:02:10 - 1.66 > +++ loongson.html 31 May 2020 06:55:56 - > @@ -73,16 +73,16 @@ Specific hardware support is then writte > At the moment, the following machines are supported: > > > - href="http://zkml.lemote.com/en/products/mini-computer/2010/0310/111.html;>Lemote > Fuloong 2F > + href="https://web.archive.org/web/20190101154956/http://zkml.lemote.com/en/products/mini-computer/2010/0310/111.html;>Lemote > Fuloong 2F > mini-PC > > All on-board devices are supported, but the framebuffer is currently limited > to the 640x400x8 video mode set up by the firmware. > - href="http://zkml.lemote.com/en/products/all-in-one/2010/0311/122.html;>Lemote > Lynloong all-in-one-PC > + href="https://web.archive.org/web/20190101150011/http://zkml.lemote.com/en/products/all-in-one/2010/0311/122.html;>Lemote > Lynloong all-in-one-PC > > All on-board devices are supported, but the framebuffer is currently limited > to the 1360x768x16 video mode set up by the firmware. > - href="http://zkml.lemote.com/en/products/Notebook/2010/0310/112.html;>Lemote > Yeeloong > + href="https://web.archive.org/web/20160703160118/http://zkml.lemote.com/en/products/Notebook/2010/0310/112.html;>Lemote > Yeeloong > netbook > > Both the 8.9" and 10.1" models are supported.
use designated initializer for ffs_vtbl
Dear tech, while reading through ffs code I was looking for calls to ffs_truncate. This would have saved me a few seconds. OK? mbuhl Index: ufs/ffs/ffs_vfsops.c === RCS file: /cvs/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.193 diff -u -p -r1.193 ffs_vfsops.c --- ufs/ffs/ffs_vfsops.c12 Aug 2022 14:30:53 - 1.193 +++ ufs/ffs/ffs_vfsops.c14 Apr 2023 11:35:16 - @@ -89,12 +89,12 @@ const struct vfsops ffs_vfsops = { }; struct inode_vtbl ffs_vtbl = { - ffs_truncate, - ffs_update, - ffs_inode_alloc, - ffs_inode_free, - ffs_balloc, - ffs_bufatoff + .iv_truncate= ffs_truncate, + .iv_update = ffs_update, + .iv_inode_alloc = ffs_inode_alloc, + .iv_inode_free = ffs_inode_free, + .iv_buf_alloc = ffs_balloc, + .iv_bufatoff= ffs_bufatoff }; int
Re: installer: rpi: make softraid install work
On Mon, Apr 03, 2023 at 05:53:49PM +, Klemens Nanni wrote: > 03.04.2023 15:03, Klemens Nanni пишет: > > This means no behaviour change for plain installs, but working boot for > > softraid installs. > > > > 'cvs diff -b -U0': > > > > @@ -39 +39 @@ md_installboot() { > > - local _disk=$1 _mdec _plat > > + local _disk=$1 _chunks _bootdisk _mdec _plat > > @@ -62 +62,3 @@ md_installboot() { > > - mount ${MOUNT_ARGS_msdos} /dev/${_disk}i /mnt/mnt > > + _chunks=$(get_softraid_chunks) > > + for _bootdisk in ${_chunks:-$_disk}; do > > + mount ${MOUNT_ARGS_msdos} /dev/${_bootdisk}i > > /mnt/mnt > > @@ -74,0 +77 @@ md_installboot() { > > + done > > > > (I have yet to dig out my rpi4, fix the console cable and test myself...) > > As expected: > - Plain install on wiped SD card boots. > - Crypto install on wiped SD card boots. > > No other boards are effected by this diff. > > OK? I tested this on a rpi zero 2 w (no UEFI Firmware): machdep.compatible=raspberrypi,model-zero-2-w I can confirm that without this diff it tries to mount the msdos partition of the SR CRYPTO disk. After applying this diff the installer uses the chunk disk instead. ok mbuhl
Re: panic: ffs_valloc: dup alloc
Below is an updated diff addressing the following concerns: On Fri, Mar 17, 2023 at 03:45:59PM +0100, Alexander Bluhm wrote: > The FreeBSD Code puts the check_nifree label before the if > (fs->fs_cs(fs, cg).cs_nifree == 0) check. Could you move this chunk > a little up to keep us in sync? On Fri, Mar 17, 2023 at 03:57:39PM +0100, Mark Kettenis wrote: > Don't hide a global variable like that! There are also some concerns on the string "indallck", to my understanding the first 7 characters show up as wchan in ps and should be easy to grep for. Any better names? "ffs-node-lock" "ffsndlock" Index: ffs_alloc.c === RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v retrieving revision 1.114 diff -u -p -r1.114 ffs_alloc.c --- ffs_alloc.c 11 Mar 2021 13:31:35 - 1.114 +++ ffs_alloc.c 17 Mar 2023 19:28:11 - @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include @@ -75,6 +76,7 @@ daddr_t ffs_mapsearch(struct fs *, stru static const struct timevalfserr_interval = { 2, 0 }; +struct rwlock ffs_node_lock = RWLOCK_INITIALIZER("indallck"); /* * Allocate a block in the file system. @@ -1108,6 +1110,9 @@ ffs_nodealloccg(struct inode *ip, u_int * and in the cylinder group itself. */ fs = ip->i_fs; +#ifdef FFS2 + check_nifree: +#endif if (fs->fs_cs(fs, cg).cs_nifree == 0) return (0); @@ -1201,9 +1206,12 @@ gotit: /* Has any inode not been used at least once? */ cgp->cg_initediblk < cgp->cg_ffs2_niblk) { + if (rw_enter(_node_lock, RW_WRITE | RW_SLEEPFAIL)) + goto check_nifree; ibp = getblk(ip->i_devvp, fsbtodb(fs, ino_to_fsba(fs, cg * fs->fs_ipg + cgp->cg_initediblk)), (int)fs->fs_bsize, 0, INFSLP); + rw_exit(_node_lock); memset(ibp->b_data, 0, fs->fs_bsize); dp2 = (struct ufs2_dinode *)(ibp->b_data);
panic: ffs_valloc: dup alloc
Dear tech, I have a machine with some kind of hardware defect. Smartctl shows that one SSD regularly has an unexpected power loss (Attribute 174): 174 Unknown_Attribute 0x0032 100 100 000Old_age Always - 284. On the console I see the following output: root on sd0a (b75e60dd651bd99e.a) swap on sd0b dump on sd0b ahci0: device didn't come ready after reset, TFD: 0x441 ahci0: unrecoverable errors (IS: 10), disabling port. ahci0: device didn't come ready after reset, TFD: 0x1d0 ahci0: stopping the port, softreset slot 31 was still active. ahci0: device didn't come ready after reset, TFD: 0x441 ahci0: device didn't come ready after reset, TFD: 0x441 ahci0: stopping the port, softreset slot 31 was still active. ahci0: device didn't come ready after reset, TFD: 0x441 ahci0: stopping the port, softreset slot 31 was still active. ahci0: stopping the port, softreset slot 31 was still active. ahci0: stopping the port, softreset slot 31 was still active. ahci0: stopping the port, softreset slot 31 was still active. ahci0: stopping the port, softreset slot 31 was still active. ahci0: stopping the port, softreset slot 31 was still active. ahci0: device didn't come ready after reset, TFD: 0x441 ahci0: device didn't come ready after reset, TFD: 0x441 mode = 0100644, inum = 7731315, fs = /var panic: ffs_valloc: dup alloc Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 175271 46390 1001 0x2 0x4000 rustc 345274 46390 1001 0x2 0x4001 rustc 224769 52321 1001 0x2 0x4005 rustc *414814 59107 1001 0x1002 0x4082K rustc 87951 59107 1001 0x1002 0x4084 rustc db_enter() at db_enter+0x10 panic(81f194f6) at panic+0xbf ffs_inode_alloc(fd87b0b8c1e0,81a4,fd8fc6cfc5d0,800020a7a948) at ffs _inode_alloc+0x42e ufs_makeinode(81a4,fd87ab540438,800020a7ac40,800020a7ac70) at ufs_m akeinode+0x79 ufs_create(800020a7a9f8) at ufs_create+0x3c VOP_CREATE(fd87ab540438,800020a7ac40,800020a7ac70,800020a7aa50) at VOP_CREATE+0x3f vn_open(800020a7ac10,10602,1a4) at vn_open+0x162 doopenat(800020879cf0,ff9c,246e5646200,10601,1b6,800020a7adf0) at d oopenat+0x1cd syscall(800020a7ae60) at syscall+0x35f Xsyscall() at Xsyscall+0x128 end of kernel ffs_inode_alloc checks the mode of the newly allocated inode and if it is not 0 it panics. I suspect that due to the ssds softreset that IO operations take longer and cause trouble due to missing locking. For example in ffs_nodealloccg it is possible for an FFS2 file system to sleep due to a getblk. During the sleep, another call to ffs_nodealloccg might happen with the same ipref. The function first checks if there is space at the prefered location, then sleeps and then reserves it. This is a TOCTOU bug. FreeBSD fixed this in 2013: https://github.com/freebsd/freebsd-src/commit/94f4ac214c403b99cbc44dd6e9cdf4db2cc9b297 Below is a diff that addresses this with a exclusive lock using RW_SLEEPFAIL. Index: ufs/ffs/ffs_alloc.c === RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v retrieving revision 1.114 diff -u -p -r1.114 ffs_alloc.c --- ufs/ffs/ffs_alloc.c 11 Mar 2021 13:31:35 - 1.114 +++ ufs/ffs/ffs_alloc.c 10 Mar 2023 15:44:52 - @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include @@ -1089,6 +1090,7 @@ gotit: } /* inode allocation routine */ +struct rwlock ffs_node_lock = RWLOCK_INITIALIZER("indallck"); daddr_t ffs_nodealloccg(struct inode *ip, u_int cg, daddr_t ipref, int mode) { @@ -1115,6 +1117,9 @@ ffs_nodealloccg(struct inode *ip, u_int return (0); cgp = (struct cg *)bp->b_data; +#ifdef FFS2 + check_nifree: +#endif if (cgp->cg_cs.cs_nifree == 0) { brelse(bp); return (0); @@ -1201,9 +1206,12 @@ gotit: /* Has any inode not been used at least once? */ cgp->cg_initediblk < cgp->cg_ffs2_niblk) { + if (rw_enter(_node_lock, RW_WRITE | RW_SLEEPFAIL)) + goto check_nifree; ibp = getblk(ip->i_devvp, fsbtodb(fs, ino_to_fsba(fs, cg * fs->fs_ipg + cgp->cg_initediblk)), (int)fs->fs_bsize, 0, INFSLP); + rw_exit(_node_lock); memset(ibp->b_data, 0, fs->fs_bsize); dp2 = (struct ufs2_dinode *)(ibp->b_data);
PPPoE with PAP and CHAP
Hi gerhard and tech, You wrote the following in 2015 and I was wondering if it was worth upstreaming it: The PPPoE kernel module gets a new option auto for the authproto in addition to pap and chap in client mode. In this case the authentication protocol sent by the server in the LCP configure request is used. by gerhard@ Index: sbin/ifconfig/ifconfig.8 === RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v retrieving revision 1.390 diff -u -p -r1.390 ifconfig.8 --- sbin/ifconfig/ifconfig.827 Feb 2023 14:53:38 - 1.390 +++ sbin/ifconfig/ifconfig.810 Mar 2023 19:56:33 - @@ -1693,6 +1693,7 @@ interface acting as a client. The protocol name can be either .Ql chap , .Ql pap , +.Ql auto , or .Ql none . In the latter case, authentication will be turned off. Index: sbin/ifconfig/ifconfig.c === RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v retrieving revision 1.461 diff -u -p -r1.461 ifconfig.c --- sbin/ifconfig/ifconfig.c18 Jan 2023 21:57:10 - 1.461 +++ sbin/ifconfig/ifconfig.c10 Mar 2023 19:56:33 - @@ -5481,6 +5481,8 @@ setsroto(const char *val, int d) spa.proto = PPP_PAP; else if (strcmp(val, "chap") == 0) spa.proto = PPP_CHAP; + else if (d == 0 && strcmp(val, "auto") == 0) + spa.proto = PPP_AUTOAUTH; else if (strcmp(val, "none") == 0) spa.proto = 0; else @@ -5588,6 +5590,9 @@ sppp_printproto(const char *name, struct break; case PPP_CHAP: printf("chap "); + break; + case PPP_AUTOAUTH: + printf("auto "); break; default: printf("0x%04x ", auth->proto); Index: sys/net/if_sppp.h === RCS file: /cvs/src/sys/net/if_sppp.h,v retrieving revision 1.30 diff -u -p -r1.30 if_sppp.h --- sys/net/if_sppp.h 17 Nov 2021 18:00:24 - 1.30 +++ sys/net/if_sppp.h 10 Mar 2023 19:56:33 - @@ -40,6 +40,8 @@ #ifndef _NET_IF_SPPP_H_ #define _NET_IF_SPPP_H_ +#define PPP_AUTOAUTH 0xffee /* automatic auth protocol selection */ + #define AUTHFLAG_NOCALLOUT 1 /* don't require authentication on callouts */ #define AUTHFLAG_NORECHALLENGE 2 /* don't re-challenge CHAP */ @@ -185,6 +187,7 @@ struct sppp { struct sipcp ipv6cp;/* IPV6CP params */ struct sauth myauth;/* auth params, i'm peer */ struct sauth hisauth; /* auth params, i'm authenticator */ + u_short peerproto; /* auth protocol requested by peer */ u_char chap_challenge[AUTHCHALEN]; /* random challenge used by CHAP */ /* Index: sys/net/if_spppsubr.c === RCS file: /cvs/src/sys/net/if_spppsubr.c,v retrieving revision 1.191 diff -u -p -r1.191 if_spppsubr.c --- sys/net/if_spppsubr.c 2 Jan 2022 22:36:03 - 1.191 +++ sys/net/if_spppsubr.c 10 Mar 2023 19:57:55 - @@ -1894,11 +1894,16 @@ sppp_lcp_RCR(struct sppp *sp, struct lcp case LCP_OPT_AUTH_PROTO: authproto = (p[2] << 8) + p[3]; - if (sp->myauth.proto != authproto) { + if (sp->myauth.proto == PPP_AUTOAUTH) { + if (debug) + addlog("[peer wants %s] ", + sppp_proto_name(authproto)); + sp->peerproto = authproto; + } else if (sp->myauth.proto != authproto) { /* not agreed, nak */ if (debug) addlog("[mine %s != his %s] ", - sppp_proto_name(sp->hisauth.proto), + sppp_proto_name(sp->myauth.proto), sppp_proto_name(authproto)); p[2] = sp->myauth.proto >> 8; p[3] = sp->myauth.proto; @@ -3409,7 +3414,8 @@ sppp_chap_input(struct sppp *sp, struct } x = splnet(); sp->pp_flags &= ~PP_NEEDAUTH; - if (sp->myauth.proto == PPP_CHAP && + if ((sp->myauth.proto == PPP_CHAP || + sp->myauth.proto == PPP_AUTOAUTH) && (sp->lcp.opts & (1 << LCP_OPT_AUTH_PROTO)) && (sp->lcp.protos & (1 << IDX_CHAP)) == 0) { /* @@ -3558,7 +3564,8 @@ sppp_chap_init(struct sppp *sp) void sppp_chap_open(struct sppp *sp) { - if (sp->myauth.proto == PPP_CHAP && + if ((sp->myauth.proto == PPP_CHAP || +
mountd: potential use of uninitialized stack data
Dear tech@, codechecker found the following problem with fsb in sbin/mountd: mntsrv(...) ... struct statfs fsb; ... if (realpath(rpcpath, dirpath) == NULL) { bad = errno; if (debug) fprintf(stderr, "realpath failed on %s\n", rpcpath); strlcpy(dirpath, rpcpath, sizeof(dirpath)); } else if (stat(dirpath, ) == -1 || (!S_ISDIR(stb.st_mode) && !S_ISREG(stb.st_mode)) || statfs(dirpath, ) == -1) { if (debug) fprintf(stderr, "stat failed on %s\n", dirpath); bad = ENOENT; /* We will send error reply later */ } /* Check in the exports list */ sigprocmask(SIG_BLOCK, _mask, NULL); ep = ex_search(_fsid); ... The tool finds a path to ex_search where fsb.f_fsid is uninitialized. ex_search compares the potentially uninitialized stack data: ex_search(fsid_t *fsid) { struct exportlist *ep; ep = exphead; while (ep) { if (ep->ex_fs.val[0] == fsid->val[0] && ... Is it sufficient to zero fsb? Is this really reachable? mbuhl
igc(4) remove unnecessary PHY ID checks
Dear tech, Looking into various reports on errors with mini routers that use igc(4), I found the following diff in FreeBSD: https://github.com/freebsd/freebsd-src/commit/29d7f1ff579579711dd5a3325480728b8ed45f8c I additionally deleted the igc_set_d0_lplu_state_i225 and igc_set_d3_lplu_state_i225 functions since they are no longer used. I am not convinced that this diff really fixes any of the issues, however I do appreciate feedback. Ok? mbuhl Index: dev/pci/igc_i225.c === RCS file: /cvs/src/sys/dev/pci/igc_i225.c,v retrieving revision 1.3 diff -u -p -r1.3 igc_i225.c --- dev/pci/igc_i225.c 11 May 2022 06:14:15 - 1.3 +++ dev/pci/igc_i225.c 10 Jan 2023 09:26:04 - @@ -163,16 +163,7 @@ igc_init_phy_params_i225(struct igc_hw * goto out; ret_val = igc_get_phy_id(hw); - /* Verify phy id and set remaining function pointers */ - switch (phy->id) { - case I225_I_PHY_ID: - default: - phy->type = igc_phy_i225; - phy->ops.set_d0_lplu_state = igc_set_d0_lplu_state_i225; - phy->ops.set_d3_lplu_state = igc_set_d3_lplu_state_i225; - /* TODO - complete with GPY PHY information */ - break; - } + phy->type = igc_phy_i225; out: return ret_val; @@ -1104,66 +1095,6 @@ igc_init_hw_i225(struct igc_hw *hw) ret_val = igc_init_hw_base(hw); return ret_val; -} - -/* - * igc_set_d0_lplu_state_i225 - Set Low-Power-Link-Up (LPLU) D0 state - * @hw: pointer to the HW structure - * @active: true to enable LPLU, false to disable - * - * Note: since I225 does not actually support LPLU, this function - * simply enables/disables 1G and 2.5G speeds in D0. - */ -int -igc_set_d0_lplu_state_i225(struct igc_hw *hw, bool active) -{ - uint32_t data; - - DEBUGFUNC("igc_set_d0_lplu_state_i225"); - - data = IGC_READ_REG(hw, IGC_I225_PHPM); - - if (active) { - data |= IGC_I225_PHPM_DIS_1000; - data |= IGC_I225_PHPM_DIS_2500; - } else { - data &= ~IGC_I225_PHPM_DIS_1000; - data &= ~IGC_I225_PHPM_DIS_2500; - } - - IGC_WRITE_REG(hw, IGC_I225_PHPM, data); - return IGC_SUCCESS; -} - -/* - * igc_set_d3_lplu_state_i225 - Set Low-Power-Link-Up (LPLU) D3 state - * @hw: pointer to the HW structure - * @active: true to enable LPLU, false to disable - * - * Note: since I225 does not actually support LPLU, this function - * simply enables/disables 100M, 1G and 2.5G speeds in D3. - */ -int -igc_set_d3_lplu_state_i225(struct igc_hw *hw, bool active) -{ - uint32_t data; - - DEBUGFUNC("igc_set_d3_lplu_state_i225"); - - data = IGC_READ_REG(hw, IGC_I225_PHPM); - - if (active) { - data |= IGC_I225_PHPM_DIS_100_D3; - data |= IGC_I225_PHPM_DIS_1000_D3; - data |= IGC_I225_PHPM_DIS_2500_D3; - } else { - data &= ~IGC_I225_PHPM_DIS_100_D3; - data &= ~IGC_I225_PHPM_DIS_1000_D3; - data &= ~IGC_I225_PHPM_DIS_2500_D3; - } - - IGC_WRITE_REG(hw, IGC_I225_PHPM, data); - return IGC_SUCCESS; } /** Index: dev/pci/igc_i225.h === RCS file: /cvs/src/sys/dev/pci/igc_i225.h,v retrieving revision 1.1 diff -u -p -r1.1 igc_i225.h --- dev/pci/igc_i225.h 31 Oct 2021 14:52:57 - 1.1 +++ dev/pci/igc_i225.h 10 Jan 2023 09:37:40 - @@ -27,8 +27,6 @@ void igc_release_swfw_sync_i225(struct i intigc_set_ltr_i225(struct igc_hw *, bool); intigc_init_hw_i225(struct igc_hw *); intigc_setup_copper_link_i225(struct igc_hw *); -intigc_set_d0_lplu_state_i225(struct igc_hw *, bool); -intigc_set_d3_lplu_state_i225(struct igc_hw *, bool); intigc_set_eee_i225(struct igc_hw *, bool, bool, bool); #define ID_LED_DEFAULT_I225\ Index: dev/pci/igc_phy.c === RCS file: /cvs/src/sys/dev/pci/igc_phy.c,v retrieving revision 1.2 diff -u -p -r1.2 igc_phy.c --- dev/pci/igc_phy.c 11 May 2022 06:14:15 - 1.2 +++ dev/pci/igc_phy.c 10 Jan 2023 09:34:11 - @@ -307,8 +307,7 @@ igc_phy_setup_autoneg(struct igc_hw *hw) return ret_val; } - if ((phy->autoneg_mask & ADVERTISE_2500_FULL) && - hw->phy.id == I225_I_PHY_ID) { + if (phy->autoneg_mask & ADVERTISE_2500_FULL) { /* Read the MULTI GBT AN Control Register - reg 7.32 */ ret_val = phy->ops.read_reg(hw, (STANDARD_AN_REG_MASK << MMD_DEVADDR_SHIFT) | ANEG_MULTIGBT_AN_CTRL, @@ -443,8 +442,7 @@ igc_phy_setup_autoneg(struct igc_hw *hw) ret_val = phy->ops.write_reg(hw, PHY_1000T_CTRL, mii_1000t_ctrl_reg); - if ((phy->autoneg_mask &
MAP_CONCEAL bypass due to stack overflow
Dear tech, currently the MAP_CONCEAL flag for mmap(2) can be bypassed with a stack overflow. FreeBSD introduced a similar flag called MAP_NOCORE. They also save the auxinfo pointer in some struct that is attached to the proc struct, similar to the diff below. Here is an example CTF challenge: /* * Setup: * make mmap * echo 'this is a well kept secret' > flag.txt * printf 'flag.txt\0' | ./mmap * strings mmap.core | grep secret */ #include #include #include #include #include char * load_file(char *path) { char *addr; int fd; fd = open(path, O_RDONLY); if ((addr = mmap((char *)0x4, 0x100, PROT_READ, MAP_FIXED | MAP_CONCEAL, fd, 0)) == MAP_FAILED) err(1, NULL); close(fd); return (addr); } int main(void) { char file[0x100]; char *buf; puts("Content-type: text/html\n"); puts("simple mmapPOST filename to /cgi-bin/mmap"); read(0, file, 0x1000); buf = load_file(file); printf("load: %p\n", buf); fflush(stdout); return (0); } With the following python program (on amd64) I can generate an input string that writes the MAP_CONCEALED content of flag.txt to the core file: import struct p64 = lambda x: struct.pack("p_p; @@ -1282,23 +1279,7 @@ coredump_notes_elf(struct proc *p, void /* Second, write an NT_OPENBSD_AUXV note. */ notesize = sizeof(nhdr) + elfround(sizeof("OpenBSD")) + elfround(ELF_AUX_WORDS * sizeof(char *)); - if (iocookie) { - iov.iov_base = - iov.iov_len = sizeof(pss); - uio.uio_iov = - uio.uio_iovcnt = 1; - uio.uio_offset = (off_t)pr->ps_strings; - uio.uio_resid = sizeof(pss); - uio.uio_segflg = UIO_SYSSPACE; - uio.uio_rw = UIO_READ; - uio.uio_procp = NULL; - - error = uvm_io(>p_vmspace->vm_map, , 0); - if (error) - return (error); - - if (pss.ps_envstr == NULL) - return (EIO); + if (iocookie && pr->ps_auxinfo) { nhdr.namesz = sizeof("OpenBSD"); nhdr.descsz = ELF_AUX_WORDS * sizeof(char *); @@ -1315,7 +1296,7 @@ coredump_notes_elf(struct proc *p, void return (error); error = coredump_write(iocookie, UIO_USERSPACE, - pss.ps_envstr + pss.ps_nenvstr + 1, nhdr.descsz); + (caddr_t)pr->ps_auxinfo, nhdr.descsz); if (error) return (error); } Index: kern/kern_exec.c === RCS file: /mount/openbsd/cvs/src/sys/kern/kern_exec.c,v retrieving revision 1.238 diff -u -p -r1.238 kern_exec.c --- kern/kern_exec.c30 Oct 2022 17:43:40 - 1.238 +++ kern/kern_exec.c6 Nov 2022 10:37:19 - @@ -492,6 +492,8 @@ sys_execve(struct proc *p, void *v, regi if (!copyargs(, , stack, argp)) goto exec_abort; + pr->ps_auxinfo = (vaddr_t)pack.ep_auxinfo; + /* copy out the process's ps_strings structure */ if (copyout(, (char *)pr->ps_strings, sizeof(arginfo))) goto exec_abort; Index: sys/proc.h === RCS file: /mount/openbsd/cvs/src/sys/sys/proc.h,v retrieving revision 1.334 diff -u -p -r1.334 proc.h --- sys/proc.h 23 Jul 2022 22:10:59 - 1.334 +++ sys/proc.h 6 Nov 2022 10:37:19 - @@ -215,6 +215,7 @@ struct process { charps_comm[_MAXCOMLEN];/* command name, incl NUL */ vaddr_t ps_strings; /* User pointers to argv/env */ + vaddr_t ps_auxinfo; /* User pointer to auxinfo */ vaddr_t ps_timekeep;/* User pointer to timekeep */ vaddr_t ps_sigcode; /* [I] User pointer to signal code */ vaddr_t ps_sigcoderet; /* [I] User ptr to sigreturn retPC */
Re: relayd: uninitialized errstr
On Thu, Nov 10, 2022 at 01:29:13AM +0100, Theo Buehler wrote: > On Thu, Nov 10, 2022 at 01:10:51AM +0100, Moritz Buhl wrote: > > errstr is never set but fail does: > > RSA_meth_free(rsae_method); > > fatalx("%s: %s", __func__, errstr); > > Found by codechecker. > > > > OK? > > Needs more braces Indeed. Is this OK? Index: ca.c === RCS file: /cvs/src/usr.sbin/relayd/ca.c,v retrieving revision 1.39 diff -u -p -r1.39 ca.c --- ca.c20 Jan 2022 17:56:35 - 1.39 +++ ca.c10 Nov 2022 00:33:35 - @@ -513,8 +513,10 @@ ca_engine_init(struct relayd *x_env) if (rsa_default != NULL) return; - if ((rsae_method = RSA_meth_new("RSA privsep engine", 0)) == NULL) + if ((rsae_method = RSA_meth_new("RSA privsep engine", 0)) == NULL) { + errstr = "RSA_meth_new"; goto fail; + } RSA_meth_set_pub_enc(rsae_method, rsae_pub_enc); RSA_meth_set_pub_dec(rsae_method, rsae_pub_dec);
relayd: uninitialized errstr
errstr is never set but fail does: RSA_meth_free(rsae_method); fatalx("%s: %s", __func__, errstr); Found by codechecker. OK? mbuhl Index: ca.c === RCS file: /cvs/src/usr.sbin/relayd/ca.c,v retrieving revision 1.39 diff -u -p -r1.39 ca.c --- ca.c20 Jan 2022 17:56:35 - 1.39 +++ ca.c10 Nov 2022 00:06:20 - @@ -514,6 +514,7 @@ ca_engine_init(struct relayd *x_env) return; if ((rsae_method = RSA_meth_new("RSA privsep engine", 0)) == NULL) + errstr = "RSA_meth_new"; goto fail; RSA_meth_set_pub_enc(rsae_method, rsae_pub_enc);
relayd: always call va_end
The same code is in httpd but there it was fixed in commit 6b535b529336a3fd1beb56c42ff5755b84ba9b03 Author: jung Date: Sun May 22 19:19:21 2016 + fix unbalanced va_start and va_end macros from Hiltjo Posthuma "do." deraadt Found by codechecker OK? mbuhl Index: usr.sbin/relayd/relayd.c === RCS file: /cvs/src/usr.sbin/relayd/relayd.c,v retrieving revision 1.189 diff -u -p -r1.189 relayd.c --- usr.sbin/relayd/relayd.c3 Sep 2022 20:07:31 - 1.189 +++ usr.sbin/relayd/relayd.c9 Nov 2022 23:51:57 - @@ -656,11 +656,13 @@ kv_set(struct kv *kv, char *fmt, ...) va_list ap; char*value = NULL; struct kv *ckv; + int ret; va_start(ap, fmt); - if (vasprintf(, fmt, ap) == -1) - return (-1); + ret = vasprintf(, fmt, ap); va_end(ap); + if (ret == -1) + return (-1); /* Remove all children */ while ((ckv = TAILQ_FIRST(>kv_children)) != NULL) { @@ -681,11 +683,13 @@ kv_setkey(struct kv *kv, char *fmt, ...) { va_list ap; char*key = NULL; + int ret; va_start(ap, fmt); - if (vasprintf(, fmt, ap) == -1) - return (-1); + ret = vasprintf(, fmt, ap); va_end(ap); + if (ret == -1) + return (-1); free(kv->kv_key); kv->kv_key = key;
acme-client memory leak in error case
Hi tech, g is not freed in this error case. Found by codechecker. OK? mbuhl Index: usr.sbin/acme-client/netproc.c === RCS file: /cvs/src/usr.sbin/acme-client/netproc.c,v retrieving revision 1.31 diff -u -p -r1.31 netproc.c --- usr.sbin/acme-client/netproc.c 24 Aug 2021 10:07:30 - 1.31 +++ usr.sbin/acme-client/netproc.c 9 Nov 2022 18:11:17 - @@ -222,6 +222,7 @@ again: if ((st = http_head_get("Location", g->head, g->headsz)) == NULL) { warnx("redirect without location header"); + http_get_free(g); return -1; }
apm garbage stack value could be accessed
Hi tech, In case send or recv fail in send_command, reply.error is stack garbage. This wouldn't be possible if reply was zeroed. But checking for ret == 0 ensures that reply was fully written. OK? mbuhl Index: usr.sbin/apm/apm.c === RCS file: /cvs/src/usr.sbin/apm/apm.c,v retrieving revision 1.42 diff -u -p -r1.42 apm.c --- usr.sbin/apm/apm.c 10 Sep 2022 10:10:09 - 1.42 +++ usr.sbin/apm/apm.c 9 Nov 2022 17:29:03 - @@ -99,6 +99,8 @@ do_zzz(int fd, enum apm_action action) char *msg; int ret; + bzero(, sizeof reply); + switch (action) { case NONE: case SUSPEND: @@ -119,7 +121,7 @@ do_zzz(int fd, enum apm_action action) printf("%s...\n", msg); ret = send_command(fd, , ); - if (reply.error) + if (ret == 0 && reply.error) errx(1, "%s: %s", apm_state(reply.newstate), strerror(reply.error)); exit(ret); }
libcrypto potential memory leak in OBJ_NAME_add error path
In case lh_OBJ_NAME_insert returns NULL due to a failed malloc, onp is leaked in OBJ_NAME_add. Found by CodeChecker. OK? mbuhl Index: lib/libcrypto/objects/o_names.c === RCS file: /cvs/src/lib/libcrypto/objects/o_names.c,v retrieving revision 1.22 diff -u -p -r1.22 o_names.c --- lib/libcrypto/objects/o_names.c 29 Jan 2017 17:49:23 - 1.22 +++ lib/libcrypto/objects/o_names.c 8 Nov 2022 12:24:47 - @@ -196,6 +196,7 @@ OBJ_NAME_add(const char *name, int type, } free(ret); } else { + free(onp); if (lh_OBJ_NAME_error(names_lh)) { /* ERROR */ return (0);
Re: mbufs growing in 7.2
Hi Greg, Hi Joe, dlg@ hinted to me that the ring might overwrite it's own starting position with the current code. Does this help? mbuhl Index: dev/pci/if_igc.c === RCS file: /cvs/src/sys/dev/pci/if_igc.c,v retrieving revision 1.9 diff -u -p -r1.9 if_igc.c --- dev/pci/if_igc.c2 Jun 2022 07:41:17 - 1.9 +++ dev/pci/if_igc.c8 Nov 2022 10:35:39 - @@ -978,7 +978,7 @@ igc_start(struct ifqueue *ifq) mask = sc->num_tx_desc - 1; for (;;) { - if (free <= IGC_MAX_SCATTER) { + if (free <= IGC_MAX_SCATTER + 1) { ifq_set_oactive(ifq); break; } @@ -1005,6 +1005,7 @@ igc_start(struct ifqueue *ifq) /* Consume the first descriptor */ prod++; prod &= mask; + free--; } for (i = 0; i < map->dm_nsegs; i++) {
potential memory leak in bgpd rde_dump_ctx_new
Hi tech, Dear claudio, ctx might leak due to a prefix/rib dump new/subtree failing in calloc and then going to nomem in rde_dump_ctx_new. I am wondering if a similar fix is missing in rde_dump_done after the nomem label. thoughts? mbuhl Found by CodeChecker. Index: usr.sbin/bgpd/rde.c === RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v retrieving revision 1.578 diff -u -p -r1.578 rde.c --- usr.sbin/bgpd/rde.c 23 Sep 2022 15:49:20 - 1.578 +++ usr.sbin/bgpd/rde.c 7 Nov 2022 19:17:16 - @@ -2739,6 +2739,7 @@ rde_dump_ctx_new(struct ctl_show_rib_req error = CTL_RES_NOMEM; imsg_compose(ibuf_se_ctl, IMSG_CTL_RESULT, 0, pid, -1, , sizeof(error)); + free(ctx); return; }
bgpd and ldpd pfkey_reply might access uninitialized stack memory
Hi tech, Dear claudio, there could be an uninitialized stack memory access in pfkey_reply. It looks like this: struct sadb_msg hdr, *msg; ... do { rv = pfkey_read(sd, ); if (rv == -1) return (-1); } while (rv); if (hdr.sadb_msg_errno != 0) { And in the errno cases of pfkey_read, it doesn't initialize hdr but then breaks out of the do-while loop. OK? mbuhl Found by CodeChecker. Index: bgpd/pfkey.c === RCS file: /cvs/src/usr.sbin/bgpd/pfkey.c,v retrieving revision 1.67 diff -u -p -r1.67 pfkey.c --- bgpd/pfkey.c17 Aug 2022 15:15:26 - 1.67 +++ bgpd/pfkey.c7 Nov 2022 18:58:37 - @@ -423,7 +423,7 @@ pfkey_read(int sd, struct sadb_msg *h) if (recv(sd, , sizeof(hdr), MSG_PEEK) != sizeof(hdr)) { if (errno == EAGAIN || errno == EINTR) - return (0); + return (1); log_warn("pfkey peek"); return (-1); } @@ -439,7 +439,7 @@ pfkey_read(int sd, struct sadb_msg *h) /* not ours, discard */ if (read(sd, , sizeof(hdr)) == -1) { if (errno == EAGAIN || errno == EINTR) - return (0); + return (1); log_warn("pfkey read"); return (-1); } Index: ldpd/pfkey.c === RCS file: /cvs/src/usr.sbin/ldpd/pfkey.c,v retrieving revision 1.12 diff -u -p -r1.12 pfkey.c --- ldpd/pfkey.c23 Jan 2019 02:02:04 - 1.12 +++ ldpd/pfkey.c7 Nov 2022 19:00:01 - @@ -253,7 +253,7 @@ pfkey_read(int sd, struct sadb_msg *h) if (recv(sd, , sizeof(hdr), MSG_PEEK) != sizeof(hdr)) { if (errno == EAGAIN || errno == EINTR) - return (0); + return (1); log_warn("pfkey peek"); return (-1); } @@ -269,7 +269,7 @@ pfkey_read(int sd, struct sadb_msg *h) /* not ours, discard */ if (read(sd, , sizeof(hdr)) == -1) { if (errno == EAGAIN || errno == EINTR) - return (0); + return (1); log_warn("pfkey read"); return (-1); }
Re: em(4) IPv4, TCP, UDP checksum offloading
Looking for OKs. On Sat, Oct 15, 2022 at 10:01:53PM +0200, Moritz Buhl wrote: > With the previous diffs I am seeing sporadic connection problems in tcpbench > via IPv6 on Intel 82546GB. > The diff was too big anyways. Here is a smaller diff that introduces > checksum offloading for the controllers that use the advanced descriptors. > > I tested this diff on i350 and 82571EB, receive, send, forwarding, > icmp4, icmp6, tcp4, tcp6, udp4, udp6, also over vlan and veb. > > I am still looking for testing. Any feedback? > > mbuhl > > Index: dev/pci/if_em.c > === > RCS file: /cvs/src/sys/dev/pci/if_em.c,v > retrieving revision 1.362 > diff -u -p -r1.362 if_em.c > --- dev/pci/if_em.c 23 Jun 2022 09:38:28 - 1.362 > +++ dev/pci/if_em.c 15 Oct 2022 19:49:12 - > @@ -37,6 +37,8 @@ POSSIBILITY OF SUCH DAMAGE. > #include > #include > > +#include > + > /* > * Driver version > */ > @@ -278,6 +280,8 @@ void em_receive_checksum(struct em_softc >struct mbuf *); > u_intem_transmit_checksum_setup(struct em_queue *, struct mbuf *, > u_int, > u_int32_t *, u_int32_t *); > +u_intem_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, > u_int32_t *, > + u_int32_t *); > void em_iff(struct em_softc *); > void em_update_link_status(struct em_softc *); > int em_get_buf(struct em_queue *, int); > @@ -1220,10 +1224,9 @@ em_encap(struct em_queue *que, struct mb > BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); > } > > - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && > - sc->hw.mac_type != em_82576 && > - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && > - sc->hw.mac_type != em_i350) { > + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { > + used += em_tx_ctx_setup(que, m, head, _upper, _lower); > + } else if (sc->hw.mac_type >= em_82543) { > used += em_transmit_checksum_setup(que, m, head, > _upper, _lower); > } else { > @@ -1278,7 +1281,8 @@ em_encap(struct em_queue *que, struct mb > > #if NVLAN > 0 > /* Find out if we are in VLAN mode */ > - if (m->m_flags & M_VLANTAG) { > + if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 || > + sc->hw.mac_type > em_i210)) { > /* Set the VLAN id */ > desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); > > @@ -1964,17 +1968,16 @@ em_setup_interface(struct em_softc *sc) > ifp->if_capabilities = IFCAP_VLAN_MTU; > > #if NVLAN > 0 > - if (sc->hw.mac_type != em_82575 && sc->hw.mac_type != em_82580 && > - sc->hw.mac_type != em_82576 && > - sc->hw.mac_type != em_i210 && sc->hw.mac_type != em_i350) > - ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; > + ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; > #endif > > - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && > - sc->hw.mac_type != em_82576 && > - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && > - sc->hw.mac_type != em_i350) > + if (sc->hw.mac_type >= em_82543) { > ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; > + } > + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { > + ifp->if_capabilities |= IFCAP_CSUM_IPv4; > + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; > + } > > /* >* Specify the media types supported by this adapter and register > @@ -2389,6 +2392,108 @@ em_free_transmit_structures(struct em_so > 0, que->tx.sc_tx_dma.dma_map->dm_mapsize, > BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); > } > +} > + > +u_int > +em_tx_ctx_setup(struct em_queue *que, struct mbuf *mp, u_int head, > +u_int32_t *olinfo_status, u_int32_t *cmd_type_len) > +{ > + struct e1000_adv_tx_context_desc *TD; > + struct ether_header *eh = mtod(mp, struct ether_header *); > + struct mbuf *m; > + uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0; > + int off = 0,
Re: em(4) IPv4, TCP, UDP checksum offloading
With the previous diffs I am seeing sporadic connection problems in tcpbench via IPv6 on Intel 82546GB. The diff was too big anyways. Here is a smaller diff that introduces checksum offloading for the controllers that use the advanced descriptors. I tested this diff on i350 and 82571EB, receive, send, forwarding, icmp4, icmp6, tcp4, tcp6, udp4, udp6, also over vlan and veb. I am still looking for testing. Any feedback? mbuhl Index: dev/pci/if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.362 diff -u -p -r1.362 if_em.c --- dev/pci/if_em.c 23 Jun 2022 09:38:28 - 1.362 +++ dev/pci/if_em.c 15 Oct 2022 19:49:12 - @@ -37,6 +37,8 @@ POSSIBILITY OF SUCH DAMAGE. #include #include +#include + /* * Driver version */ @@ -278,6 +280,8 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); +u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, + u_int32_t *); void em_iff(struct em_softc *); void em_update_link_status(struct em_softc *); int em_get_buf(struct em_queue *, int); @@ -1220,10 +1224,9 @@ em_encap(struct em_queue *que, struct mb BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); } - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) { + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { + used += em_tx_ctx_setup(que, m, head, _upper, _lower); + } else if (sc->hw.mac_type >= em_82543) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); } else { @@ -1278,7 +1281,8 @@ em_encap(struct em_queue *que, struct mb #if NVLAN > 0 /* Find out if we are in VLAN mode */ - if (m->m_flags & M_VLANTAG) { + if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 || + sc->hw.mac_type > em_i210)) { /* Set the VLAN id */ desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); @@ -1964,17 +1968,16 @@ em_setup_interface(struct em_softc *sc) ifp->if_capabilities = IFCAP_VLAN_MTU; #if NVLAN > 0 - if (sc->hw.mac_type != em_82575 && sc->hw.mac_type != em_82580 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_i210 && sc->hw.mac_type != em_i350) - ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; + ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; #endif - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) + if (sc->hw.mac_type >= em_82543) { ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; + } + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { + ifp->if_capabilities |= IFCAP_CSUM_IPv4; + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; + } /* * Specify the media types supported by this adapter and register @@ -2389,6 +2392,108 @@ em_free_transmit_structures(struct em_so 0, que->tx.sc_tx_dma.dma_map->dm_mapsize, BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); } +} + +u_int +em_tx_ctx_setup(struct em_queue *que, struct mbuf *mp, u_int head, +u_int32_t *olinfo_status, u_int32_t *cmd_type_len) +{ + struct e1000_adv_tx_context_desc *TD; + struct ether_header *eh = mtod(mp, struct ether_header *); + struct mbuf *m; + uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0; + int off = 0, hoff; + uint8_t ipproto, iphlen; + + *olinfo_status = 0; + *cmd_type_len = 0; + TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head]; + +#if NVLAN > 0 + if (ISSET(mp->m_flags, M_VLANTAG)) { + uint16_t vtag = htole16(mp->m_pkthdr.ether_vtag); + vlan_macip_lens |= vtag << E1000_ADVTXD_VLAN_SHIFT; + *cmd_type_len |= E1000_ADVTXD_DCMD_VLE; + off = 1; + } +#endif + + vlan_macip_lens |= (sizeof(*eh) << E1000_ADVTXD_MACLEN_SHIFT); + + switch (ntohs(eh->ether_type)) { + case ETHERTYPE_IP: { + struct ip *ip; + + m = m_getptr(mp, sizeof(*eh), ); + ip = (struct ip *)(mtod(m, caddr_t) + hoff); + + iphlen = ip->ip_hl <<
Re: em(4) IPv4, TCP, UDP checksum offloading
Another mistake On Wed, Oct 12, 2022 at 01:45:30PM +0200, Moritz Buhl wrote: > On Tue, Oct 11, 2022 at 04:16:15PM +0100, Stuart Henderson wrote: > > On 2022/10/11 15:03, Moritz Buhl wrote: > > > Here is a new diff for checksum offloading (ipv4, udp, tcp) for em(4). > > > > > > The previous diff didn't implement hardware vlan tagging for >em82578 > > > which should result in variable ethernet header lengths and thus > > > wrong checksums inserted at wrong places. > > > > > > The diff below addresses this. > > > I would appreciate further testing reports with different controllers. > > > > > > mbuhl > > > > I tried this on my laptop which has I219-V em (I run it in a trunk > > with iwm). It breaks tx (packets don't show up on the other side). > > rx seems ok. > > The following diff will restrict the usage of the advanced > descriptors to 82575, 82576, i350 and i210, and fix what the > last diff broke for i219. > > Index: dev/pci/if_em.c > === > RCS file: /cvs/src/sys/dev/pci/if_em.c,v > retrieving revision 1.362 > diff -u -p -r1.362 if_em.c > --- dev/pci/if_em.c 23 Jun 2022 09:38:28 - 1.362 > +++ dev/pci/if_em.c 11 Oct 2022 16:05:43 - > @@ -37,6 +37,8 @@ POSSIBILITY OF SUCH DAMAGE. > #include > #include > > +#include > + > /* > * Driver version > */ > @@ -278,6 +280,8 @@ void em_receive_checksum(struct em_softc >struct mbuf *); > u_intem_transmit_checksum_setup(struct em_queue *, struct mbuf *, > u_int, > u_int32_t *, u_int32_t *); > +u_intem_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, > u_int32_t *, > + u_int32_t *); > void em_iff(struct em_softc *); > void em_update_link_status(struct em_softc *); > int em_get_buf(struct em_queue *, int); > @@ -1220,10 +1224,9 @@ em_encap(struct em_queue *que, struct mb > BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); > } > > - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && > - sc->hw.mac_type != em_82576 && > - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && > - sc->hw.mac_type != em_i350) { > + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { > + used += em_tx_ctx_setup(que, m, head, _upper, _lower); > + } else if (sc->hw.mac_type >= em_82543) { > used += em_transmit_checksum_setup(que, m, head, > _upper, _lower); > } else { > @@ -1278,7 +1281,7 @@ em_encap(struct em_queue *que, struct mb > > #if NVLAN > 0 > /* Find out if we are in VLAN mode */ > - if (m->m_flags & M_VLANTAG) { > + if (m->m_flags & M_VLANTAG && sc->hw.mac_type < em_82575) { I missed this spot, it should be if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 || sc->hw.mac_type > em_i210)) {
Re: em(4) IPv4, TCP, UDP checksum offloading
On Tue, Oct 11, 2022 at 04:16:15PM +0100, Stuart Henderson wrote: > On 2022/10/11 15:03, Moritz Buhl wrote: > > Here is a new diff for checksum offloading (ipv4, udp, tcp) for em(4). > > > > The previous diff didn't implement hardware vlan tagging for >em82578 > > which should result in variable ethernet header lengths and thus > > wrong checksums inserted at wrong places. > > > > The diff below addresses this. > > I would appreciate further testing reports with different controllers. > > > > mbuhl > > I tried this on my laptop which has I219-V em (I run it in a trunk > with iwm). It breaks tx (packets don't show up on the other side). > rx seems ok. The following diff will restrict the usage of the advanced descriptors to 82575, 82576, i350 and i210, and fix what the last diff broke for i219. Index: dev/pci/if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.362 diff -u -p -r1.362 if_em.c --- dev/pci/if_em.c 23 Jun 2022 09:38:28 - 1.362 +++ dev/pci/if_em.c 11 Oct 2022 16:05:43 - @@ -37,6 +37,8 @@ POSSIBILITY OF SUCH DAMAGE. #include #include +#include + /* * Driver version */ @@ -278,6 +280,8 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); +u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, + u_int32_t *); void em_iff(struct em_softc *); void em_update_link_status(struct em_softc *); int em_get_buf(struct em_queue *, int); @@ -1220,10 +1224,9 @@ em_encap(struct em_queue *que, struct mb BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); } - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) { + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { + used += em_tx_ctx_setup(que, m, head, _upper, _lower); + } else if (sc->hw.mac_type >= em_82543) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); } else { @@ -1278,7 +1281,7 @@ em_encap(struct em_queue *que, struct mb #if NVLAN > 0 /* Find out if we are in VLAN mode */ - if (m->m_flags & M_VLANTAG) { + if (m->m_flags & M_VLANTAG && sc->hw.mac_type < em_82575) { /* Set the VLAN id */ desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); @@ -1964,17 +1967,14 @@ em_setup_interface(struct em_softc *sc) ifp->if_capabilities = IFCAP_VLAN_MTU; #if NVLAN > 0 - if (sc->hw.mac_type != em_82575 && sc->hw.mac_type != em_82580 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_i210 && sc->hw.mac_type != em_i350) - ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; + ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; #endif - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) + if (sc->hw.mac_type >= em_82543) { + ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; + } /* * Specify the media types supported by this adapter and register @@ -2391,6 +2391,108 @@ em_free_transmit_structures(struct em_so } } +u_int +em_tx_ctx_setup(struct em_queue *que, struct mbuf *mp, u_int head, +u_int32_t *olinfo_status, u_int32_t *cmd_type_len) +{ + struct e1000_adv_tx_context_desc *TD; + struct ether_header *eh = mtod(mp, struct ether_header *); + struct mbuf *m; + uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0; + int off = 0, hoff; + uint8_t ipproto, iphlen; + + *olinfo_status = 0; + *cmd_type_len = 0; + TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head]; + +#if NVLAN > 0 + if (ISSET(mp->m_flags, M_VLANTAG)) { + uint16_t vtag = htole16(mp->m_pkthdr.ether_vtag); + vlan
em(4) IPv4, TCP, UDP checksum offloading
Here is a new diff for checksum offloading (ipv4, udp, tcp) for em(4). The previous diff didn't implement hardware vlan tagging for >em82578 which should result in variable ethernet header lengths and thus wrong checksums inserted at wrong places. The diff below addresses this. I would appreciate further testing reports with different controllers. mbuhl Index: dev/pci/if_em.c === RCS file: /mount/openbsd/cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.362 diff -u -p -r1.362 if_em.c --- dev/pci/if_em.c 23 Jun 2022 09:38:28 - 1.362 +++ dev/pci/if_em.c 10 Oct 2022 18:01:19 - @@ -37,6 +37,8 @@ POSSIBILITY OF SUCH DAMAGE. #include #include +#include + /* * Driver version */ @@ -278,6 +280,8 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); +u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, + u_int32_t *); void em_iff(struct em_softc *); void em_update_link_status(struct em_softc *); int em_get_buf(struct em_queue *, int); @@ -1220,10 +1224,9 @@ em_encap(struct em_queue *que, struct mb BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); } - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) { + if (sc->hw.mac_type >= em_82575) { + used += em_tx_ctx_setup(que, m, head, _upper, _lower); + } else if (sc->hw.mac_type >= em_82543) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); } else { @@ -1278,7 +1281,7 @@ em_encap(struct em_queue *que, struct mb #if NVLAN > 0 /* Find out if we are in VLAN mode */ - if (m->m_flags & M_VLANTAG) { + if (m->m_flags & M_VLANTAG && sc->hw.mac_type < em_82575) { /* Set the VLAN id */ desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); @@ -1964,17 +1967,14 @@ em_setup_interface(struct em_softc *sc) ifp->if_capabilities = IFCAP_VLAN_MTU; #if NVLAN > 0 - if (sc->hw.mac_type != em_82575 && sc->hw.mac_type != em_82580 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_i210 && sc->hw.mac_type != em_i350) - ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; + ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; #endif - if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) + if (sc->hw.mac_type >= em_82543) { + ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; + } /* * Specify the media types supported by this adapter and register @@ -2391,6 +2391,108 @@ em_free_transmit_structures(struct em_so } } +u_int +em_tx_ctx_setup(struct em_queue *que, struct mbuf *mp, u_int head, +u_int32_t *olinfo_status, u_int32_t *cmd_type_len) +{ + struct e1000_adv_tx_context_desc *TD; + struct ether_header *eh = mtod(mp, struct ether_header *); + struct mbuf *m; + uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0; + int off = 0, hoff; + uint8_t ipproto, iphlen; + + *olinfo_status = 0; + *cmd_type_len = 0; + TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head]; + +#if NVLAN > 0 + if (ISSET(mp->m_flags, M_VLANTAG)) { + uint16_t vtag = htole16(mp->m_pkthdr.ether_vtag); + vlan_macip_lens |= vtag << E1000_ADVTXD_VLAN_SHIFT; + *cmd_type_len |= E1000_ADVTXD_DCMD_VLE; + off = 1; + } +#endif + + vlan_macip_lens |= (sizeof(*eh) << E1000_ADVTXD_MACLEN_SHIFT); + + switch (ntohs(eh->ether_type)) { + case ETHERTYPE_IP: { + struct ip *ip; + + m = m_getptr(mp, sizeof(*eh), ); + ip = (struct ip *)(mtod(m, caddr_t) + hoff); + + iphlen = ip->ip_hl << 2; + ipproto = ip->ip_p; + + type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4; + if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) { + *olinfo_status |= E1000_TXD_POPTS_IXSM << 8; + off = 1; + } + + break; + } +#ifdef INET6 +
Re: semctl, semget and semop possible panics
On Fri, Sep 16, 2022 at 11:38:49PM +0200, Moritz Buhl wrote: > Syzkaller found a panic with sysv semaphores: > https://syzkaller.appspot.com/bug?id=f7e8e2822779918d7a23d9ff9d7c0a3779c00a46 The last diff would introduced two memory leaks, seminfo.semopm is the only seminfo value that can shrink via sysctl, so we have to recheck it. semtot also can change during a sleep. These problems are now addressed below. I am still uncertain if anything relevant could change during a sleep in copyout for GETALL. The diff is growing in size, would it make sense to look at semctl separately from semop and semget? Except for the fuzzer, I am not aware of any code that makes extensive use of the related syscalls. But they are cute. mbuhl Index: kern/sysv_sem.c === RCS file: /cvs/src/sys/kern/sysv_sem.c,v retrieving revision 1.62 diff -u -p -r1.62 sysv_sem.c --- kern/sysv_sem.c 16 Sep 2022 15:57:23 - 1.62 +++ kern/sysv_sem.c 17 Sep 2022 15:12:32 - @@ -236,7 +236,7 @@ sys___semctl(struct proc *p, void *v, re union semun arg, *uarg = SCARG(uap, arg); struct semid_ds sbuf; struct semid_ds *semaptr; - unsigned short *semval = NULL; + unsigned short *semval = NULL, nsems; int i, ix, error; switch (cmd) { @@ -248,6 +248,9 @@ sys___semctl(struct proc *p, void *v, re if ((error = copyin(uarg, , sizeof(union semun return (error); } + if (cmd == IPC_SET) + if ((error = copyin(arg.buf, , sizeof(sbuf + return (error); DPRINTF(("call to semctl(%d, %d, %d, %p)\n", semid, semnum, cmd, uarg)); @@ -255,6 +258,7 @@ sys___semctl(struct proc *p, void *v, re if (ix < 0 || ix >= seminfo.semmni) return (EINVAL); +again: if ((semaptr = sema[ix]) == NULL || semaptr->sem_perm.seq != IPCID_TO_SEQ(semid)) return (EINVAL); @@ -277,8 +281,6 @@ sys___semctl(struct proc *p, void *v, re case IPC_SET: if ((error = ipcperm(cred, >sem_perm, IPC_M))) return (error); - if ((error = copyin(arg.buf, , sizeof(sbuf))) != 0) - return (error); semaptr->sem_perm.uid = sbuf.sem_perm.uid; semaptr->sem_perm.gid = sbuf.sem_perm.gid; semaptr->sem_perm.mode = (semaptr->sem_perm.mode & ~0777) | @@ -319,11 +321,22 @@ sys___semctl(struct proc *p, void *v, re break; case GETALL: + nsems = semaptr->sem_nsems; + semval = mallocarray(nsems, sizeof(arg.array[0]), + M_TEMP, M_WAITOK); + if (semaptr != sema[ix] || + semaptr->sem_perm.seq != IPCID_TO_SEQ(semid) || + semaptr->sem_nsems != nsems) { + free(semval, M_TEMP, nsems * sizeof(arg.array[0])); + goto again; + } if ((error = ipcperm(cred, >sem_perm, IPC_R))) - return (error); - for (i = 0; i < semaptr->sem_nsems; i++) { - error = copyout(>sem_base[i].semval, - [i], sizeof(arg.array[0])); + goto error; + for (i = 0; i < nsems; i++) + semval[i] = semaptr->sem_base[i].semval; + for (i = 0; i < nsems; i++) { + error = copyout([i], [i], + sizeof(arg.array[0])); if (error != 0) break; } @@ -350,11 +363,18 @@ sys___semctl(struct proc *p, void *v, re break; case SETALL: - if ((error = ipcperm(cred, >sem_perm, IPC_W))) - return (error); + nsems = semaptr->sem_nsems; semval = mallocarray(semaptr->sem_nsems, sizeof(arg.array[0]), M_TEMP, M_WAITOK); - for (i = 0; i < semaptr->sem_nsems; i++) { + if (semaptr != sema[ix] || + semaptr->sem_perm.seq != IPCID_TO_SEQ(semid) || + semaptr->sem_nsems != nsems) { + free(semval, M_TEMP, nsems * sizeof(arg.array[0])); + goto again; + } + if ((error = ipcperm(cred, >sem_perm, IPC_W))) + goto error; + for (i = 0; i < nsems; i++) { error = copyin([i], [i], sizeof(arg.array[0])); if (error != 0) @@ -364,7 +384,15 @@ sys___semctl(struct proc *p, void *v, re goto error; }
semctl, semget and semop possible panics
Syzkaller found a panic with sysv semaphores: https://syzkaller.appspot.com/bug?id=f7e8e2822779918d7a23d9ff9d7c0a3779c00a46 The problem is that the code uses a few globals (sema, semtot, seminfo) that can change during sleeping points where the kernel lock is released. Usually copyin, copyout, malloc and pool_get (with WAIT_OK). It is especially bad with sema, whihc is a list of semaphore pointers. The crash syzkaller found is likely this: Another process removes the current semaphore while the initial process is sleeping in one of the modification commands. Then the process wakes up and writes to the unassigned pool region and the next pool_get call will notice this and panic. To fix this, I move the sleeping points up and move the icpperm checks after the sleeping points. For the theoretical case, that the sleep of the modifying command takes longer than removing and reallocationg the semaphore, the permissions must be rechecked. And if the number of semaphores changed, we better start over too. To confirm that the attached diff fixes the crash, instructions on running syzkaller reproducers follow. pkg_add gmake go git git clone https://github.com/google/syzkaller/ cd syzkaller && gmake syzkaller_home=$PWD cd /tmp && ln -s $syzkaller_home/bin/openbsd_amd64/syz-* . mount -u -o ro / ./syz-execprog -repeat 1000 -procs 4 repro.syz where repro.syz is ftp -o repro.syz \ 'https://syzkaller.appspot.com/text?tag=ReproSyz=175f4ff208' kind regards, mbuhl Index: kern/sysv_sem.c === RCS file: /cvs/src/sys/kern/sysv_sem.c,v retrieving revision 1.62 diff -u -p -r1.62 sysv_sem.c --- kern/sysv_sem.c 16 Sep 2022 15:57:23 - 1.62 +++ kern/sysv_sem.c 16 Sep 2022 21:29:36 - @@ -236,7 +236,7 @@ sys___semctl(struct proc *p, void *v, re union semun arg, *uarg = SCARG(uap, arg); struct semid_ds sbuf; struct semid_ds *semaptr; - unsigned short *semval = NULL; + unsigned short *semval = NULL, nsems; int i, ix, error; switch (cmd) { @@ -248,6 +248,9 @@ sys___semctl(struct proc *p, void *v, re if ((error = copyin(uarg, , sizeof(union semun return (error); } + if (cmd == IPC_SET) + if ((error = copyin(arg.buf, , sizeof(sbuf + return (error); DPRINTF(("call to semctl(%d, %d, %d, %p)\n", semid, semnum, cmd, uarg)); @@ -255,6 +258,7 @@ sys___semctl(struct proc *p, void *v, re if (ix < 0 || ix >= seminfo.semmni) return (EINVAL); +again: if ((semaptr = sema[ix]) == NULL || semaptr->sem_perm.seq != IPCID_TO_SEQ(semid)) return (EINVAL); @@ -277,8 +281,6 @@ sys___semctl(struct proc *p, void *v, re case IPC_SET: if ((error = ipcperm(cred, >sem_perm, IPC_M))) return (error); - if ((error = copyin(arg.buf, , sizeof(sbuf))) != 0) - return (error); semaptr->sem_perm.uid = sbuf.sem_perm.uid; semaptr->sem_perm.gid = sbuf.sem_perm.gid; semaptr->sem_perm.mode = (semaptr->sem_perm.mode & ~0777) | @@ -319,11 +321,20 @@ sys___semctl(struct proc *p, void *v, re break; case GETALL: + nsems = semaptr->sem_nsems; + semval = mallocarray(nsems, sizeof(arg.array[0]), + M_TEMP, M_WAITOK); + if (semaptr != sema[ix] || semaptr->sem_nsems != nsems) { + free(semval, M_TEMP, nsems * sizeof(arg.array[0])); + goto again; + } if ((error = ipcperm(cred, >sem_perm, IPC_R))) - return (error); - for (i = 0; i < semaptr->sem_nsems; i++) { - error = copyout(>sem_base[i].semval, - [i], sizeof(arg.array[0])); + goto error; + for (i = 0; i < nsems; i++) + semval[i] = semaptr->sem_base[i].semval; + for (i = 0; i < nsems; i++) { + error = copyout([i], [i], + sizeof(arg.array[0])); if (error != 0) break; } @@ -350,11 +361,16 @@ sys___semctl(struct proc *p, void *v, re break; case SETALL: - if ((error = ipcperm(cred, >sem_perm, IPC_W))) - return (error); + nsems = semaptr->sem_nsems; semval = mallocarray(semaptr->sem_nsems, sizeof(arg.array[0]), M_TEMP, M_WAITOK); - for (i = 0; i < semaptr->sem_nsems; i++) { + if (semaptr != sema[ix] || semaptr->sem_nsems != nsems) { + free(semval, M_TEMP, nsems * sizeof(arg.array[0])); +
Re: add sendmmsg and recvmmsg systemcalls
On Wed, Aug 31, 2022 at 05:44:31PM -0900, Philip Guenther wrote: > kdump.c will need at least a SYS_recvmmsg line in the big table, and if you > do a ktrmmsghdr() bit in the kernel a matching decoder will be needed in > kdump. Here is a new diff for kdump. OK? mbuhl Index: usr.bin/kdump/kdump.c === RCS file: /cvs/src/usr.bin/kdump/kdump.c,v retrieving revision 1.149 diff -u -p -r1.149 kdump.c --- usr.bin/kdump/kdump.c 20 Jul 2022 05:56:36 - 1.149 +++ usr.bin/kdump/kdump.c 7 Sep 2022 11:14:19 - @@ -720,6 +720,8 @@ static const formatter scargs[][8] = { [SYS_ptrace] = { Ptracedecode, Ppid_t, Pptr, Pdecint }, [SYS_recvmsg] = { Pfd, Pptr, Sendrecvflagsname }, [SYS_sendmsg] = { Pfd, Pptr, Sendrecvflagsname }, +[SYS_recvmmsg] = { Pfd, Pptr, Pucount, Sendrecvflagsname, Pptr }, +[SYS_sendmmsg] = { Pfd, Pptr, Pucount, Sendrecvflagsname }, [SYS_recvfrom] = { Pfd, Pptr, Pbigsize, Sendrecvflagsname }, [SYS_accept] = { Pfd, Pptr, Pptr }, [SYS_getpeername] = { Pfd, Pptr, Pptr }, Index: usr.bin/kdump/ktrstruct.c === RCS file: /cvs/src/usr.bin/kdump/ktrstruct.c,v retrieving revision 1.29 diff -u -p -r1.29 ktrstruct.c --- usr.bin/kdump/ktrstruct.c 21 Dec 2020 07:47:37 - 1.29 +++ usr.bin/kdump/ktrstruct.c 7 Sep 2022 10:00:23 - @@ -398,6 +398,18 @@ ktrquota(const struct dqblk *quota) } static void +ktrmmsghdr(const struct mmsghdr *mmsg) +{ + printf("struct mmsghdr { msg_hdr = { name=%p, namelen=%u, " + "iov=%p, iovlen=%u, control=%p, controllen=%u, flags=", + mmsg->msg_hdr.msg_name, mmsg->msg_hdr.msg_namelen, + mmsg->msg_hdr.msg_iov, mmsg->msg_hdr.msg_iovlen, + mmsg->msg_hdr.msg_control, mmsg->msg_hdr.msg_controllen); + sendrecvflagsname(mmsg->msg_hdr.msg_flags); + printf(" }, msg_len = %u }\n", mmsg->msg_len); +} + +static void ktrmsghdr(const struct msghdr *msg) { printf("struct msghdr { name=%p, namelen=%u, iov=%p, iovlen=%u," @@ -649,6 +661,13 @@ ktrstruct(char *buf, size_t buflen) goto invalid; memcpy(, data, datalen); ktrmsghdr(); + } else if (strcmp(name, "mmsghdr") == 0) { + struct mmsghdr mmsg; + + if (datalen != sizeof(mmsg)) + goto invalid; + memcpy(, data, datalen); + ktrmmsghdr(); } else if (strcmp(name, "iovec") == 0) { if (datalen % sizeof(struct iovec)) goto invalid;
Re: add sendmmsg and recvmmsg systemcalls
On Tue, Sep 06, 2022 at 04:00:39PM +0200, Moritz Buhl wrote: > Hi, > here is the most recent diff for the libc part of send and recvmmsg. > This requires a libc minor bump and therefore should be coordinated > after snapshots are building normally again. > > To my understanding the minor bump itself should not cause problems > in ports anymore. miod reminded me to also bump librthread as stated in libc/shlib_version. Index: lib/libc/Symbols.list === RCS file: /cvs/src/lib/libc/Symbols.list,v retrieving revision 1.75 diff -u -p -r1.75 Symbols.list --- lib/libc/Symbols.list 2 Aug 2022 16:45:00 - 1.75 +++ lib/libc/Symbols.list 6 Sep 2022 09:36:40 - @@ -175,6 +175,7 @@ _thread_sys_readlinkat _thread_sys_readv _thread_sys_reboot _thread_sys_recvfrom +_thread_sys_recvmmsg _thread_sys_recvmsg _thread_sys_rename _thread_sys_renameat @@ -184,6 +185,7 @@ _thread_sys_sched_yield _thread_sys_select _thread_sys_semget _thread_sys_semop +_thread_sys_sendmmsg _thread_sys_sendmsg _thread_sys_sendsyslog _thread_sys_sendto @@ -372,6 +374,7 @@ readlinkat readv reboot recvfrom +recvmmsg recvmsg rename renameat @@ -383,6 +386,7 @@ select semctl semget semop +sendmmsg sendmsg sendsyslog sendto Index: lib/libc/shlib_version === RCS file: /cvs/src/lib/libc/shlib_version,v retrieving revision 1.210 diff -u -p -r1.210 shlib_version --- lib/libc/shlib_version 2 Jun 2021 07:29:03 - 1.210 +++ lib/libc/shlib_version 6 Sep 2022 13:42:09 - @@ -1,4 +1,4 @@ major=96 -minor=1 +minor=2 # note: If changes were made to include/thread_private.h or if system calls # were added/changed then librthread/shlib_version must also be updated. Index: lib/libc/hidden/sys/socket.h === RCS file: /cvs/src/lib/libc/hidden/sys/socket.h,v retrieving revision 1.4 diff -u -p -r1.4 socket.h --- lib/libc/hidden/sys/socket.h7 May 2016 19:05:22 - 1.4 +++ lib/libc/hidden/sys/socket.h6 Sep 2022 13:41:53 - @@ -33,9 +33,11 @@ PROTO_NORMAL(listen); PROTO_NORMAL(recv); PROTO_CANCEL(recvfrom); PROTO_CANCEL(recvmsg); +PROTO_CANCEL(recvmmsg); PROTO_NORMAL(send); -PROTO_CANCEL(sendmsg); PROTO_CANCEL(sendto); +PROTO_CANCEL(sendmsg); +PROTO_CANCEL(sendmmsg); PROTO_NORMAL(setrtable); PROTO_NORMAL(setsockopt); PROTO_NORMAL(shutdown); Index: lib/libc/sys/Makefile.inc === RCS file: /cvs/src/lib/libc/sys/Makefile.inc,v retrieving revision 1.163 diff -u -p -r1.163 Makefile.inc --- lib/libc/sys/Makefile.inc 17 Jul 2022 03:04:27 - 1.163 +++ lib/libc/sys/Makefile.inc 6 Sep 2022 13:41:53 - @@ -34,8 +34,8 @@ CANCEL= accept accept4 \ nanosleep \ open openat \ poll ppoll pread preadv pselect pwrite pwritev \ - read readv recvfrom recvmsg \ - select sendmsg sendto \ + read readv recvfrom recvmsg recvmmsg \ + select sendto sendmsg sendmmsg \ wait4 write writev SRCS+= ${CANCEL:%=w_%.c} Index: lib/libc/sys/recv.2 === RCS file: /cvs/src/lib/libc/sys/recv.2,v retrieving revision 1.48 diff -u -p -r1.48 recv.2 --- lib/libc/sys/recv.2 21 Nov 2021 23:44:55 - 1.48 +++ lib/libc/sys/recv.2 6 Sep 2022 13:42:12 - @@ -46,15 +46,35 @@ .Fn recvfrom "int s" "void *buf" "size_t len" "int flags" "struct sockaddr *from" "socklen_t *fromlen" .Ft ssize_t .Fn recvmsg "int s" "struct msghdr *msg" "int flags" +.Ft int +.Fn recvmmsg "int s" "struct mmsghdr *mmsg" "unsigned int vlen" "int flags" "struct timespec *timeout" .Sh DESCRIPTION -.Fn recvfrom +.Fn recv , +.Fn recvfrom , +.Fn recvmsg , and -.Fn recvmsg +.Fn recvmmsg are used to receive messages from a socket, -.Fa s , -and may be used to receive +.Fa s . +.Fn recv +is normally used only on a +.Em connected +socket (see +.Xr connect 2 ). +.Fn recvfrom , +.Fn recvmsg , +and +.Fn recvmmsg +may be used to receive data on a socket whether or not it is connection-oriented. .Pp +.Fn recv +is identical to +.Fn recvfrom +with a null +.Fa from +parameter. +.Pp If .Fa from is non-null and the socket is not connection-oriented, @@ -66,25 +86,6 @@ the buffer associated with and modified on return to indicate the actual size of the address stored there. .Pp -The -.Fn recv -call is normally used only on a -.Em connected -socket (see -.Xr connect 2 ) -and is identical to -.Fn recvfrom -with a null -.Fa from -parameter. -.Pp -On successful completion, all three routines return the number of -message bytes read. -If a message is too long to fit in the supplied -buffer, ex
Re: add sendmmsg and recvmmsg systemcalls
receive call waits for a message to arrive, unless the socket is nonblocking (see @@ -158,6 +159,8 @@ The .Dv MSG_CMSG_CLOEXEC requests that any file descriptors received as ancillary data with .Fn recvmsg +and +.Fn recvmmsg (see below) have their close-on-exec flag set. .Pp @@ -249,13 +252,67 @@ Indicates that the packet was received a .It Dv MSG_MCAST Indicates that the packet was received as multicast. .El +.Pp +The +.Fn recvmmsg +call uses an array of the +.Fa mmsghdr +structure of length +.Fa vlen +to group multiple +.Fa msghdr +structures into a single system call. +.Fa vlen +is capped at maximum +.Dv 1024 +messages that are received in a single call. +The +.Fa flags +field allows setting +.Dv MSG_WAITFORONE +to wait for one +.Fa msghdr , +and set +.Dv MSG_DONTWAIT +for all subsequent messages. +A provided +.Fa timeout +limits the time spent in the function but it does not limit the +time spent in lower parts of the kernel. +.Pp +The +.Fa mmsghdr +structure has the following form, as defined in +.In sys/socket.h : +.Bd -literal +struct mmsghdr { + struct msghdr msg_hdr; + unsigned int msg_len; +}; +.Ed +.Pp +Here +.Fa msg_len +indicated the number of bytes received for each +.Fa msg_hdr +member. .Sh RETURN VALUES -These calls return the number of bytes received, or \-1 if an error occurred. -.Sh ERRORS +The .Fn recv , .Fn recvfrom , and .Fn recvmsg +calls return the number of bytes received, or \-1 if an error occurred. +The +.Fn recvmmsg +call returns the number of messages received, or \-1 +if an error occurred before the first message has been received. +.Sh ERRORS +.Fn recv , +.Fn recvfrom , +.Fn recvmsg , +and +.Fn recvmmsg fail if: .Bl -tag -width "[EHOSTUNREACH]" .It Bq Er EBADF @@ -310,6 +367,8 @@ was larger than .Pp And .Fn recvmsg +and +.Fn recvmmsg may return one of the following errors: .Bl -tag -width Er .It Bq Er EINVAL @@ -364,6 +423,10 @@ The .Fn recv function call appeared in .Bx 4.1c . +The +.Fn recvmmsg +syscall first appeared in Linux 2.6.33 and was added to +.Ox 7.2 . .Sh CAVEATS Calling .Fn recvmsg Index: lib/libc/sys/send.2 === RCS file: /cvs/src/lib/libc/sys/send.2,v retrieving revision 1.34 diff -u -p -r1.34 send.2 --- lib/libc/sys/send.2 11 Jan 2019 06:10:13 - 1.34 +++ lib/libc/sys/send.2 5 Sep 2022 14:59:10 - @@ -36,7 +36,8 @@ .Sh NAME .Nm send , .Nm sendto , -.Nm sendmsg +.Nm sendmsg , +.Nm sendmmsg .Nd send a message from a socket .Sh SYNOPSIS .In sys/socket.h @@ -46,19 +47,23 @@ .Fn sendto "int s" "const void *msg" "size_t len" "int flags" "const struct sockaddr *to" "socklen_t tolen" .Ft ssize_t .Fn sendmsg "int s" "const struct msghdr *msg" "int flags" +.Ft int +.Fn sendmmsg "int s" "const struct mmsghdr *mmsg" "unsigned int vlen" "int flags" .Sh DESCRIPTION .Fn send , .Fn sendto , +.Fn sendmsg , and -.Fn sendmsg +.Fn sendmmsg are used to transmit a message to another socket. .Fn send may be used only when the socket is in a .Em connected state, while -.Fn sendto +.Fn sendto , +.Fn sendmsg , and -.Fn sendmsg +.Fn sendmmsg may be used at any time. .Pp The address of the target is given by @@ -127,10 +132,21 @@ See .Xr recv 2 for a description of the .Fa msghdr -structure. +and +.Fa mmsghdr +structures. .Sh RETURN VALUES -The call returns the number of characters sent, or \-1 +The +.Fn send , +.Fn sendto , +and +.Fn sendmsg +calls return the number of characters sent, or \-1 if an error occurred. +The +.Fn sendmmsg +call returns the number of messages sent, or \-1 +if an error occurred before the first message has been sent. .Sh ERRORS .Fn send , .Fn sendto , @@ -267,3 +283,7 @@ The .Fn send function call appeared in .Bx 4.1c . +The +.Fn sendmmsg +syscall first appeared in Linux 3.0 and was added to +.Ox 7.2 . Index: lib/libc/sys/w_recvmmsg.c ======= RCS file: lib/libc/sys/w_recvmmsg.c diff -N lib/libc/sys/w_recvmmsg.c --- /dev/null 1 Jan 1970 00:00:00 - +++ lib/libc/sys/w_recvmmsg.c 6 Sep 2022 09:42:08 - @@ -0,0 +1,32 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2022 Moritz Buhl + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF
Re: add sendmmsg and recvmmsg systemcalls
Here is an updated version of the kernel part for sendmmsg. mbuhl Index: sys/kern/syscalls.master === RCS file: /mount/openbsd/cvs/src/sys/kern/syscalls.master,v retrieving revision 1.230 diff -u -p -r1.230 syscalls.master --- sys/kern/syscalls.master2 Sep 2022 13:18:06 - 1.230 +++ sys/kern/syscalls.master2 Sep 2022 20:34:15 - @@ -247,7 +247,9 @@ 116STD NOLOCK { int sys_recvmmsg(int s, struct mmsghdr *mmsg, \ unsigned int vlen, unsigned int flags, \ struct timespec *timeout); } -117UNIMPL sendmmsg +117STD NOLOCK { int sys_sendmmsg(int s, \ + struct mmsghdr *mmsg, unsigned int vlen, \ + unsigned int flags); } 118STD { int sys_getsockopt(int s, int level, int name, \ void *val, socklen_t *avalsize); } 119STD { int sys_thrkill(pid_t tid, int signum, void *tcb); } Index: sys/kern/uipc_syscalls.c === RCS file: /mount/openbsd/cvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.202 diff -u -p -r1.202 uipc_syscalls.c --- sys/kern/uipc_syscalls.c2 Sep 2022 13:18:06 - 1.202 +++ sys/kern/uipc_syscalls.c2 Sep 2022 23:26:08 - @@ -606,6 +606,92 @@ done: } int +sys_sendmmsg(struct proc *p, void *v, register_t *retval) +{ + struct sys_sendmmsg_args /* { + syscallarg(int) s; + syscallarg(struct mmsghdr *)mmsg; + syscallarg(unsigned int)vlen; + syscallarg(unsigned int)flags; + } */ *uap = v; + struct mmsghdr mmsg, *mmsgp; + struct iovec aiov[UIO_SMALLIOV], *iov = aiov, *uiov; + size_t iovlen = UIO_SMALLIOV; + register_t retsnd; + unsigned int vlen, dgrams; + int error = 0; + + /* Arbitrarily capped at 1024 datagrams. */ + vlen = SCARG(uap, vlen); + if (vlen > 1024) + vlen = 1024; + + mmsgp = SCARG(uap, mmsg); + for (dgrams = 0; dgrams < vlen; dgrams++) { + error = copyin([dgrams], , sizeof(mmsg)); + if (error) + break; + +#ifdef KTRACE + if (KTRPOINT(p, KTR_STRUCT)) + ktrmmsghdr(p, ); +#endif + + if (mmsg.msg_hdr.msg_iovlen > IOV_MAX) { + error = EMSGSIZE; + break; + } + + if (mmsg.msg_hdr.msg_iovlen > iovlen) { + if (iov != aiov) + free(iov, M_IOV, iovlen * + sizeof(struct iovec)); + + iovlen = mmsg.msg_hdr.msg_iovlen; + iov = mallocarray(iovlen, sizeof(struct iovec), + M_IOV, M_WAITOK); + } + + if (mmsg.msg_hdr.msg_iovlen > 0) { + error = copyin(mmsg.msg_hdr.msg_iov, iov, + mmsg.msg_hdr.msg_iovlen * sizeof(struct iovec)); + if (error) + break; + } + +#ifdef KTRACE + if (mmsg.msg_hdr.msg_iovlen && KTRPOINT(p, KTR_STRUCT)) + ktriovec(p, iov, mmsg.msg_hdr.msg_iovlen); +#endif + + uiov = mmsg.msg_hdr.msg_iov; + mmsg.msg_hdr.msg_iov = iov; + mmsg.msg_hdr.msg_flags = 0; + + error = sendit(p, SCARG(uap, s), _hdr, + SCARG(uap, flags), ); + if (error) + break; + + mmsg.msg_hdr.msg_iov = uiov; + mmsg.msg_len = retsnd; + + error = copyout(, [dgrams], sizeof(mmsg)); + if (error) + break; + } + + if (iov != aiov) + free(iov, M_IOV, sizeof(struct iovec) * iovlen); + + *retval = dgrams; + + if (dgrams) + return 0; + return error; +} + +int sendit(struct proc *p, int s, struct msghdr *mp, int flags, register_t *retsize) { struct file *fp; Index: sys/sys/socket.h === RCS file: /mount/openbsd/cvs/src/sys/sys/socket.h,v retrieving revision 1.103 diff -u -p -r1.103 socket.h --- sys/sys/socket.h2 Sep 2022 13:18:07 - 1.103 +++ sys/sys/socket.h2 Sep 2022 22:31:09 - @@ -579,6 +579,7 @@ ssize_t send(int, const void *, size_t, ssize_tsendto(int, const void *, size_t, int, const struct sockaddr *, socklen_t); ssize_tsendmsg(int, const struct msghdr *, int); +intsendmmsg(int, struct mmsghdr *, unsigned int, unsigned int); intsetsockopt(int, int, int, const void *, socklen_t); intshutdown(int, int); int
Re: add sendmmsg and recvmmsg systemcalls
I addressed your concerns as well as these of jca, just the kernel part (and the new ktrace stuff) below. One minor thing: I didn't see any kdump output where one struct was contained in another one but I am printing it like ddb would so I guess it should be fine. Index: kern/syscalls.master === RCS file: /mount/openbsd/cvs/src/sys/kern/syscalls.master,v retrieving revision 1.229 diff -u -p -r1.229 syscalls.master --- kern/syscalls.master1 Aug 2022 14:56:59 - 1.229 +++ kern/syscalls.master1 Sep 2022 14:52:47 - @@ -244,8 +244,10 @@ const char *permissions); } 115STD { int sys___realpath(const char *pathname, \ char *resolved); } -116OBSOL t32_gettimeofday -117OBSOL t32_getrusage +116STD NOLOCK { int sys_recvmmsg(int s, struct mmsghdr *mmsg, \ + unsigned int vlen, unsigned int flags, \ + struct timespec *timeout); } +117UNIMPL sendmmsg 118STD { int sys_getsockopt(int s, int level, int name, \ void *val, socklen_t *avalsize); } 119STD { int sys_thrkill(pid_t tid, int signum, void *tcb); } Index: kern/uipc_syscalls.c === RCS file: /mount/openbsd/cvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.201 diff -u -p -r1.201 uipc_syscalls.c --- kern/uipc_syscalls.c14 Aug 2022 01:58:28 - 1.201 +++ kern/uipc_syscalls.c1 Sep 2022 14:37:26 - @@ -805,6 +805,140 @@ done: } int +sys_recvmmsg(struct proc *p, void *v, register_t *retval) +{ + struct sys_recvmmsg_args /* { + syscallarg(int) s; + syscallarg(struct mmsghdr *)mmsg; + syscallarg(unsigned int)vlen; + syscallarg(unsigned int)flags; + syscallarg(struct timespec *) timeout; + } */ *uap = v; + struct mmsghdr mmsg, *mmsgp; + struct timespec ts, now; + struct iovec aiov[UIO_SMALLIOV], *uiov, *iov = aiov; + struct file *fp; + struct socket *so; + struct timespec *timeout; + size_t iovlen = UIO_SMALLIOV; + register_t retrec; + unsigned int vlen, dgrams; + int error = 0, flags, s; + + s = SCARG(uap, s); + if ((error = getsock(p, s, ))) + return (error); + so = (struct socket *)fp->f_data; + + timeout = SCARG(uap, timeout); + if (timeout != NULL) { + error = copyin(timeout, , sizeof(ts)); + if (error) + return error; +#ifdef KTRACE + if (KTRPOINT(p, KTR_STRUCT)) + ktrreltimespec(p, ); +#endif + getnanotime(); + timespecadd(, , ); + } + + flags = SCARG(uap, flags); + + /* Arbitrarily capped at 1024 datagrams. */ + vlen = SCARG(uap, vlen); + if (vlen > 1024) + vlen = 1024; + + mmsgp = SCARG(uap, mmsg); + for (dgrams = 0; dgrams < vlen;) { + error = copyin([dgrams], , sizeof(mmsg)); + if (error) + break; + + if (mmsg.msg_hdr.msg_iovlen > IOV_MAX) { + error = EMSGSIZE; + break; + } + + if (mmsg.msg_hdr.msg_iovlen > iovlen) { + if (iov != aiov) + free(iov, M_IOV, iovlen * + sizeof(struct iovec)); + + iovlen = mmsg.msg_hdr.msg_iovlen; + iov = mallocarray(iovlen, sizeof(struct iovec), + M_IOV, M_WAITOK); + } + + if (mmsg.msg_hdr.msg_iovlen > 0) { + error = copyin(mmsg.msg_hdr.msg_iov, iov, + mmsg.msg_hdr.msg_iovlen * sizeof(struct iovec)); + if (error) + break; + } + + uiov = mmsg.msg_hdr.msg_iov; + mmsg.msg_hdr.msg_iov = iov; + mmsg.msg_hdr.msg_flags = flags; + + error = recvit(p, s, _hdr, NULL, ); + if (error) { + if (error == EAGAIN && dgrams > 0) + error = 0; + break; + } + + if (dgrams == 0 && flags & MSG_WAITFORONE) { + flags &= ~MSG_WAITFORONE; + flags |= MSG_DONTWAIT; + } + + mmsg.msg_hdr.msg_iov = uiov; + mmsg.msg_len = retrec; +#ifdef KTRACE + if (KTRPOINT(p, KTR_STRUCT)) { + ktrmmsghdr(p, ); + if
Re: add sendmmsg and recvmmsg systemcalls
I created a pull request on their github: https://github.com/NLnetLabs/nsd/pull/231 On Tue, Aug 30, 2022 at 04:33:10PM -0600, Todd C. Miller wrote: > On Wed, 31 Aug 2022 00:19:20 +0200, Moritz Buhl wrote: > > > On Tue, Aug 30, 2022 at 10:59:43PM +0200, Claudio Jeker wrote: > > > And nsd in base. It seems unbound does not use recvmmsg. > > > > After 'make -f Makefile.bsd-wrapper config' recvmmsg is picked up. > > The config compile test currently defines NONBLOCKING_IS_BROKEN > > because of a missing include. Then vlen for recvmmsg would always > > be 1. > > > > checking for struct mmsghdr... yes > > checking for recvmmsg... yes > > checking for sendmmsg... no > > ... > > checking if nonblocking sockets work... yes > > > > If anybody knows how to make nsd call nsd_recvmmsg this would > > make for some nice testing. > > Does this also fix the configure test? If so, we can submit it > upstream. > > - todd > > Index: acx_nlnetlabs.m4 > === > RCS file: /cvs/src/usr.sbin/nsd/acx_nlnetlabs.m4,v > retrieving revision 1.7 > diff -u -p -u -r1.7 acx_nlnetlabs.m4 > --- acx_nlnetlabs.m4 24 Oct 2021 12:14:18 - 1.7 > +++ acx_nlnetlabs.m4 30 Aug 2022 22:31:48 - > @@ -963,6 +963,9 @@ AC_LANG_SOURCE([[ > #ifdef HAVE_SYS_TYPES_H > #include > #endif > +#ifdef HAVE_SYS_SELECT_H > +#include > +#endif > #ifdef HAVE_SYS_SOCKET_H > #include > #endif > Index: configure > === > RCS file: /cvs/src/usr.sbin/nsd/configure,v > retrieving revision 1.56 > diff -u -p -u -r1.56 configure > --- configure 30 Jun 2022 10:49:39 - 1.56 > +++ configure 30 Aug 2022 22:32:08 - > @@ -6593,6 +6593,9 @@ else > #ifdef HAVE_SYS_TYPES_H > #include > #endif > +#ifdef HAVE_SYS_SELECT_H > +#include > +#endif > #ifdef HAVE_SYS_SOCKET_H > #include > #endif >
Re: add sendmmsg and recvmmsg systemcalls
On Tue, Aug 30, 2022 at 10:59:43PM +0200, Claudio Jeker wrote: > And nsd in base. It seems unbound does not use recvmmsg. After 'make -f Makefile.bsd-wrapper config' recvmmsg is picked up. The config compile test currently defines NONBLOCKING_IS_BROKEN because of a missing include. Then vlen for recvmmsg would always be 1. checking for struct mmsghdr... yes checking for recvmmsg... yes checking for sendmmsg... no ... checking if nonblocking sockets work... yes If anybody knows how to make nsd call nsd_recvmmsg this would make for some nice testing. mbuhl Index: Makefile.bsd-wrapper === RCS file: /mount/openbsd/cvs/src/usr.sbin/nsd/Makefile.bsd-wrapper,v retrieving revision 1.19 diff -u -p -r1.19 Makefile.bsd-wrapper --- Makefile.bsd-wrapper30 Jun 2021 11:50:22 - 1.19 +++ Makefile.bsd-wrapper30 Aug 2022 22:09:46 - @@ -22,6 +22,7 @@ CONFIGURE_OPTS= --prefix=/usr \ --with-xfrdfile=${CHROOTDIR}/run/xfrd.state \ --with-libevent=/usr \ --enable-ratelimit \ + --enable-recvmmsg \ --enable-root-server PROG= nsd nsd-checkconf nsd-checkzone nsd-control Index: configure === RCS file: /mount/openbsd/cvs/src/usr.sbin/nsd/configure,v retrieving revision 1.56 diff -u -p -r1.56 configure --- configure 30 Jun 2022 10:49:39 - 1.56 +++ configure 30 Aug 2022 22:04:41 - @@ -6609,6 +6609,8 @@ else #include #endif +#include + int main(void) { int port;
Re: add sendmmsg and recvmmsg systemcalls
ype of socket -the message is received from (see -.Xr socket 2 ) . -.Pp If no messages are available at the socket, the receive call waits for a message to arrive, unless the socket is nonblocking (see @@ -158,6 +159,8 @@ The .Dv MSG_CMSG_CLOEXEC requests that any file descriptors received as ancillary data with .Fn recvmsg +and +.Fn recvmmsg (see below) have their close-on-exec flag set. .Pp @@ -249,13 +252,67 @@ Indicates that the packet was received a .It Dv MSG_MCAST Indicates that the packet was received as multicast. .El +.Pp +The +.Fn recvmmsg +call uses an array of the +.Fa mmsghdr +structure of length +.Fa vlen +to group multiple +.Fa msghdr +structures into a single system call. +.Fa vlen +is capped at maximum +.Dv 1024 +messages that are received in a single call. +The +.Fa flags +field allows setting +.Dv MSG_WAITFORONE +to wait for one +.Fa msghdr , +and set +.Dv MSG_DONTWAIT +for all subsequent messages. +A provided +.Fa timeout +limits the time spent in the function but it does not limit the +time spent in lower parts of the kernel. +.Pp +The +.Fa mmsghdr +structure has the following form, as defined in +.In sys/socket.h : +.Bd -literal +struct mmsghdr { + struct msghdr msg_hdr; + unsigned int msg_len; +}; +.Ed +.Pp +Here +.Fa msg_len +indicated the number of bytes received for each +.Fa msg_hdr +member. .Sh RETURN VALUES -These calls return the number of bytes received, or \-1 if an error occurred. -.Sh ERRORS +The .Fn recv , .Fn recvfrom , and .Fn recvmsg +calls return the number of bytes received, or \-1 if an error occurred. +The +.Fn recvmmsg +call returns the number of messages received, or \-1 +if an error occurred before the first message has been received. +.Sh ERRORS +.Fn recv , +.Fn recvfrom , +.Fn recvmsg , +and +.Fn recvmmsg fail if: .Bl -tag -width "[EHOSTUNREACH]" .It Bq Er EBADF @@ -310,6 +367,8 @@ was larger than .Pp And .Fn recvmsg +and +.Fn recvmmsg may return one of the following errors: .Bl -tag -width Er .It Bq Er EINVAL @@ -364,6 +423,11 @@ The .Fn recv function call appeared in .Bx 4.1c . +The +.Fn recvmmsg +syscall first appeared in Linux 2.6.33 +and has been available since +.Ox 7.2 . .Sh CAVEATS Calling .Fn recvmsg Index: lib/libc/sys/w_recvmmsg.c === RCS file: lib/libc/sys/w_recvmmsg.c diff -N lib/libc/sys/w_recvmmsg.c --- /dev/null 1 Jan 1970 00:00:00 - +++ lib/libc/sys/w_recvmmsg.c 22 Apr 2022 16:03:30 - @@ -0,0 +1,32 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2022 Moritz Buhl + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include +#include "cancel.h" + +int +recvmmsg(int fd, struct mmsghdr *mmsg, unsigned int vlen, unsigned int flags, + struct timespec *ts) +{ + int ret; + + ENTER_CANCEL_POINT(1); + ret = HIDDEN(recvmmsg)(fd, mmsg, vlen, flags, ts); + LEAVE_CANCEL_POINT(ret == -1); + return (ret); +} +DEF_CANCEL(recvmmsg); Index: regress/lib/libc/sys/Makefile === RCS file: /cvs/src/regress/lib/libc/sys/Makefile,v retrieving revision 1.15 diff -u -p -r1.15 Makefile --- regress/lib/libc/sys/Makefile 6 Jan 2022 03:30:15 - 1.15 +++ regress/lib/libc/sys/Makefile 30 Aug 2022 15:44:29 - @@ -54,6 +54,7 @@ PROGS += t_pipe2 PROGS += t_poll PROGS += t_ppoll PROGS += t_ptrace +PROGS += t_recvmmsg PROGS += t_revoke PROGS += t_select PROGS += t_sendrecv Index: regress/lib/libc/sys/atf-c.h === RCS file: /cvs/src/regress/lib/libc/sys/atf-c.h,v retrieving revision 1.4 diff -u -p -r1.4 atf-c.h --- regress/lib/libc/sys/atf-c.h28 May 2022 18:39:39 - 1.4 +++ regress/lib/libc/sys/atf-c.h30 Aug 2022 15:44:29 - @@ -76,6 +76,7 @@ ATF_TC_FUNCTIONS(fn) #define ATF_CHECK ATF_REQUIRE #define ATF_CHECK_MSG ATF_REQUIRE_MSG #define ATF_CHECK_EQ ATF_REQUIRE_EQ +#define ATF_CHECK_EQ_MSG ATF_REQUIRE_EQ_MSG #define ATF_CHECK_ERRNOATF_REQUIRE_ERRNO #define ATF_CHECK_STREQATF_REQUIRE_STREQ Index: regress/lib/libc/sys/t_rec
Re: delete unused variable in ix(4) rx checksum calc
ok mbuhl On Tue, Aug 30, 2022 at 05:13:35PM +0200, Sebastian Benoit wrote: > ptype is never used. > ok? > > diff --git sys/dev/pci/if_ix.c sys/dev/pci/if_ix.c > index cb233034d23..72a221b97d9 100644 > --- sys/dev/pci/if_ix.c > +++ sys/dev/pci/if_ix.c > @@ -148,7 +148,7 @@ void ixgbe_enable_intr(struct ix_softc *); > void ixgbe_disable_intr(struct ix_softc *); > int ixgbe_txeof(struct tx_ring *); > int ixgbe_rxeof(struct rx_ring *); > -void ixgbe_rx_checksum(uint32_t, struct mbuf *, uint32_t); > +void ixgbe_rx_checksum(uint32_t, struct mbuf *); > void ixgbe_iff(struct ix_softc *); > void ixgbe_map_queue_statistics(struct ix_softc *); > void ixgbe_update_link_status(struct ix_softc *); > @@ -3103,7 +3103,7 @@ ixgbe_rxeof(struct rx_ring *rxr) > struct mbuf *mp, *sendmp; > uint8_t eop = 0; > uint16_t len, vtag; > - uint32_t staterr = 0, ptype; > + uint32_t staterr = 0; > struct ixgbe_rx_buf *rxbuf, *nxbuf; > union ixgbe_adv_rx_desc *rxdesc; > size_t dsize = sizeof(union ixgbe_adv_rx_desc); > @@ -3140,8 +3140,6 @@ ixgbe_rxeof(struct rx_ring *rxr) > > mp = rxbuf->buf; > len = letoh16(rxdesc->wb.upper.length); > - ptype = letoh32(rxdesc->wb.lower.lo_dword.data) & > - IXGBE_RXDADV_PKTTYPE_MASK; > vtag = letoh16(rxdesc->wb.upper.vlan); > eop = ((staterr & IXGBE_RXD_STAT_EOP) != 0); > hash = lemtoh32(>wb.lower.hi_dword.rss); > @@ -3213,7 +3211,7 @@ ixgbe_rxeof(struct rx_ring *rxr) > sendmp = NULL; > mp->m_next = nxbuf->buf; > } else { /* Sending this frame? */ > - ixgbe_rx_checksum(staterr, sendmp, ptype); > + ixgbe_rx_checksum(staterr, sendmp); > > if (hashtype != IXGBE_RXDADV_RSSTYPE_NONE) { > sendmp->m_pkthdr.ph_flowid = hash; > @@ -3251,7 +3249,7 @@ next_desc: > * > */ > void > -ixgbe_rx_checksum(uint32_t staterr, struct mbuf * mp, uint32_t ptype) > +ixgbe_rx_checksum(uint32_t staterr, struct mbuf * mp) > { > uint16_t status = (uint16_t) staterr; > uint8_t errors = (uint8_t) (staterr >> 24); >
Re: ipv6 fragment checksum
On Sun, Aug 07, 2022 at 12:41:29AM +0200, Alexander Bluhm wrote: > Hi, > > If interface drivers have enabled transmit offloading for the payload > checksum , IPv6 fragments contain invalid checksum. For fragments > the protocol checksum has to be calculated before fragmentation. > Hardware cannot do this as it is too late. Do it earlier in software. > > ip_fragement() has such code, but in IPv6 it is missing. Note that > in6_proto_cksum_out() has to be called before the next protocol is > set to IPPROTO_FRAGMENT. > > ok? > > bluhm > > Index: netinet6/ip6_output.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_output.c,v > retrieving revision 1.269 > diff -u -p -r1.269 ip6_output.c > --- netinet6/ip6_output.c 29 Jun 2022 22:45:24 - 1.269 > +++ netinet6/ip6_output.c 6 Aug 2022 22:21:04 - > @@ -729,6 +729,12 @@ reroute: > mtu = IPV6_MAXPACKET; > > /* > + * If we are doing fragmentation, we can't defer TCP/UDP > + * checksumming; compute the checksum and clear the flag. > + */ > +in6_proto_cksum_out(m, NULL); > + > + /* >* Change the next header field of the last header in the >* unfragmentable part. >*/ > tested this using udp on em (with and without my offloading diff), igc, and ix openbsd-linux and linux-openbsd-linux-forwarding. ok mbuhl
pfctl: possible memory leak in load_feedback_profile
Dear tech@, the scan-build(1) tool from the llvm port on sbin/pfctl reported a memory leak in the error path of load_feedback_profile. There are more places that could be investigated. Would it make sense to look into these? Independent of that I was wondering if it makes sense to warn if pfctl_optimize_ruleset fails. There cases should exist where the function with 1 without printing a warning. Any feedback? mbuhl Index: sbin/pfctl/pfctl.c === RCS file: /cvs/src/sbin/pfctl/pfctl.c,v retrieving revision 1.388 diff -u -p -r1.388 pfctl.c --- sbin/pfctl/pfctl.c 27 Jul 2022 12:28:27 - 1.388 +++ sbin/pfctl/pfctl.c 27 Jul 2022 12:46:05 - @@ -1452,7 +1452,8 @@ pfctl_load_ruleset(struct pfctl *pf, cha } if (pf->optimize) - pfctl_optimize_ruleset(pf, rs); + if (pfctl_optimize_ruleset(pf, rs)) + warnx("ruleset optimizations failed"); while ((r = TAILQ_FIRST(rs->rules.active.ptr)) != NULL) { TAILQ_REMOVE(rs->rules.active.ptr, r, entries); Index: sbin/pfctl/pfctl_optimize.c === RCS file: /cvs/src/sbin/pfctl/pfctl_optimize.c,v retrieving revision 1.49 diff -u -p -r1.49 pfctl_optimize.c --- sbin/pfctl/pfctl_optimize.c 28 Jan 2022 05:24:15 - 1.49 +++ sbin/pfctl/pfctl_optimize.c 27 Jul 2022 12:46:05 - @@ -869,13 +869,13 @@ load_feedback_profile(struct pfctl *pf, struct pf_ruleset *rs; if ((por = calloc(1, sizeof(*por))) == NULL) { warn("calloc"); - return (1); + goto free_queue; } pr.nr = nr; if (ioctl(pf->dev, DIOCGETRULE, ) == -1) { warnx("%s", pf_strerror(errno)); free(por); - return (1); + goto free_queue; } memcpy(>por_rule, , sizeof(por->por_rule)); rs = pf_find_or_create_ruleset(pr.anchor_call); @@ -884,7 +884,7 @@ load_feedback_profile(struct pfctl *pf, } if (construct_superblocks(pf, , _superblocks)) - return (1); + goto free_queue; /* @@ -920,6 +920,14 @@ load_feedback_profile(struct pfctl *pf, blockcur = block; } return (0); + +free_queue: + while ((por = TAILQ_FIRST())) { + TAILQ_REMOVE(, por, por_entry); + free(por); + } + + return (1); }
Re: pf.conf(5): document new anchors limit
On Thu, Jul 21, 2022 at 10:25:09PM +0100, Jason McIntyre wrote: ... > it looks like the "set limit" text in pf.conf(5) might need some small > adjustments: > > - as well as the "anchors" limit, it does not document "pktdelay-pkts" > > - for entries where defaults are not documented, it is not clear whether > this is an omission or they are just not limited by default (in the > same way that things like table numbers are limited). those affected > seem to be src-nodes and pktdelay-pkts All of these limits have defaults that are sometimes set in the kernel or with the first pfctl invocation during boot: /usr/src/sbin/pfctl.c: 1726 pf->limit[PF_LIMIT_SRC_NODES] = PFSNODE_HIWAT; ... 1729 pf->limit[PF_LIMIT_PKTDELAY_PKTS] = PF_PKTDELAY_MAXPKTS; /usr/include/net/pfvar.h: #define PFSNODE_HIWAT 1 /* default source node table size */ ... #define PF_PKTDELAY_MAXPKTS 1 /* max # of pkts held in delay queue */ > - the two entries for "table-entries" are confusing. it seems to be that > machines with less than a specific amount of memory have their entries > limited to the value of _SMALL. but the way it's documented makes that > unclear. i'm not sure whether the reader needs the names such as > PFSTATE_HIWAT. i think it's just confusing to list it this way. we > should probably have one item, table-entries, and say what the default > is normally, and what it is for lesser memory setups. /usr/src/sbin/pfctl.c: 1737 if (mem <= 100*1024*1024) 1738 pf->limit[PF_LIMIT_TABLE_ENTRIES] = PFR_KENTRY_HIWAT_SMALL; If the machine has less than 100MB physical memory, ..._SMALL is set for PF_LIMIT_TABLE_ENTRIES. > - i think the whole section should just be reduced to a simple list of > what can be set, and any default values. > > here's a stab at tidying up. i've inserted a couple of XXX for where i > came unstuck. > > thoughts? help? I appreciate where this is going. mbuhl > jmc > > Index: pf.conf.5 > === > RCS file: /cvs/src/share/man/man5/pf.conf.5,v > retrieving revision 1.596 > diff -u -p -r1.596 pf.conf.5 > --- pf.conf.5 27 May 2022 15:45:02 - 1.596 > +++ pf.conf.5 21 Jul 2022 21:22:08 - > @@ -1236,65 +1236,56 @@ See > .Xr pool 9 > for an explanation of memory pools. > .Pp > -For example, > -to set the maximum number of entries in the memory pool used by state table > -entries (generated by > +Limits can be set on the following: > +.Bl -tag -width pktdelay_pkts > +.It Cm states > +Set the maximum number of entries in the memory pool used by state table > +entries (those generated by > .Ic pass > rules which do not specify > -.Cm no state ) > -to 2: > -.Pp > -.Dl set limit states 2 > -.Pp > -To set the maximum number of entries in the memory pool used for fragment > -reassembly to 2000: > -.Pp > -.Dl set limit frags 2000 > -.Pp > -This maximum may not exceed, and should be well below, the maximum number > -of mbuf clusters > -.Pq sysctl kern.maxclusters > -in the system. > -.Pp > -To set the maximum number of entries in the memory pool used for tracking > +.Cm no state ) . > +The default is 10. > +.It Cm src-nodes > +Set the maximum number of entries in the memory pool used for tracking > source IP addresses (generated by the > .Cm sticky-address > and > .Cm src.track > -options) to 2000: > -.Pp > -.Dl set limit src-nodes 2000 > -.Pp > -To set limits on the memory pools used by tables: > -.Bd -literal -offset indent > -set limit tables 1000 > -set limit table-entries 10 > -.Ed > -.Pp > -The first limits the number of tables that can exist to 1000. > -The second limits the overall number of addresses that can be stored > -in tables to 10. > -.Pp > -Various limits can be combined on a single line: > -.Bd -literal -offset indent > -set limit { states 2, frags 2000, src-nodes 2000 } > -.Ed > -.Pp > -.Xr pf 4 > -has the following defaults: > -.Bl -column table-entries PFR_KENTRY_HIWAT_SMALL platform_dependent > -.It states Ta Dv PFSTATE_HIWAT Ta Pq 10 > -.It tables Ta Dv PFR_KTABLE_HIWAT Ta Pq 1000 > -.It table-entries Ta Dv PFR_KENTRY_HIWAT Ta Pq 20 > -.It table-entries Ta Dv PFR_KENTRY_HIWAT_SMALL Ta Pq 10 > -.It frags Ta Dv NMBCLUSTERS Ns /32 Ta Pq platform dependent > -.El > -.Pp > +options). > +The default is > +.\" XXX > +.It Cm frags > +Set the maximum number of entries in the memory pool used for fragment > +reassembly. > +The maximum may not exceed, and should be well below, > +the maximum number of mbuf clusters > +.Pq sysctl kern.maxclusters > +in the system. > +The default is NMBCLUSTERS/32. > .Dv NMBCLUSTERS > defines the total number of packets which can exist in-system at any one > time. > Refer to > .In machine/param.h > for the platform-specific value. > +.It Cm tables > +Set the number of tables that can exist. > +The default is 1000. > +.It Cm table-entries > +Set the number of addresses
Re: checksum offloading for em
On Mon, Jun 27, 2022 at 08:07:32AM +1000, Jonathan Gray wrote: > On Sun, Jun 26, 2022 at 04:43:59PM +0200, Moritz Buhl wrote: > > On Sun, Jun 26, 2022 at 12:23:58PM +0200, Moritz Buhl wrote: > > > Hi, > > > > > > I noticed that for some offloading-capable em controllers checksum > > > offloading is still disabled and I couldn't find a reason for that. > > There are two descriptor formats on 82575/82576/i350/i354/i210/i211. > The older one we use and the newer igb/ix style one we don't use in em. > A lot of the offloading options are in the newer descriptor format > from memory. Thanks that helped a lot. Below is a diff that implements offloading for i350 and i210 (I haven't checked it for the others you mentioned). It also does tcp and upd checksums for the older ones also ipv4. I didn't feel confident in the diff but I currently cannot cause any misbehavoir. Using VLAN with i350 and i210 should still break because IFCAP_VLAN_HWTAGGING is not fully implemented yet. I would appreciate any feedback and comments. mbuhl Index: dev/pci/if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.362 diff -u -p -r1.362 if_em.c --- dev/pci/if_em.c 23 Jun 2022 09:38:28 - 1.362 +++ dev/pci/if_em.c 22 Jul 2022 12:37:27 - @@ -37,6 +37,8 @@ POSSIBILITY OF SUCH DAMAGE. #include #include +#include + /* * Driver version */ @@ -278,6 +280,8 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); +u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, + u_int32_t *); void em_iff(struct em_softc *); void em_update_link_status(struct em_softc *); int em_get_buf(struct em_queue *, int); @@ -1221,11 +1225,12 @@ em_encap(struct em_queue *que, struct mb } if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) { + sc->hw.mac_type != em_82576 && sc->hw.mac_type != em_82580 && + sc->hw.mac_type != em_i210 && sc->hw.mac_type != em_i350) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); + } else if (sc->hw.mac_type == em_i210 || sc->hw.mac_type == em_i350) { + used += em_tx_ctx_setup(que, m, head, _upper, _lower); } else { txd_upper = txd_lower = 0; } @@ -1971,10 +1976,11 @@ em_setup_interface(struct em_softc *sc) #endif if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) + sc->hw.mac_type != em_82576 && sc->hw.mac_type != em_82580) { + ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; + } /* * Specify the media types supported by this adapter and register @@ -2391,6 +2397,96 @@ em_free_transmit_structures(struct em_so } } +u_int +em_tx_ctx_setup(struct em_queue *que, struct mbuf *mp, u_int head, +u_int32_t *olinfo_status, u_int32_t *cmd_type_len) +{ + struct e1000_adv_tx_context_desc *TD; + struct ether_header *eh = mtod(mp, struct ether_header *); + struct mbuf *m; + uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0; + uint32_t iphlen; + int off = 0, hoff; + uint8_t ipproto; + + *olinfo_status = 0; + *cmd_type_len = 0; + TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head]; + + // XXX: VLAN TAGGING + + vlan_macip_lens |= (sizeof(*eh) << E1000_ADVTXD_MACLEN_SHIFT); + + switch (ntohs(eh->ether_type)) { + case ETHERTYPE_IP: { + struct ip *ip; + + m = m_getptr(mp, sizeof(*eh), ); + ip = (struct ip *)(mtod(m, caddr_t) + hoff); + + iphlen = ip->ip_hl << 2; + ipproto = ip->ip_p; + + type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4; + if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) { + *olinfo_status |= E1000_TXD_POPTS_IXS
pf: DIOCXCOMMIT and copyin
Hi tech, for the other two DIOCX ioctls syzkaller showed that it is possible to grab netlock while doing copyin. The same problem should exist for DIOCXCOMMIT but syzkaller didn't find it yet. In case anybody can reproduce the witness lock order reversals the syzkaller can produce, the diff below might address the problem. mbuhl Index: sys/net/pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.383 diff -u -p -r1.383 pf_ioctl.c --- sys/net/pf_ioctl.c 20 Jul 2022 09:33:11 - 1.383 +++ sys/net/pf_ioctl.c 20 Jul 2022 18:17:45 - @@ -2621,13 +2621,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a } ioe = malloc(sizeof(*ioe), M_TEMP, M_WAITOK); table = malloc(sizeof(*table), M_TEMP, M_WAITOK); - NET_LOCK(); - PF_LOCK(); /* first makes sure everything will succeed */ for (i = 0; i < io->size; i++) { if (copyin(io->array+i, ioe, sizeof(*ioe))) { - PF_UNLOCK(); - NET_UNLOCK(); free(table, M_TEMP, sizeof(*table)); free(ioe, M_TEMP, sizeof(*ioe)); error = EFAULT; @@ -2635,13 +2631,13 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a } if (strnlen(ioe->anchor, sizeof(ioe->anchor)) == sizeof(ioe->anchor)) { - PF_UNLOCK(); - NET_UNLOCK(); free(table, M_TEMP, sizeof(*table)); free(ioe, M_TEMP, sizeof(*ioe)); error = ENAMETOOLONG; goto fail; } + NET_LOCK(); + PF_LOCK(); switch (ioe->type) { case PF_TRANS_TABLE: rs = pf_find_ruleset(ioe->anchor); @@ -2677,7 +2673,11 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a error = EINVAL; goto fail; } + PF_UNLOCK(); + NET_UNLOCK(); } + NET_LOCK(); + PF_LOCK(); /* * Checked already in DIOCSETLIMIT, but check again as the @@ -2696,9 +2696,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a } /* now do the commit - no errors should happen here */ for (i = 0; i < io->size; i++) { + PF_UNLOCK(); + NET_UNLOCK(); if (copyin(io->array+i, ioe, sizeof(*ioe))) { - PF_UNLOCK(); - NET_UNLOCK(); free(table, M_TEMP, sizeof(*table)); free(ioe, M_TEMP, sizeof(*ioe)); error = EFAULT; @@ -2706,13 +2706,13 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a } if (strnlen(ioe->anchor, sizeof(ioe->anchor)) == sizeof(ioe->anchor)) { - PF_UNLOCK(); - NET_UNLOCK(); free(table, M_TEMP, sizeof(*table)); free(ioe, M_TEMP, sizeof(*ioe)); error = ENAMETOOLONG; goto fail; } + NET_LOCK(); + PF_LOCK(); switch (ioe->type) { case PF_TRANS_TABLE: memset(table, 0, sizeof(*table));
Re: pf: pool for pf_anchor
On Tue, Jul 19, 2022 at 10:58:08PM +0200, Alexander Bluhm wrote: > On Tue, Jul 19, 2022 at 07:18:54PM +0200, Moritz Buhl wrote: > > A solution would be to move the allocation of the pf_anchor struct > > into a pool. One final question would be regarding the size of the > > hiwat or hardlimit. Any suggestions? > > 10 seems way to low. We want to limit resource abuse. But regular > users should never hit this. Especially existing configurations > should still work after upgrade. > > Perhaps 512 anchors. Noone shoul ever need more than 512 anchors :-) Sure. > > Any other comments? > > Please do not use PR_WAITOK in pool_init() unless you know what you > are doing. I think it should work, but who knows. The other pf > pools have 0 flags, just use that. Ok. > Comment says rs_malloc is needed to compile pfctl. Have you tested > that? The new diff addresses this. As well as documentation. > Replacing M_WAITOK|M_CANFAIL|M_ZERO with PR_NOWAIT|PR_ZERO looks > wrong. PR_WAITOK|PR_LIMITFAIL|PR_ZERO should do something equivalent. I don't know how I got this wrong. > I am not sure if pf_find_anchor() should fail if the limit is hit. > It uses only a temporary buffer. All calls to pf_find_ruleset() > should be checked whether permanent failure, after the limit has > been reached, is what we want. Or we keep the malloc there? I guess I was overzealous replacing all mallocs for pf_anchor, thanks for paying attention on the actual lifetime. Is the following diff OK? mbuhl Index: sbin/pfctl/pfctl.c === RCS file: /mount/openbsd/cvs/src/sbin/pfctl/pfctl.c,v retrieving revision 1.385 diff -u -p -r1.385 pfctl.c --- sbin/pfctl/pfctl.c 11 Nov 2021 12:49:53 - 1.385 +++ sbin/pfctl/pfctl.c 20 Jul 2022 06:55:36 - @@ -158,6 +158,7 @@ static const struct { { "tables", PF_LIMIT_TABLES }, { "table-entries", PF_LIMIT_TABLE_ENTRIES }, { "pktdelay-pkts", PF_LIMIT_PKTDELAY_PKTS }, + { "anchors",PF_LIMIT_ANCHORS }, { NULL, 0 } }; Index: share/man/man4/pf.4 === RCS file: /mount/openbsd/cvs/src/share/man/man4/pf.4,v retrieving revision 1.92 diff -u -p -r1.92 pf.4 --- share/man/man4/pf.4 26 May 2019 02:06:55 - 1.92 +++ share/man/man4/pf.4 20 Jul 2022 07:40:38 - @@ -416,7 +416,7 @@ struct pfioc_limit { enum { PF_LIMIT_STATES, PF_LIMIT_SRC_NODES, PF_LIMIT_FRAGS, PF_LIMIT_TABLES, PF_LIMIT_TABLE_ENTRIES, PF_LIMIT_PKTDELAY_PKTS, - PF_LIMIT_MAX }; + PF_LIMIT_ANCHORS, PF_LIMIT_MAX }; .Ed .It Dv DIOCGETLIMIT Fa "struct pfioc_limit *pl" Get the hard @@ -1033,6 +1033,7 @@ static const struct { { "frags", PF_LIMIT_FRAGS }, { "tables", PF_LIMIT_TABLES }, { "table-entries", PF_LIMIT_TABLE_ENTRIES }, + { "anchors",PF_LIMIT_ANCHORS }, { NULL, 0 } }; Index: sys/net/pf.c === RCS file: /mount/openbsd/cvs/src/sys/net/pf.c,v retrieving revision 1.1135 diff -u -p -r1.1135 pf.c --- sys/net/pf.c28 Jun 2022 13:48:06 - 1.1135 +++ sys/net/pf.c20 Jul 2022 06:59:37 - @@ -276,7 +276,8 @@ struct pf_pool_limit pf_pool_limits[PF_L { _frent_pl, PFFRAG_FRENT_HIWAT, PFFRAG_FRENT_HIWAT }, { _ktable_pl, PFR_KTABLE_HIWAT, PFR_KTABLE_HIWAT }, { _kentry_pl, PFR_KENTRY_HIWAT, PFR_KENTRY_HIWAT }, - { _pktdelay_pl, PF_PKTDELAY_MAXPKTS, PF_PKTDELAY_MAXPKTS } + { _pktdelay_pl, PF_PKTDELAY_MAXPKTS, PF_PKTDELAY_MAXPKTS }, + { _anchor_pl, PF_ANCHOR_HIWAT, PF_ANCHOR_HIWAT } }; #define BOUND_IFACE(r, k) \ Index: sys/net/pf_ioctl.c === RCS file: /mount/openbsd/cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.382 diff -u -p -r1.382 pf_ioctl.c --- sys/net/pf_ioctl.c 26 Jun 2022 11:37:08 - 1.382 +++ sys/net/pf_ioctl.c 20 Jul 2022 06:22:15 - @@ -191,6 +191,8 @@ pfattach(int num) IPL_SOFTNET, 0, "pftag", NULL); pool_init(_pktdelay_pl, sizeof(struct pf_pktdelay), 0, IPL_SOFTNET, 0, "pfpktdelay", NULL); + pool_init(_anchor_pl, sizeof(struct pf_anchor), 0, + IPL_SOFTNET, 0, "pfanchor", NULL); hfsc_initialize(); pfr_initialize(); @@ -200,6 +202,8 @@ pfattach(int num) pool_sethardlimit(pf_pool_limits[PF_LIMIT_STATES].pp, pf_pool_limits[PF_LIMIT_STATES].limit, NULL, 0); + pool_sethardlimit(pf_pool_limits[PF_LIMIT_ANCHORS].pp, + pf_pool_limits[PF_LIMIT_ANCHORS].limit, NULL, 0); if
pf: pool for pf_anchor
Dear tech@, I am investigating a syzkaller reproducer found in the "no output from test machine (7)" crashes: https://syzkaller.appspot.com/bug?id=d93e92fde3857c69df2cf46b4244d9814c4318a7 https://syzkaller.appspot.com/text?tag=ReproC=116ee2e008 The code calls DIOCXBEGIN, with different anchor names until the malloc bucket for size 2048 fills up. So the following reproducer does the same: #include #include #include #include #include #include #include int calls; void iterate(char *str) { char *end = str + 64; *end = '\0'; while (*str == '\255') { *str = '\1'; str++; } if (str == end) return; if (*str == '\0') *str = '\1'; *str = ++*str; } int main(void) { struct pfioc_trans trans; struct pfioc_trans_e elem; int fd, ret; char it[64] = "\0"; fd = open("/dev/pf", O_RDWR); while (1) { trans.size = 1; trans.esize = 0x408; trans.array = elem.type = 0; iterate(it); strcpy(elem.anchor, it); elem.ticket = 0; calls++; if (ioctl(fd, DIOCXBEGIN, ) == -1) err(1, "icoctl: %d calls, '%s'", calls, elem.anchor); } } A solution would be to move the allocation of the pf_anchor struct into a pool. One final question would be regarding the size of the hiwat or hardlimit. Any suggestions? Any other comments? mbuhl Index: sys/net/pf.c === RCS file: /mount/openbsd/cvs/src/sys/net/pf.c,v retrieving revision 1.1135 diff -u -p -r1.1135 pf.c --- sys/net/pf.c28 Jun 2022 13:48:06 - 1.1135 +++ sys/net/pf.c19 Jul 2022 11:34:23 - @@ -269,6 +269,7 @@ void pf_log_matches(struct pf_pdesc * extern struct pool pfr_ktable_pl; extern struct pool pfr_kentry_pl; +extern struct pool pf_anchor_pl; struct pf_pool_limit pf_pool_limits[PF_LIMIT_MAX] = { { _state_pl, PFSTATE_HIWAT, PFSTATE_HIWAT }, @@ -276,7 +277,8 @@ struct pf_pool_limit pf_pool_limits[PF_L { _frent_pl, PFFRAG_FRENT_HIWAT, PFFRAG_FRENT_HIWAT }, { _ktable_pl, PFR_KTABLE_HIWAT, PFR_KTABLE_HIWAT }, { _kentry_pl, PFR_KENTRY_HIWAT, PFR_KENTRY_HIWAT }, - { _pktdelay_pl, PF_PKTDELAY_MAXPKTS, PF_PKTDELAY_MAXPKTS } + { _pktdelay_pl, PF_PKTDELAY_MAXPKTS, PF_PKTDELAY_MAXPKTS }, + { _anchor_pl, PF_ANCHOR_HIWAT, PF_ANCHOR_HIWAT } }; #define BOUND_IFACE(r, k) \ Index: sys/net/pf_ioctl.c === RCS file: /mount/openbsd/cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.382 diff -u -p -r1.382 pf_ioctl.c --- sys/net/pf_ioctl.c 26 Jun 2022 11:37:08 - 1.382 +++ sys/net/pf_ioctl.c 19 Jul 2022 16:56:23 - @@ -191,6 +191,8 @@ pfattach(int num) IPL_SOFTNET, 0, "pftag", NULL); pool_init(_pktdelay_pl, sizeof(struct pf_pktdelay), 0, IPL_SOFTNET, 0, "pfpktdelay", NULL); + pool_init(_anchor_pl, sizeof(struct pf_anchor), 0, + IPL_SOFTNET, PR_WAITOK, "pfanchor", NULL); hfsc_initialize(); pfr_initialize(); @@ -200,6 +202,8 @@ pfattach(int num) pool_sethardlimit(pf_pool_limits[PF_LIMIT_STATES].pp, pf_pool_limits[PF_LIMIT_STATES].limit, NULL, 0); + pool_sethardlimit(pf_pool_limits[PF_LIMIT_ANCHORS].pp, + pf_pool_limits[PF_LIMIT_ANCHORS].limit, NULL, 0); if (physmem <= atop(100*1024*1024)) pf_pool_limits[PF_LIMIT_TABLE_ENTRIES].limit = Index: sys/net/pf_ruleset.c === RCS file: /mount/openbsd/cvs/src/sys/net/pf_ruleset.c,v retrieving revision 1.18 diff -u -p -r1.18 pf_ruleset.c --- sys/net/pf_ruleset.c27 Dec 2018 16:54:01 - 1.18 +++ sys/net/pf_ruleset.c19 Jul 2022 12:56:35 - @@ -40,6 +40,7 @@ #ifdef _KERNEL #include #include +#include #endif /* _KERNEL */ #include @@ -55,6 +56,7 @@ #endif /* INET6 */ +struct poolpf_anchor_pl; #ifdef _KERNEL #define rs_malloc(x) malloc(x, M_TEMP, M_WAITOK|M_CANFAIL|M_ZERO) #define rs_free(x, siz)free(x, M_TEMP, siz) @@ -107,12 +109,12 @@ pf_find_anchor(const char *path) { struct pf_anchor*key, *found; - key = rs_malloc(sizeof(*key)); + key = pool_get(_anchor_pl, PR_NOWAIT | PR_ZERO); if (key == NULL) return (NULL); strlcpy(key->path, path, sizeof(key->path)); found = RB_FIND(pf_anchor_global, _anchors, key); - rs_free(key, sizeof(*key)); + pool_put(_anchor_pl, key); return (found); } @@ -184,7 +186,7 @@ pf_create_anchor(struct pf_anchor *paren ((parent != NULL) && (strlen(parent->path) >= PF_ANCHOR_MAXPATH))) return
regress/lib/libm/rint on armv7
Hi tech@, Dear miod@, Dear kettenis@, On arm the lib/libm/rint regress test is failing. After adding some printf debugging into libm it turns out that the problem is the cast from double to long long. http://bluhm.genua.de/regress/results/2022-07-12T04%3A17%3A03Z/logs/lib/libm/rint/make.log: run-regress-rint cc -O2 -pipe -MD -MP -c /usr/src/regress/lib/libm/rint/rint.c cc -o rint rint.o -lm ./rint assertion "llrintf(4503599627370496.0F) == 4503599627370496LL" failed: file "/usr/src/regress/lib/libm/rint/rint.c", line 51, function "main" *** Signal SIGABRT in . (:36 'run-regress-rint') FAILED After discussing this with kettenis@, he suggested copying the ieee754 implementations from NetBSD. The only modification I find worth mentioning is in fixdfdi.c: if (exp >= 63) the original code checked for 62, but this fails the regression test too: assertion "llrint(0x7c00.0p0) == 0x7c00LL" failed: file "/usr/src/regress/lib/libm/rint/rint.c", line 55, function "main" Any thoughts on this? mbuhl Index: lib/libc/arch/arm/quad/fixdfdi.c === RCS file: lib/libc/arch/arm/quad/fixdfdi.c diff -N lib/libc/arch/arm/quad/fixdfdi.c --- /dev/null 1 Jan 1970 00:00:00 - +++ lib/libc/arch/arm/quad/fixdfdi.c15 Jul 2022 10:57:41 - @@ -0,0 +1,83 @@ +/* $OpenBSD$ */ +/* $NetBSD: fixdfdi_ieee754.c,v 1.1 2013/08/24 00:51:48 matt Exp $ */ + +/*- + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * This software was developed by the Computer Systems Engineering group + * at Lawrence Berkeley Laboratory under DARPA contract BG 91-66 and + * contributed to Berkeley. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of the University nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +#if defined(LIBC_SCCS) && !defined(lint) +__RCSID("$NetBSD: fixdfdi_ieee754.c,v 1.1 2013/08/24 00:51:48 matt Exp $"); +#endif /* LIBC_SCCS and not lint */ + +#if defined(SOFTFLOAT) || defined(__ARM_EABI__) +#include "softfloat/softfloat-for-gcc.h" +#endif + +#include "../../../quad/quad.h" +#include +#include +#include + +/* + * Convert double to signed quad. + * Not sure what to do with negative numbers---for now, anything out + * of range becomes UQUAD_MAX. + */ +quad_t +__fixdfdi(double x) +{ + struct ieee_double ux = *(struct ieee_double *) + signed int exp = ux.dbl_exp - DBL_EXP_BIAS; + const bool neg = ux.dbl_sign; + quad_t r; + + if (exp >= 63) + return neg ? QUAD_MIN : QUAD_MAX; + + r = 1 << DBL_FRACHBITS; /* implicit bit */ + r |= ux.dbl_frach; + exp -= DBL_FRACHBITS; + if (exp < 0) { + r >>= -exp; + } else if (exp > 0) { + r <<= DBL_FRACLBITS; + r |= ux.dbl_fracl; + exp -= DBL_FRACLBITS; + if (exp < 0) { + r >>= -exp; + } else if (exp > 0) { + r <<= exp; + } + } + return neg ? -r : r; +} Index: lib/libc/arch/arm/quad/fixsfdi.c === RCS file: lib/libc/arch/arm/quad/fixsfdi.c diff -N lib/libc/arch/arm/quad/fixsfdi.c --- /dev/null 1 Jan 1970 00:00:00 - +++ lib/libc/arch/arm/quad/fixsfdi.c15 Jul 2022 10:56:06 - @@ -0,0 +1,76 @@ +/* $OpenBSD$ */ +/* $NetBSD: fixsfdi_ieee754.c,v 1.1 2013/08/24 00:51:48 matt Exp $ */ + +/*- + * Copyright (c) 1992,
Re: checksum offloading for em
On Sun, Jun 26, 2022 at 12:23:58PM +0200, Moritz Buhl wrote: > Hi, > > I noticed that for some offloading-capable em controllers checksum > offloading is still disabled and I couldn't find a reason for that. > > It was assumed during initial implementation: > https://marc.info/?l=openbsd-tech=137875739832438=2 > > According to the datasheets of the i350 controllers listed below, > they do support the usual offloading features. > https://www.intel.com/content/www/us/en/products/details/ethernet/gigabit-controllers/i350-controllers.html > > tested with > em0 at pci3 dev 0 function 0 "Intel I350" rev 0x01: msi I made a mistake during testing. The diff doesn't work for i350.
checksum offloading for em
Hi, I noticed that for some offloading-capable em controllers checksum offloading is still disabled and I couldn't find a reason for that. It was assumed during initial implementation: https://marc.info/?l=openbsd-tech=137875739832438=2 According to the datasheets of the i350 controllers listed below, they do support the usual offloading features. https://www.intel.com/content/www/us/en/products/details/ethernet/gigabit-controllers/i350-controllers.html tested with em0 at pci3 dev 0 function 0 "Intel I350" rev 0x01: msi and em0 at pci0 dev 31 function 6 "Intel I219-LM" rev 0x21: msi I would appreciate any further feedback, discussion and especially testing. mbuhl Index: sys/dev/pci/if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.361 diff -u -p -r1.361 if_em.c --- sys/dev/pci/if_em.c 11 Mar 2022 18:00:45 - 1.361 +++ sys/dev/pci/if_em.c 24 May 2022 08:47:41 - @@ -1217,9 +1217,7 @@ em_encap(struct em_queue *que, struct mb } if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) { + sc->hw.mac_type != em_82576 && sc->hw.mac_type != em_82580) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); } else { @@ -1966,9 +1964,7 @@ em_setup_interface(struct em_softc *sc) #endif if (sc->hw.mac_type >= em_82543 && sc->hw.mac_type != em_82575 && - sc->hw.mac_type != em_82576 && - sc->hw.mac_type != em_82580 && sc->hw.mac_type != em_i210 && - sc->hw.mac_type != em_i350) + sc->hw.mac_type != em_82576 && sc->hw.mac_type != em_82580) ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; /*
Re: another syzkaller problem in pf
On Tue, May 03, 2022 at 09:31:51PM +0200, Alexander Bluhm wrote: ... > The code is too complex to be sure what the reason of the syzkaller > panic is. Sleep in malloc is correct anyway and may improve the > situation. > > Functions with argument values 0 or 1 are hard to read. It would > be much nicer to pass M_WAITOK or M_NOWAIT. And the variable name > "intr" does not make sense anymore. pf does not run in interrupt > context. Call it "mflags" like in pfi_kif_alloc(). Or "wait" like > in other functions. > > Could you cleanup that also? I renamed the intr parameter for pfr_create_ktable and pfr_attach_table and passed it further up the call stack. In pf_addr_setup I noticed that pf_tbladdr_setup also used to set intr. I changed it to wait for both, pfi_dynaddr_setup and pf_tbladdr_setup. Now it should be a little easier to notice sleeping points in pf_ioctl.c. OK? mbuhl Index: sys/net/pf.c === RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.1133 diff -u -p -r1.1133 pf.c --- sys/net/pf.c13 Jun 2022 12:48:00 - 1.1133 +++ sys/net/pf.c26 Jun 2022 09:21:21 - @@ -1585,11 +1585,11 @@ pf_purge_expired_states(u_int32_t maxche } int -pf_tbladdr_setup(struct pf_ruleset *rs, struct pf_addr_wrap *aw) +pf_tbladdr_setup(struct pf_ruleset *rs, struct pf_addr_wrap *aw, int wait) { if (aw->type != PF_ADDR_TABLE) return (0); - if ((aw->p.tbl = pfr_attach_table(rs, aw->v.tblname, 1)) == NULL) + if ((aw->p.tbl = pfr_attach_table(rs, aw->v.tblname, wait)) == NULL) return (1); return (0); } Index: sys/net/pf_if.c === RCS file: /cvs/src/sys/net/pf_if.c,v retrieving revision 1.105 diff -u -p -r1.105 pf_if.c --- sys/net/pf_if.c 16 May 2022 13:31:19 - 1.105 +++ sys/net/pf_if.c 26 Jun 2022 09:38:43 - @@ -423,7 +423,7 @@ pfi_match_addr(struct pfi_dynaddr *dyn, } int -pfi_dynaddr_setup(struct pf_addr_wrap *aw, sa_family_t af) +pfi_dynaddr_setup(struct pf_addr_wrap *aw, sa_family_t af, int wait) { struct pfi_dynaddr *dyn; char tblname[PF_TABLE_NAME_SIZE]; @@ -432,8 +432,7 @@ pfi_dynaddr_setup(struct pf_addr_wrap *a if (aw->type != PF_ADDR_DYNIFTL) return (0); - if ((dyn = pool_get(_addr_pl, PR_WAITOK | PR_LIMITFAIL | PR_ZERO)) - == NULL) + if ((dyn = pool_get(_addr_pl, wait|PR_LIMITFAIL|PR_ZERO)) == NULL) return (1); if (!strcmp(aw->v.ifname, "self")) @@ -466,7 +465,7 @@ pfi_dynaddr_setup(struct pf_addr_wrap *a goto _bad; } - if ((dyn->pfid_kt = pfr_attach_table(ruleset, tblname, 1)) == NULL) { + if ((dyn->pfid_kt = pfr_attach_table(ruleset, tblname, wait)) == NULL) { rv = 1; goto _bad; } Index: sys/net/pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.381 diff -u -p -r1.381 pf_ioctl.c --- sys/net/pf_ioctl.c 10 May 2022 23:12:25 - 1.381 +++ sys/net/pf_ioctl.c 26 Jun 2022 09:35:17 - @@ -891,8 +891,8 @@ int pf_addr_setup(struct pf_ruleset *ruleset, struct pf_addr_wrap *addr, sa_family_t af) { - if (pfi_dynaddr_setup(addr, af) || - pf_tbladdr_setup(ruleset, addr) || + if (pfi_dynaddr_setup(addr, af, PR_WAITOK) || + pf_tbladdr_setup(ruleset, addr, PR_WAITOK) || pf_rtlabel_add(addr)) return (EINVAL); @@ -1413,7 +1413,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a if (rule->overload_tblname[0]) { if ((rule->overload_tbl = pfr_attach_table(ruleset, - rule->overload_tblname, 0)) == NULL) + rule->overload_tblname, PR_WAITOK)) == NULL) error = EINVAL; else rule->overload_tbl->pfrkt_flags |= PFR_TFLAG_ACTIVE; @@ -1636,7 +1636,8 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a if (newrule->overload_tblname[0]) { newrule->overload_tbl = pfr_attach_table( - ruleset, newrule->overload_tblname, 0); + ruleset, newrule->overload_tblname, + PR_WAITOK); if (newrule->overload_tbl == NULL) error = EINVAL; else Index: sys/net/pf_table.c === RCS file: /cvs/src/sys/net/pf_table.c,v retrieving revision 1.142 diff -u -p -r1.142 pf_table.c --- sys/net/pf_table.c 16 Jun 2022 20:47:26 - 1.142 +++ sys/net/pf_table.c 26 Jun 2022
igc vlan hwtagging
Dear tech@, the following diff should implement vlan hwtagging for igc. I would appreciate feedback from further testing. OK? mbuhl Index: sys/dev/pci/if_igc.c === RCS file: /cvs/src/sys/dev/pci/if_igc.c,v retrieving revision 1.9 diff -u -p -r1.9 if_igc.c --- sys/dev/pci/if_igc.c2 Jun 2022 07:41:17 - 1.9 +++ sys/dev/pci/if_igc.c2 Jun 2022 12:18:14 - @@ -790,11 +790,9 @@ igc_setup_interface(struct igc_softc *sc ifp->if_capabilities = IFCAP_VLAN_MTU; -#ifdef notyet #if NVLAN > 0 ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; #endif -#endif ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; @@ -1007,17 +1005,19 @@ igc_start(struct ifqueue *ifq) prod &= mask; } + cmd_type_len = IGC_ADVTXD_DCMD_IFCS | IGC_ADVTXD_DTYP_DATA | + IGC_ADVTXD_DCMD_DEXT; + + if (ISSET(m->m_flags, M_VLANTAG)) + cmd_type_len |= IGC_ADVTXD_DCMD_VLE; + for (i = 0; i < map->dm_nsegs; i++) { txdesc = >tx_base[prod]; - cmd_type_len = IGC_ADVTXD_DCMD_IFCS | IGC_ADVTXD_DTYP_DATA | - IGC_ADVTXD_DCMD_DEXT | map->dm_segs[i].ds_len; - if (i == map->dm_nsegs - 1) - cmd_type_len |= IGC_ADVTXD_DCMD_EOP | - IGC_ADVTXD_DCMD_RS; - htolem64(>read.buffer_addr, map->dm_segs[i].ds_addr); - htolem32(>read.cmd_type_len, cmd_type_len); + htolem32(>read.cmd_type_len, cmd_type_len | + map->dm_segs[i].ds_len | ((i == map->dm_nsegs - 1)? + IGC_ADVTXD_DCMD_EOP | IGC_ADVTXD_DCMD_RS : 0)); htolem32(>read.olinfo_status, olinfo_status); last = prod; @@ -2006,10 +2006,10 @@ igc_tx_ctx_setup(struct tx_ring *txr, st struct mbuf *m; uint32_t type_tucmd_mlhl = 0; uint32_t vlan_macip_lens = 0; - uint32_t iphlen; + uint32_t iphlen = 0; int hoff; int off = 0; - uint8_t ipproto; + uint8_t ipproto = 0; vlan_macip_lens |= (sizeof(*eh) << IGC_ADVTXD_MACLEN_SHIFT); @@ -2018,7 +2018,6 @@ igc_tx_ctx_setup(struct tx_ring *txr, st * be placed into the context descriptor. Hence * we need to make one even if not doing offloads. */ -#ifdef notyet #if NVLAN > 0 if (ISSET(mp->m_flags, M_VLANTAG)) { uint32_t vtag = mp->m_pkthdr.ether_vtag; @@ -2026,7 +2025,6 @@ igc_tx_ctx_setup(struct tx_ring *txr, st off = 1; } #endif -#endif switch (ntohs(eh->ether_type)) { case ETHERTYPE_IP: { @@ -2062,8 +2060,6 @@ igc_tx_ctx_setup(struct tx_ring *txr, st break; } #endif - default: - return 0; } vlan_macip_lens |= iphlen;
Re: igc checksum offloading
On Wed, May 25, 2022 at 04:18:00PM +0200, Moritz Buhl wrote: > Dear tech@, > > The attached diff should handle checksum offloading for igc(4) > for IPv4, TCP, and UDP. > > I tested with "Intel I225-LM" rev 0x03, > kevlo@ tested this with "Intel I225-V" rev 0x03 > > OK? The diff doesn't look right. Try again. Index: sys/dev/pci/if_igc.c === RCS file: /cvs/src/sys/dev/pci/if_igc.c,v retrieving revision 1.8 diff -u -p -r1.8 if_igc.c --- sys/dev/pci/if_igc.c11 May 2022 06:14:15 - 1.8 +++ sys/dev/pci/if_igc.c25 May 2022 18:44:39 - @@ -48,6 +48,8 @@ #include #include +#include +#include #if NBPFILTER > 0 #include @@ -115,6 +117,7 @@ int igc_media_change(struct ifnet *); void igc_iff(struct igc_softc *); void igc_update_link_status(struct igc_softc *); intigc_get_buf(struct rx_ring *, int); +intigc_tx_ctx_setup(struct tx_ring *, struct mbuf *, int, uint32_t *); void igc_configure_queues(struct igc_softc *); void igc_set_queues(struct igc_softc *, uint32_t, uint32_t, int); @@ -791,9 +794,11 @@ igc_setup_interface(struct igc_softc *sc #if NVLAN > 0 ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; #endif +#endif + ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; -#endif + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; /* Initialize ifmedia structures. */ ifmedia_init(>media, IFM_IMASK, igc_media_change, igc_media_status); @@ -996,6 +1001,12 @@ igc_start(struct ifqueue *ifq) bus_dmamap_sync(txr->txdma.dma_tag, map, 0, map->dm_mapsize, BUS_DMASYNC_PREWRITE); + if (igc_tx_ctx_setup(txr, m, prod, _status)) { + /* Consume the first descriptor */ + prod++; + prod &= mask; + } + for (i = 0; i < map->dm_nsegs; i++) { txdesc = >tx_base[prod]; @@ -1977,6 +1988,117 @@ igc_free_transmit_buffers(struct tx_ring sc->num_tx_desc * sizeof(struct igc_tx_buf)); txr->tx_buffers = NULL; txr->txtag = NULL; +} + + +/* + * + * Advanced Context Descriptor setup for VLAN, CSUM or TSO + * + **/ + +int +igc_tx_ctx_setup(struct tx_ring *txr, struct mbuf *mp, int prod, +uint32_t *olinfo_status) +{ + struct igc_adv_tx_context_desc *txdesc; + struct ether_header *eh = mtod(mp, struct ether_header *); + struct mbuf *m; + uint32_t type_tucmd_mlhl = 0; + uint32_t vlan_macip_lens = 0; + uint32_t iphlen; + int hoff; + int off = 0; + uint8_t ipproto; + + vlan_macip_lens |= (sizeof(*eh) << IGC_ADVTXD_MACLEN_SHIFT); + + /* +* In advanced descriptors the vlan tag must +* be placed into the context descriptor. Hence +* we need to make one even if not doing offloads. +*/ +#ifdef notyet +#if NVLAN > 0 + if (ISSET(mp->m_flags, M_VLANTAG)) { + uint32_t vtag = mp->m_pkthdr.ether_vtag; + vlan_macip_lens |= (vtag << IGC_ADVTXD_VLAN_SHIFT); + off = 1; + } +#endif +#endif + + switch (ntohs(eh->ether_type)) { + case ETHERTYPE_IP: { + struct ip *ip; + + m = m_getptr(mp, sizeof(*eh), ); + KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip)); + ip = (struct ip *)(mtod(m, caddr_t) + hoff); + + iphlen = ip->ip_hl << 2; + ipproto = ip->ip_p; + + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_IPV4; + if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) { + *olinfo_status |= IGC_TXD_POPTS_IXSM << 8; + off = 1; + } + + break; + } +#ifdef INET6 + case ETHERTYPE_IPV6: { + struct ip6_hdr *ip6; + + m = m_getptr(mp, sizeof(*eh), ); + KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip6)); + ip6 = (struct ip6_hdr *)(mtod(m, caddr_t) + hoff); + + iphlen = sizeof(*ip6); + ipproto = ip6->ip6_nxt; + + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_IPV6; + break; + } +#endif + default: + return 0; + } + + vlan_macip_lens |= iphlen; + type_tucmd_mlhl |= IGC_ADVTXD_DCMD_DEXT | IGC_ADVTXD_DTYP_CTXT; + + switch (ipproto) { + case IPPROTO_TCP: + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_L4T_T
igc checksum offloading
Dear tech@, The attached diff should handle checksum offloading for igc(4) for IPv4, TCP, and UDP. I tested with "Intel I225-LM" rev 0x03, kevlo@ tested this with "Intel I225-V" rev 0x03 OK? Index: /sys/dev/pci/if_igc.c === RCS file: /mount/openbsd/cvs/src/sys/dev/pci/if_igc.c,v retrieving revision 1.8 diff -u -p -r1.8 if_igc.c --- /sys/dev/pci/if_igc.c 11 May 2022 06:14:15 - 1.8 +++ /sys/dev/pci/if_igc.c 20 May 2022 18:21:29 - @@ -48,6 +48,8 @@ #include #include +#include +#include #if NBPFILTER > 0 #include @@ -115,6 +117,7 @@ int igc_media_change(struct ifnet *); void igc_iff(struct igc_softc *); void igc_update_link_status(struct igc_softc *); intigc_get_buf(struct rx_ring *, int); +intigc_tx_ctx_setup(struct tx_ring *, struct mbuf *, int, uint32_t *); void igc_configure_queues(struct igc_softc *); void igc_set_queues(struct igc_softc *, uint32_t, uint32_t, int); @@ -791,9 +794,11 @@ igc_setup_interface(struct igc_softc *sc #if NVLAN > 0 ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING; #endif +#endif + ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; -#endif + ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; /* Initialize ifmedia structures. */ ifmedia_init(>media, IFM_IMASK, igc_media_change, igc_media_status); @@ -996,6 +1001,12 @@ igc_start(struct ifqueue *ifq) bus_dmamap_sync(txr->txdma.dma_tag, map, 0, map->dm_mapsize, BUS_DMASYNC_PREWRITE); + if (igc_tx_ctx_setup(txr, m, prod, _status)) { + /* Consume the first descriptor */ + prod++; + prod &= mask; + } + for (i = 0; i < map->dm_nsegs; i++) { txdesc = >tx_base[prod]; @@ -1977,6 +1988,117 @@ igc_free_transmit_buffers(struct tx_ring sc->num_tx_desc * sizeof(struct igc_tx_buf)); txr->tx_buffers = NULL; txr->txtag = NULL; +} + + +/* + * + * Advanced Context Descriptor setup for VLAN, CSUM or TSO + * + **/ + +int +igc_tx_ctx_setup(struct tx_ring *txr, struct mbuf *mp, int prod, +uint32_t *olinfo_status) +{ + struct igc_adv_tx_context_desc *txdesc; + struct ether_header *eh = mtod(mp, struct ether_header *); + struct mbuf *m; + uint32_t type_tucmd_mlhl = 0; + uint32_t vlan_macip_lens = 0; + uint32_t iphlen; + int hoff; + int off = 0; + uint8_t ipproto; + + vlan_macip_lens |= (sizeof(*eh) << IGC_ADVTXD_MACLEN_SHIFT); + + /* +* In advanced descriptors the vlan tag must +* be placed into the context descriptor. Hence +* we need to make one even if not doing offloads. +*/ +#ifdef notyet +#if NVLAN > 0 + if (ISSET(mp->m_flags, M_VLANTAG)) { + uint32_t vtag = mp->m_pkthdr.ether_vtag; + vlan_macip_lens |= (vtag << IGC_ADVTXD_VLAN_SHIFT); + off = 1; + } +#endif +#endif + + switch (ntohs(eh->ether_type)) { + case ETHERTYPE_IP: { + struct ip *ip; + + m = m_getptr(mp, sizeof(*eh), ); + KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip)); + ip = (struct ip *)(mtod(m, caddr_t) + hoff); + + iphlen = ip->ip_hl << 2; + ipproto = ip->ip_p; + + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_IPV4; + if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) { + *olinfo_status |= IGC_TXD_POPTS_IXSM << 8; + off = 1; + } + + break; + } +#ifdef INET6 + case ETHERTYPE_IPV6: { + struct ip6_hdr *ip6; + + m = m_getptr(mp, sizeof(*eh), ); + KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip6)); + ip6 = (struct ip6_hdr *)(mtod(m, caddr_t) + hoff); + + iphlen = sizeof(*ip6); + ipproto = ip6->ip6_nxt; + + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_IPV6; + break; + } +#endif + default: + return 0; + } + + vlan_macip_lens |= iphlen; + type_tucmd_mlhl |= IGC_ADVTXD_DCMD_DEXT | IGC_ADVTXD_DTYP_CTXT; + + switch (ipproto) { + case IPPROTO_TCP: + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_L4T_TCP; + if (ISSET(mp->m_pkthdr.csum_flags, M_TCP_CSUM_OUT)) { + *olinfo_status |= IGC_TXD_POPTS_TXSM << 8; + off = 1; + } + break; + case IPPROTO_UDP: + type_tucmd_mlhl |= IGC_ADVTXD_TUCMD_L4T_UDP; +
another syzkaller problem in pf
Hi tech@, Syzkaller found a few crashes in pf_anchor_global_RB_REMOVE. https://syzkaller.appspot.com/bug?id=a97f712331903ce38b8c084a489818b9bb5c6fcb and also https://syzkaller.appspot.com/text?tag=CrashLog=15ace9aaf0 The call stack is something like this: pf_anchor_global_RB_REMOVE pf_remove_if_empty_ruleset pfi_dynaddr_setup pfioctl I believe this is caused by two calls to pf_remove_if_empty_ruleset in pfi_dynaddr_setup and a possibly pfr_destroy_ktable. I think it is a problem that pfr_attach_table is being called with 1 (intr) because the introducing commit says: commit 4b3977248902c22d96aaebdb5784840debc2631c Author: mikeb Date: Mon Nov 24 13:22:09 2008 + Fix splasserts seen in pr 5987 by propagating a flag that discribes whether we're called from the interrupt context to the functions performing allocations. Looked at by mpf@ and henning@, tested by mpf@ and Antti Harri, the pr originator. ok tedu And we are not in interrupt context. A detailed analysis follows. The pfioctl cd60441a is DIOCADDRULE. The call stack fails in pf_addr_setup calls from pf_ioctl.c: 1647 if (pf_addr_setup(ruleset, >src.addr, 1648 newrule->af)) similar calls below. One of these calls might do the following: 424 pfi_dynaddr_setup(struct pf_addr_wrap *aw, sa_family_t af) ... 462 if ((ruleset = pf_find_or_create_ruleset(PF_RESERVED_ANCHOR)) == NULL) { assume false. ... 467 if ((dyn->pfid_kt = pfr_attach_table(ruleset, tblname, 1)) == NULL) might evaluate to true, I will share my thoughts on it. 468 rv = 1; 469 goto _bad; ... 481 _bad: 482 if (dyn->pfid_kt != NULL) 483 pfr_detach_table(dyn->pfid_kt); 484 if (ruleset != NULL) 485 pf_remove_if_empty_ruleset(ruleset); remember this call. I will show a case where we don't want this. due to the following in pf_table.c: 2424 pfr_attach_table(struct pf_ruleset *rs, char *name, int intr) ... 2434 kt = pfr_lookup_table(); 2435 if (kt == NULL) { assume true. 2436 kt = pfr_create_ktable(, gettime(), 1, intr); 2437 if (kt == NULL) 2438 return (NULL); 2439 if (ac != NULL) { assume true. this means kt != NULL and kt->pfrkt_rs = our ruleset. 2440 bzero(tbl.pfrt_anchor, sizeof(tbl.pfrt_anchor)); 2441 rt = pfr_lookup_table(); 2442 if (rt == NULL) { assume true. pfr_ktable_compares pfrkt_name an pfrkt_anchor if previous was same. pfrkt_name should be '\0', pfrkt_anchor is PF_RESERVED_ANCHOR vs '\0' 2443 rt = pfr_create_ktable(, 0, 1, intr); let's look into this call next just to be sure. 2444 if (rt == NULL) { assume true. 2445 pfr_destroy_ktable(kt, 0); remember this call is possible. We look at this afterwards. 2446 return (NULL); ... looking into create_ktable, I want to see if we can reach pfr_destroy_ktable, still in pf_table.c: 2202 pfr_create_ktable(struct pfr_table *tbl, time_t tzero, int attachruleset ... 2208 if (intr) 2209 kt = pool_get(_ktable_pl, PR_NOWAIT|PR_ZERO|PR_LIMITFAIL); This can fail. Returning NULL afterwards. Note PR_ZERO. ... 2216 if (attachruleset) { 2217 rs = pf_find_or_create_ruleset(tbl->pfrt_anchor); 2218 if (!rs) { not reachable because pfrt_anchor comes from l.2433. But a weird indirection for our current call stack anyway. Might be necessary. 2219 pfr_destroy_ktable(kt, 0); 2220 return (NULL); 2221 } kt->pfrkt_rs = rs; 2223 rs->tables++; 2224 } 2225 2226 if (!rn_inithead((void **)>pfrkt_ip4, 2227 offsetof(struct sockaddr_in, sin_addr)) || 2228 !rn_inithead((void **)>pfrkt_ip6, 2229 offsetof(struct sockaddr_in6, sin6_addr))) { unreachable because >pfrkt_ip{4,6} are NULL. 2230 pfr_destroy_ktable(kt, 0); 2231 return (NULL); 2232 } Back to pfr_destroy_ktable, still in pf_table.c: 2253 pfr_destroy_ktable(struct pfr_ktable *kt, int flushaddr) 2254 { ... 2268 if (kt->pfrkt_rs != NULL) { assume true. 2269 kt->pfrkt_rs->tables--; 2270 pf_remove_if_empty_ruleset(kt->pfrkt_rs); our first call to pf_remove_if_empty_ruleset. The second one follows from pfi_dynaddr_setup. If this is sound, then the only reason why pfr_destroy_ktable was called is that pool_get is called with PR_NOWAIT. And then the following diff would help. mbuhl Index: pf_if.c === RCS file: /cvs/src/sys/net/pf_if.c,v retrieving revision 1.104 diff -u -p
another copyout while holding net/pf lock in pfioctl
Hi tech@, syzkaller reported pfi_get_interfaces https://syzkaller.appspot.com/bug?id=963ef41930bb84af584be3bf910a3f67d65581a0 which does a copyout while holding NET and PF lock. The following diff should address this. mbuhl Index: sys/net/pf_if.c === RCS file: /cvs/src/sys/net/pf_if.c,v retrieving revision 1.103 diff -u -p -r1.103 pf_if.c --- sys/net/pf_if.c 26 Dec 2021 01:00:32 - 1.103 +++ sys/net/pf_if.c 28 Apr 2022 16:36:49 - @@ -755,7 +755,7 @@ pfi_update_status(const char *name, stru } } -int +void pfi_get_ifaces(const char *name, struct pfi_kif *buf, int *size) { struct pfi_kif *p, *nextp; @@ -769,16 +769,12 @@ pfi_get_ifaces(const char *name, struct if (!p->pfik_tzero) p->pfik_tzero = gettime(); pfi_kif_ref(p, PFI_KIF_REF_RULE); - if (copyout(p, buf++, sizeof(*buf))) { - pfi_kif_unref(p, PFI_KIF_REF_RULE); - return (EFAULT); - } + memcpy(buf++, p, sizeof(*buf)); nextp = RB_NEXT(pfi_ifhead, _ifs, p); pfi_kif_unref(p, PFI_KIF_REF_RULE); } } *size = n; - return (0); } int Index: sys/net/pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.379 diff -u -p -r1.379 pf_ioctl.c --- sys/net/pf_ioctl.c 9 Apr 2022 13:15:44 - 1.379 +++ sys/net/pf_ioctl.c 28 Apr 2022 17:21:47 - @@ -2921,18 +2921,30 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a break; case DIOCIGETIFACES: { - struct pfioc_iface *io = (struct pfioc_iface *)addr; + struct pfioc_iface *io = (struct pfioc_iface *)addr; + struct pfi_kif *kif_buf; + int apfiio_size = io->pfiio_size; if (io->pfiio_esize != sizeof(struct pfi_kif)) { error = ENODEV; goto fail; } + + if ((kif_buf = mallocarray(sizeof(*kif_buf), apfiio_size, + M_TEMP, M_WAITOK|M_CANFAIL)) == NULL) { + error = EINVAL; + goto fail; + } + NET_LOCK(); PF_LOCK(); - error = pfi_get_ifaces(io->pfiio_name, io->pfiio_buffer, - >pfiio_size); + pfi_get_ifaces(io->pfiio_name, kif_buf, >pfiio_size); PF_UNLOCK(); NET_UNLOCK(); + if (copyout(kif_buf, io->pfiio_buffer, sizeof(*kif_buf) * + io->pfiio_size)) + error = EFAULT; + free(kif_buf, M_TEMP, sizeof(*kif_buf) * apfiio_size); break; } Index: sys/net/pfvar.h === RCS file: /cvs/src/sys/net/pfvar.h,v retrieving revision 1.506 diff -u -p -r1.506 pfvar.h --- sys/net/pfvar.h 21 Apr 2022 15:22:49 - 1.506 +++ sys/net/pfvar.h 28 Apr 2022 16:22:44 - @@ -1879,7 +1879,7 @@ intpfi_dynaddr_setup(struct pf_addr_w voidpfi_dynaddr_remove(struct pf_addr_wrap *); voidpfi_dynaddr_copyout(struct pf_addr_wrap *); voidpfi_update_status(const char *, struct pf_status *); -int pfi_get_ifaces(const char *, struct pfi_kif *, int *); +voidpfi_get_ifaces(const char *, struct pfi_kif *, int *); int pfi_set_flags(const char *, int); int pfi_clear_flags(const char *, int); voidpfi_xcommit(void);
Re: wall(1) - fix issues with line wrapping
On Tue, Apr 19, 2022 at 09:00:08PM +0200, Anton Borowka wrote: > Hi, > > the attached diff fixes 2 issues with line wrappings in wall(1). ... I can confirm that your patch fixes the described problems. But I am wondering if it makes sense to split the ch == '\n' case and the cnt == 79 case into two blocks. OK mbuhl.
add sendmmsg and recvmmsg systemcalls
l to -.Fn recvfrom -with a null -.Fa from -parameter. -.Pp -On successful completion, all three routines return the number of -message bytes read. -If a message is too long to fit in the supplied -buffer, excess bytes may be discarded depending on the type of socket -the message is received from (see -.Xr socket 2 ) . -.Pp If no messages are available at the socket, the receive call waits for a message to arrive, unless the socket is nonblocking (see @@ -158,6 +159,8 @@ The .Dv MSG_CMSG_CLOEXEC requests that any file descriptors received as ancillary data with .Fn recvmsg +and +.Fn recvmmsg (see below) have their close-on-exec flag set. .Pp @@ -249,13 +252,67 @@ Indicates that the packet was received a .It Dv MSG_MCAST Indicates that the packet was received as multicast. .El +.Pp +The +.Fn recvmmsg +call uses an array of the +.Fa mmsghdr +structure of length +.Fa vlen +to group multiple +.Fa msghdr +structures into a single system call. +.Fa vlen +is capped at maximum +.Dv 1024 +messages that are received in a single call. +The +.Fa flags +field allows setting +.Dv MSG_WAITFORONE +to wait for one +.Fa msghdr , +and set +.Dv MSG_DONTWAIT +for all subsequent messages. +A provided +.Fa timeout +limits the time spent in the function but it does not limit the +time spent in lower parts of the kernel. +.Pp +The +.Fa mmsghdr +structure has the following form, as defined in +.In sys/socket.h : +.Bd -literal +struct mmsghdr { + struct msghdr msg_hdr; + unsigned int msg_len; +}; +.Ed +.Pp +Here +.Fa msg_len +indicated the number of bytes received for each +.Fa msg_hdr +member. .Sh RETURN VALUES -These calls return the number of bytes received, or \-1 if an error occurred. +The +.Fn send , +.Fn sendto , +and +.Fn sendmsg +calls return the number of bytes received, or \-1 if an error occurred. +The +.Fn sendmmsg +call returns the number of messages received, or \-1 +if an error occurred before the first message has been received. .Sh ERRORS .Fn recv , .Fn recvfrom , +.Fn recvmsg , and -.Fn recvmsg +.Fn recvmmsg fail if: .Bl -tag -width "[EHOSTUNREACH]" .It Bq Er EBADF @@ -310,6 +367,8 @@ was larger than .Pp And .Fn recvmsg +and +.Fn recvmmsg may return one of the following errors: .Bl -tag -width Er .It Bq Er EINVAL @@ -364,6 +423,12 @@ The .Fn recv function call appeared in .Bx 4.1c . +The +.Fn sendmmsg +syscall first appeared in Linux 2.6.33, was reimplemented for +.Nx 7.0 , +and ported to +.Ox 7.1 . .Sh CAVEATS Calling .Fn recvmsg Index: lib/libc/sys/send.2 === RCS file: /cvs/src/lib/libc/sys/send.2,v retrieving revision 1.34 diff -u -p -r1.34 send.2 --- lib/libc/sys/send.2 11 Jan 2019 06:10:13 - 1.34 +++ lib/libc/sys/send.2 22 Apr 2022 16:03:30 - @@ -36,7 +36,8 @@ .Sh NAME .Nm send , .Nm sendto , -.Nm sendmsg +.Nm sendmsg , +.Nm sendmmsg .Nd send a message from a socket .Sh SYNOPSIS .In sys/socket.h @@ -46,19 +47,23 @@ .Fn sendto "int s" "const void *msg" "size_t len" "int flags" "const struct sockaddr *to" "socklen_t tolen" .Ft ssize_t .Fn sendmsg "int s" "const struct msghdr *msg" "int flags" +.Ft int +.Fn sendmmsg "int s" "const struct mmsghdr *mmsg" "unsigned int vlen" "unsigned int flags" .Sh DESCRIPTION .Fn send , .Fn sendto , +.Fn sendmsg , and -.Fn sendmsg +.Fn sendmmsg are used to transmit a message to another socket. .Fn send may be used only when the socket is in a .Em connected state, while -.Fn sendto +.Fn sendto , +.Fn sendmsg , and -.Fn sendmsg +.Fn sendmmsg may be used at any time. .Pp The address of the target is given by @@ -127,10 +132,21 @@ See .Xr recv 2 for a description of the .Fa msghdr -structure. +and +.Fa mmsghdr +structures. .Sh RETURN VALUES -The call returns the number of characters sent, or \-1 +The +.Fn send , +.Fn sendto , +and +.Fn sendmsg +calls return the number of characters sent, or \-1 if an error occurred. +The +.Fn sendmmsg +call returns the number of messages sent, or \-1 +if an error occurred before the first message has been sent. .Sh ERRORS .Fn send , .Fn sendto , @@ -267,3 +283,9 @@ The .Fn send function call appeared in .Bx 4.1c . +The +.Fn sendmmsg +syscall first appeared in Linux 3.0, was reimplemented for +.Nx 7.0 , +and ported to +.Ox 7.1 . Index: lib/libc/sys/w_recvmmsg.c === RCS file: lib/libc/sys/w_recvmmsg.c diff -N lib/libc/sys/w_recvmmsg.c --- /dev/null 1 Jan 1970 00:00:00 - +++ lib/libc/sys/w_recvmmsg.c 22 Apr 2022 16:03:30 - @@ -0,0 +1,32 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2021 Moritz Buhl + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permissio
pf_rollback_rules always returns 0
Hi, Since 2015 pf_rollback_rules always returns 0. Make it void. OK? mbuhl Index: sys/net/pf_ioctl.c === RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.376 diff -u -p -r1.376 pf_ioctl.c --- sys/net/pf_ioctl.c 4 Apr 2022 12:57:36 - 1.376 +++ sys/net/pf_ioctl.c 7 Apr 2022 13:14:28 - @@ -93,7 +93,7 @@ intpfopen(dev_t, int, int, struct pr int pfclose(dev_t, int, int, struct proc *); int pfioctl(dev_t, u_long, caddr_t, int, struct proc *); int pf_begin_rules(u_int32_t *, const char *); -int pf_rollback_rules(u_int32_t, char *); +voidpf_rollback_rules(u_int32_t, char *); voidpf_remove_queues(void); int pf_commit_queues(void); voidpf_free_queues(struct pf_queuehead *); @@ -537,7 +537,7 @@ pf_begin_rules(u_int32_t *ticket, const return (0); } -int +void pf_rollback_rules(u_int32_t ticket, char *anchor) { struct pf_ruleset *rs; @@ -546,7 +546,7 @@ pf_rollback_rules(u_int32_t ticket, char rs = pf_find_ruleset(anchor); if (rs == NULL || !rs->rules.inactive.open || rs->rules.inactive.ticket != ticket) - return (0); + return; while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) { pf_rm_rule(rs->rules.inactive.ptr, rule); rs->rules.inactive.rcount--; @@ -555,11 +555,9 @@ pf_rollback_rules(u_int32_t ticket, char /* queue defs only in the main ruleset */ if (anchor[0]) - return (0); + return; pf_free_queues(pf_queues_inactive); - - return (0); } void @@ -2597,14 +2595,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a } break; case PF_TRANS_RULESET: - if ((error = pf_rollback_rules(ioe->ticket, - ioe->anchor))) { - PF_UNLOCK(); - NET_UNLOCK(); - free(table, M_TEMP, sizeof(*table)); - free(ioe, M_TEMP, sizeof(*ioe)); - goto fail; /* really bad */ - } + pf_rollback_rules(ioe->ticket, ioe->anchor); break; default: PF_UNLOCK();
Re: possible memory leak in ipmi get_sdr
Any insights? On Mon, Jan 10, 2022 at 03:18:47PM +0100, Moritz Buhl wrote: > Hi tech@, > > The return value of add_child_sensors is returned in add_sdr_sensor, > which is called by get_sdr. get_sdr mallocs psdr and only frees it > if add_sdr_sensor returns 0. The assumption is that psdr is attached > to a list in add_child_sensors otherwise. This is not the case if > the malloc for psensor fails, or read_sensor() equals 0. In any > of these error cases psdr is leaked. > > Kind regards > mbuhl > > Index: sys/dev/ipmi.c > === > RCS file: /mount/openbsd/cvs/src/sys/dev/ipmi.c,v > retrieving revision 1.115 > diff -u -p -r1.115 ipmi.c > --- sys/dev/ipmi.c23 Jan 2021 12:10:08 - 1.115 > +++ sys/dev/ipmi.c10 Jan 2022 13:49:19 - > @@ -1374,7 +1374,7 @@ add_child_sensors(struct ipmi_softc *sc, > int sensor_num, int sensor_type, int ext_type, int sensor_base, > int entity, const char *name) > { > - int typ, idx; > + int typ, idx, rc = 0; > struct ipmi_sensor *psensor; > #ifdef IPMI_DEBUG > struct sdrtype1 *s1 = (struct sdrtype1 *)psdr; > @@ -1415,11 +1415,12 @@ add_child_sensors(struct ipmi_softc *sc, > dbg_printf(5, " reading: %lld [%s]\n", > psensor->i_sensor.value, > psensor->i_sensor.desc); > + rc = 1; > } else > free(psensor, M_DEVBUF, sizeof(*psensor)); > } > > - return (1); > + return (rc); > } > > /* Handle IPMI Timer - reread sensor values */ >
Re: possible memory leak in ipmi get_sdr
I thought I'd walk through my understanding of the code to make it easier to review. from sys/dev/ipmi.c: 1019 int 1020 get_sdr(struct ipmi_softc *sc, u_int16_t recid, u_int16_t *nxtrec) 1021 { ... 1046 psdr = malloc(sdrlen, M_DEVBUF, M_NOWAIT); 1047 if (psdr == NULL) 1048 return (1); Now psdr is allocated. In the following loop it is freed correctly on failure: 1062 free(psdr, M_DEVBUF, sdrlen); 1063 return (1); If everything works, we attach it to a list: 1067 /* Add SDR to sensor list, if not wanted, free buffer */ 1068 if (add_sdr_sensor(sc, psdr, sdrlen) == 0) 1069 free(psdr, M_DEVBUF, sdrlen); 1070 1071 return (0); So if add_sdr_sensor doesn't return 0 it has to keep a reference to psdr somewhere. 1335 int 1336 add_sdr_sensor(struct ipmi_softc *sc, u_int8_t *psdr, int sdrlen) 1337 { There are two switch cases that both call add_child_sensors and set rc with it, if no previous failures occour: 1349 rc = add_child_sensors(sc, psdr, 1, s1->sensor_num, 1350 s1->sensor_type, s1->event_code, 0, s1->entity_id, name); 1351 break; and 1358 rc = add_child_sensors(sc, psdr, s2->share1 & 0xF, 1359 s2->sensor_num, s2->sensor_type, s2->event_code, 1360 s2->share2 & 0x7F, s2->entity_id, name); 1361 break; These two calls are the only locations that can make add_sdr_sensor return something else than 0. 1370 int 1371 add_child_sensors(struct ipmi_softc *sc, u_int8_t *psdr, int count, 1372 int sensor_num, int sensor_type, int ext_type, int sensor_base, 1373 int entity, const char *name) 1374 { So as mentioned before, this function needs to return 0 or keep a reference to psdr. This is not the case in two situations, but to keep it simple I will only show one: 1388 psensor = malloc(sizeof(*psensor), M_DEVBUF, M_NOWAIT | M_ZERO); 1389 if (psensor == NULL) 1390 break; ... 1420 return (1); I hope this helps mbuhl
uninitialized stack memory possibly passed to m_freem
Hi tech@, https://github.com/openbsd/src/commit/0ea6bae06233cd25645df14602c3eda6bdff7dca.patch the patch forgot to add mrep to the info struct, nfsm_dissect could pass info.nmi_mrep to m_freem, which is currently uninitialized stack memory. Index: sys/nfs/nfs_subs.c === RCS file: /mount/openbsd/cvs/src/sys/nfs/nfs_subs.c,v retrieving revision 1.145 diff -u -p -r1.145 nfs_subs.c --- sys/nfs/nfs_subs.c 11 Jan 2022 03:13:59 - 1.145 +++ sys/nfs/nfs_subs.c 12 Jan 2022 09:31:52 - @@ -1841,6 +1841,7 @@ nfsm_srvsattr(struct mbuf **mp, struct v info.nmi_md = *mp; info.nmi_dpos = *dposp; + info.nmi_mrep = mrep; nfsm_dissect(tl, u_int32_t *, NFSX_UNSIGNED); if (*tl == nfs_true) {
possible memory leak in ipmi get_sdr
Hi tech@, The return value of add_child_sensors is returned in add_sdr_sensor, which is called by get_sdr. get_sdr mallocs psdr and only frees it if add_sdr_sensor returns 0. The assumption is that psdr is attached to a list in add_child_sensors otherwise. This is not the case if the malloc for psensor fails, or read_sensor() equals 0. In any of these error cases psdr is leaked. Kind regards mbuhl Index: sys/dev/ipmi.c === RCS file: /mount/openbsd/cvs/src/sys/dev/ipmi.c,v retrieving revision 1.115 diff -u -p -r1.115 ipmi.c --- sys/dev/ipmi.c 23 Jan 2021 12:10:08 - 1.115 +++ sys/dev/ipmi.c 10 Jan 2022 13:49:19 - @@ -1374,7 +1374,7 @@ add_child_sensors(struct ipmi_softc *sc, int sensor_num, int sensor_type, int ext_type, int sensor_base, int entity, const char *name) { - int typ, idx; + int typ, idx, rc = 0; struct ipmi_sensor *psensor; #ifdef IPMI_DEBUG struct sdrtype1 *s1 = (struct sdrtype1 *)psdr; @@ -1415,11 +1415,12 @@ add_child_sensors(struct ipmi_softc *sc, dbg_printf(5, " reading: %lld [%s]\n", psensor->i_sensor.value, psensor->i_sensor.desc); + rc = 1; } else free(psensor, M_DEVBUF, sizeof(*psensor)); } - return (1); + return (rc); } /* Handle IPMI Timer - reread sensor values */
bsd.regress.mk REGRESS_SETUP vs REGRESS_SETUP_ONCE
Dear tech@, given the following Makefile: # REGRESS_SETUP vs REGRESS_SETUP_ONCE example setup: @echo setup setup_once: @echo setup_once test1: @echo test1 test2: @echo test2 cleanup: @echo cleanup REGRESS_SETUP = setup REGRESS_SETUP_ONCE =setup_once REGRESS_TARGETS+= test1 test2 REGRESS_CLEANUP = cleanup .include I expected the following output: pbp$ make regress setup_once setup test1 setup test2 cleanup But instead I got: pbp$ make regress setup setup_once test1 setup test2 cleanup The diff below yields the expected result instead. Index: share/mk/bsd.regress.mk === RCS file: /cvs/src/share/mk/bsd.regress.mk,v retrieving revision 1.21 diff -u -p -r1.21 bsd.regress.mk --- share/mk/bsd.regress.mk 17 Jun 2019 17:20:24 - 1.21 +++ share/mk/bsd.regress.mk 24 Sep 2020 19:48:11 - @@ -67,16 +67,16 @@ REGRESS_SETUP?= REGRESS_SETUP_ONCE?= REGRESS_CLEANUP?= -.if !empty(REGRESS_SETUP) -${REGRESS_TARGETS}: ${REGRESS_SETUP} -.endif - .if !empty(REGRESS_SETUP_ONCE) CLEANFILES+=${REGRESS_SETUP_ONCE:S/^/stamp-/} ${REGRESS_TARGETS}: ${REGRESS_SETUP_ONCE:S/^/stamp-/} ${REGRESS_SETUP_ONCE:S/^/stamp-/}: .SILENT ${MAKE} -C ${.CURDIR} ${@:S/^stamp-//} date >$@ +.endif + +.if !empty(REGRESS_SETUP) +${REGRESS_TARGETS}: ${REGRESS_SETUP} .endif regress: .SILENT
Re: vfs posix compatibility change
Any comments? July 22, 2019 8:23 PM, "Moritz Buhl" wrote: > Hi, > > here is another POSIX compatibility change I found while running > NetBSD system call tests: > > The incompatibilities can be reproduced with the followding comands: > `mv / foo` > mv: rename / to foo: Is a directory > `man 2 rename` mentions EISDIR: > [EISDIR] to is a directory, but from is not a directory. > Which is mentioned similarly in POSIX. But it doesn't fit the current case. > > `mkdir /` > mkdir: /: Is a directory > EISDIR is not mentioned in `man 2 mkdir`. Also not mentioned in POSIX. > I believe EBUSY would be good here. > > `rmdir /` > rmdir: /: Is a directory > EISDIR is not mentioned in `man 2 rmdir`. Also not mentioned in POSIX. > > unlink("/"); > printf("%d\n", errno); > 21 // EISDIR > Also neither mentioned in man nor POSIX. > > Here is some POSIX references: > https://pubs.opengroup.org/onlinepubs/9699919799 > unlink (2): > [EBUSY] > The file named by the path argument cannot be unlinked because it is > being used by the system or another process and the implementation > considers this an error. > > rename (2): > [EBUSY] > The directory named by old or new is currently in use by the system or > another process, and the implementation considers this an error. > > rmdir (2): > [EBUSY] > The directory to be removed is currently in use by the system or > some process and the implementation considers this to be an error. > > mkdir (2): > [EEXIST] > The named file exists. > > Attached is a patch that will make mkdir return EEXIST, EBUSY for others. > I believe this corer case is quite essential as it is in namei because of > this (and because NetBSD and FreeBSD differ a lot in the vfs area) I hope > for some comments on this. > > The patched code hasn't changed a lot since 44BSD. duh POSIX. > There already were discussions on EISDIR in the past: (from unlink POSIX) > >> The standard developers reviewed TR 24715-2006 and noted that LSB-conforming >> implementations may >> return [EISDIR] instead of [EPERM] when unlinking a directory. A change to >> permit this behavior by >> changing the requirement for [EPERM] to [EPERM] or [EISDIR] was considered, >> but decided against >> since it would break existing strictly conforming and conforming >> applications. Applications written >> for portability to both POSIX.1-2017 and the LSB should be prepared to >> handle either error code. > > duh. > > thanks, > mbuhl > > Index: sys/kern/vfs_lookup.c > === > RCS file: /cvs/src/sys/kern/vfs_lookup.c,v > retrieving revision 1.80 > diff -u -p -r1.80 vfs_lookup.c > --- sys/kern/vfs_lookup.c 18 Jul 2019 18:06:17 - 1.80 > +++ sys/kern/vfs_lookup.c 22 Jul 2019 16:10:14 - > @@ -440,7 +440,10 @@ vfs_lookup(struct nameidata *ndp) > */ > if (cnp->cn_nameptr[0] == '\0') { > if (ndp->ni_dvp == NULL && wantparent) { > - error = EISDIR; > + if (cnp->cn_nameiop == CREATE) > + error = EEXIST; > + else > + error = EBUSY; > goto bad; > } > ndp->ni_vp = dp; > Index: lib/libc/sys/rename.2 > === > RCS file: /cvs/src/lib/libc/sys/rename.2,v > retrieving revision 1.22 > diff -u -p -r1.22 rename.2 > --- lib/libc/sys/rename.2 10 Sep 2015 17:55:21 - 1.22 > +++ lib/libc/sys/rename.2 22 Jul 2019 16:45:16 - > @@ -123,6 +123,12 @@ characters, or an entire pathname (inclu > exceeded > .Dv PATH_MAX > bytes. > +.It Bq Er EBUSY > +The directory containing > +.Fa from > +or > +.Fa to > +is currently in use by the system. > .It Bq Er ENOENT > A component of the > .Fa from
fix ssh unittest misc regression test
Hi, some ssh unittest ist failing because it links stuff together in another way. basically add 'sshbuf-misc.c' to SRCS. additional suggestion: make variable assignments similar to ports mbuhl Index: regress/usr.bin/ssh/unittests/misc/Makefile === RCS file: /mount/openbsd/cvs/src/regress/usr.bin/ssh/unittests/misc/Makefile,v retrieving revision 1.1 diff -u -p -r1.1 Makefile --- regress/usr.bin/ssh/unittests/misc/Makefile 28 Apr 2019 22:53:26 - 1.1 +++ regress/usr.bin/ssh/unittests/misc/Makefile 22 Aug 2019 20:06:46 - @@ -1,14 +1,15 @@ # $OpenBSD: Makefile,v 1.1 2019/04/28 22:53:26 dtucker Exp $ -PROG=test_misc -SRCS=tests.c +PROG = test_misc +SRCS = tests.c # From usr.bin/ssh/Makefile.inc -SRCS+=sshbuf.c sshbuf-getput-basic.c ssherr.c log.c xmalloc.c misc.c +SRCS +=sshbuf.c sshbuf-getput-basic.c sshbuf-misc.c ssherr.c log.c\ + xmalloc.c misc.c # From usr/bin/ssh/sshd/Makefile -SRCS+=atomicio.c cleanup.c fatal.c +SRCS +=atomicio.c cleanup.c fatal.c -REGRESS_TARGETS=run-regress-${PROG} +REGRESS_TARGETS = run-regress-${PROG} run-regress-${PROG}: ${PROG} env ${TEST_ENV} ./${PROG}
fix rpki client regression tests
Hi, some experts forgot to run the rpki-client regression tests after some changes. The tests are relinking parts of the source and this is not great but after all I still prefer some running tests. Patch attached. thanks, mbuhl Index: regress/usr.sbin/rpki-client/test-cert.c === RCS file: /mount/openbsd/cvs/src/regress/usr.sbin/rpki-client/test-cert.c,v retrieving revision 1.2 diff -u -p -r1.2 test-cert.c --- regress/usr.sbin/rpki-client/test-cert.c12 Aug 2019 18:03:17 - 1.2 +++ regress/usr.sbin/rpki-client/test-cert.c22 Aug 2019 19:38:14 - @@ -31,6 +31,7 @@ #include #include "extern.h" +int verbose = 0; static void cert_print(const struct cert *p) Index: regress/usr.sbin/rpki-client/test-ip.c === RCS file: /mount/openbsd/cvs/src/regress/usr.sbin/rpki-client/test-ip.c,v retrieving revision 1.3 diff -u -p -r1.3 test-ip.c --- regress/usr.sbin/rpki-client/test-ip.c 12 Aug 2019 18:03:17 - 1.3 +++ regress/usr.sbin/rpki-client/test-ip.c 22 Aug 2019 19:38:19 - @@ -30,6 +30,7 @@ #include #include "extern.h" +int verbose = 0; static void test(const char *res, uint16_t afiv, size_t sz, size_t unused, ...) Index: regress/usr.sbin/rpki-client/test-mft.c === RCS file: /mount/openbsd/cvs/src/regress/usr.sbin/rpki-client/test-mft.c,v retrieving revision 1.2 diff -u -p -r1.2 test-mft.c --- regress/usr.sbin/rpki-client/test-mft.c 12 Aug 2019 18:03:17 - 1.2 +++ regress/usr.sbin/rpki-client/test-mft.c 22 Aug 2019 19:38:22 - @@ -28,6 +28,7 @@ #include #include "extern.h" +int verbose = 0; static void mft_print(const struct mft *p) Index: regress/usr.sbin/rpki-client/test-roa.c === RCS file: /mount/openbsd/cvs/src/regress/usr.sbin/rpki-client/test-roa.c,v retrieving revision 1.2 diff -u -p -r1.2 test-roa.c --- regress/usr.sbin/rpki-client/test-roa.c 12 Aug 2019 18:03:17 - 1.2 +++ regress/usr.sbin/rpki-client/test-roa.c 22 Aug 2019 19:38:25 - @@ -28,6 +28,7 @@ #include #include "extern.h" +int verbose = 0; static void roa_print(const struct roa *p) Index: regress/usr.sbin/rpki-client/test-tal.c === RCS file: /mount/openbsd/cvs/src/regress/usr.sbin/rpki-client/test-tal.c,v retrieving revision 1.2 diff -u -p -r1.2 test-tal.c --- regress/usr.sbin/rpki-client/test-tal.c 12 Aug 2019 18:03:17 - 1.2 +++ regress/usr.sbin/rpki-client/test-tal.c 22 Aug 2019 19:38:28 - @@ -28,6 +28,7 @@ #include #include "extern.h" +int verbose = 0; static void tal_print(const struct tal *p)
unmount after realpath
Hi, the same as to unveil(2) also counts for realpath(3). https://marc.info/?l=openbsd-bugs=156469573812165=2 Similar diff attached. tests are here: https://github.com/moritzbuhl/realpath-unmount-regress thanks, mbuhl Index: sys/kern/vfs_syscalls.c === RCS file: /cvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.327 diff -u -p -r1.327 vfs_syscalls.c --- sys/kern/vfs_syscalls.c 25 Jul 2019 01:43:21 - 1.327 +++ sys/kern/vfs_syscalls.c 2 Aug 2019 12:45:12 - @@ -948,10 +948,10 @@ sys___realpath(struct proc *p, void *v, VOP_UNLOCK(nd.ni_vp); vrele(nd.ni_vp); } - if (nd.ni_dvp && nd.ni_dvp != nd.ni_vp){ + if (nd.ni_dvp && nd.ni_dvp != nd.ni_vp) VOP_UNLOCK(nd.ni_dvp); + if (nd.ni_dvp) vrele(nd.ni_dvp); - } error = copyoutstr(nd.ni_cnd.cn_rpbuf, SCARG(uap, resolved), MAXPATHLEN, NULL);
vfs posix compatibility change
Hi, here is another POSIX compatibility change I found while running NetBSD system call tests: The incompatibilities can be reproduced with the followding comands: `mv / foo` mv: rename / to foo: Is a directory `man 2 rename` mentions EISDIR: [EISDIR] to is a directory, but from is not a directory. Which is mentioned similarly in POSIX. But it doesn't fit the current case. `mkdir /` mkdir: /: Is a directory EISDIR is not mentioned in `man 2 mkdir`. Also not mentioned in POSIX. I believe EBUSY would be good here. `rmdir /` rmdir: /: Is a directory EISDIR is not mentioned in `man 2 rmdir`. Also not mentioned in POSIX. unlink("/"); printf("%d\n", errno); 21 // EISDIR Also neither mentioned in man nor POSIX. Here is some POSIX references: https://pubs.opengroup.org/onlinepubs/9699919799/ unlink (2): [EBUSY] The file named by the path argument cannot be unlinked because it is being used by the system or another process and the implementation considers this an error. rename (2): [EBUSY] The directory named by old or new is currently in use by the system or another process, and the implementation considers this an error. rmdir (2): [EBUSY] The directory to be removed is currently in use by the system or some process and the implementation considers this to be an error. mkdir (2): [EEXIST] The named file exists. Attached is a patch that will make mkdir return EEXIST, EBUSY for others. I believe this corer case is quite essential as it is in namei because of this (and because NetBSD and FreeBSD differ a lot in the vfs area) I hope for some comments on this. The patched code hasn't changed a lot since 44BSD. duh POSIX. There already were discussions on EISDIR in the past: (from unlink POSIX) > The standard developers reviewed TR 24715-2006 and noted that LSB-conforming > implementations may return [EISDIR] instead of [EPERM] when unlinking a > directory. A change to permit this behavior by changing the requirement for > [EPERM] to [EPERM] or [EISDIR] was considered, but decided against since it > would break existing strictly conforming and conforming applications. > Applications written for portability to both POSIX.1-2017 and the LSB should > be prepared to handle either error code. duh. thanks, mbuhl Index: sys/kern/vfs_lookup.c === RCS file: /cvs/src/sys/kern/vfs_lookup.c,v retrieving revision 1.80 diff -u -p -r1.80 vfs_lookup.c --- sys/kern/vfs_lookup.c 18 Jul 2019 18:06:17 - 1.80 +++ sys/kern/vfs_lookup.c 22 Jul 2019 16:10:14 - @@ -440,7 +440,10 @@ vfs_lookup(struct nameidata *ndp) */ if (cnp->cn_nameptr[0] == '\0') { if (ndp->ni_dvp == NULL && wantparent) { - error = EISDIR; + if (cnp->cn_nameiop == CREATE) + error = EEXIST; + else + error = EBUSY; goto bad; } ndp->ni_vp = dp; Index: lib/libc/sys/rename.2 === RCS file: /cvs/src/lib/libc/sys/rename.2,v retrieving revision 1.22 diff -u -p -r1.22 rename.2 --- lib/libc/sys/rename.2 10 Sep 2015 17:55:21 - 1.22 +++ lib/libc/sys/rename.2 22 Jul 2019 16:45:16 - @@ -123,6 +123,12 @@ characters, or an entire pathname (inclu exceeded .Dv PATH_MAX bytes. +.It Bq Er EBUSY +The directory containing +.Fa from +or +.Fa to +is currently in use by the system. .It Bq Er ENOENT A component of the .Fa from
Re: make msgsnd(2) more posix
> Moritz, can you create a man page ERRORS diff? Is the attached diff ok? mbuhl Index: lib/libc/sys/msgsnd.2 === RCS file: /cvs/src/lib/libc/sys/msgsnd.2,v retrieving revision 1.19 diff -u -p -r1.19 msgsnd.2 --- lib/libc/sys/msgsnd.2 10 Dec 2014 19:19:00 - 1.19 +++ lib/libc/sys/msgsnd.2 18 Jul 2019 11:21:48 - @@ -127,6 +127,9 @@ will fail if: .Fa msqid is not a valid message queue identifier. .Pp +.Va mtype +is less than 1. +.Pp .Fa msgsz is greater than .Va msg_qbytes .
make msgsnd(2) more posix
Hi, while running some ported tests I noticed that msgsnd(2) did not error when passing a message with mtype < 1, even tho the manual page states: > mtype is an integer greater than 0 that can be used POSIX says: https://pubs.opengroup.org/onlinepubs/9699919799/ [EINVAL] The value of msqid is not a valid message queue identifier, or the value of mtype is less than 1; or the value of msgsz is greater than the system-imposed limit. Here is a minimal test: #include struct msg { long mtype; char buf[3]; }; int main(void) { struct msg msg = { 0, { 'a', 'b', 'c' } }; int id; id = msgget(1234, IPC_CREAT | 0600); // man 2 msgsnd: mtype is an integer greater than 0 ... if (msgsnd(id, , sizeof(struct msg), IPC_NOWAIT) == -1) return 0; return 1; } and the patch follows. greetings, mbuhl Index: sys/kern/sysv_msg.c === RCS file: /cvs/src/sys/kern/sysv_msg.c,v retrieving revision 1.34 diff -u -p -r1.34 sysv_msg.c --- sys/kern/sysv_msg.c 5 Dec 2018 15:42:45 - 1.34 +++ sys/kern/sysv_msg.c 12 Jul 2019 15:03:16 - @@ -588,7 +588,7 @@ msg_copyin(struct msg *msg, const char * return (error); } - if (msg->msg_type < 0) { + if (msg->msg_type <= 0) { msg_free(msg); return (EINVAL); }
Re: problems with libm
Hi, I made the FreeBSD msun regression tests compile on OpenBSD. https://github.com/moritzbuhl/msun-regress 3 out of 19 test files pass. 14 files die after the first error case. Two files (ctrig_test.c and trig_test.c) use atf and after some hacks they report all error cases. 840 for ctrig_test and 88 for trig_test. These test files should be reviewed carefully as I know for sure that many don't work on i386 (adding some volatile keywords usually helps). I believe all these errors paint a good picture. I will be looking into fixing what I can.
listen(2) sets wrong errno
Hi, while running some NetBSD syscall tests I noticed that listen(2) returns the wrong errno when a socket is already connected. man 2 listen already mentions EINVAL. NetBSD bug: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=46150 NetBSD patch: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/uipc_socket.c.diff?r1=1.209=1.210_with_tag=MAIN Spec: https://pubs.opengroup.org/onlinepubs/9699919799/ patch attached, mbuhl Index: sys/kern/uipc_socket.c === RCS file: /mount/openbsd/cvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.232 diff -u -p -r1.232 uipc_socket.c --- sys/kern/uipc_socket.c 4 Jul 2019 17:42:17 - 1.232 +++ sys/kern/uipc_socket.c 9 Jul 2019 16:31:50 - @@ -172,7 +172,7 @@ solisten(struct socket *so, int backlog) int s, error; if (so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING|SS_ISDISCONNECTING)) - return (EOPNOTSUPP); + return (EINVAL); #ifdef SOCKET_SPLICE if (isspliced(so) || issplicedback(so)) return (EOPNOTSUPP);
getgroups(2) with negative values
Hi, while porting some NetBSD syscall tests to OpenBSD I noticed that the getgroups test is failing. Simply put: gid_t gidset[NGROUPS_MAX]; getgroups(-1, gidset); This was fixed on NetBSD 8 years ago: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_prot.c > if (SCARG(uap, gidsetsize) < (int)*retval) > return EINVAL; While here, also remove the u_int in setgroups. POSIX does't say a lot about setgroups and therefore return EINVAL. https://pubs.opengroup.org/onlinepubs/9699919799/ Index: sys/kern/kern_prot.c === RCS file: /cvs/src/sys/kern/kern_prot.c,v retrieving revision 1.75 diff -u -p -r1.75 kern_prot.c --- sys/kern/kern_prot.c22 Jun 2018 13:33:30 - 1.75 +++ sys/kern/kern_prot.c8 Jul 2019 12:13:04 - @@ -196,7 +196,7 @@ sys_getgroups(struct proc *p, void *v, r syscallarg(gid_t *) gidset; } */ *uap = v; struct ucred *uc = p->p_ucred; - u_int ngrp; + int ngrp; int error; if ((ngrp = SCARG(uap, gidsetsize)) == 0) { @@ -870,13 +870,13 @@ sys_setgroups(struct proc *p, void *v, r struct process *pr = p->p_p; struct ucred *pruc, *newcred; gid_t groups[NGROUPS_MAX]; - u_int ngrp; + int ngrp; int error; if ((error = suser(p)) != 0) return (error); ngrp = SCARG(uap, gidsetsize); - if (ngrp > NGROUPS_MAX) + if (ngrp > NGROUPS_MAX || ngrp < 0) return (EINVAL); error = copyin(SCARG(uap, gidset), groups, ngrp * sizeof(gid_t)); if (error == 0) {
fix format strings in relayd
Hi, as promised a patch to relayd to enable formatting warnings for yyerror. Greetings, Moritz Buhl Index: usr.sbin/relayd/parse.y === RCS file: /cvs/src/usr.sbin/relayd/parse.y,v retrieving revision 1.238 diff -u -p -r1.238 parse.y --- usr.sbin/relayd/parse.y 31 May 2019 15:25:57 - 1.238 +++ usr.sbin/relayd/parse.y 4 Jul 2019 14:59:36 - @@ -75,7 +75,9 @@ intpopfile(void); int check_file_secrecy(int, const char *); int yyparse(void); int yylex(void); -int yyerror(const char *, ...); +int yyerror(const char *, ...) +__attribute__((__format__ (printf, 1, 2))) +__attribute__((__nonnull__ (1))); int kw_cmp(const void *, const void *); int lookup(char *); int igetc(void); @@ -350,7 +352,7 @@ port: PORT HTTP { } | PORT NUMBER { if ($2 <= 0 || $2 > (int)USHRT_MAX) { - yyerror("invalid port: %d", $2); + yyerror("invalid port: %lld", $2); YYERROR; } $$.val[0] = htons($2); @@ -389,7 +391,7 @@ sendbuf : NOTHING { main : INTERVAL NUMBER { if ((conf->sc_conf.interval.tv_sec = $2) < 0) { - yyerror("invalid interval: %d", $2); + yyerror("invalid interval: %lld", $2); YYERROR; } } @@ -403,7 +405,7 @@ main: INTERVAL NUMBER { | PREFORK NUMBER{ if ($2 <= 0 || $2 > PROC_MAX_INSTANCES) { yyerror("invalid number of preforked " - "relays: %d", $2); + "relays: %lld", $2); YYERROR; } conf->sc_conf.prefork_relay = $2; @@ -895,7 +897,7 @@ tablecheck : ICMP { table->conf.check } table->conf.check = CHECK_HTTP_CODE; if ((table->conf.retcode = $5) <= 0) { - yyerror("invalid HTTP code: %d", $5); + yyerror("invalid HTTP code: %lld", $5); free($2); free($3); YYERROR; @@ -1094,7 +1096,7 @@ httpflags_l : httpflags comma httpflags_ httpflags : HEADERLEN NUMBER { if ($2 < 0 || $2 > RELAY_MAXHEADERLENGTH) { - yyerror("invalid headerlen: %d", $2); + yyerror("invalid headerlen: %lld", $2); YYERROR; } proto->httpheaderlen = $2; @@ -1115,7 +1117,7 @@ tcpflags : SACK { proto->tcpflags |= T | NO SPLICE { proto->tcpflags |= TCPFLAG_NSPLICE; } | BACKLOG NUMBER{ if ($2 < 0 || $2 > RELAY_MAX_BACKLOG) { - yyerror("invalid backlog: %d", $2); + yyerror("invalid backlog: %lld", $2); YYERROR; } proto->tcpbacklog = $2; @@ -1123,13 +1125,13 @@ tcpflags: SACK { proto->tcpflags |= T | SOCKET BUFFER NUMBER { proto->tcpflags |= TCPFLAG_BUFSIZ; if ((proto->tcpbufsiz = $3) < 0) { - yyerror("invalid socket buffer size: %d", $3); + yyerror("invalid socket buffer size: %lld", $3); YYERROR; } } | IP STRING NUMBER { if ($3 < 0) { - yyerror("invalid ttl: %d", $3); + yyerror("invalid ttl: %lld", $3); free($2); YYERROR; } @@ -2072,7 +2074,7 @@ routeoptsl: ROUTE addrprefix { YYERROR; } if ($2 < 0 || $2 > RT_TABLEID_MAX) { - yyerror("invalid rtable id %d", $2); +
fix pfctl regress on armv7
Hi, due to the pfctl regression tests failing on armv7 I noticed that the yyerror function is having the usual format attributes. This lead to garbage in error messages on armv7. I added the directives and then adjusted the formatting to what the compiler suggested. The same goes for relayd. A patch will follow. Thanks for feedback and comments, Moritz Buhl Index: sbin/pfctl/parse.y === RCS file: /cvs/src/sbin/pfctl/parse.y,v retrieving revision 1.696 diff -u -p -r1.696 parse.y --- sbin/pfctl/parse.y 8 May 2019 21:31:30 - 1.696 +++ sbin/pfctl/parse.y 4 Jul 2019 14:34:29 - @@ -83,7 +83,9 @@ intpopfile(void); int check_file_secrecy(int, const char *); int yyparse(void); int yylex(void); -int yyerror(const char *, ...); +int yyerror(const char *, ...) +__attribute__((__format__ (printf, 1, 2))) +__attribute__((__nonnull__ (1))); int kw_cmp(const void *, const void *); int lookup(char *); int igetc(void); @@ -1041,7 +1043,7 @@ scrub_opt : NODF { YYERROR; } if ($2 < 0 || $2 > 255) { - yyerror("illegal min-ttl value %d", $2); + yyerror("illegal min-ttl value %lld", $2); YYERROR; } scrub_opts.marker |= FOM_MINTTL; @@ -1053,7 +1055,7 @@ scrub_opt : NODF { YYERROR; } if ($2 < 0 || $2 > 65535) { - yyerror("illegal max-mss value %d", $2); + yyerror("illegal max-mss value %lld", $2); YYERROR; } scrub_opts.marker |= FOM_MAXMSS; @@ -2218,7 +2220,7 @@ filter_set: prio { YYERROR; } if ($2 < 0 || $2 > 0x) { - yyerror("illegal delay value %d (0-%u)", $2, + yyerror("illegal delay value %lld (0-%u)", $2, 0x); YYERROR; } @@ -2289,7 +2291,7 @@ blockspec : /* empty */ { } | RETURNRST '(' TTL NUMBER ')' { if ($4 < 0 || $4 > 255) { - yyerror("illegal ttl value %d", $4); + yyerror("illegal ttl value %lld", $4); YYERROR; } $$.b2 = PFRULE_RETURNRST; @@ -2339,7 +2341,7 @@ reticmpspec : STRING{ u_int8_ticmptype; if ($1 < 0 || $1 > 255) { - yyerror("invalid icmp code %lu", $1); + yyerror("invalid icmp code %lld", $1); YYERROR; } icmptype = returnicmpdefault >> 8; @@ -2358,7 +2360,7 @@ reticmp6spec : STRING{ u_int8_ticmptype; if ($1 < 0 || $1 > 255) { - yyerror("invalid icmp code %lu", $1); + yyerror("invalid icmp code %lld", $1); YYERROR; } icmptype = returnicmp6default >> 8; @@ -2788,7 +2790,7 @@ host : STRING{ if (strlcpy($$->addr.v.rtlabelname, $2, sizeof($$->addr.v.rtlabelname)) >= sizeof($$->addr.v.rtlabelname)) { - yyerror("route label too long, max %u chars", + yyerror("route label too long, max %lu chars", sizeof($$->addr.v.rtlabelname) - 1); free($2); free($$); @@ -3014,7 +3016,7 @@ uid : STRING{ } | NUMBER{ if ($1 < 0 || $1 >= UID_MAX) { - yyerror("illegal uid value %lu", $1); + yyerror("illegal uid value %lld", $1); YYERROR;
port NetBSD libc tests
Hi, I started porting some NetBSD libc tests. They use atf which is a lot more than necessary for most tests. Attached is a passing test file for access(2). I would continue the same way with the other tests if people say it is ok to work around atf this way. I only changed two includes in t_access.c and left the file as it is. This is not possible for all other tests as NetBSD has e.g. fchroot and a differently behaving dup3. I appreciate comments and feedback. Thanks, mbuhl Index: regress/lib/libc/sys/Makefile === RCS file: regress/lib/libc/sys/Makefile diff -N regress/lib/libc/sys/Makefile --- /dev/null 1 Jan 1970 00:00:00 - +++ regress/lib/libc/sys/Makefile 1 Jul 2019 06:53:18 - @@ -0,0 +1,10 @@ +PROGS += t_access + +.for t in ${PROGS} +REGRESS_TARGETS+= run-$t +run-$t: $t + @echo "\n $@ " + ./$t +.endfor + +.include Index: regress/lib/libc/sys/atf-c.h === RCS file: regress/lib/libc/sys/atf-c.h diff -N regress/lib/libc/sys/atf-c.h --- /dev/null 1 Jan 1970 00:00:00 - +++ regress/lib/libc/sys/atf-c.h27 Jun 2019 19:07:52 - @@ -0,0 +1,66 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2019 Moritz Buhl + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#if !defined(ATF_C_H) +#define ATF_C_H + +#include + +#include + +/* error.h */ + +/* macros.h */ +#define ATF_TC(fn) \ +void atf_##fn(void); \ +void atf_head_##fn(void); \ +void atf_body_##fn(void); \ +void \ +atf_##fn(void) \ +{ \ + atf_head_##fn();\ + atf_body_##fn();\ +} + +#define ATF_TC_WITH_CLEANUP(fn) \ +void atf_##fn(void); \ +void atf_head_##fn(void); \ +void atf_body_##fn(void); \ +void atf_cleanup_##fn(void); \ +void \ +atf_##fn(void) \ +{ \ + atf_head_##fn();\ + atf_body_##fn();\ + atf_cleanup_##fn(); \ +} + +#define ATF_TC_HEAD(fn, tc)void atf_head_##fn(void) +#define ATF_TC_BODY(fn, tc)void atf_body_##fn(void) +#define ATF_TC_CLEANUP(fn, tc) void atf_cleanup_##fn(void) + +#define ATF_TP_ADD_TCS(tp) int main(void) +#define ATF_TP_ADD_TC(tp, fn) atf_##fn() + +#define atf_tc_set_md_var(tc, attr, fmt, ...) \ + printf(attr ": " fmt "\n", ##__VA_ARGS__) + +#define ATF_REQUIRE(exp) if (!(exp)) err(1, __func__) + +#define atf_no_error() 0 + +#endif /* !defined(ATF_C_H) */ Index: regress/lib/libc/sys/macros.h === RCS file: regress/lib/libc/sys/macros.h diff -N regress/lib/libc/sys/macros.h --- /dev/null 1 Jan 1970 00:00:00 - +++ regress/lib/libc/sys/macros.h 27 Jun 2019 19:07:26 - @@ -0,0 +1,7 @@ +/* $OpenBSD$ */ +/* Public domain - Moritz Buhl */ + +#include + +#define __RCSID(str) +#define __arraycount(_a) nitems(_a) Index: regress/lib/libc/sys/t_access.c === RCS file: regress/lib/libc/sys/t_access.c diff -N regress/lib/libc/sys/t_access.c --- /dev/null 1 Jan 1970 00:00:00 - +++ regress/lib/libc/sys/t_access.c 27 Jun 2019 19:05:27
problems with libm
Hi, while testing arm hardware on OpenBSD I noticed that some floating point operations cause failures of other tests. In fact the current libm is incorrect according to the ISO C Std Annex G. I found this out after porting some FreeBSD lib msun tests. Many edge cases for complex floating point operations are not covered at all. My question on this is what I should do about this. Port the FreeBSD msun library? Ignore the problem? Patch the current implementation? I attached a test that is currently breaking. There are many more. To fix these I just copied the FreeBSD file that implements the failing function. But I am not sure if this is a good approach. mbuhl Index: regress/lib/libm/msun/Makefile === RCS file: /cvs/src/regress/lib/libm/msun/Makefile,v retrieving revision 1.1.1.1 diff -u -p -r1.1.1.1 Makefile --- regress/lib/libm/msun/Makefile 21 Feb 2019 16:14:03 - 1.1.1.1 +++ regress/lib/libm/msun/Makefile 31 May 2019 19:50:32 - @@ -1,6 +1,7 @@ # $OpenBSD: Makefile,v 1.1.1.1 2019/02/21 16:14:03 bluhm Exp $ TESTS = +TESTS += cexp_test TESTS += conj_test TESTS += fenv_test TESTS += lrint_test Index: regress/lib/libm/msun/cexp_test.c === RCS file: regress/lib/libm/msun/cexp_test.c diff -N regress/lib/libm/msun/cexp_test.c --- /dev/null 1 Jan 1970 00:00:00 - +++ regress/lib/libm/msun/cexp_test.c 1 Jun 2019 05:40:51 - @@ -0,0 +1,326 @@ +/*- + * Copyright (c) 2008-2011 David Schultz + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* + * Tests for corner cases in cexp*(). + */ + +#include + +#include + +#include +#include +#include +#include +#include +#include + +#include "test-utils.h" + +#pragma STDC FENV_ACCESS ON +#pragmaSTDC CX_LIMITED_RANGE OFF + +/* + * Test that a function returns the correct value and sets the + * exception flags correctly. The exceptmask specifies which + * exceptions we should check. We need to be lenient for several + * reasons, but mainly because on some architectures it's impossible + * to raise FE_OVERFLOW without raising FE_INEXACT. In some cases, + * whether cexp() raises an invalid exception is unspecified. + * + * These are macros instead of functions so that assert provides more + * meaningful error messages. + * + * XXX The volatile here is to avoid gcc's bogus constant folding and work + * around the lack of support for the FENV_ACCESS pragma. + */ +#definetest(func, z, result, exceptmask, excepts, checksign) do { \ + volatile long double complex _d = z;\ + assert(feclearexcept(FE_ALL_EXCEPT) == 0); \ + assert(cfpequal_cs((func)(_d), (result), (checksign))); \ + assert(((void)(func), fetestexcept(exceptmask) == (excepts))); \ +} while (0) + +#define test_c(t, func, z, result, exceptmask, excepts, checksign) do { \ +volatile t complex r = result; \ +test(func, z, r, exceptmask, excepts, checksign); \ +} while (0) + +#define test_f(func, z, result, exceptmask, excepts, checksign) do {\ +test_c(float, func, z, result, exceptmask, excepts, checksign); \ +} while (0) + +/* Test within a given tolerance. */ +#definetest_tol(func, z, result, tol) do { \ + volatile long double complex _d = z;\ + assert(cfpequal_tol((func)(_d), (result), (tol),\ + FPE_ABS_ZERO | CS_BOTH)); \ +} while
Fix regress/sys/rtable for octeon
Hi, while running regression tests on an octeon ERL, I noticed that no-macro-redefined flag was not known to the compiler. I added some #undefs and some hours ago a change to art_walk made the tests fail too. The no-macro-redefine was added 3 month ago. Thanks, mbuhl Index: regress/sys/net/rtable/Makefile.inc === RCS file: /mount/openbsd/cvs/src/regress/sys/net/rtable/Makefile.inc,v retrieving revision 1.4 diff -u -p -r1.4 Makefile.inc --- regress/sys/net/rtable/Makefile.inc 31 Mar 2019 14:03:40 - 1.4 +++ regress/sys/net/rtable/Makefile.inc 21 Jun 2019 18:26:06 - @@ -9,6 +9,6 @@ SRCS+= art.c CFLAGS+= -DART .endif -CPPFLAGS+= -I${TOPDIR} -Wall -Wno-macro-redefined +CPPFLAGS+= -I${TOPDIR} -Wall .PATH: ${TOPDIR} ${TOPDIR}/../../../../sys/net Index: regress/sys/net/rtable/srp_compat.h === RCS file: /mount/openbsd/cvs/src/regress/sys/net/rtable/srp_compat.h,v retrieving revision 1.5 diff -u -p -r1.5 srp_compat.h --- regress/sys/net/rtable/srp_compat.h 27 Jul 2017 13:34:30 - 1.5 +++ regress/sys/net/rtable/srp_compat.h 21 Jun 2019 14:38:33 - @@ -48,7 +48,9 @@ srp_swap_locked(struct srp *srp, void *n */ #define SRPL_INIT(_sl) SLIST_INIT(_sl) +#undef SRPL_HEAD #define SRPL_HEAD(name, entry) SLIST_HEAD(name, entry) +#undef SRPL_ENTRY #define SRPL_ENTRY(type) SLIST_ENTRY(type) #define SRPL_FIRST(_sr, _sl) SLIST_FIRST(_sl); Index: regress/sys/net/rtable/delete/main.c === RCS file: /mount/openbsd/cvs/src/regress/sys/net/rtable/delete/main.c,v retrieving revision 1.5 diff -u -p -r1.5 main.c --- regress/sys/net/rtable/delete/main.c15 Nov 2016 10:43:41 - 1.5 +++ regress/sys/net/rtable/delete/main.c21 Jun 2019 18:20:20 - @@ -58,9 +58,9 @@ main(int argc, char *argv[]) do_from_file(0, AF_INET6, filename, route_insert); - rtable_walk(0, AF_INET6, rtentry_delete, NULL); + rtable_walk(0, AF_INET6, NULL, rtentry_delete, NULL); - rtable_walk(0, AF_INET6, rtentry_dump, NULL); + rtable_walk(0, AF_INET6, NULL, rtentry_dump, NULL); #ifdef ART struct art_root *ar; Index: regress/sys/net/rtable/fullfeed/main.c === RCS file: /mount/openbsd/cvs/src/regress/sys/net/rtable/fullfeed/main.c,v retrieving revision 1.3 diff -u -p -r1.3 main.c --- regress/sys/net/rtable/fullfeed/main.c 15 Nov 2016 10:43:41 - 1.3 +++ regress/sys/net/rtable/fullfeed/main.c 21 Jun 2019 18:20:38 - @@ -53,7 +53,7 @@ main(int argc, char *argv[]) do_from_file(0, af, filename, route_insert); do_from_file(0, af, filename, route_lookup); - rtable_walk(0, af, rtentry_dump, NULL); + rtable_walk(0, af, NULL, rtentry_dump, NULL); do_from_file(0, af, filename, route_delete);
iked regression test with obj dir
Hi, I noticed that the iked regress test fails if I have an obj directory. The following patch adresses this. mbuhl Index: regress/sbin/iked/parser/Makefile === RCS file: /mount/openbsd/cvs//src/regress/sbin/iked/parser/Makefile,v retrieving revision 1.2 diff -u -p -r1.2 Makefile --- regress/sbin/iked/parser/Makefile 30 May 2017 15:36:13 - 1.2 +++ regress/sbin/iked/parser/Makefile 21 Jun 2019 02:06:02 - @@ -17,7 +17,7 @@ test_parser: ${IKEOBJS} ${IKEOBJS}: cd ${.CURDIR}/../../../../sbin/iked && make $@ - ln -sf ${.OBJDIR}/../../../../sbin/iked/$@ . + ln -sf ${.CURDIR}/../../../../sbin/iked/$@ . LDADD+=-L${.OBJDIR} -ltest_helper DPADD+=libtest_helper.a
user mod -u lead to segfault
Hi, when using user mod -u, my machine ran into a segfault and the passwd file was left locked. I belief this is due to the changes to getpwnam etc. Sadly I am unable to figure out, why exactly the call to getpwuid segfaults. But replacing it with a call to user_from_uid helped. I could not crash the program with other flags that use getpw{nam,uid}. Thanks, Moritz Buhl Index: user.c === RCS file: /cvs/src/usr.sbin/user/user.c,v retrieving revision 1.122 diff -u -p -r1.122 user.c --- user.c 26 Sep 2018 14:54:58 - 1.122 +++ user.c 8 Oct 2018 17:00:16 - @@ -1515,7 +1515,8 @@ moduser(char *login_name, char *newlogin } if (up->u_flags & F_UID) { /* check uid isn't already allocated */ - if (!(up->u_flags & F_DUPUID) && getpwuid((uid_t)(up->u_uid)) != NULL) { + if (!(up->u_flags & F_DUPUID) && + user_from_uid((uid_t)(up->u_uid), 1) != NULL) { close(ptmpfd); pw_abort(); errx(EXIT_FAILURE, "uid %u is already in use", up->u_uid);
top cpu stats are wrong with hyper threading
Hi, Due to the SMT stuff the output of top showed the first few cpus instead of the ones that are actually active. To reproduce the bad output: Use a machine with hyper therading, top should show half the cpus, of which every second is disabled. The following diff skips the disabled cpus and disabling/reenabling the cores with hw.smt also works. The only problem is that the lines reserved for cpu-stats does not change with reenabling. Refreshing the output with '1' or resizing the window should help. Thanks, Moritz Buhl Index: display.c === RCS file: /home/mbuhl/cvs/src/usr.bin/top/display.c,v retrieving revision 1.55 diff -u -p -r1.55 display.c --- display.c 26 Sep 2018 17:23:13 - 1.55 +++ display.c 5 Oct 2018 15:19:00 - @@ -381,7 +381,7 @@ cpustates_tag(int cpu) void i_cpustates(int64_t *ostates, int *online) { - int i, first, cpu, ncpuonline; + int i, first, cpu, ncpuonline, off; double value; int64_t *states; char **names, *thisname; @@ -434,15 +434,18 @@ i_cpustates(int64_t *ostates, int *onlin } return; } + off = 0; for (cpu = 0; cpu < ncpu; cpu++) { /* now walk thru the names and print the line */ names = cpustate_names; first = 0; states = ostates + (CPUSTATES * cpu); - if ((screen_length > 2 + cpu && 2 + cpu < y_mem) || + if (y_mem == ncpuonline + 2 && !online[cpu]) + continue; + if ((screen_length > 2 + cpu && 2 + off < y_mem) || !smart_terminal) { - move(2 + cpu, 0); + move(2 + off, 0); clrtoeol(); addstrp(cpustates_tag(cpu)); @@ -465,6 +468,7 @@ i_cpustates(int64_t *ostates, int *onlin } putn(); } + off++; } }
wdc polling lead to unclean fs and panics
Hi, when rebooting a PowerBook G4, the kernel sometimes paniced while detaching the disk. Additionallly, during boot it was noted that / is not clean. The following patch special cases polling to not return early anymore and adds polling to all previous elements in the queue. Please tell me if this is not the right way to address the problem. Thanks, mbuhl Index: dev/ic/wdc.c === RCS file: /cvs/src/sys/dev/ic/wdc.c,v retrieving revision 1.134 diff -u -p -r1.134 wdc.c --- dev/ic/wdc.c30 Dec 2017 23:08:29 - 1.134 +++ dev/ic/wdc.c2 Oct 2018 14:01:04 - @@ -888,7 +888,8 @@ wdcstart(struct channel_softc *chp) /* adjust chp, in case we have a shared queue */ chp = xfer->chp; - if ((chp->ch_flags & WDCF_ACTIVE) != 0 ) { + if ((chp->ch_flags & WDCF_ACTIVE) != 0 && + (xfer->c_flags & C_POLL) == 0) { return; /* channel already active */ } #ifdef DIAGNOSTIC @@ -1905,6 +1906,7 @@ wdccommandshort(struct channel_softc *ch void wdc_exec_xfer(struct channel_softc *chp, struct wdc_xfer *xfer) { + struct wdc_xfer *iter; WDCDEBUG_PRINT(("wdc_exec_xfer %p flags 0x%x channel %d drive %d\n", xfer, xfer->c_flags, chp->channel, xfer->drive), DEBUG_XFERS); @@ -1918,7 +1920,8 @@ wdc_exec_xfer(struct channel_softc *chp, */ if ((xfer->c_flags & C_POLL) != 0 && !TAILQ_EMPTY(>ch_queue->sc_xfer)) { - TAILQ_INIT(>ch_queue->sc_xfer); + TAILQ_FOREACH(iter, >ch_queue->sc_xfer, c_xferchain) + iter->c_flags |= C_POLL; } /* insert at the end of command list */ TAILQ_INSERT_TAIL(>ch_queue->sc_xfer,xfer , c_xferchain);
Unplugging USB network stick during tcpdump crashes system
Hello, Removing my usb network interface during `tcpdump -i rum0` caused my kernel to crash. A fix is attached. Thanks, Moritz Buhl Index: net/bpf.c === RCS file: /cvs/src/sys/net/bpf.c,v retrieving revision 1.169 diff -u -p -r1.169 bpf.c --- net/bpf.c 2 Mar 2018 16:57:41 - 1.169 +++ net/bpf.c 4 Jul 2018 19:42:35 - @@ -332,7 +332,8 @@ bpf_detachd(struct bpf_d *d) mtx_enter(>bd_mtx); bpf_put(d); - if (error && !(error == EINVAL || error == ENODEV)) + if (error && !(error == EINVAL || error == ENODEV || + error == ENXIO)) /* * Something is really wrong if we were able to put * the driver into promiscuous mode, but can't
Re: Remove vgrind in ctags
Sorry, I noticed that vgrind still is in the ports after writing the mail. I will do more research before writing a patch next time. On 11/08/16 11:49, Theo Buehler wrote: On Tue, Nov 08, 2016 at 11:42:17AM +0100, Theo Buehler wrote: On Tue, Nov 08, 2016 at 10:53:37AM +0100, Moritz Buhl wrote: Hi, since vgrind was removed in 4.9 I removed the -v option in ctags. At least the example in ctags(1) could be modified. Greetings Moritz Buhl The diff got mangled by your mail program, please resend it. The synopsis in ctags.1 would also need to be adjusted by removing v. ? ctags-vgrind.diff and is there some part of your diff missing? vflag also appears in ctags.h and print.c
Remove vgrind in ctags
Hi, since vgrind was removed in 4.9 I removed the -v option in ctags. At least the example in ctags(1) could be modified. Greetings Moritz Buhl ? ctags-vgrind.diff Index: ctags.1 === RCS file: /cvs/src/usr.bin/ctags/ctags.1,v retrieving revision 1.33 diff -u -p -r1.33 ctags.1 --- ctags.1 31 Dec 2015 14:01:26 - 1.33 +++ ctags.1 8 Nov 2016 09:37:27 - @@ -91,19 +91,6 @@ Update the specified files in the file; that is, all references to them are regenerated, keeping only the other values in the file. -.It Fl v -An index of the form expected by vgrind -is produced on the standard output. -This listing contains the object name, file name, and page number (assuming -64 line pages). -Since the output will be sorted into lexicographic order, -it may be desired to run the output through -.Xr sort 1 . -Sample use: -.Bd -literal -offset indent -$ ctags -v files | sort -f > index -$ vgrind -x index -.Ed .It Fl w Suppress warning diagnostics. .It Fl x Index: ctags.c === RCS file: /cvs/src/usr.bin/ctags/ctags.c,v retrieving revision 1.18 diff -u -p -r1.18 ctags.c --- ctags.c 9 Oct 2015 01:37:07 - 1.18 +++ ctags.c 8 Nov 2016 09:37:27 - @@ -55,7 +55,6 @@ long lineftell; /* ftell after getc( in int lineno; /* line number of current line */ int dflag; /* -d: non-macro defines */ -int vflag; /* -v: vgrind style index output */ int wflag; /* -w: suppress warnings */ int xflag; /* -x: cxref style output */ @@ -81,7 +80,7 @@ main(int argc, char *argv[]) err(1, "pledge"); aflag = uflag = NO; - while ((ch = getopt(argc, argv, "BFadf:tuwvx")) != -1) + while ((ch = getopt(argc, argv, "BFadf:tuwx")) != -1) switch(ch) { case 'B': searchar = '?'; @@ -107,8 +106,6 @@ main(int argc, char *argv[]) case 'w': wflag = 1; break; - case 'v': - vflag = 1; case 'x': xflag = 1; break; @@ -120,12 +117,12 @@ main(int argc, char *argv[]) argc -= optind; if (!argc) { usage: (void)fprintf(stderr, - "usage: ctags [-aBdFuvwx] [-f tagsfile] file ...\n"); + "usage: ctags [-aBdFuwx] [-f tagsfile] file ...\n"); exit(1); } init(); - if (uflag && !vflag && !xflag) + if (uflag && !xflag) preload_entries(outfile, argc, argv); for (exit_val = step = 0; step < argc; ++step)