Re: Race in disk_attach_callback?
On Mon, Aug 29, 2022 at 07:51:24PM +, Miod Vallat wrote: > > What's the status on this diff? > > I'm waiting for review from at least one of the softraid suspects prior > to putting this in, in case there are further cinematics to address. Sounds like a good idea. In the meantime, while converting remaining/forgotten architectures from the manual disklabel/newfs dance to 'installboot -p', I also tested this diff wrt. fresh installations on softraid and found no behaviour change. When installing to a fresh softraid volume, the disklabel UID is all zeroes. On macppc for example, the installer still does disklabel $_disk 2>/dev/null | grep -q "^ i:" || disklabel -w -d $_disk newfs -t msdos ${_disk}i which can be replaced with installboot -p $_disk as for example arm64 already does. I only have access to macppc via multi-user SSH, so I can't step through the installer myself. However, reconstructing the installer on softraid on vnd works with manual disklabel/newfs as well as with installboot -p, with and without miod's kernel race fix. With zeroed images/disks, these tests always start out with an all-zeroed disklabel UID on the softraid volume at the time fdisk and newfs (directly or through installboot -p) are run. Again, installation works on softraid with zeroed-DUIDs, with and without miod's diff.
Re: Race in disk_attach_callback?
> What's the status on this diff? I'm waiting for review from at least one of the softraid suspects prior to putting this in, in case there are further cinematics to address.
Re: Race in disk_attach_callback?
On Thu, Aug 18, 2022 at 08:29:13AM +, Klemens Nanni wrote: > On Wed, Aug 17, 2022 at 07:03:50AM +, Miod Vallat wrote: > > > What is the result if root runs disklabel, and forces it to all zeros? > > > > If the root duid is all zeroes, then the only way to refer to the root > > disk is to use its /dev/{s,w}d* device name, as zero duids are ignored. > > I like miod's second diff and it fixes the race for vnd(4). > I never ran into the issue with softraid(4), but that should not happen > anymore with it, either. What's the status on this diff? The problem still exists and I was reminded by the new installboot regress suffering from it on an arm64 box. > > OK kn > > > > > If you set a zero duid in disklabel(8), setdisklabel() in the kernel > > will compute a new, non-zero value. > > Correct; same for real sd1 and fictitious vnd0. > > # disklabel -E sd1 > Label editor (enter '?' for help at any prompt) > > sd1> i > The disklabel UID is currently: c766517084e5e5ce > duid: [] > > sd1*> l > # /dev/rsd1c: > type: SCSI > disk: SCSI disk > label: Block Device > duid: > flags: > bytes/sector: 512 > sectors/track: 63 > tracks/cylinder: 16 > sectors/cylinder: 1008 > cylinders: 203 > total sectors: 204800 > boundstart: 32832 > boundend: 204767 > drivedata: 0 > > sd1*> w > > sd1> l > # /dev/rsd1c: > type: SCSI > disk: SCSI disk > label: Block Device > duid: 9ff85059e4960901 > flags: > bytes/sector: 512 > sectors/track: 63 > tracks/cylinder: 16 > sectors/cylinder: 1008 > cylinders: 203 > total sectors: 204800 > boundstart: 32832 > boundend: 204767 > drivedata: 0 > > sd1> >
Re: Race in disk_attach_callback?
On Tue, Aug 16, 2022 at 09:03:55PM +, Miod Vallat wrote: > > after a prompt from stsp@ and florian@, reporting that newfs_msdos > > fails when given an $duid.i argument, I set down to see what could be > > going on. My test using an USB stick failed to reprdouce the problem. > > Then I started using a vnd, and that shows the issue (once in a > > while). The feeling is that any disk devcied created on the fly might > > show this issue. > > Moments ago kn@ stumbled upon this as well. > > It turns out that, at least in the vnd case, there is indeed a race > between the scheduling of the task running disk_attach_callback and the > first access to the vnd device, which will cause the label to get read. > > While vnd(4) has logic to read a label on-demand, disk_attach_callback > assumes it runs at a point where the label can safely be obtained, and > will not retry (because it sets DKF_OPENED, which never gets unset). > > The following shell script will demonstrate the issue: > > cat << EOF > test.sh > #! /bin/sh > set -e > > dd if=/dev/zero of=img.bin bs=1m count=4 >/dev/null > iter=1 > while [ $iter -lt 1000 ]; do > vnconfig vnd0 img.bin > fdisk -iy vnd0 > /dev/null > disklabel -Aw vnd0 > /dev/null > duid=$(sysctl hw.disknames|sed 's,.*vnd0:,,') > disklabel $duid > /dev/null > vnconfig -u vnd0 > iter=$((iter + 1)) > done > EOF > > (don't forget to vnconfig -u vnd0 if it fails). Here's the improved version of a new regress test I initially wrote to compare real disks with vnd and block devices with duids. I ran into this issue when trying to test installboot(8) changes on vnd(4) rather than real hardware first. The new regress spins up a vnd disk, creates a label and accesses the disk through both its disklabel and block device... former is broken. This could easily be adapted or copied into initial installboot(8) tests which I wanted to do anyway. regress/sys/kern/disk seems specific and yet broad enough; putting it under lib/libutil/ due to opendev(3) makes little sense since that's just the common entry point to what is actually a kernel bug and this is not really vnd specific either as we're dealing with a race condition/design flaw. Feedback? Objections? OK? Index: regress/sys/kern/disk/Makefile === RCS file: regress/sys/kern/disk/Makefile diff -N regress/sys/kern/disk/Makefile --- /dev/null 1 Jan 1970 00:00:00 - +++ regress/sys/kern/disk/Makefile 18 Aug 2022 10:06:49 - @@ -0,0 +1,56 @@ +# $OpenBSD: $ + +# anything calling opendev(3) with "sd0a" or "fc2aa4672193310c.a", e.g. +# disklabel(8), newfs(8) or newfs_msdos(8) +OPENDEV ?= /sbin/disklabel +# write default MBR +FORMAT ?= /sbin/fdisk -iy +# write standard disklabel to get a valid DUID +NEWLABEL ?=/sbin/disklabel -Aw + +# whether to trace diskmap(4)'s DIOCMAP handling DUIDs/block devices +DO_TRACE ?=No + +IMGFILE = disk.img +DEVFILE = vnd.txt +DUIDFILE = duid.txt +PART = a + + +REGRESS_SETUP_ONCE = format-disk + +create-new-disk: create-new-disk + dd if=/dev/zero of=${IMGFILE} bs=1m count=0 seek=32 + ${SUDO} vnconfig -- ${IMGFILE} > ${DEVFILE} +format-disk: create-new-disk + ${SUDO} ${FORMAT} -- "$$(<${DEVFILE})" + ${SUDO} ${NEWLABEL} -- "$$(<${DEVFILE})" + + +REGRESS_TARGETS = opendev-use-block \ + opendev-use-duid + +opendev-use-block: + ${SUDO} ${OPENDEV} -- "$$(<${DEVFILE})${PART}" + +opendev-use-duid: + ${SUDO} disklabel -- "$$(<${DEVFILE})" | \ + awk '$$1 == "duid:" { print $$2 }' > ${DUIDFILE} +.if ${DO_TRACE:L} == "yes" + @# always run kdump regardless of newfs exit code + sh -uc '${SUDO} ktrace -- ${OPENDEV} -- "$$(<${DUIDFILE}).${PART}" ;\ + ret=$$? ;\ + ${SUDO} kdump -tcn | grep -FwA2 DIOCMAP ;\ + exit $$ret' +.else + ${SUDO} ${OPENDEV} -- "$$(<${DUIDFILE}).${PART}" +.endif + + +CLEANFILES = ${IMGFILE} ${DEVFILE} ${DUIDFILE} ktrace.out +REGRESS_CLEANUP = cleanup + +cleanup: + ${SUDO} vnconfig -vu -- "$$(<${DEVFILE})" + +.include
Re: Race in disk_attach_callback?
On Wed, Aug 17, 2022 at 07:03:50AM +, Miod Vallat wrote: > > What is the result if root runs disklabel, and forces it to all zeros? > > If the root duid is all zeroes, then the only way to refer to the root > disk is to use its /dev/{s,w}d* device name, as zero duids are ignored. I like miod's second diff and it fixes the race for vnd(4). I never ran into the issue with softraid(4), but that should not happen anymore with it, either. OK kn > > If you set a zero duid in disklabel(8), setdisklabel() in the kernel > will compute a new, non-zero value. Correct; same for real sd1 and fictitious vnd0. # disklabel -E sd1 Label editor (enter '?' for help at any prompt) sd1> i The disklabel UID is currently: c766517084e5e5ce duid: [] sd1*> l # /dev/rsd1c: type: SCSI disk: SCSI disk label: Block Device duid: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 16 sectors/cylinder: 1008 cylinders: 203 total sectors: 204800 boundstart: 32832 boundend: 204767 drivedata: 0 sd1*> w sd1> l # /dev/rsd1c: type: SCSI disk: SCSI disk label: Block Device duid: 9ff85059e4960901 flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 16 sectors/cylinder: 1008 cylinders: 203 total sectors: 204800 boundstart: 32832 boundend: 204767 drivedata: 0 sd1>
Re: Race in disk_attach_callback?
> What is the result if root runs disklabel, and forces it to all zeros? If the root duid is all zeroes, then the only way to refer to the root disk is to use its /dev/{s,w}d* device name, as zero duids are ignored. If you set a zero duid in disklabel(8), setdisklabel() in the kernel will compute a new, non-zero value.
Re: Race in disk_attach_callback?
Miod Vallat wrote: > Come to think further about it, I think it is better for diskmap to > always trust disk drivers to either : > - not have any label (dk_label == NULL, or points to zeroed memory) > or > - have a valid label (duid is not zeroes). What is the result if root runs disklabel, and forces it to all zeros? (You are already running this diff so you can test faster) zero duid setups still exist. I am sure I encountered one in the last year. So I hesitate about this idea of using it like a flag.
Re: Race in disk_attach_callback?
Come to think further about it, I think it is better for diskmap to always trust disk drivers to either : - not have any label (dk_label == NULL, or points to zeroed memory) or - have a valid label (duid is not zeroes). The following diff thus relaxes the logic to always trust dk_label->d_uid, unless it is zero. This passes the vnd test I mailed yesterday, without the need for a dev/vnd.c change. Index: sys/dev/softraid.c === RCS file: /OpenBSD/src/sys/dev/softraid.c,v retrieving revision 1.425 diff -u -p -u -p -r1.425 softraid.c --- sys/dev/softraid.c 16 Apr 2022 19:19:58 - 1.425 +++ sys/dev/softraid.c 17 Aug 2022 05:20:51 - @@ -3685,13 +3685,11 @@ sr_ioctl_installboot(struct sr_softc *sc } } - bzero(duid, sizeof(duid)); TAILQ_FOREACH(dk, &disklist, dk_link) if (!strncmp(dk->dk_name, bb->bb_dev, sizeof(bb->bb_dev))) break; if (dk == NULL || dk->dk_label == NULL || - (dk->dk_flags & DKF_LABELVALID) == 0 || - bcmp(dk->dk_label->d_uid, &duid, sizeof(duid)) == 0) { + duid_iszero(dk->dk_label->d_uid)) { sr_error(sc, "failed to get DUID for softraid volume"); goto done; } Index: sys/kern/subr_disk.c === RCS file: /OpenBSD/src/sys/kern/subr_disk.c,v retrieving revision 1.253 diff -u -p -u -p -r1.253 subr_disk.c --- sys/kern/subr_disk.c14 Aug 2022 01:58:27 - 1.253 +++ sys/kern/subr_disk.c17 Aug 2022 05:20:51 - @@ -1121,7 +1121,6 @@ disk_attach_callback(void *xdat) /* Read disklabel. */ if (disk_readlabel(&dl, dk->dk_devno, errbuf, sizeof(errbuf)) == NULL) { enqueue_randomness(dl.d_checksum); - dk->dk_flags |= DKF_LABELVALID; } done: @@ -1440,14 +1439,14 @@ setroot(struct device *bootdv, int part, TAILQ_FOREACH(dk, &disklist, dk_link) if (dk->dk_device == bootdv) break; - if (dk && (dk->dk_flags & DKF_LABELVALID)) + if (dk) bcopy(dk->dk_label->d_uid, bootduid, sizeof(bootduid)); } else if (bootdv == NULL) { /* Locate boot disk based on the provided DUID. */ TAILQ_FOREACH(dk, &disklist, dk_link) if (duid_equal(dk->dk_label->d_uid, bootduid)) break; - if (dk && (dk->dk_flags & DKF_LABELVALID)) + if (dk) bootdv = dk->dk_device; } bcopy(bootduid, rootduid, sizeof(rootduid)); @@ -1561,8 +1560,7 @@ gotswap: if (bootdv->dv_class == DV_DISK) { if (!duid_iszero(rootduid)) { TAILQ_FOREACH(dk, &disklist, dk_link) - if ((dk->dk_flags & DKF_LABELVALID) && - dk->dk_label && duid_equal( + if (dk->dk_label && duid_equal( dk->dk_label->d_uid, rootduid)) break; if (dk == NULL) @@ -1788,7 +1786,8 @@ disk_map(char *path, char *mappath, int mdk = NULL; TAILQ_FOREACH(dk, &disklist, dk_link) { - if ((dk->dk_flags & DKF_LABELVALID) && dk->dk_label && + if (dk->dk_label && + !duid_iszero(dk->dk_label->d_uid) && memcmp(dk->dk_label->d_uid, uid, sizeof(dk->dk_label->d_uid)) == 0) { /* Fail if there are duplicate UIDs! */ Index: sys/sys/disk.h === RCS file: /OpenBSD/src/sys/sys/disk.h,v retrieving revision 1.36 diff -u -p -u -p -r1.36 disk.h --- sys/sys/disk.h 4 May 2017 22:47:27 - 1.36 +++ sys/sys/disk.h 17 Aug 2022 05:20:51 - @@ -83,7 +83,6 @@ struct disk { #define DKF_CONSTRUCTED0x0001 #define DKF_OPENED 0x0002 #define DKF_NOLABELREAD0x0004 -#define DKF_LABELVALID 0x0008 /* * Metrics data; note that some metrics may have no meaning
Re: Race in disk_attach_callback?
> after a prompt from stsp@ and florian@, reporting that newfs_msdos > fails when given an $duid.i argument, I set down to see what could be > going on. My test using an USB stick failed to reprdouce the problem. > Then I started using a vnd, and that shows the issue (once in a > while). The feeling is that any disk devcied created on the fly might > show this issue. Moments ago kn@ stumbled upon this as well. It turns out that, at least in the vnd case, there is indeed a race between the scheduling of the task running disk_attach_callback and the first access to the vnd device, which will cause the label to get read. While vnd(4) has logic to read a label on-demand, disk_attach_callback assumes it runs at a point where the label can safely be obtained, and will not retry (because it sets DKF_OPENED, which never gets unset). The following shell script will demonstrate the issue: cat << EOF > test.sh #! /bin/sh set -e dd if=/dev/zero of=img.bin bs=1m count=4 >/dev/null iter=1 while [ $iter -lt 1000 ]; do vnconfig vnd0 img.bin fdisk -iy vnd0 > /dev/null disklabel -Aw vnd0 > /dev/null duid=$(sysctl hw.disknames|sed 's,.*vnd0:,,') disklabel $duid > /dev/null vnconfig -u vnd0 iter=$((iter + 1)) done EOF (don't forget to vnconfig -u vnd0 if it fails). The following diff makes vnd attempt to read a label as soon as the VNDIOCSET ioctl is issued. However it will not fix softraid if it has a similar problem, so a rework of the disk_attach operation might be a better idea. Index: vnd.c === RCS file: /OpenBSD/src/sys/dev/vnd.c,v retrieving revision 1.177 diff -u -p -r1.177 vnd.c --- vnd.c 23 Dec 2021 10:09:16 - 1.177 +++ vnd.c 16 Aug 2022 21:02:23 - @@ -536,6 +536,9 @@ fail: sc->sc_dk.dk_name = sc->sc_dev.dv_xname; disk_attach(&sc->sc_dev, &sc->sc_dk); + sc->sc_flags |= VNF_HAVELABEL; + vndgetdisklabel(dev, sc, sc->sc_dk.dk_label, 0); + disk_unlock(&sc->sc_dk); break;
Re: Race in disk_attach_callback?
On Wed, Jul 13, 2022 at 02:18:53PM +0200, Otto Moerbeek wrote: > Hi, > > after a prompt from stsp@ and florian@, reporting that newfs_msdos > fails when given an $duid.i argument, I set down to see what could be > going on. My test using an USB stick failed to reprdouce the problem. > Then I started using a vnd, and that shows the issue (once in a > while). The feeling is that any disk devcied created on the fly might > show this issue. Just as an additional detail: This also affects softraid volumes, which is how I ran into the issue. Installing a laptop with softraid full disk encryption on top of an EFI/GPT disk can result in a failure to format the MSDOS partition required for installing the EFI boot loader. When this happens the installer reports that it could not install a boot loader. A manual invocation of newfs_msdos with the correct /dev/sdxi device, and a subsequent invocation of installboot, can be used as a workaround.
Re: Race in disk_attach_callback?
On Wed, Jul 13, 2022 at 02:18:53PM +0200, Otto Moerbeek wrote: > Hi, > > after a prompt from stsp@ and florian@, reporting that newfs_msdos > fails when given an $duid.i argument, I set down to see what could be > going on. My test using an USB stick failed to reprdouce the problem. > Then I started using a vnd, and that shows the issue (once in a > while). The feeling is that any disk devcied created on the fly might > show this issue. > > What I have found out so far: sometimes, disk_subr.c:disk_map() > fails, because the device in question does not have the DKF_LABELVALID > flag set, so it is skipped in the duid search. > > The only place where DKF_LABELVALID is set is in > disk_attach_callback(). > > Often the disk_readlabel() call in this function succeeds and the flag > is set, but also often it does not succeed. > > I have observed it failing setting the message to "cannot read disk > label, 0xe32/0x2932, error 25", which seems to come from the > DIOCGPDINFO NOTTY case in dev/vnc.c > > This is how far I got. Looking a bit more, I see that the attachment is done from a task. I suppose that if the open happens before the task is run, we end up with a flag not set. > > I am using the test script below. In my case, instrumenting > subr_disk.c with printf made the problem occur less often, so I > suspect a race between the disk attachment and the label becoming > available. > > The vnconfig triggers the disk attachment message (but only the first time a > vnd is created, it seems, I am not seeing any messages after a vnconfig > -u and re-running the script). > > > === script === > set -e > dd if=/dev/random of=/tmp/image$1 bs=1m count=100 > vnconfig vnd$1 /tmp/image$1 > fdisk -ig vnd$1 > echo "a\ni\n\n\n\nw\nl\nq\n" | disklabel -E vnd$1 > == > > Try to run newfs_msdos on the duid.i reported. Note that you need to > reboot to see a change a behaviour on a specific vnd. If it starts out > OK, it keeps on being OK, if it fails, it keeps on failing. This > corresponds to my observation that the disk_attach_callback() is > called only once for a specifc vnd, even after unconfig and reconfig. > > Hope somebody who knows how this all shoudl work can see where the > issue is. > > -Otto > > Index: subr_disk.c > === > RCS file: /cvs/src/sys/kern/subr_disk.c,v > retrieving revision 1.248 > diff -u -p -r1.248 subr_disk.c > --- subr_disk.c 2 Jan 2022 17:26:14 - 1.248 > +++ subr_disk.c 13 Jul 2022 12:02:05 - > @@ -1118,8 +1118,10 @@ disk_attach_callback(void *xdat) > /* Read disklabel. */ > if (disk_readlabel(&dl, dk->dk_devno, errbuf, sizeof(errbuf)) == NULL) { > enqueue_randomness(dl.d_checksum); > + printf("disk_attach_callback %s: seting DKF_LABELVALID\n", > dk->dk_name); > dk->dk_flags |= DKF_LABELVALID; > } > + else printf("disk_attach_callback %s: NOT seting DKF_LABELVALID > (%s)\n", dk->dk_name, errbuf); > > done: > dk->dk_flags |= DKF_OPENED; > @@ -1785,6 +1787,7 @@ disk_map(char *path, char *mappath, int > > mdk = NULL; > TAILQ_FOREACH(dk, &disklist, dk_link) { > + printf("YYY %p %s %x %p\n", dk, dk->dk_name, dk->dk_flags, > dk->dk_label); > if ((dk->dk_flags & DKF_LABELVALID) && dk->dk_label && > memcmp(dk->dk_label->d_uid, uid, > sizeof(dk->dk_label->d_uid)) == 0) { > @@ -1795,8 +1798,10 @@ disk_map(char *path, char *mappath, int > } > } > > - if (mdk == NULL || mdk->dk_name == NULL) > + if (mdk == NULL || mdk->dk_name == NULL) { > + printf("XXX %p\n", mdk); > return -1; > + } > > snprintf(mappath, size, "/dev/%s%s%c", > (flags & DM_OPENBLCK) ? "" : "r", mdk->dk_name, part); >