subject:"Re\: Race in disk_attach

Re: Race in disk_attach_callback?

2022-09-02 Thread Klemens Nanni

On Mon, Aug 29, 2022 at 07:51:24PM +, Miod Vallat wrote:
> > What's the status on this diff?
> 
> I'm waiting for review from at least one of the softraid suspects prior
> to putting this in, in case there are further cinematics to address.

Sounds like a good idea.

In the meantime, while converting remaining/forgotten architectures
from the manual disklabel/newfs dance to 'installboot -p', I also tested
this diff wrt. fresh installations on softraid and found no behaviour
change.

When installing to a fresh softraid volume, the disklabel UID is all
zeroes.

On macppc for example, the installer still does
disklabel $_disk 2>/dev/null | grep -q "^  i:" || disklabel -w -d $_disk
newfs -t msdos ${_disk}i

which can be replaced with
installboot -p $_disk

as for example arm64 already does.

I only have access to macppc via multi-user SSH, so I can't step through
the installer myself.

However, reconstructing the installer on softraid on vnd works with
manual disklabel/newfs as well as with installboot -p, with and without
miod's kernel race fix.

With zeroed images/disks, these tests always start out with an
all-zeroed disklabel UID on the softraid volume at the time fdisk and
newfs (directly or through installboot -p) are run.

Again, installation works on softraid with zeroed-DUIDs, with and
without miod's diff.

Re: Race in disk_attach_callback?

2022-08-29 Thread Miod Vallat

> What's the status on this diff?

I'm waiting for review from at least one of the softraid suspects prior
to putting this in, in case there are further cinematics to address.

Re: Race in disk_attach_callback?

2022-08-25 Thread Klemens Nanni

On Thu, Aug 18, 2022 at 08:29:13AM +, Klemens Nanni wrote:
> On Wed, Aug 17, 2022 at 07:03:50AM +, Miod Vallat wrote:
> > > What is the result if root runs disklabel, and forces it to all zeros?
> > 
> > If the root duid is all zeroes, then the only way to refer to the root
> > disk is to use its /dev/{s,w}d* device name, as zero duids are ignored.
> 
> I like miod's second diff and it fixes the race for vnd(4).
> I never ran into the issue with softraid(4), but that should not happen
> anymore with it, either.

What's the status on this diff?

The problem still exists and I was reminded by the new installboot
regress suffering from it on an arm64 box.

> 
> OK kn
> 
> > 
> > If you set a zero duid in disklabel(8), setdisklabel() in the kernel
> > will compute a new, non-zero value.
> 
> Correct;  same for real sd1 and fictitious vnd0.
> 
>   # disklabel -E sd1
>   Label editor (enter '?' for help at any prompt)
> 
>   sd1> i
>   The disklabel UID is currently: c766517084e5e5ce
>   duid: [] 
> 
>   sd1*> l
>   # /dev/rsd1c:
>   type: SCSI
>   disk: SCSI disk
>   label: Block Device
>   duid: 
>   flags:
>   bytes/sector: 512
>   sectors/track: 63
>   tracks/cylinder: 16
>   sectors/cylinder: 1008
>   cylinders: 203
>   total sectors: 204800
>   boundstart: 32832
>   boundend: 204767
>   drivedata: 0
> 
>   sd1*> w
> 
>   sd1> l
>   # /dev/rsd1c:
>   type: SCSI
>   disk: SCSI disk
>   label: Block Device
>   duid: 9ff85059e4960901
>   flags:
>   bytes/sector: 512
>   sectors/track: 63
>   tracks/cylinder: 16
>   sectors/cylinder: 1008
>   cylinders: 203
>   total sectors: 204800
>   boundstart: 32832
>   boundend: 204767
>   drivedata: 0
> 
>   sd1>
>

Re: Race in disk_attach_callback?

2022-08-18 Thread Klemens Nanni

On Tue, Aug 16, 2022 at 09:03:55PM +, Miod Vallat wrote:
> > after a prompt from stsp@ and florian@, reporting that newfs_msdos
> > fails when given an $duid.i argument, I set down to see what could be
> > going on. My test using an USB stick failed to reprdouce the problem.
> > Then I started using a vnd, and that shows the issue (once in a
> > while). The feeling is that any disk devcied created on the fly might
> > show this issue.
> 
> Moments ago kn@ stumbled upon this as well.
> 
> It turns out that, at least in the vnd case, there is indeed a race
> between the scheduling of the task running disk_attach_callback and the
> first access to the vnd device, which will cause the label to get read.
> 
> While vnd(4) has logic to read a label on-demand, disk_attach_callback
> assumes it runs at a point where the label can safely be obtained, and
> will not retry (because it sets DKF_OPENED, which never gets unset).
> 
> The following shell script will demonstrate the issue:
> 
> cat << EOF > test.sh
> #! /bin/sh
> set -e
> 
> dd if=/dev/zero of=img.bin bs=1m count=4 >/dev/null
> iter=1
> while [ $iter -lt 1000 ]; do
>   vnconfig vnd0 img.bin
>   fdisk -iy vnd0 > /dev/null
>   disklabel -Aw vnd0 > /dev/null
>   duid=$(sysctl hw.disknames|sed 's,.*vnd0:,,')
>   disklabel $duid > /dev/null
>   vnconfig -u vnd0
>   iter=$((iter + 1))
> done
> EOF
> 
> (don't forget to vnconfig -u vnd0 if it fails).

Here's the improved version of a new regress test I initially wrote to
compare real disks with vnd and block devices with duids.

I ran into this issue when trying to test installboot(8) changes on
vnd(4) rather than real hardware first.

The new regress spins up a vnd disk, creates a label and accesses the
disk through both its disklabel and block device... former is broken.

This could easily be adapted or copied into initial installboot(8)
tests which I wanted to do anyway.

regress/sys/kern/disk seems specific and yet broad enough;  putting it
under lib/libutil/ due to opendev(3) makes little sense since that's
just the common entry point to what is actually a kernel bug and this
is not really vnd specific either as we're dealing with a
race condition/design flaw.

Feedback? Objections? OK?

Index: regress/sys/kern/disk/Makefile
===
RCS file: regress/sys/kern/disk/Makefile
diff -N regress/sys/kern/disk/Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/sys/kern/disk/Makefile  18 Aug 2022 10:06:49 -
@@ -0,0 +1,56 @@
+#  $OpenBSD: $
+
+# anything calling opendev(3) with "sd0a" or "fc2aa4672193310c.a", e.g.
+# disklabel(8), newfs(8) or newfs_msdos(8)
+OPENDEV ?= /sbin/disklabel
+# write default MBR
+FORMAT ?=  /sbin/fdisk -iy
+# write standard disklabel to get a valid DUID
+NEWLABEL ?=/sbin/disklabel -Aw
+
+# whether to trace diskmap(4)'s DIOCMAP handling DUIDs/block devices
+DO_TRACE ?=No
+
+IMGFILE =  disk.img
+DEVFILE =  vnd.txt
+DUIDFILE = duid.txt
+PART = a
+
+
+REGRESS_SETUP_ONCE =   format-disk
+
+create-new-disk: create-new-disk
+   dd if=/dev/zero  of=${IMGFILE} bs=1m count=0 seek=32
+   ${SUDO} vnconfig -- ${IMGFILE} > ${DEVFILE}
+format-disk: create-new-disk
+   ${SUDO} ${FORMAT}   -- "$$(<${DEVFILE})"
+   ${SUDO} ${NEWLABEL} -- "$$(<${DEVFILE})"
+
+
+REGRESS_TARGETS =  opendev-use-block \
+   opendev-use-duid
+
+opendev-use-block:
+   ${SUDO} ${OPENDEV} -- "$$(<${DEVFILE})${PART}"
+
+opendev-use-duid:
+   ${SUDO} disklabel  -- "$$(<${DEVFILE})" | \
+   awk '$$1 == "duid:" { print $$2 }' > ${DUIDFILE}
+.if ${DO_TRACE:L} == "yes"
+   @# always run kdump regardless of newfs exit code
+   sh -uc '${SUDO} ktrace -- ${OPENDEV} -- "$$(<${DUIDFILE}).${PART}" ;\
+   ret=$$? ;\
+   ${SUDO} kdump -tcn | grep -FwA2 DIOCMAP ;\
+   exit $$ret'
+.else
+   ${SUDO} ${OPENDEV} -- "$$(<${DUIDFILE}).${PART}"
+.endif
+
+
+CLEANFILES =   ${IMGFILE} ${DEVFILE} ${DUIDFILE} ktrace.out
+REGRESS_CLEANUP =  cleanup
+
+cleanup:
+   ${SUDO} vnconfig -vu -- "$$(<${DEVFILE})"
+
+.include

Re: Race in disk_attach_callback?

2022-08-18 Thread Klemens Nanni

On Wed, Aug 17, 2022 at 07:03:50AM +, Miod Vallat wrote:
> > What is the result if root runs disklabel, and forces it to all zeros?
> 
> If the root duid is all zeroes, then the only way to refer to the root
> disk is to use its /dev/{s,w}d* device name, as zero duids are ignored.

I like miod's second diff and it fixes the race for vnd(4).
I never ran into the issue with softraid(4), but that should not happen
anymore with it, either.

OK kn

> 
> If you set a zero duid in disklabel(8), setdisklabel() in the kernel
> will compute a new, non-zero value.

Correct;  same for real sd1 and fictitious vnd0.

# disklabel -E sd1
Label editor (enter '?' for help at any prompt)

sd1> i
The disklabel UID is currently: c766517084e5e5ce
duid: [] 

sd1*> l
# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: Block Device
duid: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 203
total sectors: 204800
boundstart: 32832
boundend: 204767
drivedata: 0

sd1*> w

sd1> l
# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: Block Device
duid: 9ff85059e4960901
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 203
total sectors: 204800
boundstart: 32832
boundend: 204767
drivedata: 0

sd1>

Re: Race in disk_attach_callback?

2022-08-17 Thread Miod Vallat

> What is the result if root runs disklabel, and forces it to all zeros?

If the root duid is all zeroes, then the only way to refer to the root
disk is to use its /dev/{s,w}d* device name, as zero duids are ignored.

If you set a zero duid in disklabel(8), setdisklabel() in the kernel
will compute a new, non-zero value.

Re: Race in disk_attach_callback?

2022-08-16 Thread Theo de Raadt

Miod Vallat  wrote:

> Come to think further about it, I think it is better for diskmap to
> always trust disk drivers to either :
> - not have any label (dk_label == NULL, or points to zeroed memory)
> or
> - have a valid label (duid is not zeroes).

What is the result if root runs disklabel, and forces it to all zeros?

(You are already running this diff so you can test faster)

zero duid setups still exist.  I am sure I encountered one in the last year.
So I hesitate about this idea of using it like a flag.

Re: Race in disk_attach_callback?

2022-08-16 Thread Miod Vallat

Come to think further about it, I think it is better for diskmap to
always trust disk drivers to either :
- not have any label (dk_label == NULL, or points to zeroed memory)
or
- have a valid label (duid is not zeroes).

The following diff thus relaxes the logic to always trust
dk_label->d_uid, unless it is zero. This passes the vnd test I mailed
yesterday, without the need for a dev/vnd.c change.

Index: sys/dev/softraid.c
===
RCS file: /OpenBSD/src/sys/dev/softraid.c,v
retrieving revision 1.425
diff -u -p -u -p -r1.425 softraid.c
--- sys/dev/softraid.c  16 Apr 2022 19:19:58 -  1.425
+++ sys/dev/softraid.c  17 Aug 2022 05:20:51 -
@@ -3685,13 +3685,11 @@ sr_ioctl_installboot(struct sr_softc *sc
}
}
 
-   bzero(duid, sizeof(duid));
TAILQ_FOREACH(dk, &disklist,  dk_link)
if (!strncmp(dk->dk_name, bb->bb_dev, sizeof(bb->bb_dev)))
break;
if (dk == NULL || dk->dk_label == NULL ||
-   (dk->dk_flags & DKF_LABELVALID) == 0 ||
-   bcmp(dk->dk_label->d_uid, &duid, sizeof(duid)) == 0) {
+   duid_iszero(dk->dk_label->d_uid)) {
sr_error(sc, "failed to get DUID for softraid volume");
goto done;
}
Index: sys/kern/subr_disk.c
===
RCS file: /OpenBSD/src/sys/kern/subr_disk.c,v
retrieving revision 1.253
diff -u -p -u -p -r1.253 subr_disk.c
--- sys/kern/subr_disk.c14 Aug 2022 01:58:27 -  1.253
+++ sys/kern/subr_disk.c17 Aug 2022 05:20:51 -
@@ -1121,7 +1121,6 @@ disk_attach_callback(void *xdat)
/* Read disklabel. */
if (disk_readlabel(&dl, dk->dk_devno, errbuf, sizeof(errbuf)) == NULL) {
enqueue_randomness(dl.d_checksum);
-   dk->dk_flags |= DKF_LABELVALID;
}
 
 done:
@@ -1440,14 +1439,14 @@ setroot(struct device *bootdv, int part,
TAILQ_FOREACH(dk, &disklist, dk_link)
if (dk->dk_device == bootdv)
break;
-   if (dk && (dk->dk_flags & DKF_LABELVALID))
+   if (dk)
bcopy(dk->dk_label->d_uid, bootduid, sizeof(bootduid));
} else if (bootdv == NULL) {
/* Locate boot disk based on the provided DUID. */
TAILQ_FOREACH(dk, &disklist, dk_link)
if (duid_equal(dk->dk_label->d_uid, bootduid))
break;
-   if (dk && (dk->dk_flags & DKF_LABELVALID))
+   if (dk)
bootdv = dk->dk_device;
}
bcopy(bootduid, rootduid, sizeof(rootduid));
@@ -1561,8 +1560,7 @@ gotswap:
if (bootdv->dv_class == DV_DISK) {
if (!duid_iszero(rootduid)) {
TAILQ_FOREACH(dk, &disklist, dk_link)
-   if ((dk->dk_flags & DKF_LABELVALID) &&
-   dk->dk_label && duid_equal(
+   if (dk->dk_label && duid_equal(
dk->dk_label->d_uid, rootduid))
break;
if (dk == NULL)
@@ -1788,7 +1786,8 @@ disk_map(char *path, char *mappath, int 
 
mdk = NULL;
TAILQ_FOREACH(dk, &disklist, dk_link) {
-   if ((dk->dk_flags & DKF_LABELVALID) && dk->dk_label &&
+   if (dk->dk_label &&
+   !duid_iszero(dk->dk_label->d_uid) &&
memcmp(dk->dk_label->d_uid, uid,
sizeof(dk->dk_label->d_uid)) == 0) {
/* Fail if there are duplicate UIDs! */
Index: sys/sys/disk.h
===
RCS file: /OpenBSD/src/sys/sys/disk.h,v
retrieving revision 1.36
diff -u -p -u -p -r1.36 disk.h
--- sys/sys/disk.h  4 May 2017 22:47:27 -   1.36
+++ sys/sys/disk.h  17 Aug 2022 05:20:51 -
@@ -83,7 +83,6 @@ struct disk {
 #define DKF_CONSTRUCTED0x0001
 #define DKF_OPENED 0x0002
 #define DKF_NOLABELREAD0x0004
-#define DKF_LABELVALID 0x0008
 
/*
 * Metrics data; note that some metrics may have no meaning

Re: Race in disk_attach_callback?

2022-08-16 Thread Miod Vallat

> after a prompt from stsp@ and florian@, reporting that newfs_msdos
> fails when given an $duid.i argument, I set down to see what could be
> going on. My test using an USB stick failed to reprdouce the problem.
> Then I started using a vnd, and that shows the issue (once in a
> while). The feeling is that any disk devcied created on the fly might
> show this issue.

Moments ago kn@ stumbled upon this as well.

It turns out that, at least in the vnd case, there is indeed a race
between the scheduling of the task running disk_attach_callback and the
first access to the vnd device, which will cause the label to get read.

While vnd(4) has logic to read a label on-demand, disk_attach_callback
assumes it runs at a point where the label can safely be obtained, and
will not retry (because it sets DKF_OPENED, which never gets unset).

The following shell script will demonstrate the issue:

cat << EOF > test.sh
#! /bin/sh
set -e

dd if=/dev/zero of=img.bin bs=1m count=4 >/dev/null
iter=1
while [ $iter -lt 1000 ]; do
vnconfig vnd0 img.bin
fdisk -iy vnd0 > /dev/null
disklabel -Aw vnd0 > /dev/null
duid=$(sysctl hw.disknames|sed 's,.*vnd0:,,')
disklabel $duid > /dev/null
vnconfig -u vnd0
iter=$((iter + 1))
done
EOF

(don't forget to vnconfig -u vnd0 if it fails).

The following diff makes vnd attempt to read a label as soon as the
VNDIOCSET ioctl is issued. However it will not fix softraid if it has a
similar problem, so a rework of the disk_attach operation might be a
better idea.

Index: vnd.c
===
RCS file: /OpenBSD/src/sys/dev/vnd.c,v
retrieving revision 1.177
diff -u -p -r1.177 vnd.c
--- vnd.c   23 Dec 2021 10:09:16 -  1.177
+++ vnd.c   16 Aug 2022 21:02:23 -
@@ -536,6 +536,9 @@ fail:
sc->sc_dk.dk_name = sc->sc_dev.dv_xname;
disk_attach(&sc->sc_dev, &sc->sc_dk);
 
+   sc->sc_flags |= VNF_HAVELABEL;
+   vndgetdisklabel(dev, sc, sc->sc_dk.dk_label, 0);
+
disk_unlock(&sc->sc_dk);
 
break;

Re: Race in disk_attach_callback?

2022-07-13 Thread Stefan Sperling

On Wed, Jul 13, 2022 at 02:18:53PM +0200, Otto Moerbeek wrote:
> Hi,
> 
> after a prompt from stsp@ and florian@, reporting that newfs_msdos
> fails when given an $duid.i argument, I set down to see what could be
> going on. My test using an USB stick failed to reprdouce the problem.
> Then I started using a vnd, and that shows the issue (once in a
> while). The feeling is that any disk devcied created on the fly might
> show this issue.

Just as an additional detail: This also affects softraid volumes,
which is how I ran into the issue. Installing a laptop with
softraid full disk encryption on top of an EFI/GPT disk can result
in a failure to format the MSDOS partition required for installing
the EFI boot loader. When this happens the installer reports that
it could not install a boot loader.
A manual invocation of newfs_msdos with the correct /dev/sdxi device,
and a subsequent invocation of installboot, can be used as a workaround.

Re: Race in disk_attach_callback?

2022-07-13 Thread Otto Moerbeek

On Wed, Jul 13, 2022 at 02:18:53PM +0200, Otto Moerbeek wrote:

> Hi,
> 
> after a prompt from stsp@ and florian@, reporting that newfs_msdos
> fails when given an $duid.i argument, I set down to see what could be
> going on. My test using an USB stick failed to reprdouce the problem.
> Then I started using a vnd, and that shows the issue (once in a
> while). The feeling is that any disk devcied created on the fly might
> show this issue.
> 
> What I have found out so far: sometimes, disk_subr.c:disk_map()
> fails, because the device in question does not have the DKF_LABELVALID
> flag set, so it is skipped in the duid search.
> 
> The only place where DKF_LABELVALID is set is in
> disk_attach_callback().
> 
> Often the disk_readlabel() call in this function succeeds and the flag
> is set, but also often it does not succeed. 
> 
> I have observed it failing setting the message to "cannot read disk
> label, 0xe32/0x2932, error 25", which seems to come from the
> DIOCGPDINFO NOTTY case in dev/vnc.c
> 
> This is how far I got.

Looking a bit more, I see that the attachment is done from a task. I
suppose that if the open happens before the task is run, we end up
with a flag not set. 


> 
> I am using the test script below. In my case, instrumenting
> subr_disk.c with printf made the problem occur less often, so I
> suspect a race between the disk attachment and the label becoming
> available.
> 
> The vnconfig triggers the disk attachment message (but only the first time a
> vnd is created, it seems, I am not seeing any messages after a vnconfig
> -u and re-running the script).
> 
> 
> === script ===
> set -e
> dd if=/dev/random of=/tmp/image$1 bs=1m count=100
> vnconfig vnd$1 /tmp/image$1
> fdisk -ig vnd$1
> echo "a\ni\n\n\n\nw\nl\nq\n" | disklabel -E vnd$1
> ==
> 
> Try to run newfs_msdos on the duid.i reported. Note that you need to
> reboot to see a change a behaviour on a specific vnd. If it starts out
> OK, it keeps on being OK, if it fails, it keeps on failing. This
> corresponds to my observation that the disk_attach_callback() is
> called only once for a specifc vnd, even after unconfig and reconfig.
> 
> Hope somebody who knows how this all shoudl work can see where the
> issue is.
> 
>   -Otto
> 
> Index: subr_disk.c
> ===
> RCS file: /cvs/src/sys/kern/subr_disk.c,v
> retrieving revision 1.248
> diff -u -p -r1.248 subr_disk.c
> --- subr_disk.c   2 Jan 2022 17:26:14 -   1.248
> +++ subr_disk.c   13 Jul 2022 12:02:05 -
> @@ -1118,8 +1118,10 @@ disk_attach_callback(void *xdat)
>   /* Read disklabel. */
>   if (disk_readlabel(&dl, dk->dk_devno, errbuf, sizeof(errbuf)) == NULL) {
>   enqueue_randomness(dl.d_checksum);
> + printf("disk_attach_callback %s: seting DKF_LABELVALID\n", 
> dk->dk_name);
>   dk->dk_flags |= DKF_LABELVALID;
>   }
> + else printf("disk_attach_callback %s: NOT seting DKF_LABELVALID 
> (%s)\n", dk->dk_name, errbuf);
>  
>  done:
>   dk->dk_flags |= DKF_OPENED;
> @@ -1785,6 +1787,7 @@ disk_map(char *path, char *mappath, int 
>  
>   mdk = NULL;
>   TAILQ_FOREACH(dk, &disklist, dk_link) {
> + printf("YYY %p %s %x %p\n", dk, dk->dk_name, dk->dk_flags, 
> dk->dk_label);
>   if ((dk->dk_flags & DKF_LABELVALID) && dk->dk_label &&
>   memcmp(dk->dk_label->d_uid, uid,
>   sizeof(dk->dk_label->d_uid)) == 0) {
> @@ -1795,8 +1798,10 @@ disk_map(char *path, char *mappath, int 
>   }
>   }
>  
> - if (mdk == NULL || mdk->dk_name == NULL)
> + if (mdk == NULL || mdk->dk_name == NULL) {
> + printf("XXX %p\n", mdk);
>   return -1;
> + }
>  
>   snprintf(mappath, size, "/dev/%s%s%c",
>   (flags & DM_OPENBLCK) ? "" : "r", mdk->dk_name, part);
>

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

Re: Race in disk_attach_callback?

11 matches

Site Navigation

Mail list logo

Footer information