Neil Brown wrote:

Yes, but it should not be needed, and I'd like to understand why it
is.
One of the last things do_md_run does is
   mddev->changed = 1;

When you next open /dev/md_d0, md_open is called which calls
check_disk_change().
This will call into md_fops->md_media_changed which will return the
value of mddev->changed, which will be '1'.
So check_disk_change will then call md_fops->revalidate_disk which
will set mddev->changed to 0, and will then set bd_invalidated to 1
(as bd_disk->minors > 1 (being 64)).

md_open will then return into do_open (in fs/block_dev.c) and because
bd_invalidated is true, it will call rescan_partitions and the
partitions will appear.

Yuck. The md stack should populate the partition information on device creation *without* needing someone to open the resulting device. That you can tweak mdadm to open the device after creation is fine, but unless no other program is allowed to use the ioctls to start devices, and unless this is a documented part of the API, waiting until second open to populate the device info is just flat wrong. It breaks all sorts of expectations people have regarding things like mount by label, etc.

Hmmm... there is room for a race there.  If some other process opens
/dev/md_d0 before mdadm gets to close it, it will call
rescan_partitions before first calling  bd_set_size to update the size
of the bdev.  So when we try to read the partition table, it will
appear to be reading past the EOF, and will not actually read
anything..

I guess udev must be opening the block device at exactly the wrong
time.
I can simulate this by holding /dev/md_d0 open while assembling the
array.  If I do that, the partitions don't get created.
Yuck.

Maybe I could call bd_set_size in md_open before calling
check_disk_change..

Yep, this patch seems to fix it.  Could you confirm?

Thanks,

NeilBrown

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c       2007-04-17 11:42:15.000000000 +1000
+++ ./drivers/md/md.c   2007-04-24 21:29:51.000000000 +1000
@@ -4485,6 +4485,8 @@ static int md_open(struct inode *inode, mddev_get(mddev);
        mddev_unlock(mddev);
+ if (mddev->changed)
+               bd_set_size(inode->i_bdev, mddev->array_size << 1);
        check_disk_change(inode->i_bdev);
  out:
        return err;

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Doug Ledford <[EMAIL PROTECTED]>
http://people.redhat.com/dledford

Infiniband specific RPMs can be found at
http://people.redhat.com/dledford/Infiniband
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to