On 2015-06-13 11:35, Anand Jain wrote: > > Thanks for your reply Andrei and Goffredo. more below... > > On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote: >> On 2015-06-12 20:04, Andrei Borzenkov wrote: >>> В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain >>> <anand.j...@oracle.com> пишет: >>> >>>> >>>> >>>> BTRFS_IOC_DEVICES_READY is to check if all the required >>>> devices are known by the btrfs kernel, so that >>>> admin/system-application could mount the FS. It is checked >>>> against a device in the argument. >>>> >>>> However the actual implementation is bit more than just that, >>>> in the way that it would also scan and register the device >>>> provided in the argument (same as btrfs device scan subcommand >>>> or BTRFS_IOC_SCAN_DEV ioctl). >>>> >>>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, >>>> but its a write command as well. >>>> >>>> Next, since in the kernel we only check if total_devices (read >>>> from SB) is equal to num_devices (counted in the list) to >>>> state the status as 0 (ready) or 1 (not ready). But this does >>>> not work in rest of the device pool state like missing, >>>> seeding, replacing since total_devices is actually not equal to >>>> num_devices in these state but device pool is ready for the >>>> mount and its a bug which is not part of this discussions. >>>> >>>> >>>> Questions: >>>> >>>> - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and >>>> register the device provided (same as btrfs device scan command >>>> or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY >>>> be read-only ioctl interface to check the state of the device >>>> pool. ? >>>> >>> >>> udev is using it to incrementally assemble multi-device btrfs, so >>> in this case I think it should. > > Nice. Thanks for letting me know this. > >> I agree, the ioctl name is confusing, but unfortunately this is an >> API and it has to be stay here forever. Udev uses it, so we know >> for sure that it is widely used. > > ok. what goes in stays there forever. its time to update the man page > rather. > >>> Are there any other users? >>> >>>> - If the the device in the argument is already mounted, can it >>>> straightaway return 0 (ready) ? (as of now it would again >>>> independently read the SB determine total_devices and check >>>> against num_devices. >>>> >>> >>> I think yes; obvious use case is btrfs mounted in initrd and >>> later coldplug. There is no point to wait for anything as >>> filesystem is obviously there. >>> > > There is little difference. If the device is already mounted. And > there are two device paths for the same device PA and PB. The path as > last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs > device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs > filesystem show' or '/proc/self/mounts' output. It does not mean that > btrfs kernel will close the first device path and reopen the 2nd > given device path, it just updates the device path in the kernel. > > Further, the problem will be more intense in this eg. if you use dd > and copy device A to device B. After you mount device A, by just > providing device B in the above two commands you could let kernel > update the device path, again all the IO (since device is mounted) > are still going to the device A (not B), but /proc/self/mounts and > 'btrfs fi show' shows it as device B (not A). > > Its a bug. very tricky to fix.
In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel. I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers: - kernel - udev/systemd - initrd logic My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767 back to your questions > - we can't return -EBUSY for subsequent (after mount) calls for the > above two ioctls (if a mounted device is used as an argument). Since > admin/system-application might actually call again to mount subvols. I am not sure that the two things are related: the mount doesn't use BTRFS_IOC_DEVICES_READY. After BTRFS_IOC_DEVICES_READY returns OK, all the filesystem belongs this FSID should be mounted; but it is a job of systemd/initramfs/sysv... a further failed BTRFS_IOC_DEVICES_READY shouldn't case any problem ... > > - we can return success (without updating the device path) but, we > would be wrong when device A is copied into device B using dd. Since > we would check against the on device SB's fsid/uuid/devid. Checking > using strcmp the device paths is not practical since there can be > different paths to the same device (lets says mapper). > > (any suggestion on how to check if its the same device in the > kernel?). check minor/major ? > > - Also if we don't let to update the device path after device is > mounted, then are there chances that we would be stuck with the > device path during initrd which does not make any sense to the user > ? > > >>>> - What should be the expected return when the FS is mounted and >>>> there is a missing device. >> >> I suggest to not invest further energy on a ioctl API. If you want >> these kind of information, you (we) should export these in sysfs: >> In an ideal world: >> >> - a new btrfs device appears - udev register it with >> BTRFS_IOC_SCAN_DEV: - udev (or mount ?) checks the status of the >> filesystem reading the sysfs entries (total devices, present >> devices, seed devices, raid level....); on the basis of the local >> policy (allow degraded mount, device timeout, how many device are >> missing, filesystem redundancy level.....) udev (mount) may mount >> the filesystem with the appropriate parameter (ro, degraded, or >> even insert a spare device to correct a missing device....) > > Yes. sysfs interface is coming. few framework patch were sent > sometime back, any comments will help. On the ioctl part I am trying > to fix the bug(s). > >>>> >>> >>> This is similar to problem mdadm had to solve. mdadm starts timer >>> as soon as enough raid devices are present; if timer expires >>> before raid is complete, raid is started in degraded mode. This >>> avoids spurious rebuilds. So it would be good if btrfs could >>> distinguish between enough devices to mount and all devices. > >> These are two different things: how export the filesystem >> information (I am still convinced that these have to be exported >> via sysfs), and what the system has to do in case of ... (a missing >> device ?). The latter is a policy, and I think that it should be >> not rely in the kernel. >> >> >>> -- To unsubscribe from this list: send the line "unsubscribe >>> linux-btrfs" in the body of a message to >>> majord...@vger.kernel.org More majordomo info at >>> http://vger.kernel.org/majordomo-info.html >>> >> >> > -- To unsubscribe from this list: send the line "unsubscribe > linux-btrfs" in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html