URL: <https://savannah.gnu.org/bugs/?58555>
Summary: grub-probe not identifying ZFS root filesystem due to unsupported features Project: GNU GRUB Submitted by: i336_ Submitted on: Sat 13 Jun 2020 08:33:14 AM UTC Category: Filesystem Severity: Major Priority: 5 - Normal Item Group: None Status: None Privacy: Public Assigned to: None Originator Name: Originator Email: Open/Closed: Open Release: Release: Git master Discussion Lock: Any Reproducibility: Every Time Planned Release: None _______________________________________________________ Details: Hi all, I've just set up a simple mirrored ZFS-on-root Debian configuration with /boot on a FAT32 partition. I'm temporarily using BIOS booting with grub-pc and a 1MB grub_boot partition, and plan to move the pool/disks to an EFI system later. Attempting to configure GRUB for this scenario produced a /boot/grub/grub.cfg with entries like linux /vmlinuz-5.5.0-0.bpo.2-amd64 root=ZFS=/debian ro which completely failed to boot due to the missing pool name. Investigation revealed this was because grub-probe, called in /etc/grub.d/10_linux to identify the ZFS root pool name (http://git.savannah.gnu.org/cgit/grub.git/tree/util/grub.d/10_linux.in?id=6a34fdb76a07305b95e31659bc27b1d190101cbf#n76), bailed out with "grub-probe: error: unknown filesystem". I incidentally configured multiple zpools in this setup, and idle curiosity revealed that grub-probe was able to detect one of the auxiliary pools: # grub-probe -t fs_label -d /dev/sda3 grub-probe: error: unknown filesystem. # grub-probe -t fs_label -d /dev/sda4 pool-1 Enabling verbose output revealed that zfs.c was bailing due to unsupported pool features: grub-core/fs/zfs/zfs.c:2115: zap: name = org.illumos:lz4_compress, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.joyent:multi_vdev_crash_dump, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:hole_birth, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:extensible_dataset, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:embedded_data, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.open-zfs:large_blocks, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.zfsonlinux:large_dnode, value = 1, cd = 0 grub-core/kern/fs.c:78: zfs detection failed. versus: grub-core/fs/zfs/zfs.c:2115: zap: name = org.illumos:lz4_compress, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.joyent:multi_vdev_crash_dump, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:hole_birth, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:extensible_dataset, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:embedded_data, value = 1, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.open-zfs:large_blocks, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.zfsonlinux:large_dnode, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.illumos:sha512, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.illumos:skein, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = org.illumos:edonr, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.datto:bookmark_v2, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.datto:encryption, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = com.delphix:device_removal, value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = , value = 0, cd = 0 grub-core/fs/zfs/zfs.c:2115: zap: name = , value = 0, cd = 0 pool-1 grub-core/kern/disk.c:295: Closing `hostdisk//dev/vda'. I did a bit of investigation, and traced that check_mos_features() calls mzap_iterate() with the check_feature() callback, which grub_strcmp()s each feature against a whitelist of known names (http://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/fs/zfs/zfs.c?id=6a34fdb76a07305b95e31659bc27b1d190101cbf#n285), and (back in mzap_iterate()) bails if any detected feature that is not known has a value of 1. Hence my conclusion above. Incidentally, the pools were created with identical parameters: zpool create -o ashift=12 -O acltype=posixacl -O canmount=off -O compression=lz4 -O xattr=sa -O relatime=on -O dnodesize=auto -o cachefile=none tank mirror /dev/disk/by-partlabel/... /dev/disk/by-partlabel/... zpool create -o ashift=12 -O acltype=posixacl -O canmount=off -O compression=lz4 -O xattr=sa -O relatime=on -O dnodesize=auto -o cachefile=none pool-1 /dev/disk/by-partlabel/... zpool create -o ashift=12 -O acltype=posixacl -O canmount=off -O compression=lz4 -O xattr=sa -O relatime=on -O dnodesize=auto -o cachefile=none pool-2 /dev/disk/by-partlabel/... (The first is a mirror across two disks, while the other two are non-redundant, discrete pools on each disk that will provide sha256 checksumming.) As a newcomer to ZFS, it _appears_ to me that the root pool automatically enabled (?) the large_dnode attribute during installation because I have dnodesize=auto. I may be incorrect. (pool-1 has not been mounted or touched yet; I'm still setting everything up, etc.) There are a couple conclusions I draw. As a comparative niggle, grub-probe's exit status of 1 is not caught in 10_linux. I outline my understanding of the bigger problem thusly: - GRUB's ZFS implementation cannot handle all ZFS features, which is why there is a root-pool / boot-pool split. - GRUB is calling grub-probe to supply Linux's root=ZFS=... parameter. - grub-probe is using GRUB's zfs.c known-limited implementation to examine the root pool, which is likely to have features unsupported by GRUB enabled. Perhaps I'm missing something here. I hope so... Me: _looks at every working GRUB 2 ZFS-on-root setup everywhere_ Me: _looks at the above, which says that everything should be catastrophically broken_ Me: _explodes_ Let me know what additional information I can provide. This is an effectively blank system, so you can reply with SSH pubkeys if that would be helpful (the only caveat being 250ms RTT to .com.au). Thanks in advance, David Lindsay NB. The non-head refs in the links point to the current Git master, for posterity. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?58555> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/