On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote: > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote: > > Hello Liu Bo, > > > > We have to fix the following check in check_super() as well, > > > > if (btrfs_super_stripesize(sb) != 4096) { > > error("invalid stripesize %u", btrfs_super_stripesize(sb)); > > goto error_out; > > } > > > > i.e. btrfs_super_stripesize(sb) must be equal to > > btrfs_super_sectorsize(sb). > > > > However in btrfs-progs (mkfs.c to be precise) since we had stripesize > > hardcoded to 4096, setting stripesize to the value of sectorsize in > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for > > devices with existing Btrfs filesystem instances, > > > > NOTE: Assume we have changed the stripesize validation in btrfs-progs' > > check_super() to, > > > > if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) { > > error("invalid stripesize %u", btrfs_super_stripesize(sb)); > > goto error_out; > > } > > > > > > main() > > for each device file passed as an argument, > > test_dev_for_mkfs() > > check_mounted > > check_mounted_where > > btrfs_scan_one_device > > btrfs_read_dev_super > > check_super() call will fail for existing filesystems which > > have stripesize set to 4k. All existing filesystem instances will fall into > > this category. > > > > This error value is pushed up the call stack and this causes the device to > > not get added to the fs_devices_mnt list in check_mounted_where(). Hence we > > would fail to correctly check the mount status of the multi-device btrfs > > filesystems. > > > > > We can end up in the following scenario, > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single > filesystem. The filesystem was created by an older version of mkfs.btrfs > which set stripesize to 4k. > - losetup -a > /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img) > /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img) > /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img) > - /etc/mtab lists only /dev/loop0 > - "losetup /dev/loop4 /root/disk-imgs/file-1.img" > The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even > though /dev/loop1 has already been mounted and has > /root/disk-imgs/file-1.img as its backing file. > > So IMHO the only solution is to have the stripesize check in check_super() to > allow both '4k' and 'sectorsize' as valid values i.e. > > if ((btrfs_super_stripesize(sb) != 4096) > && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) { > error("invalid stripesize %u", btrfs_super_stripesize(sb)); > goto error_out; > }
That's a good one. But if we go back to the original point, in the kernel side, 1. in open_ctree(), root->stripesize = btrfs_super_stripesize(); 2. in find_free_extent(), ... search_start = ALIGN(offset, root->stripesize); ... 3. in btrfs_alloc_tree_block(), ... ret = btrfs_reserve_extent(..., &ins, ...); ... buf = btrfs_init_new_buffer(trans, root, ins.objectid, level); 4. in btrfs_init_new_buffer(), ... buf = btrfs_find_create_tree_block(root, bytenr); ... Because 'num_bytes' we pass to find_free_extent() is aligned to sectorsize, the free space we can find is aligned to sectorsize, which means 'offset' in '1. find_free_extent()' is aligned to sectorsize. If our stripesize is larger than sectorsize, say 4 * sectorsize, for data allocation it's fine while for metadata block allocations it's not. It is possible that when we allocate a new metadata block, we can end up with an existing eb returned by '4. in btrfs_init_new_buffer()'. PS: There is something wrong around '2. in find_free_extent()', we only do alignment on offset, but for num_bytes, we don't do that, so we may end up with a overlap with other data extents or metadata blocks. So I think we can just replace this ALING with a warning since the offset returned by searching free space tree is aligned to block_group->full_stripe_len, which is either sectorsize or BTRFS_STRIPE_LEN * nr_stripes (for raid56), then we can just drop the check for stripesize everywhere. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html