On Thu, Apr 08, 2021 at 05:25:28PM +0900, Naohiro Aota wrote: > This commit moves the location of the superblock logging zones. The new > locations of the logging zones are now determined based on fixed block > addresses instead of on fixed zone numbers. > > The old placement method based on fixed zone numbers causes problems when > one needs to inspect a file system image without access to the drive zone > information. In such case, the super block locations cannot be reliably > determined as the zone size is unknown. By locating the superblock logging > zones using fixed addresses, we can scan a dumped file system image without > the zone information since a super block copy will always be present at or > after the fixed location. > > This commit introduces the following three pairs of zones containing fixed > offset locations, regardless of the device zone size. > > - Primary superblock: zone starting at offset 0 and the following zone > - First copy: zone containing offset 64GB and the following zone > - Second copy: zone containing offset 256GB and the following zone > > If a logging zone is outside of the disk capacity, we do not record the > superblock copy. > > The first copy position is much larger than for a regular btrfs volume > (64M). This increase is to avoid overlapping with the log zones for the > primary superblock. This higher location is arbitrary but allows supporting > devices with very large zone sizes, up to 32GB. Such large zone size is > unrealistic and very unlikely to ever be seen in real devices. Currently, > SMR disks have a zone size of 256MB, and we are expecting ZNS drives to be > in the 1-4GB range, so this 32GB limit gives us room to breathe. For now, > we only allow zone sizes up to 8GB, below this hard limit of 32GB. > > The fixed location addresses are somewhat arbitrary, but with the intent of > maintaining superblock reliability even for smaller devices. For this > reason, the superblock fixed locations do not exceed 1TB.
Yeah, so how much should we adjust the offsets in favor of smaller devices instead of larger ones? I think this is going the wrong direction, the capacity will grow, the zones are supposed to be larger. > The superblock logging zones are reserved for superblock logging and never > used for data or metadata blocks. Note that we only reserve the two zones > per primary/copy actually used for superblock logging. We do not reserve > the ranges of zones possibly containing superblocks with the largest > supported zone size (0-16GB, 64G-80GB, 256G-272GB). > > The zones containing the fixed location offsets used to store superblocks > in a regular btrfs volume (no zoned case) are also reserved to avoid > confusion. > > Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> > --- > fs/btrfs/zoned.c | 43 +++++++++++++++++++++++++++++++++++-------- > 1 file changed, 35 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index 1f972b75a9ab..a4b195fe08a0 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -21,9 +21,28 @@ > /* Pseudo write pointer value for conventional zone */ > #define WP_CONVENTIONAL ((u64)-2) > > +/* > + * Location of the first zone of superblock logging zone pairs. > + * - Primary superblock: the zone containing offset 0 (zone 0) > + * - First superblock copy: the zone containing offset 64G > + * - Second superblock copy: the zone containing offset 256G > + */ > +#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL > +#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G) > +#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G) > +#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE) > +#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE) This should be named like BTRFS_SB_LOG_* ie. "namespaces" or common prefixes for the same class of valuesd. > + > /* Number of superblock log zones */ > #define BTRFS_NR_SB_LOG_ZONES 2 > > +/* > + * Maximum size of zones. Currently, SMR disks have a zone size of 256MB, > + * and we are expecting ZNS drives to be in the 1-4GB range. We do not > + * expect the zone size to become larger than 8GB in the near future. > + */ > +#define BTRFS_MAX_ZONE_SIZE SZ_8G > + > static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void > *data) > { > struct blk_zone *zones = data; > @@ -111,11 +130,8 @@ static int sb_write_pointer(struct block_device *bdev, > struct blk_zone *zones, > } > > /* > - * The following zones are reserved as the circular buffer on ZONED btrfs. > - * - The primary superblock: zones 0 and 1 > - * - The first copy: zones 16 and 17 > - * - The second copy: zones 1024 or zone at 256GB which is minimum, and > - * the following one > + * Get the zone number of the first zone of a pair of contiguous zones used > + * for superblock logging. > */ > static inline u32 sb_zone_number(int shift, int mirror) > { > @@ -123,8 +139,8 @@ static inline u32 sb_zone_number(int shift, int mirror) > > switch (mirror) { > case 0: return 0; > - case 1: return 16; > - case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024); > + case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift); > + case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift); > } > > return 0; > @@ -300,10 +316,21 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device) > zone_sectors = bdev_zone_sectors(bdev); > } > > - nr_sectors = bdev_nr_sectors(bdev); > /* Check if it's power of 2 (see is_power_of_2) */ > ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0); > zone_info->zone_size = zone_sectors << SECTOR_SHIFT; > + > + /* We reject devices with a zone size larger than 8GB. */ > + if (zone_info->zone_size > BTRFS_MAX_ZONE_SIZE) { > + btrfs_err_in_rcu(fs_info, > + "zoned: %s: zone size %llu is too large", > + rcu_str_deref(device->name), > + zone_info->zone_size); > + ret = -EINVAL; > + goto out; > + } > + > + nr_sectors = bdev_nr_sectors(bdev); > zone_info->zone_size_shift = ilog2(zone_info->zone_size); > zone_info->max_zone_append_size = > (u64)queue_max_zone_append_sectors(queue) << SECTOR_SHIFT; > -- > 2.31.1