On 2018年03月07日 19:27, Nikolay Borisov wrote: > > > On 7.03.2018 13:18, Qu Wenruo wrote: >> >> >> On 2018年03月07日 19:01, robbieko wrote: >>> Qu Wenruo 於 2018-03-07 18:42 寫到: >>>> On 2018年03月07日 18:33, Qu Wenruo wrote: >>>>> >>>>> >>>>> On 2018年03月07日 16:20, robbieko wrote: >>>>>> From: Robbie Ko <robbi...@synology.com> >>>>>> >>>>>> [BUG] >>>>>> Range clone can cause fiemap to return error result. >>>>>> Like: >>>>>> # mount /dev/vdb5 /mnt/btrfs >>>>>> # dd if=ev/zero bsK count=2 oflag=dsync of=/mnt/btrfs/file >>>>>> # xfs_io -c "fiemap -v" /mnt/btrfs/file >>>>>> /mnt/btrfs/file: >>>>>> EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >>>>>> 0: [0..63]: 4241424..4241487 64 0x1 >>>>>> >>>>>> # cloner -s $((16*1024)) /mnt/btrfs/file /mnt/btrfs/file_clone >>>>>> # xfs_io -c "fiemap -v" /mnt/btrfs/file >>>>>> /mnt/btrfs/file: >>>>>> EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >>>>>> 0: [0..63]: 4241424..4241487 64 0x1 >>>>>> If we clone second file extent, we will get error FLAGS, >>>>>> SHARED bit is not set. >>>>>> >>>>>> [REASON] >>>>>> Btrfs only checks if the first extent in extent map is shared, >>>>>> but extent may merge. >>>>>> >>>>>> [FIX] >>>>>> Here we will check each extent with extent map range, >>>>>> if one of them is shared, extent map is shared. >>>>>> >>>>>> [PATCH RESULT] >>>>>> # xfs_io -c "fiemap -v" /mnt/btrfs/file >>>>>> /mnt/btrfs/file: >>>>>> EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >>>>>> 0: [0..63]: 4241424..4241487 64 0x2001 >>>>>> >>>>>> Signed-off-by: Robbie Ko <robbi...@synology.com> >>>>>> --- >>>>>> fs/btrfs/extent_io.c | 146 >>>>>> +++++++++++++++++++++++++++++++++++++++++++++------ >>>>>> 1 file changed, 131 insertions(+), 15 deletions(-) >>>>>> >>>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >>>>>> index 066b6df..5c6dca9 100644 >>>>>> --- a/fs/btrfs/extent_io.c >>>>>> +++ b/fs/btrfs/extent_io.c >>>>>> @@ -4394,8 +4394,8 @@ static int emit_fiemap_extent(struct >>>>>> fiemap_extent_info *fieinfo, >>>>>> */ >>>>>> if (cache->offset + cache->len =offset && >>>>>> cache->phys + cache->len =phys && >>>>>> - (cache->flags & ~FIEMAP_EXTENT_LAST) =>>>>> - (flags >>>>>> & ~FIEMAP_EXTENT_LAST)) { >>>>>> + (cache->flags & ~(FIEMAP_EXTENT_LAST|FIEMAP_EXTENT_SHARED)) >>>>>> =>>>>> + (flags & >>>>>> ~(FIEMAP_EXTENT_LAST|FIEMAP_EXTENT_SHARED))) { >>>>>> cache->len +=en; >>>>>> cache->flags |=lags; >>>>>> goto try_submit_last; >>>>>> @@ -4450,6 +4450,134 @@ static int emit_last_fiemap_cache(struct >>>>>> btrfs_fs_info *fs_info, >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +/* >>>>>> + * Helper to check the file range is shared. >>>>>> + * >>>>>> + * Fiemap extent will be combined with many extents, so we need to >>>>>> examine >>>>>> + * each extent, and if shared, the results are shared. >>>>>> + * >>>>>> + * Return: 0 if file range is not shared, 1 if it is shared, < 0 on >>>>>> error. >>>>>> + */ >>>>>> +static int extent_map_check_shared(struct inode *inode, u64 start, >>>>>> u64 end) >>>>>> +{ >>>>>> + struct btrfs_fs_info *fs_info =trfs_sb(inode->i_sb); >>>>>> + struct btrfs_root *root =TRFS_I(inode)->root; >>>>>> + int ret =; >>>>>> + struct extent_buffer *leaf; >>>>>> + struct btrfs_path *path; >>>>>> + struct btrfs_file_extent_item *fi; >>>>>> + struct btrfs_key found_key; >>>>>> + int check_prev =; >>>>>> + int extent_type; >>>>>> + int shared =; >>>>>> + u64 cur_offset; >>>>>> + u64 extent_end; >>>>>> + u64 ino =trfs_ino(BTRFS_I(inode)); >>>>>> + u64 disk_bytenr; >>>>>> + >>>>>> + path =trfs_alloc_path(); >>>>>> + if (!path) { >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> + >>>>>> + cur_offset =tart; >>>>>> + while (1) { >>>>>> + ret =trfs_lookup_file_extent(NULL, root, path, ino, >>>>>> + cur_offset, 0); >>>>>> + if (ret < 0) >>>>>> + goto error; >>>>>> + if (ret > 0 && path->slots[0] > 0 && check_prev) { >>>>>> + leaf =ath->nodes[0]; >>>>>> + btrfs_item_key_to_cpu(leaf, &found_key, >>>>>> + path->slots[0] - 1); >>>>>> + if (found_key.objectid =ino && >>>>>> + found_key.type =BTRFS_EXTENT_DATA_KEY) >>>>>> + path->slots[0]--; >>>>>> + } >>>>>> + check_prev =; >>>>>> +next_slot: >>>>>> + leaf =ath->nodes[0]; >>>>>> + if (path->slots[0] >=trfs_header_nritems(leaf)) { >>>>>> + ret =trfs_next_leaf(root, path); >>>>>> + if (ret < 0) >>>>>> + goto error; >>>>>> + if (ret > 0) >>>>>> + break; >>>>>> + leaf =ath->nodes[0]; >>>>>> + } >>>>>> + >>>>>> + disk_bytenr =; >>>>>> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); >>>>>> + >>>>>> + if (found_key.objectid > ino) >>>>>> + break; >>>>>> + if (WARN_ON_ONCE(found_key.objectid < ino) || >>>>>> + found_key.type < BTRFS_EXTENT_DATA_KEY) { >>>>>> + path->slots[0]++; >>>>>> + goto next_slot; >>>>>> + } >>>>>> + if (found_key.type > BTRFS_EXTENT_DATA_KEY || >>>>>> + found_key.offset > end) >>>>>> + break; >>>>>> + >>>>>> + fi =trfs_item_ptr(leaf, path->slots[0], >>>>>> + struct btrfs_file_extent_item); >>>>>> + extent_type =trfs_file_extent_type(leaf, fi); >>>>>> + >>>>>> + if (extent_type =BTRFS_FILE_EXTENT_REG || >>>>>> + extent_type =BTRFS_FILE_EXTENT_PREALLOC) { >>>>>> + disk_bytenr =trfs_file_extent_disk_bytenr(leaf, fi); >>>>>> + extent_end =ound_key.offset + >>>>>> + btrfs_file_extent_num_bytes(leaf, fi); >>>>>> + if (extent_end <=tart) { >>>>>> + path->slots[0]++; >>>>>> + goto next_slot; >>>>>> + } >>>>>> + if (disk_bytenr =0) { >>>>>> + path->slots[0]++; >>>>>> + goto next_slot; >>>>>> + } >>>>>> + >>>>>> + btrfs_release_path(path); >>>>>> + >>>>>> + /* >>>>>> + * As btrfs supports shared space, this information >>>>>> + * can be exported to userspace tools via >>>>>> + * flag FIEMAP_EXTENT_SHARED. If fi_extents_max =0 >>>>>> + * then we're just getting a count and we can skip the >>>>>> + * lookup stuff. >>>>>> + */ >>>>>> + ret =trfs_check_shared(root, >>>>>> + ino, disk_bytenr); >>>>>> + if (ret < 0) >>>>>> + goto error; >>>>>> + if (ret) >>>>>> + shared =; >>>>>> + ret =; >>>>>> + if (shared) { >>>>>> + break; >>>>>> + } >>>>>> + } else if (extent_type =BTRFS_FILE_EXTENT_INLINE) { >>>>>> + extent_end =ound_key.offset + >>>>>> + btrfs_file_extent_inline_len(leaf, >>>>>> + path->slots[0], fi); >>>>>> + extent_end =LIGN(extent_end, fs_info->sectorsize); >>>>>> + path->slots[0]++; >>>>>> + goto next_slot; >>>>>> + } else { >>>>>> + BUG_ON(1); >>>>>> + } >>>>>> + cur_offset =xtent_end; >>>>>> + if (cur_offset > end) >>>>>> + break; >>>>>> + } >>>>>> + >>>>>> + ret =; >>>>>> +error: >>>>>> + btrfs_free_path(path); >>>>>> + return !ret ? shared : ret; >>>>>> +} >>>>>> + >>>>>> int extent_fiemap(struct inode *inode, struct fiemap_extent_info >>>>>> *fieinfo, >>>>>> __u64 start, __u64 len, get_extent_t *get_extent) >>>>>> { >>>>>> @@ -4587,19 +4715,7 @@ int extent_fiemap(struct inode *inode, struct >>>>>> fiemap_extent_info *fieinfo, >>>>>> flags |=FIEMAP_EXTENT_DELALLOC | >>>>>> FIEMAP_EXTENT_UNKNOWN); >>>>>> } else if (fieinfo->fi_extents_max) { >>>>>> - u64 bytenr =m->block_start - >>>>>> - (em->start - em->orig_start); >>>>>> - >>>>>> - /* >>>>>> - * As btrfs supports shared space, this information >>>>>> - * can be exported to userspace tools via >>>>>> - * flag FIEMAP_EXTENT_SHARED. If fi_extents_max =0 >>>>>> - * then we're just getting a count and we can skip the >>>>>> - * lookup stuff. >>>>>> - */ >>>>>> - ret =trfs_check_shared(root, >>>>>> - btrfs_ino(BTRFS_I(inode)), >>>>>> - bytenr); >>>>> >>>>> Since we're going to use the whole extent to determine the SHARED flags, >>>>> what about just passing the extent bytenr into btrfs_check_shared? >>>>> >>>>> In that case I think we could get correct shared flag without using >>>>> another helper function. >>>>> (IIRC it's em->block_start) >>>> >>>> Well, it's not em->block_start, but a little more complex. >>>> (For compressed one it's em->block_start, but for plaintext one, it's >>>> more complex) >>>> >>>> em->block_start =ile_extent_disk_bytenr + >>>> file_extent_file_extent_offset >>>> >>>> We need extra calculation to determine the real extent bytenr >>>> (file_extent_disk_bytenr). >>>> >>>> IIRC the correct calculation would be: >>>> file_extent_disk_bytenr =m->block_start - em->start + em->orig_start. >>>> >>>> (Who thought out such anti-human calculation for extent map?!) >>>> >>>> Thanks, >>>> Qu >>> >>> dd if=ev/zero bsK count=4 oflag=dsync of=file >>> >>> btrfs-debugfs -f file >>> (276 0): ram 16384 disk 2171609088 disk_size 16384 >>> (276 16384): ram 16384 disk 2171625472 disk_size 16384 >>> (276 32768): ram 16384 disk 2171641856 disk_size 16384 >>> (276 49152): ram 16384 disk 2171658240 disk_size 16384 >>> file: file extents 4 disk size 65536 logical size 65536 ratio 1.00 >>> >>> xfs_io -c "fiemap -v" file >>> file: >>> EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >>> 0: [0..127]: 4241424..4241551 128 0x1 >>> >>> My point is that we have only one extent_map, >>> but in fact it can correspond to many extents >>> and each one has the potential to be cloned >>> so we need to examine each one individually. >> >> Right, I missed this point. >> >> SHARED flag is determined after extent map merge, so here we can't rely >> on em here. > > Shouldn't extent maps correspond to 1:1 disk-state. I.e. they are just > the memory cache of the extent state. So if we merge them, shouldn't we > also merge the on-disk extents as well ?
Not 1:1. In memory one is merged maybe to save memory. But on-disk file extents has size limit. For compressed one it's 128K and 128M for uncompressed one. And for on-disk file extent size limit, it could be reduce extent booking space waste and other reasons. Thanks, Qu > >> >> So the current helper function looks valid, unless we have method to >> access the original extent map. >> >> Thanks, >> Qu >> >>> >>> >>>>> >>>>> Thanks, >>>>> Qu >>>>> >>>>>> + ret =xtent_map_check_shared(inode, em->start, >>>>>> extent_map_end(em) - 1); >>>>>> if (ret < 0) >>>>>> goto out_free; >>>>>> if (ret) >>>>>> -- >>>>>> 1.9.1 >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>> linux-btrfs" in >>>>>> the body of a message to majord...@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
signature.asc
Description: OpenPGP digital signature