On Mon, Oct 27, 2014 at 12:11 PM, Filipe David Manana <fdman...@gmail.com> wrote: > On Mon, Oct 27, 2014 at 11:08 AM, Miao Xie <mi...@cn.fujitsu.com> wrote: >> On Mon, 27 Oct 2014 09:19:52 +0000, Filipe Manana wrote: >>> We have a race that can lead us to miss skinny extent items in the function >>> btrfs_lookup_extent_info() when the skinny metadata feature is enabled. >>> So basically the sequence of steps is: >>> >>> 1) We search in the extent tree for the skinny extent, which returns > 0 >>> (not found); >>> >>> 2) We check the previous item in the returned leaf for a non-skinny extent, >>> and we don't find it; >>> >>> 3) Because we didn't find the non-skinny extent in step 2), we release our >>> path to search the extent tree again, but this time for a non-skinny >>> extent key; >>> >>> 4) Right after we released our path in step 3), a skinny extent was inserted >>> in the extent tree (delayed refs were run) - our second extent tree >>> search >>> will miss it, because it's not looking for a skinny extent; >>> >>> 5) After the second search returned (with ret > 0), we look for any delayed >>> ref for our extent's bytenr (and we do it while holding a read lock on >>> the >>> leaf), but we won't find any, as such delayed ref had just run and >>> completed >>> after we released out path in step 3) before doing the second search. >>> >>> Fix this by removing completely the path release and re-search logic. This >>> is >>> safe, because if we seach for a metadata item and we don't find it, we have >>> the >>> guarantee that the returned leaf is the one where the item would be >>> inserted, >>> and so path->slots[0] > 0 and path->slots[0] - 1 must be the slot where the >>> non-skinny extent item is if it exists. The only case where path->slots[0] >>> is >> >> I think this analysis is wrong if there are some independent shared ref >> metadata for >> a tree block, just like: >> +------------------------+-------------+-------------+ >> | tree block extent item | shared ref1 | shared ref2 | >> +------------------------+-------------+-------------+
Trying to guess what's in your mind. Is the concern that if after a non-skinny extent item we have non-inlined references, the assumption that path->slots[0] - 1 points to the extent item would be wrong when searching for a skinny extent? That wouldn't be the case because BTRFS_EXTENT_ITEM_KEY == 168 and BTRFS_METADATA_ITEM_KEY == 169, with BTRFS_SHARED_BLOCK_REF_KEY == 182. So in the presence of such non-inlined shared tree block reference items, searching for a skinny extent item leaves us at a slot that points to the first non-inlined ref (regardless of its type, since they're all > 169), and therefore path->slots[0] - 1 is the non-skinny extent item. thanks. > > Why does that matters? Can you elaborate why it's not correct? > > We're looking for the extent item only in btrfs_lookup_extent_info(), > and running a delayed ref, independently of being inlined/shared, it > implies inserting a new extent item or updating an existing extent > item (updating ref count). > > thanks > >> >> Thanks >> Miao >> >>> zero is when there are no smaller keys in the tree (i.e. no left siblings >>> for >>> our leaf), in which case the re-search logic isn't needed as well. >>> >>> This race has been present since the introduction of skinny metadata (change >>> 3173a18f70554fe7880bb2d85c7da566e364eb3c). >>> >>> Signed-off-by: Filipe Manana <fdman...@suse.com> >>> --- >>> fs/btrfs/extent-tree.c | 8 -------- >>> 1 file changed, 8 deletions(-) >>> >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >>> index 9141b2b..2cedd06 100644 >>> --- a/fs/btrfs/extent-tree.c >>> +++ b/fs/btrfs/extent-tree.c >>> @@ -780,7 +780,6 @@ search_again: >>> else >>> key.type = BTRFS_EXTENT_ITEM_KEY; >>> >>> -again: >>> ret = btrfs_search_slot(trans, root->fs_info->extent_root, >>> &key, path, 0, 0); >>> if (ret < 0) >>> @@ -796,13 +795,6 @@ again: >>> key.offset == root->nodesize) >>> ret = 0; >>> } >>> - if (ret) { >>> - key.objectid = bytenr; >>> - key.type = BTRFS_EXTENT_ITEM_KEY; >>> - key.offset = root->nodesize; >>> - btrfs_release_path(path); >>> - goto again; >>> - } >>> } >>> >>> if (ret == 0) { >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html