On Fri, Jan 05, 2018 at 12:51:09PM -0700, Liu Bo wrote: > This fixes a corner case that is caused by a race of dio write vs dio > read/write. > > Here is how the race could happen. > > Suppose that no extent map has been loaded into memory yet. > There is a file extent [0, 32K), two jobs are running concurrently > against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio > read from [0, 4K) or [4K, 8K). > > t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). > > ------------------------------------------------------ > t1 t2 > btrfs_get_blocks_direct() btrfs_get_blocks_direct() > -> btrfs_get_extent() -> btrfs_get_extent() > -> lookup_extent_mapping() > -> add_extent_mapping() -> lookup_extent_mapping() > # load [0, 32K) > -> btrfs_new_extent_direct() > -> btrfs_drop_extent_cache() > # split [0, 32K) and > # drop [8K, 32K) > -> add_extent_mapping() > # add [8K, 32K) > -> add_extent_mapping() > # handle -EEXIST when adding > # [0, 32K) > ------------------------------------------------------ > About how t2(dio read/write) runs into -EEXIST: > > a) add_extent_mapping() gets -EEXIST for adding em [0, 32k), > > b) search_extent_mapping() then returns [0, 8k) as the existing em, > even though start == existing->start, em is [0, 32k) so that > extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k, > > c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) > (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree, > > d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write, > which is confusing applications. > > Here I conclude all the possible situations, > 1) start < existing->start > > +-----------+em+-----------+ > +--prev---+ | +-------------+ | > | | | | | | > +---------+ + +---+existing++ ++ > + > | > + > start > > 2) start == existing->start > > +------------em------------+ > | +-------------+ | > | | | | > + +----existing-+ + > | > | > + > start > > 3) start > existing->start && start < (existing->start + existing->len) > > +------------em------------+ > | +-------------+ | > | | | | > + +----existing-+ + > | > | > + > start > > 4) start >= (existing->start + existing->len) > > +-----------+em+-----------+ > | +-------------+ | +--next---+ > | | | | | | > + +---+existing++ + +---------+ > + > | > + > start > > As we can see, it turns out that if start is within existing em (front > inclusive), then the existing em should be returned as is, otherwise, > we try our best to merge candidate em with sibling ems to form a > larger em (in order to reduce the total number of em). > > Reported-by: David Vallender <david.vallen...@landmark.co.uk> > Signed-off-by: Liu Bo <bo.li....@oracle.com>
Reviewed-by: Josef Bacik <jba...@fb.com> Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html