On Fri, Dec 22, 2017 at 02:10:45PM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > This fixes a corner case that is caused by a race of dio write vs dio
> > read/write.
> > 
> > dio write:
> > [0, 32k) -> [0, 8k) + [8k, 32k)
> > 
> > dio read/write:
> > 
> > While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
> > though start == existing->start, em is [0, 32k),
> > extent_map_end(em) > extent_map_end(existing),
> > then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
> > (with a length 0), and get_extent ends up returning -EEXIST, and dio
> > read/write will get -EEXIST which is confusing applications.
> 
> I think this paragraph should be expanded a bit since you've crammed a
> lot of information in few words.
> 

OK, it's not easy to explain the problem, either.

Probably I should also attach the following as a extra explanation,

+ * Suppose that no extent map has been loaded into memory yet.
+ * There is a file extent [0, 32K), two jobs are running concurrently
+ * against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
+ * read from [0, 4K) or [4K, 8K).
+ *
+ * t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
+ *
+ *         t1                                t2
+ *  btrfs_get_blocks_direct()         btrfs_get_blocks_direct()
+ *   -> btrfs_get_extent()              -> btrfs_get_extent()
+ *       -> lookup_extent_mapping()
+ *       -> add_extent_mapping()            -> lookup_extent_mapping()
+ *          # load [0, 32K)
+ *   -> btrfs_new_extent_direct()
+ *       -> btrfs_drop_extent_cache()
+ *          # split [0, 32K)
+ *       -> add_extent_mapping()
+ *          # add [8K, 32K)
+ *                                          -> add_extent_mapping()
+ *                                             # handle -EEXIST when adding
+ *                                             # [0, 32K)


> In the below graphs em should be extent we'd like to insert, no?
> So given that it always encompasses the existing (0, 8k given the
> above description) then em should be 0, 32k ?

Yes and yes.

thanks,
-liubo

> > 
> > Here I concluded all the possible situations,
> > 1) start < existing->start
> > 
> >             +-----------+em+-----------+
> > +--prev---+ |     +-------------+      |
> > |         | |     |             |      |
> > +---------+ +     +---+existing++      ++
> >                 +
> >                 |
> >                 +
> >              start
> > 
> > 2) start == existing->start
> > 
> >       +------------em------------+
> >       |     +-------------+      |
> >       |     |             |      |
> >       +     +----existing-+      +
> >             |
> >             |
> >             +
> >          start
> > 
> > 3) start > existing->start && start < (existing->start + existing->len)
> > 
> >       +------------em------------+
> >       |     +-------------+      |
> >       |     |             |      |
> >       +     +----existing-+      +
> >                |
> >                |
> >                +
> >              start
> > 
> > 4) start >= (existing->start + existing->len)
> > 
> > +-----------+em+-----------+
> > |     +-------------+      | +--next---+
> > |     |             |      | |         |
> > +     +---+existing++      + +---------+
> >                       +
> >                       |
> >                       +
> >                    start
> > 
> > After going thru the above case by case, it turns out that if start is
> > within existing em (front inclusive), then the existing em should be
> > returned, otherwise, we try our best to merge candidate em with
> > sibling ems to form a larger em.
> > 
> > Reported-by: David Vallender <david.vallen...@landmark.co.uk>
> > Signed-off-by: Liu Bo <bo.li....@oracle.com>
> > ---
> >  fs/btrfs/extent_map.c | 25 ++++++++++---------------
> >  1 file changed, 10 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> > index 6653b08..d386cfb 100644
> > --- a/fs/btrfs/extent_map.c
> > +++ b/fs/btrfs/extent_map.c
> > @@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct 
> > extent_map *em)
> >  static int merge_extent_mapping(struct extent_map_tree *em_tree,
> >                             struct extent_map *existing,
> >                             struct extent_map *em,
> > -                           u64 map_start)
> > +                           u64 map_start, u64 map_len)
> >  {
> >     struct extent_map *prev;
> >     struct extent_map *next;
> > @@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree 
> > *em_tree,
> >     if (existing->start > map_start) {
> >             next = existing;
> >             prev = prev_extent_map(next);
> > +           if (prev)
> > +                   ASSERT(extent_map_end(prev) <= map_start);
> >     } else {
> >             prev = existing;
> >             next = next_extent_map(prev);
> > +           if (next)
> > +                   ASSERT(map_start + map_len <= next->start);
> >     }
> >  
> >     start = prev ? extent_map_end(prev) : em->start;
> > @@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree 
> > *em_tree,
> >              * existing will always be non-NULL, since there must be
> >              * extent causing the -EEXIST.
> >              */
> > -           if (existing->start == em->start &&
> > -               extent_map_end(existing) >= extent_map_end(em) &&
> > -               em->block_start == existing->block_start) {
> > -                   /*
> > -                    * The existing extent map already encompasses the
> > -                    * entire extent map we tried to add.
> > -                    */
> > +           if (start >= existing->start &&
> > +               start < extent_map_end(existing)) {
> >                     free_extent_map(em);
> >                     *em_in = existing;
> >                     ret = 0;
> > -           } else if (start >= extent_map_end(existing) ||
> > -               start <= existing->start) {
> > +           } else {
> >                     /*
> >                      * The existing extent map is the one nearest to
> >                      * the [start, start + len) range which overlaps
> >                      */
> >                     ret = merge_extent_mapping(em_tree, existing,
> > -                                              em, start);
> > +                                              em, start, len);
> >                     free_extent_map(existing);
> >                     if (ret) {
> >                             free_extent_map(em);
> >                             *em_in = NULL;
> >                     }
> > -           } else {
> > -                   free_extent_map(em);
> > -                   *em_in = existing;
> > -                   ret = 0;
> >             }
> >     }
> > +
> >     ASSERT(ret == 0 || ret == -EEXIST);
> >     return ret;
> >  }
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to