On Thu, Oct 12, 2017 at 03:33:23PM +0200, Jan Kara wrote:
> >             return;
> >  
> > -   if (dax_mapping(mapping)) {
> > -           dax_delete_mapping_entry(mapping, index);
> > -           return;
> > +   dax = dax_mapping(mapping);
> > +   if (!dax)
> > +           spin_lock_irq(&mapping->tree_lock);
> > +
> > +   for (i = ei, j = ei; i < pagevec_count(pvec); i++) {
> > +           struct page *page = pvec->pages[i];
> > +           pgoff_t index = indices[i];
> > +
> > +           if (!radix_tree_exceptional_entry(page)) {
> > +                   pvec->pages[j++] = page;
> > +                   continue;
> > +           }
> > +
> > +           if (unlikely(dax)) {
> > +                   dax_delete_mapping_entry(mapping, index);
> > +                   continue;
> > +           }
> > +
> > +           __clear_shadow_entry(mapping, index, page);
> >     }
> > -   clear_shadow_entry(mapping, index, entry);
> > +
> > +   if (!dax)
> > +           spin_unlock_irq(&mapping->tree_lock);
> > +   pvec->nr = j;
> >  }
> 
> When I look at this I think could make things cleaner. I have the following
> observations:
> 
> 1) All truncate_inode_pages(), invalidate_mapping_pages(),
> invalidate_inode_pages2_range() essentially do very similar thing and would
> benefit from a similar kind of batching.
> 

While this is true, the benefit is much more marginal that I didn't feel
the level of churn was justified. Primarily it would help fadvise() and
invalidating when buffered and direct IO is mixed. I didn't think it would
be that much cleaner as a result so I left it.

> 2) As you observed and measured, batching of radix tree operations makes
> sense both when removing pages and shadow entries, I'm very confident it
> would make sense for DAX exceptional entries as well.
> 

True, but I didn't have a suitable setup for testing DAX so I wasn't
comfortable with making the change. dax_delete_mapping_entry can sleep but it
should be as simple as not taking the spinlock in dax_delete_mapping_entry
and always locking in truncate_exceptional_pvec_entries. dax is already
releasing the mapping->tree_lock if it needs to sleep and I didn't spot
any other gotcha but I'd prefer that change was done by someone that can
verify it works properly.

> 3) In all cases (i.e., those three functions and for all entry types) the
> workflow seems to be:
>   * lockless lookup of entries
>   * prepare entry for reclaim (or determine it is not elligible)
>   * lock mapping->tree_lock
>   * verify entry is still elligible for reclaim (otherwise bail)
>   * clear radix tree entry
>   * unlock mapping->tree_lock
>   * final cleanup of the entry
> 
> So I'm wondering whether we cannot somehow refactor stuff so that batching
> of radix tree operations could be shared and we wouldn't have to duplicate
> it in all those cases.
> 
> But it would be rather large overhaul of the code so it may be a bit out of
> scope for these improvements...
> 

I think it would be out of scope for this improvement but I can look into
it if the series is accepted. I think it would be a lot of churn for fairly
marginal benefit though.

> > @@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space 
> > *mapping,
> >                     }
> >  
> >                     if (radix_tree_exceptional_entry(page)) {
> > -                           truncate_exceptional_entry(mapping, index,
> > -                                                      page);
> > +                           if (ei != PAGEVEC_SIZE)
> > +                                   ei = i;
> 
> This should be ei == PAGEVEC_SIZE I think.
> 
> Otherwise the patch looks good to me so feel free to add:
> 

Fixed.

> Reviewed-by: Jan Kara <j...@suse.cz>

Thanks

-- 
Mel Gorman
SUSE Labs

Reply via email to