Re: [PATCHv3 14/41] filemap: allocate huge page in page_cache_read(), if allowed

2016-10-11 Thread Kirill A. Shutemov
On Tue, Oct 11, 2016 at 06:15:45PM +0200, Jan Kara wrote:
> On Thu 15-09-16 14:54:56, Kirill A. Shutemov wrote:
> > This patch adds basic functionality to put huge page into page cache.
> > 
> > At the moment we only put huge pages into radix-tree if the range covered
> > by the huge page is empty.
> > 
> > We ignore shadow entires for now, just remove them from the tree before
> > inserting huge page.
> > 
> > Later we can add logic to accumulate information from shadow entires to
> > return to caller (average eviction time?).
> > 
> > Signed-off-by: Kirill A. Shutemov 
> > ---
> >  include/linux/fs.h  |   5 ++
> >  include/linux/pagemap.h |  21 ++-
> >  mm/filemap.c| 148 
> > +++-
> >  3 files changed, 157 insertions(+), 17 deletions(-)
> > 
> ...
> > @@ -663,16 +663,55 @@ static int __add_to_page_cache_locked(struct page 
> > *page,
> > page->index = offset;
> >  
> > spin_lock_irq(&mapping->tree_lock);
> > -   error = page_cache_tree_insert(mapping, page, shadowp);
> > +   if (PageTransHuge(page)) {
> > +   struct radix_tree_iter iter;
> > +   void **slot;
> > +   void *p;
> > +
> > +   error = 0;
> > +
> > +   /* Wipe shadow entires */
> > +   radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, 
> > offset) {
> > +   if (iter.index >= offset + HPAGE_PMD_NR)
> > +   break;
> > +
> > +   p = radix_tree_deref_slot_protected(slot,
> > +   &mapping->tree_lock);
> > +   if (!p)
> > +   continue;
> > +
> > +   if (!radix_tree_exception(p)) {
> > +   error = -EEXIST;
> > +   break;
> > +   }
> > +
> > +   mapping->nrexceptional--;
> > +   rcu_assign_pointer(*slot, NULL);
> 
> I think you also need something like workingset_node_shadows_dec(node)
> here. It would be even better if you used something like
> clear_exceptional_entry() to have the logic in one place (you obviously
> need to factor out only part of clear_exceptional_entry() first).

Good point. Will do.

> > +   }
> > +
> > +   if (!error)
> > +   error = __radix_tree_insert(&mapping->page_tree, offset,
> > +   compound_order(page), page);
> > +
> > +   if (!error) {
> > +   count_vm_event(THP_FILE_ALLOC);
> > +   mapping->nrpages += HPAGE_PMD_NR;
> > +   *shadowp = NULL;
> > +   __inc_node_page_state(page, NR_FILE_THPS);
> > +   }
> > +   } else {
> > +   error = page_cache_tree_insert(mapping, page, shadowp);
> > +   }
> 
> And I'd prefer to have this logic moved to page_cache_tree_insert() because
> logically it IMHO belongs there - it is a simply another case of handling
> of radix tree used for page cache.

Okay.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 14/41] filemap: allocate huge page in page_cache_read(), if allowed

2016-10-11 Thread Jan Kara
On Thu 15-09-16 14:54:56, Kirill A. Shutemov wrote:
> This patch adds basic functionality to put huge page into page cache.
> 
> At the moment we only put huge pages into radix-tree if the range covered
> by the huge page is empty.
> 
> We ignore shadow entires for now, just remove them from the tree before
> inserting huge page.
> 
> Later we can add logic to accumulate information from shadow entires to
> return to caller (average eviction time?).
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  include/linux/fs.h  |   5 ++
>  include/linux/pagemap.h |  21 ++-
>  mm/filemap.c| 148 
> +++-
>  3 files changed, 157 insertions(+), 17 deletions(-)
> 
...
> @@ -663,16 +663,55 @@ static int __add_to_page_cache_locked(struct page *page,
>   page->index = offset;
>  
>   spin_lock_irq(&mapping->tree_lock);
> - error = page_cache_tree_insert(mapping, page, shadowp);
> + if (PageTransHuge(page)) {
> + struct radix_tree_iter iter;
> + void **slot;
> + void *p;
> +
> + error = 0;
> +
> + /* Wipe shadow entires */
> + radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, 
> offset) {
> + if (iter.index >= offset + HPAGE_PMD_NR)
> + break;
> +
> + p = radix_tree_deref_slot_protected(slot,
> + &mapping->tree_lock);
> + if (!p)
> + continue;
> +
> + if (!radix_tree_exception(p)) {
> + error = -EEXIST;
> + break;
> + }
> +
> + mapping->nrexceptional--;
> + rcu_assign_pointer(*slot, NULL);

I think you also need something like workingset_node_shadows_dec(node)
here. It would be even better if you used something like
clear_exceptional_entry() to have the logic in one place (you obviously
need to factor out only part of clear_exceptional_entry() first).

> + }
> +
> + if (!error)
> + error = __radix_tree_insert(&mapping->page_tree, offset,
> + compound_order(page), page);
> +
> + if (!error) {
> + count_vm_event(THP_FILE_ALLOC);
> + mapping->nrpages += HPAGE_PMD_NR;
> + *shadowp = NULL;
> + __inc_node_page_state(page, NR_FILE_THPS);
> + }
> + } else {
> + error = page_cache_tree_insert(mapping, page, shadowp);
> + }

And I'd prefer to have this logic moved to page_cache_tree_insert() because
logically it IMHO belongs there - it is a simply another case of handling
of radix tree used for page cache.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 14/41] filemap: allocate huge page in page_cache_read(), if allowed

2016-09-15 Thread Kirill A. Shutemov
This patch adds basic functionality to put huge page into page cache.

At the moment we only put huge pages into radix-tree if the range covered
by the huge page is empty.

We ignore shadow entires for now, just remove them from the tree before
inserting huge page.

Later we can add logic to accumulate information from shadow entires to
return to caller (average eviction time?).

Signed-off-by: Kirill A. Shutemov 
---
 include/linux/fs.h  |   5 ++
 include/linux/pagemap.h |  21 ++-
 mm/filemap.c| 148 +++-
 3 files changed, 157 insertions(+), 17 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 901e25d495cc..122024ccc739 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1829,6 +1829,11 @@ struct super_operations {
 #else
 #define S_DAX  0   /* Make all the DAX code disappear */
 #endif
+#define S_HUGE_MODE0xc000
+#define S_HUGE_NEVER   0x
+#define S_HUGE_ALWAYS  0x4000
+#define S_HUGE_WITHIN_SIZE 0x8000
+#define S_HUGE_ADVISE  0xc000
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260b33de..a84f11a672f0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -191,14 +191,20 @@ static inline int page_cache_add_speculative(struct page 
*page, int count)
 }
 
 #ifdef CONFIG_NUMA
-extern struct page *__page_cache_alloc(gfp_t gfp);
+extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
 #else
-static inline struct page *__page_cache_alloc(gfp_t gfp)
+static inline struct page *__page_cache_alloc_order(gfp_t gfp,
+   unsigned int order)
 {
-   return alloc_pages(gfp, 0);
+   return alloc_pages(gfp, order);
 }
 #endif
 
+static inline struct page *__page_cache_alloc(gfp_t gfp)
+{
+   return __page_cache_alloc_order(gfp, 0);
+}
+
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
return __page_cache_alloc(mapping_gfp_mask(x));
@@ -215,6 +221,15 @@ static inline gfp_t readahead_gfp_mask(struct 
address_space *x)
  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN;
 }
 
+extern bool __page_cache_allow_huge(struct address_space *x, pgoff_t offset);
+static inline bool page_cache_allow_huge(struct address_space *x,
+   pgoff_t offset)
+{
+   if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+   return false;
+   return __page_cache_allow_huge(x, offset);
+}
+
 typedef int filler_t(void *, struct page *);
 
 pgoff_t page_cache_next_hole(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 6f7f45f47d68..50afe17230e7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -637,14 +637,14 @@ static int __add_to_page_cache_locked(struct page *page,
  pgoff_t offset, gfp_t gfp_mask,
  void **shadowp)
 {
-   int huge = PageHuge(page);
+   int hugetlb = PageHuge(page);
struct mem_cgroup *memcg;
int error;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapBacked(page), page);
 
-   if (!huge) {
+   if (!hugetlb) {
error = mem_cgroup_try_charge(page, current->mm,
  gfp_mask, &memcg, false);
if (error)
@@ -653,7 +653,7 @@ static int __add_to_page_cache_locked(struct page *page,
 
error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
if (error) {
-   if (!huge)
+   if (!hugetlb)
mem_cgroup_cancel_charge(page, memcg, false);
return error;
}
@@ -663,16 +663,55 @@ static int __add_to_page_cache_locked(struct page *page,
page->index = offset;
 
spin_lock_irq(&mapping->tree_lock);
-   error = page_cache_tree_insert(mapping, page, shadowp);
+   if (PageTransHuge(page)) {
+   struct radix_tree_iter iter;
+   void **slot;
+   void *p;
+
+   error = 0;
+
+   /* Wipe shadow entires */
+   radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, 
offset) {
+   if (iter.index >= offset + HPAGE_PMD_NR)
+   break;
+
+   p = radix_tree_deref_slot_protected(slot,
+   &mapping->tree_lock);
+   if (!p)
+   continue;
+
+   if (!radix_tree_exception(p)) {
+   error = -EEXIST;
+   break;
+   }
+
+   mapping->nrexceptional--;
+   rcu_assign_pointer(*slot, NULL);
+   }
+
+   if (!error)
+   error = __radix_tree_insert