Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On 03/22/2013 03:34 AM, Kirill A. Shutemov wrote: > Dave Hansen wrote: >> On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: >>> + error = radix_tree_insert(>page_tree, >>> + offset + i, page + i); >>> + if (error) { >>> + page_cache_release(page + i); >>> + break; >>> + } >>> } >> >> Throughout all this new code, I'd really challenge you to try as much as >> possible to minimize the code stuck under "if (PageTransHuge(page))". > > I put thp-related code under the 'if' intentionally to be able to optimize > it out if !CONFIG_TRANSPARENT_HUGEPAGE. The config option is disabled by > default. You've gotta give the compiler some credit! :) In this function's case, the compiler should be able to realize that nr=1 is constant if !TRANSPARENT_HUGEPAGE. It'll realize that all your for loops are: for (i = 0; i < 1; i++) { and will end up generating _very_ similar code to what you get with the explicit #ifdefs. You already _have_ #ifdefs, but they're just up in the headers around PageTransHuge()'s definition. The advantages for you are *huge* since it means that folks will have a harder time inadvertently breaking your CONFIG_TRANSPARENT_HUGEPAGE code. >> Does the cgroup code know how to handle these large pages internally >> somehow? It looks like the charge/uncharge is only being done for the >> head page. > > It can. We only need to remove PageCompound() check there. Patch is in > git. OK, cool. This _might_ deserve a comment or something here. Again, it looks asymmetric and fishy to someone just reading the code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Dave Hansen wrote: > On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: > > From: "Kirill A. Shutemov" > > > > For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head > > page for the specified index and HPAGE_CACHE_NR-1 tail pages for > > following indexes. > > > > Signed-off-by: Kirill A. Shutemov > > --- > > mm/filemap.c | 76 > > -- > > 1 file changed, 53 insertions(+), 23 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 2d99191..6bac9e2 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct > > address_space *mapping, > > pgoff_t offset, gfp_t gfp_mask) > > { > > int error; > > + int nr = 1; > > > > VM_BUG_ON(!PageLocked(page)); > > VM_BUG_ON(PageSwapBacked(page)); > > @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, > > struct address_space *mapping, > > error = mem_cgroup_cache_charge(page, current->mm, > > gfp_mask & GFP_RECLAIM_MASK); > > if (error) > > - goto out; > > + return error; > > > > - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); > > - if (error == 0) { > > - page_cache_get(page); > > - page->mapping = mapping; > > - page->index = offset; > > + if (PageTransHuge(page)) { > > + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR); > > + nr = HPAGE_CACHE_NR; > > + } > > That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess > it doesn't matter to some degree, but does putting it inside the if() > imply anything? It actually matters. HPAGE_CACHE_NR is BUILD_BUG() if !CONFIG_TRANSPARENT_HUGEPAGE, so we need to hide it inside 'if (PageTransHuge(page))'. PageTransHuge(page) is 0 in compile time if !CONFIG_TRANSPARENT_HUGEPAGE, so compiler can be smart and optimize out the check. > > + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM); > > + if (error) { > > + mem_cgroup_uncharge_cache_page(page); > > + return error; > > + } > > > > - spin_lock_irq(>tree_lock); > > - error = radix_tree_insert(>page_tree, offset, page); > > - if (likely(!error)) { > > - mapping->nrpages++; > > - __inc_zone_page_state(page, NR_FILE_PAGES); > > - spin_unlock_irq(>tree_lock); > > - trace_mm_filemap_add_to_page_cache(page); > > - } else { > > - page->mapping = NULL; > > - /* Leave page->index set: truncation relies upon it */ > > - spin_unlock_irq(>tree_lock); > > - mem_cgroup_uncharge_cache_page(page); > > - page_cache_release(page); > > I do really like how this rewrite de-indents this code. :) :) > > + page_cache_get(page); > > + spin_lock_irq(>tree_lock); > > + page->mapping = mapping; > > + page->index = offset; > > + error = radix_tree_insert(>page_tree, offset, page); > > + if (unlikely(error)) > > + goto err; > > + if (PageTransHuge(page)) { > > + int i; > > + for (i = 1; i < HPAGE_CACHE_NR; i++) { > > + page_cache_get(page + i); > > + page[i].index = offset + i; > > Is it OK to leave page->mapping unset for these? Good catch, thanks. Seems nobody really use it, since I haven't got any oops, but we need to set it anyway. > > + error = radix_tree_insert(>page_tree, > > + offset + i, page + i); > > + if (error) { > > + page_cache_release(page + i); > > + break; > > + } > > } > > Throughout all this new code, I'd really challenge you to try as much as > possible to minimize the code stuck under "if (PageTransHuge(page))". I put thp-related code under the 'if' intentionally to be able to optimize it out if !CONFIG_TRANSPARENT_HUGEPAGE. The config option is disabled by default. > For instance, could you change the for() loop a bit and have it shared > between both cases, like: > > > + for (i = 0; i < nr; i++) { > > + page_cache_get(page + i); > > + page[i].index = offset + i; > > + error = radix_tree_insert(>page_tree, > > + offset + i, page + i); > > + if (error) { > > + page_cache_release(page + i); > > + break; > > + } > > } > > > - radix_tree_preload_end(); > > - } else > > - mem_cgroup_uncharge_cache_page(page); > > -out: > > + if (error) { > > + error = ENOSPC; /* no space for a huge page */ > > + for (i--; i > 0; i--) { > > +
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Dave Hansen wrote: On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head page for the specified index and HPAGE_CACHE_NR-1 tail pages for following indexes. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- mm/filemap.c | 76 -- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 2d99191..6bac9e2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { int error; + int nr = 1; VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(PageSwapBacked(page)); @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, error = mem_cgroup_cache_charge(page, current-mm, gfp_mask GFP_RECLAIM_MASK); if (error) - goto out; + return error; - error = radix_tree_preload(gfp_mask ~__GFP_HIGHMEM); - if (error == 0) { - page_cache_get(page); - page-mapping = mapping; - page-index = offset; + if (PageTransHuge(page)) { + BUILD_BUG_ON(HPAGE_CACHE_NR RADIX_TREE_PRELOAD_NR); + nr = HPAGE_CACHE_NR; + } That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess it doesn't matter to some degree, but does putting it inside the if() imply anything? It actually matters. HPAGE_CACHE_NR is BUILD_BUG() if !CONFIG_TRANSPARENT_HUGEPAGE, so we need to hide it inside 'if (PageTransHuge(page))'. PageTransHuge(page) is 0 in compile time if !CONFIG_TRANSPARENT_HUGEPAGE, so compiler can be smart and optimize out the check. + error = radix_tree_preload_count(nr, gfp_mask ~__GFP_HIGHMEM); + if (error) { + mem_cgroup_uncharge_cache_page(page); + return error; + } - spin_lock_irq(mapping-tree_lock); - error = radix_tree_insert(mapping-page_tree, offset, page); - if (likely(!error)) { - mapping-nrpages++; - __inc_zone_page_state(page, NR_FILE_PAGES); - spin_unlock_irq(mapping-tree_lock); - trace_mm_filemap_add_to_page_cache(page); - } else { - page-mapping = NULL; - /* Leave page-index set: truncation relies upon it */ - spin_unlock_irq(mapping-tree_lock); - mem_cgroup_uncharge_cache_page(page); - page_cache_release(page); I do really like how this rewrite de-indents this code. :) :) + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { + page_cache_get(page + i); + page[i].index = offset + i; Is it OK to leave page-mapping unset for these? Good catch, thanks. Seems nobody really use it, since I haven't got any oops, but we need to set it anyway. + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } Throughout all this new code, I'd really challenge you to try as much as possible to minimize the code stuck under if (PageTransHuge(page)). I put thp-related code under the 'if' intentionally to be able to optimize it out if !CONFIG_TRANSPARENT_HUGEPAGE. The config option is disabled by default. For instance, could you change the for() loop a bit and have it shared between both cases, like: + for (i = 0; i nr; i++) { + page_cache_get(page + i); + page[i].index = offset + i; + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } - radix_tree_preload_end(); - } else - mem_cgroup_uncharge_cache_page(page); -out: + if (error) { + error = ENOSPC; /* no space for a huge page */ + for (i--; i 0; i--) { + radix_tree_delete(mapping-page_tree, + offset + i); +
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On 03/22/2013 03:34 AM, Kirill A. Shutemov wrote: Dave Hansen wrote: On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } Throughout all this new code, I'd really challenge you to try as much as possible to minimize the code stuck under if (PageTransHuge(page)). I put thp-related code under the 'if' intentionally to be able to optimize it out if !CONFIG_TRANSPARENT_HUGEPAGE. The config option is disabled by default. You've gotta give the compiler some credit! :) In this function's case, the compiler should be able to realize that nr=1 is constant if !TRANSPARENT_HUGEPAGE. It'll realize that all your for loops are: for (i = 0; i 1; i++) { and will end up generating _very_ similar code to what you get with the explicit #ifdefs. You already _have_ #ifdefs, but they're just up in the headers around PageTransHuge()'s definition. The advantages for you are *huge* since it means that folks will have a harder time inadvertently breaking your CONFIG_TRANSPARENT_HUGEPAGE code. Does the cgroup code know how to handle these large pages internally somehow? It looks like the charge/uncharge is only being done for the head page. It can. We only need to remove PageCompound() check there. Patch is in git. OK, cool. This _might_ deserve a comment or something here. Again, it looks asymmetric and fishy to someone just reading the code. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: > From: "Kirill A. Shutemov" > > For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head > page for the specified index and HPAGE_CACHE_NR-1 tail pages for > following indexes. > > Signed-off-by: Kirill A. Shutemov > --- > mm/filemap.c | 76 > -- > 1 file changed, 53 insertions(+), 23 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 2d99191..6bac9e2 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct > address_space *mapping, > pgoff_t offset, gfp_t gfp_mask) > { > int error; > + int nr = 1; > > VM_BUG_ON(!PageLocked(page)); > VM_BUG_ON(PageSwapBacked(page)); > @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct > address_space *mapping, > error = mem_cgroup_cache_charge(page, current->mm, > gfp_mask & GFP_RECLAIM_MASK); > if (error) > - goto out; > + return error; > > - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); > - if (error == 0) { > - page_cache_get(page); > - page->mapping = mapping; > - page->index = offset; > + if (PageTransHuge(page)) { > + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR); > + nr = HPAGE_CACHE_NR; > + } That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess it doesn't matter to some degree, but does putting it inside the if() imply anything? > + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM); > + if (error) { > + mem_cgroup_uncharge_cache_page(page); > + return error; > + } > > - spin_lock_irq(>tree_lock); > - error = radix_tree_insert(>page_tree, offset, page); > - if (likely(!error)) { > - mapping->nrpages++; > - __inc_zone_page_state(page, NR_FILE_PAGES); > - spin_unlock_irq(>tree_lock); > - trace_mm_filemap_add_to_page_cache(page); > - } else { > - page->mapping = NULL; > - /* Leave page->index set: truncation relies upon it */ > - spin_unlock_irq(>tree_lock); > - mem_cgroup_uncharge_cache_page(page); > - page_cache_release(page); I do really like how this rewrite de-indents this code. :) > + page_cache_get(page); > + spin_lock_irq(>tree_lock); > + page->mapping = mapping; > + page->index = offset; > + error = radix_tree_insert(>page_tree, offset, page); > + if (unlikely(error)) > + goto err; > + if (PageTransHuge(page)) { > + int i; > + for (i = 1; i < HPAGE_CACHE_NR; i++) { > + page_cache_get(page + i); > + page[i].index = offset + i; Is it OK to leave page->mapping unset for these? > + error = radix_tree_insert(>page_tree, > + offset + i, page + i); > + if (error) { > + page_cache_release(page + i); > + break; > + } > } Throughout all this new code, I'd really challenge you to try as much as possible to minimize the code stuck under "if (PageTransHuge(page))". For instance, could you change the for() loop a bit and have it shared between both cases, like: > + for (i = 0; i < nr; i++) { > + page_cache_get(page + i); > + page[i].index = offset + i; > + error = radix_tree_insert(>page_tree, > + offset + i, page + i); > + if (error) { > + page_cache_release(page + i); > + break; > + } > } > - radix_tree_preload_end(); > - } else > - mem_cgroup_uncharge_cache_page(page); > -out: > + if (error) { > + error = ENOSPC; /* no space for a huge page */ > + for (i--; i > 0; i--) { > + radix_tree_delete(>page_tree, > + offset + i); > + page_cache_release(page + i); > + } > + radix_tree_delete(>page_tree, offset); I wonder if this would look any nicer if you just did all the page_cache_get()s for the entire huge page along with the head page, and then released them all in one place. I think it might shrink the error handling paths here. > + goto err; > + } > + } > + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr); > + mapping->nrpages += nr; > +
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head page for the specified index and HPAGE_CACHE_NR-1 tail pages for following indexes. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- mm/filemap.c | 76 -- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 2d99191..6bac9e2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { int error; + int nr = 1; VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(PageSwapBacked(page)); @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, error = mem_cgroup_cache_charge(page, current-mm, gfp_mask GFP_RECLAIM_MASK); if (error) - goto out; + return error; - error = radix_tree_preload(gfp_mask ~__GFP_HIGHMEM); - if (error == 0) { - page_cache_get(page); - page-mapping = mapping; - page-index = offset; + if (PageTransHuge(page)) { + BUILD_BUG_ON(HPAGE_CACHE_NR RADIX_TREE_PRELOAD_NR); + nr = HPAGE_CACHE_NR; + } That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess it doesn't matter to some degree, but does putting it inside the if() imply anything? + error = radix_tree_preload_count(nr, gfp_mask ~__GFP_HIGHMEM); + if (error) { + mem_cgroup_uncharge_cache_page(page); + return error; + } - spin_lock_irq(mapping-tree_lock); - error = radix_tree_insert(mapping-page_tree, offset, page); - if (likely(!error)) { - mapping-nrpages++; - __inc_zone_page_state(page, NR_FILE_PAGES); - spin_unlock_irq(mapping-tree_lock); - trace_mm_filemap_add_to_page_cache(page); - } else { - page-mapping = NULL; - /* Leave page-index set: truncation relies upon it */ - spin_unlock_irq(mapping-tree_lock); - mem_cgroup_uncharge_cache_page(page); - page_cache_release(page); I do really like how this rewrite de-indents this code. :) + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { + page_cache_get(page + i); + page[i].index = offset + i; Is it OK to leave page-mapping unset for these? + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } Throughout all this new code, I'd really challenge you to try as much as possible to minimize the code stuck under if (PageTransHuge(page)). For instance, could you change the for() loop a bit and have it shared between both cases, like: + for (i = 0; i nr; i++) { + page_cache_get(page + i); + page[i].index = offset + i; + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } - radix_tree_preload_end(); - } else - mem_cgroup_uncharge_cache_page(page); -out: + if (error) { + error = ENOSPC; /* no space for a huge page */ + for (i--; i 0; i--) { + radix_tree_delete(mapping-page_tree, + offset + i); + page_cache_release(page + i); + } + radix_tree_delete(mapping-page_tree, offset); I wonder if this would look any nicer if you just did all the page_cache_get()s for the entire huge page along with the head page, and then released them all in one place. I think it might shrink the error handling paths here. + goto err; + } + } + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr); + mapping-nrpages += nr; +
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Hillf Danton wrote: > On Fri, Mar 15, 2013 at 9:50 PM, Kirill A. Shutemov > wrote: > > Hillf Danton wrote: > >> On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov > >> wrote: > >> > Hillf Danton wrote: > >> >> On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov > >> >> wrote: > >> >> > + page_cache_get(page); > >> >> > + spin_lock_irq(>tree_lock); > >> >> > + page->mapping = mapping; > >> >> > + page->index = offset; > >> >> > + error = radix_tree_insert(>page_tree, offset, page); > >> >> > + if (unlikely(error)) > >> >> > + goto err; > >> >> > + if (PageTransHuge(page)) { > >> >> > + int i; > >> >> > + for (i = 1; i < HPAGE_CACHE_NR; i++) { > >> >> struct page *tail = page + i; to easy reader > >> >> > >> >> > + page_cache_get(page + i); > >> >> s/page_cache_get/get_page_foll/ ? > >> > > >> > Why? > >> > > >> see follow_trans_huge_pmd() please. > > > > Sorry, I fail to see how follow_trans_huge_pmd() is relevant here. > > Could you elaborate? > > > Lets see the code > > page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; > //page is tail now > VM_BUG_ON(!PageCompound(page)); > if (flags & FOLL_GET) > get_page_foll(page); > //raise page count with the foll function > > thus I raised question. get_page_foll() is designed to be part of follow_page*() call chain. get_page() can handle compound pages properly. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On Fri, Mar 15, 2013 at 9:50 PM, Kirill A. Shutemov wrote: > Hillf Danton wrote: >> On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov >> wrote: >> > Hillf Danton wrote: >> >> On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov >> >> wrote: >> >> > + page_cache_get(page); >> >> > + spin_lock_irq(>tree_lock); >> >> > + page->mapping = mapping; >> >> > + page->index = offset; >> >> > + error = radix_tree_insert(>page_tree, offset, page); >> >> > + if (unlikely(error)) >> >> > + goto err; >> >> > + if (PageTransHuge(page)) { >> >> > + int i; >> >> > + for (i = 1; i < HPAGE_CACHE_NR; i++) { >> >> struct page *tail = page + i; to easy reader >> >> >> >> > + page_cache_get(page + i); >> >> s/page_cache_get/get_page_foll/ ? >> > >> > Why? >> > >> see follow_trans_huge_pmd() please. > > Sorry, I fail to see how follow_trans_huge_pmd() is relevant here. > Could you elaborate? > Lets see the code page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; //page is tail now VM_BUG_ON(!PageCompound(page)); if (flags & FOLL_GET) get_page_foll(page); //raise page count with the foll function thus I raised question. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Hillf Danton wrote: > On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov > wrote: > > Hillf Danton wrote: > >> On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov > >> wrote: > >> > + page_cache_get(page); > >> > + spin_lock_irq(>tree_lock); > >> > + page->mapping = mapping; > >> > + page->index = offset; > >> > + error = radix_tree_insert(>page_tree, offset, page); > >> > + if (unlikely(error)) > >> > + goto err; > >> > + if (PageTransHuge(page)) { > >> > + int i; > >> > + for (i = 1; i < HPAGE_CACHE_NR; i++) { > >> struct page *tail = page + i; to easy reader > >> > >> > + page_cache_get(page + i); > >> s/page_cache_get/get_page_foll/ ? > > > > Why? > > > see follow_trans_huge_pmd() please. Sorry, I fail to see how follow_trans_huge_pmd() is relevant here. Could you elaborate? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov wrote: > Hillf Danton wrote: >> On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov >> wrote: >> > + page_cache_get(page); >> > + spin_lock_irq(>tree_lock); >> > + page->mapping = mapping; >> > + page->index = offset; >> > + error = radix_tree_insert(>page_tree, offset, page); >> > + if (unlikely(error)) >> > + goto err; >> > + if (PageTransHuge(page)) { >> > + int i; >> > + for (i = 1; i < HPAGE_CACHE_NR; i++) { >> struct page *tail = page + i; to easy reader >> >> > + page_cache_get(page + i); >> s/page_cache_get/get_page_foll/ ? > > Why? > see follow_trans_huge_pmd() please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Hillf Danton wrote: > On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov > wrote: > > + page_cache_get(page); > > + spin_lock_irq(>tree_lock); > > + page->mapping = mapping; > > + page->index = offset; > > + error = radix_tree_insert(>page_tree, offset, page); > > + if (unlikely(error)) > > + goto err; > > + if (PageTransHuge(page)) { > > + int i; > > + for (i = 1; i < HPAGE_CACHE_NR; i++) { > struct page *tail = page + i; to easy reader > > > + page_cache_get(page + i); > s/page_cache_get/get_page_foll/ ? Why? > > + page[i].index = offset + i; > > + error = radix_tree_insert(>page_tree, > > + offset + i, page + i); > > + if (error) { > > + page_cache_release(page + i); > > + break; > > + } > > } > > - radix_tree_preload_end(); > > - } else > > - mem_cgroup_uncharge_cache_page(page); > > -out: > > + if (error) { > > + error = ENOSPC; /* no space for a huge page */ > s/E/-E/ Good catch! Thanks. > > + for (i--; i > 0; i--) { > > + radix_tree_delete(>page_tree, > > + offset + i); > > + page_cache_release(page + i); > > + } > > + radix_tree_delete(>page_tree, offset); > > + goto err; > > + } > > + } -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Hillf Danton wrote: On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? Why? + page[i].index = offset + i; + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } - radix_tree_preload_end(); - } else - mem_cgroup_uncharge_cache_page(page); -out: + if (error) { + error = ENOSPC; /* no space for a huge page */ s/E/-E/ Good catch! Thanks. + for (i--; i 0; i--) { + radix_tree_delete(mapping-page_tree, + offset + i); + page_cache_release(page + i); + } + radix_tree_delete(mapping-page_tree, offset); + goto err; + } + } -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: Hillf Danton wrote: On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? Why? see follow_trans_huge_pmd() please. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Hillf Danton wrote: On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: Hillf Danton wrote: On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? Why? see follow_trans_huge_pmd() please. Sorry, I fail to see how follow_trans_huge_pmd() is relevant here. Could you elaborate? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On Fri, Mar 15, 2013 at 9:50 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: Hillf Danton wrote: On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: Hillf Danton wrote: On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? Why? see follow_trans_huge_pmd() please. Sorry, I fail to see how follow_trans_huge_pmd() is relevant here. Could you elaborate? Lets see the code page += (addr ~HPAGE_PMD_MASK) PAGE_SHIFT; //page is tail now VM_BUG_ON(!PageCompound(page)); if (flags FOLL_GET) get_page_foll(page); //raise page count with the foll function thus I raised question. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
Hillf Danton wrote: On Fri, Mar 15, 2013 at 9:50 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: Hillf Danton wrote: On Fri, Mar 15, 2013 at 9:23 PM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: Hillf Danton wrote: On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? Why? see follow_trans_huge_pmd() please. Sorry, I fail to see how follow_trans_huge_pmd() is relevant here. Could you elaborate? Lets see the code page += (addr ~HPAGE_PMD_MASK) PAGE_SHIFT; //page is tail now VM_BUG_ON(!PageCompound(page)); if (flags FOLL_GET) get_page_foll(page); //raise page count with the foll function thus I raised question. get_page_foll() is designed to be part of follow_page*() call chain. get_page() can handle compound pages properly. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov wrote: > + page_cache_get(page); > + spin_lock_irq(>tree_lock); > + page->mapping = mapping; > + page->index = offset; > + error = radix_tree_insert(>page_tree, offset, page); > + if (unlikely(error)) > + goto err; > + if (PageTransHuge(page)) { > + int i; > + for (i = 1; i < HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader > + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? > + page[i].index = offset + i; > + error = radix_tree_insert(>page_tree, > + offset + i, page + i); > + if (error) { > + page_cache_release(page + i); > + break; > + } > } > - radix_tree_preload_end(); > - } else > - mem_cgroup_uncharge_cache_page(page); > -out: > + if (error) { > + error = ENOSPC; /* no space for a huge page */ s/E/-E/ > + for (i--; i > 0; i--) { > + radix_tree_delete(>page_tree, > + offset + i); > + page_cache_release(page + i); > + } > + radix_tree_delete(>page_tree, offset); > + goto err; > + } > + } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
From: "Kirill A. Shutemov" For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head page for the specified index and HPAGE_CACHE_NR-1 tail pages for following indexes. Signed-off-by: Kirill A. Shutemov --- mm/filemap.c | 76 -- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 2d99191..6bac9e2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { int error; + int nr = 1; VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(PageSwapBacked(page)); @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, error = mem_cgroup_cache_charge(page, current->mm, gfp_mask & GFP_RECLAIM_MASK); if (error) - goto out; + return error; - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); - if (error == 0) { - page_cache_get(page); - page->mapping = mapping; - page->index = offset; + if (PageTransHuge(page)) { + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR); + nr = HPAGE_CACHE_NR; + } + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM); + if (error) { + mem_cgroup_uncharge_cache_page(page); + return error; + } - spin_lock_irq(>tree_lock); - error = radix_tree_insert(>page_tree, offset, page); - if (likely(!error)) { - mapping->nrpages++; - __inc_zone_page_state(page, NR_FILE_PAGES); - spin_unlock_irq(>tree_lock); - trace_mm_filemap_add_to_page_cache(page); - } else { - page->mapping = NULL; - /* Leave page->index set: truncation relies upon it */ - spin_unlock_irq(>tree_lock); - mem_cgroup_uncharge_cache_page(page); - page_cache_release(page); + page_cache_get(page); + spin_lock_irq(>tree_lock); + page->mapping = mapping; + page->index = offset; + error = radix_tree_insert(>page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i < HPAGE_CACHE_NR; i++) { + page_cache_get(page + i); + page[i].index = offset + i; + error = radix_tree_insert(>page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } - radix_tree_preload_end(); - } else - mem_cgroup_uncharge_cache_page(page); -out: + if (error) { + error = ENOSPC; /* no space for a huge page */ + for (i--; i > 0; i--) { + radix_tree_delete(>page_tree, + offset + i); + page_cache_release(page + i); + } + radix_tree_delete(>page_tree, offset); + goto err; + } + } + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr); + mapping->nrpages += nr; + spin_unlock_irq(>tree_lock); + trace_mm_filemap_add_to_page_cache(page); + radix_tree_preload_end(); + return 0; +err: + page->mapping = NULL; + /* Leave page->index set: truncation relies upon it */ + spin_unlock_irq(>tree_lock); + radix_tree_preload_end(); + mem_cgroup_uncharge_cache_page(page); + page_cache_release(page); return error; } EXPORT_SYMBOL(add_to_page_cache_locked); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
From: Kirill A. Shutemov kirill.shute...@linux.intel.com For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head page for the specified index and HPAGE_CACHE_NR-1 tail pages for following indexes. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- mm/filemap.c | 76 -- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 2d99191..6bac9e2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { int error; + int nr = 1; VM_BUG_ON(!PageLocked(page)); VM_BUG_ON(PageSwapBacked(page)); @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, error = mem_cgroup_cache_charge(page, current-mm, gfp_mask GFP_RECLAIM_MASK); if (error) - goto out; + return error; - error = radix_tree_preload(gfp_mask ~__GFP_HIGHMEM); - if (error == 0) { - page_cache_get(page); - page-mapping = mapping; - page-index = offset; + if (PageTransHuge(page)) { + BUILD_BUG_ON(HPAGE_CACHE_NR RADIX_TREE_PRELOAD_NR); + nr = HPAGE_CACHE_NR; + } + error = radix_tree_preload_count(nr, gfp_mask ~__GFP_HIGHMEM); + if (error) { + mem_cgroup_uncharge_cache_page(page); + return error; + } - spin_lock_irq(mapping-tree_lock); - error = radix_tree_insert(mapping-page_tree, offset, page); - if (likely(!error)) { - mapping-nrpages++; - __inc_zone_page_state(page, NR_FILE_PAGES); - spin_unlock_irq(mapping-tree_lock); - trace_mm_filemap_add_to_page_cache(page); - } else { - page-mapping = NULL; - /* Leave page-index set: truncation relies upon it */ - spin_unlock_irq(mapping-tree_lock); - mem_cgroup_uncharge_cache_page(page); - page_cache_release(page); + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { + page_cache_get(page + i); + page[i].index = offset + i; + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } - radix_tree_preload_end(); - } else - mem_cgroup_uncharge_cache_page(page); -out: + if (error) { + error = ENOSPC; /* no space for a huge page */ + for (i--; i 0; i--) { + radix_tree_delete(mapping-page_tree, + offset + i); + page_cache_release(page + i); + } + radix_tree_delete(mapping-page_tree, offset); + goto err; + } + } + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr); + mapping-nrpages += nr; + spin_unlock_irq(mapping-tree_lock); + trace_mm_filemap_add_to_page_cache(page); + radix_tree_preload_end(); + return 0; +err: + page-mapping = NULL; + /* Leave page-index set: truncation relies upon it */ + spin_unlock_irq(mapping-tree_lock); + radix_tree_preload_end(); + mem_cgroup_uncharge_cache_page(page); + page_cache_release(page); return error; } EXPORT_SYMBOL(add_to_page_cache_locked); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked() to support huge pages
On Fri, Mar 15, 2013 at 1:50 AM, Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: + page_cache_get(page); + spin_lock_irq(mapping-tree_lock); + page-mapping = mapping; + page-index = offset; + error = radix_tree_insert(mapping-page_tree, offset, page); + if (unlikely(error)) + goto err; + if (PageTransHuge(page)) { + int i; + for (i = 1; i HPAGE_CACHE_NR; i++) { struct page *tail = page + i; to easy reader + page_cache_get(page + i); s/page_cache_get/get_page_foll/ ? + page[i].index = offset + i; + error = radix_tree_insert(mapping-page_tree, + offset + i, page + i); + if (error) { + page_cache_release(page + i); + break; + } } - radix_tree_preload_end(); - } else - mem_cgroup_uncharge_cache_page(page); -out: + if (error) { + error = ENOSPC; /* no space for a huge page */ s/E/-E/ + for (i--; i 0; i--) { + radix_tree_delete(mapping-page_tree, + offset + i); + page_cache_release(page + i); + } + radix_tree_delete(mapping-page_tree, offset); + goto err; + } + } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/