Re: [PATCH 3/5] vrange: Add page purging logic & SIGBUS trap

2014-04-10 Thread John Stultz
Hey Kosaki-san,
  Just a few follow ups on your comments here in preparation for v13.

On 03/23/2014 04:44 PM, KOSAKI Motohiro wrote:
> On Fri, Mar 21, 2014 at 2:17 PM, John Stultz  wrote:
> @@ -683,6 +684,7 @@ enum page_references {
> PAGEREF_RECLAIM,
> PAGEREF_RECLAIM_CLEAN,
> PAGEREF_KEEP,
> +   PAGEREF_DISCARD,
> "discard" is alread used in various place for another meanings.
> another name is better.

Any suggestions here? Is PAGEREF_PURGE better?


>
>> PAGEREF_ACTIVATE,
>>  };
>>
>> @@ -703,6 +705,13 @@ static enum page_references 
>> page_check_references(struct page *page,
>> if (vm_flags & VM_LOCKED)
>> return PAGEREF_RECLAIM;
>>
>> +   /*
>> +* If volatile page is reached on LRU's tail, we discard the
>> +* page without considering recycle the page.
>> +*/
>> +   if (vm_flags & VM_VOLATILE)
>> +   return PAGEREF_DISCARD;
>> +
>> if (referenced_ptes) {
>> if (PageSwapBacked(page))
>> return PAGEREF_ACTIVATE;
>> @@ -930,6 +939,9 @@ static unsigned long shrink_page_list(struct list_head 
>> *page_list,
>> switch (references) {
>> case PAGEREF_ACTIVATE:
>> goto activate_locked;
>> +   case PAGEREF_DISCARD:
>> +   if (may_enter_fs && !discard_vpage(page))
> Wny may-enter-fs is needed? discard_vpage never enter FS.

I think this is a hold over from the file based/shared volatility.
Thanks for pointing it out, I've dropped the may_enter_fs check.


>> +   /*
>> +* During interating the loop, some processes could see a page as
>> +* purged while others could see a page as not-purged because we have
>> +* no global lock between parent and child for protecting vrange 
>> system
>> +* call during this loop. But it's not a problem because the page is
>> +* not *SHARED* page but *COW* page so parent and child can see other
>> +* data anytime. The worst case by this race is a page was purged
>> +* but couldn't be discarded so it makes unnecessary page fault but
>> +* it wouldn't be severe.
>> +*/
>> +   anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, 
>> pgoff) {
>> +   struct vm_area_struct *vma = avc->vma;
>> +
>> +   if (!(vma->vm_flags & VM_VOLATILE))
>> +   continue;
> When you find !VM_VOLATILE vma, we have no reason to continue pte zapping.
> Isn't it?

Sounds reasonable. I'll switch to breaking out here and returning an
error if Minchan doesn't object.


>
>> +   try_to_discard_one(page, vma);
>> +   }
>> +   page_unlock_anon_vma_read(anon_vma);
>> +   return 0;
>> +}
>> +
>> +
>> +/**
>> + * discard_vpage - If possible, discard the specified volatile page
>> + *
>> + * Attempts to discard a volatile page, and if needed frees the swap page
>> + *
>> + * Returns 0 on success, -1 on error.
>> + */
>> +int discard_vpage(struct page *page)
>> +{
>> +   VM_BUG_ON(!PageLocked(page));
>> +   VM_BUG_ON(PageLRU(page));
>> +
>> +   /* XXX - for now we only support anonymous volatile pages */
>> +   if (!PageAnon(page))
>> +   return -1;
>> +
>> +   if (!try_to_discard_vpage(page)) {
>> +   if (PageSwapCache(page))
>> +   try_to_free_swap(page);
> This looks strange. try_to_free_swap can't handle vpurge pseudo entry.

So I may be missing some of the subtleties of the swap code, but the
vpurge pseudo swp entry is on the pte, where as here we're just trying
to make sure that before we drop the page we disconnect any swap-backing
the page may have (if it were swapped out previously before being marked
volatile). Let me know if I'm just not understanding the code or your point.


>> +
>> +   if (page_freeze_refs(page, 1)) {
> Where is page_unfreeze_refs() for the pair of this?

Since we're about to free the page I don't think we need a unfreeze_refs
pair? Or am I just misunderstanding the rules here?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] vrange: Add page purging logic & SIGBUS trap

2014-03-23 Thread KOSAKI Motohiro
On Fri, Mar 21, 2014 at 2:17 PM, John Stultz  wrote:
> This patch adds the hooks in the vmscan logic to discard volatile pages
> and mark their pte as purged. With this, volatile pages will be purged
> under pressure, and their ptes swap entry's marked. If the purged pages
> are accessed before being marked non-volatile, we catch this and send a
> SIGBUS.
>
> This is a simplified implementation that uses logic from Minchan's earlier
> efforts, so credit to Minchan for his work.
>
> Cc: Andrew Morton 
> Cc: Android Kernel Team 
> Cc: Johannes Weiner 
> Cc: Robert Love 
> Cc: Mel Gorman 
> Cc: Hugh Dickins 
> Cc: Dave Hansen 
> Cc: Rik van Riel 
> Cc: Dmitry Adamushko 
> Cc: Neil Brown 
> Cc: Andrea Arcangeli 
> Cc: Mike Hommey 
> Cc: Taras Glek 
> Cc: Jan Kara 
> Cc: KOSAKI Motohiro 
> Cc: Michel Lespinasse 
> Cc: Minchan Kim 
> Cc: linux...@kvack.org 
> Signed-off-by: John Stultz 
> ---
>  include/linux/vrange.h |   2 +
>  mm/internal.h  |   2 -
>  mm/memory.c|  21 +
>  mm/rmap.c  |   5 +++
>  mm/vmscan.c|  12 ++
>  mm/vrange.c| 114 
> +
>  6 files changed, 154 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/vrange.h b/include/linux/vrange.h
> index 986fa85..d93ad21 100644
> --- a/include/linux/vrange.h
> +++ b/include/linux/vrange.h
> @@ -8,4 +8,6 @@
>  #define VRANGE_VOLATILE 1
>  #define VRANGE_VALID_FLAGS (0) /* Don't yet support any flags */
>
> +extern int discard_vpage(struct page *page);
> +
>  #endif /* _LINUX_VRANGE_H */
> diff --git a/mm/internal.h b/mm/internal.h
> index 29e1e76..ea66bf9 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -225,10 +225,8 @@ static inline void mlock_migrate_page(struct page 
> *newpage, struct page *page)
>
>  extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern unsigned long vma_address(struct page *page,
>  struct vm_area_struct *vma);
> -#endif
>  #else /* !CONFIG_MMU */
>  static inline int mlocked_vma_newpage(struct vm_area_struct *v, struct page 
> *p)
>  {
> diff --git a/mm/memory.c b/mm/memory.c
> index 22dfa61..db5f4da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -60,6 +60,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -3643,6 +3644,8 @@ static int handle_pte_fault(struct mm_struct *mm,
>
> entry = *pte;
> if (!pte_present(entry)) {
> +   swp_entry_t vrange_entry;
> +retry:
> if (pte_none(entry)) {
> if (vma->vm_ops) {
> if (likely(vma->vm_ops->fault))
> @@ -3652,6 +3655,24 @@ static int handle_pte_fault(struct mm_struct *mm,
> return do_anonymous_page(mm, vma, address,
>  pte, pmd, flags);
> }
> +
> +   vrange_entry = pte_to_swp_entry(entry);
> +   if (unlikely(is_vpurged_entry(vrange_entry))) {
> +   if (vma->vm_flags & VM_VOLATILE)
> +   return VM_FAULT_SIGBUS;
> +
> +   /* zap pte */
> +   ptl = pte_lockptr(mm, pmd);
> +   spin_lock(ptl);
> +   if (unlikely(!pte_same(*pte, entry)))
> +   goto unlock;
> +   flush_cache_page(vma, address, pte_pfn(*pte));
> +   ptep_clear_flush(vma, address, pte);
> +   pte_unmap_unlock(pte, ptl);
> +   goto retry;

This looks strange why we need zap pte here?

> +   }
> +
> +
> if (pte_file(entry))
> return do_nonlinear_fault(mm, vma, address,
> pte, pmd, flags, entry);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index d9d4231..2b6f079 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -728,6 +728,11 @@ int page_referenced_one(struct page *page, struct 
> vm_area_struct *vma,
> referenced++;
> }
> pte_unmap_unlock(pte, ptl);
> +   if (vma->vm_flags & VM_VOLATILE) {
> +   pra->mapcount = 0;
> +   pra->vm_flags |= VM_VOLATILE;
> +   return SWAP_FAIL;
> +   }
> }
>
> if (referenced) {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a9c74b4..34f159a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -43,6 +43,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -683,6 +684,7 @@ enum page_references {
> PAGEREF_RECLAIM,
> PAGEREF_RECLAIM_CLEAN,
> PAGEREF_KEEP,
> +   PAGEREF_DISCARD,

"discard" is alread used in various place for another meanings.
another name is better.

> PAGEREF_ACTIVATE,

[PATCH 3/5] vrange: Add page purging logic & SIGBUS trap

2014-03-21 Thread John Stultz
This patch adds the hooks in the vmscan logic to discard volatile pages
and mark their pte as purged. With this, volatile pages will be purged
under pressure, and their ptes swap entry's marked. If the purged pages
are accessed before being marked non-volatile, we catch this and send a
SIGBUS.

This is a simplified implementation that uses logic from Minchan's earlier
efforts, so credit to Minchan for his work.

Cc: Andrew Morton 
Cc: Android Kernel Team 
Cc: Johannes Weiner 
Cc: Robert Love 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Rik van Riel 
Cc: Dmitry Adamushko 
Cc: Neil Brown 
Cc: Andrea Arcangeli 
Cc: Mike Hommey 
Cc: Taras Glek 
Cc: Jan Kara 
Cc: KOSAKI Motohiro 
Cc: Michel Lespinasse 
Cc: Minchan Kim 
Cc: linux...@kvack.org 
Signed-off-by: John Stultz 
---
 include/linux/vrange.h |   2 +
 mm/internal.h  |   2 -
 mm/memory.c|  21 +
 mm/rmap.c  |   5 +++
 mm/vmscan.c|  12 ++
 mm/vrange.c| 114 +
 6 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/include/linux/vrange.h b/include/linux/vrange.h
index 986fa85..d93ad21 100644
--- a/include/linux/vrange.h
+++ b/include/linux/vrange.h
@@ -8,4 +8,6 @@
 #define VRANGE_VOLATILE 1
 #define VRANGE_VALID_FLAGS (0) /* Don't yet support any flags */
 
+extern int discard_vpage(struct page *page);
+
 #endif /* _LINUX_VRANGE_H */
diff --git a/mm/internal.h b/mm/internal.h
index 29e1e76..ea66bf9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -225,10 +225,8 @@ static inline void mlock_migrate_page(struct page 
*newpage, struct page *page)
 
 extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern unsigned long vma_address(struct page *page,
 struct vm_area_struct *vma);
-#endif
 #else /* !CONFIG_MMU */
 static inline int mlocked_vma_newpage(struct vm_area_struct *v, struct page *p)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 22dfa61..db5f4da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -3643,6 +3644,8 @@ static int handle_pte_fault(struct mm_struct *mm,
 
entry = *pte;
if (!pte_present(entry)) {
+   swp_entry_t vrange_entry;
+retry:
if (pte_none(entry)) {
if (vma->vm_ops) {
if (likely(vma->vm_ops->fault))
@@ -3652,6 +3655,24 @@ static int handle_pte_fault(struct mm_struct *mm,
return do_anonymous_page(mm, vma, address,
 pte, pmd, flags);
}
+
+   vrange_entry = pte_to_swp_entry(entry);
+   if (unlikely(is_vpurged_entry(vrange_entry))) {
+   if (vma->vm_flags & VM_VOLATILE)
+   return VM_FAULT_SIGBUS;
+
+   /* zap pte */
+   ptl = pte_lockptr(mm, pmd);
+   spin_lock(ptl);
+   if (unlikely(!pte_same(*pte, entry)))
+   goto unlock;
+   flush_cache_page(vma, address, pte_pfn(*pte));
+   ptep_clear_flush(vma, address, pte);
+   pte_unmap_unlock(pte, ptl);
+   goto retry;
+   }
+
+
if (pte_file(entry))
return do_nonlinear_fault(mm, vma, address,
pte, pmd, flags, entry);
diff --git a/mm/rmap.c b/mm/rmap.c
index d9d4231..2b6f079 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -728,6 +728,11 @@ int page_referenced_one(struct page *page, struct 
vm_area_struct *vma,
referenced++;
}
pte_unmap_unlock(pte, ptl);
+   if (vma->vm_flags & VM_VOLATILE) {
+   pra->mapcount = 0;
+   pra->vm_flags |= VM_VOLATILE;
+   return SWAP_FAIL;
+   }
}
 
if (referenced) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a9c74b4..34f159a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -683,6 +684,7 @@ enum page_references {
PAGEREF_RECLAIM,
PAGEREF_RECLAIM_CLEAN,
PAGEREF_KEEP,
+   PAGEREF_DISCARD,
PAGEREF_ACTIVATE,
 };
 
@@ -703,6 +705,13 @@ static enum page_references page_check_references(struct 
page *page,
if (vm_flags & VM_LOCKED)
return PAGEREF_RECLAIM;
 
+   /*
+* If volatile page is reached on LRU's tail, we discard the
+* page without considering recycle the page.
+*/
+   if (vm_flags & VM_VOLATILE)
+   return PAGEREF_DISCARD;
+
if (referenced_ptes) {
if (PageSwapBacked(page)