Re: [RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-08-08 Thread Laurent Dufour
On 08/08/2017 11:45, Anshuman Khandual wrote:
> On 06/16/2017 11:22 PM, Laurent Dufour wrote:
>> From: Peter Zijlstra 
>>
>> One of the side effects of speculating on faults (without holding
>> mmap_sem) is that we can race with free_pgtables() and therefore we
>> cannot assume the page-tables will stick around.
>>
>> Remove the relyance on the pte pointer.
> 
> Looking into other parts of the series, it seemed like now we have
> sequence lock both at MM and VMA level but then after that we still
> need to take page table lock before handling page faults (in turn
> manipulating PTE which includes swap in paths as well). Is not that
> true ?

Page table locking is still required as several VMAs can reference the same
page table.



Re: [RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-08-08 Thread Anshuman Khandual
On 07/10/2017 11:18 PM, Laurent Dufour wrote:
> On 07/07/2017 09:07, Balbir Singh wrote:
>> On Fri, 2017-06-16 at 19:52 +0200, Laurent Dufour wrote:
>>> From: Peter Zijlstra 
>>>
>>> One of the side effects of speculating on faults (without holding
>>> mmap_sem) is that we can race with free_pgtables() and therefore we
>>> cannot assume the page-tables will stick around.
>>>
>>> Remove the relyance on the pte pointer.
>>  ^^ reliance
>>
>> Looking at the changelog and the code the impact is not clear.
>> It looks like after this patch we always assume the pte is not
>> the same. What is the impact of this patch?
> 
> Hi Balbir,
> 
> In most of the case pte_unmap_same() was returning 1, which meaning that
> do_swap_page() should do its processing.
> 
> So in most of the case there will be no impact.
> 
> Now regarding the case where pte_unmap_safe() was returning 0, and thus
> do_swap_page return 0 too, this happens when the page has already been
> swapped back. This may happen before do_swap_page() get called or while in
> the call to do_swap_page(). In that later case, the check done when
> swapin_readahead() returns will detect that case.
> 
> The worst case would be that a page fault is occuring on 2 threads at the
> same time on the same swapped out page. In that case one thread will take
> much time looping in __read_swap_cache_async(). But in the regular page
> fault path, this is even worse since the thread would wait for semaphore to
> be released before starting anything.

Can we move the detection of swap in of the same struct page back into
the page table bit earlier, ideally where pte_unmap_same() present to
speed up detection for the bail out case ?



Re: [RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-08-08 Thread Anshuman Khandual
On 06/16/2017 11:22 PM, Laurent Dufour wrote:
> From: Peter Zijlstra 
> 
> One of the side effects of speculating on faults (without holding
> mmap_sem) is that we can race with free_pgtables() and therefore we
> cannot assume the page-tables will stick around.
> 
> Remove the relyance on the pte pointer.

Looking into other parts of the series, it seemed like now we have
sequence lock both at MM and VMA level but then after that we still
need to take page table lock before handling page faults (in turn
manipulating PTE which includes swap in paths as well). Is not that
true ?



Re: [RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-07-10 Thread Balbir Singh
On Mon, 10 Jul 2017 19:48:43 +0200
Laurent Dufour  wrote:

> On 07/07/2017 09:07, Balbir Singh wrote:
> > On Fri, 2017-06-16 at 19:52 +0200, Laurent Dufour wrote:  
> >> From: Peter Zijlstra 
> >>
> >> One of the side effects of speculating on faults (without holding
> >> mmap_sem) is that we can race with free_pgtables() and therefore we
> >> cannot assume the page-tables will stick around.
> >>
> >> Remove the relyance on the pte pointer.  
> >  ^^ reliance
> > 
> > Looking at the changelog and the code the impact is not clear.
> > It looks like after this patch we always assume the pte is not
> > the same. What is the impact of this patch?  
> 
> Hi Balbir,
> 
> In most of the case pte_unmap_same() was returning 1, which meaning that
> do_swap_page() should do its processing.
> 
> So in most of the case there will be no impact.
> 
> Now regarding the case where pte_unmap_safe() was returning 0, and thus
> do_swap_page return 0 too, this happens when the page has already been
> swapped back. This may happen before do_swap_page() get called or while in
> the call to do_swap_page(). In that later case, the check done when
> swapin_readahead() returns will detect that case.
> 
> The worst case would be that a page fault is occuring on 2 threads at the
> same time on the same swapped out page. In that case one thread will take
> much time looping in __read_swap_cache_async(). But in the regular page
> fault path, this is even worse since the thread would wait for semaphore to
> be released before starting anything.
> 
>

Sounds good!

Thanks,
Balbir Singh 



Re: [RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-07-10 Thread Laurent Dufour
On 07/07/2017 09:07, Balbir Singh wrote:
> On Fri, 2017-06-16 at 19:52 +0200, Laurent Dufour wrote:
>> From: Peter Zijlstra 
>>
>> One of the side effects of speculating on faults (without holding
>> mmap_sem) is that we can race with free_pgtables() and therefore we
>> cannot assume the page-tables will stick around.
>>
>> Remove the relyance on the pte pointer.
>  ^^ reliance
> 
> Looking at the changelog and the code the impact is not clear.
> It looks like after this patch we always assume the pte is not
> the same. What is the impact of this patch?

Hi Balbir,

In most of the case pte_unmap_same() was returning 1, which meaning that
do_swap_page() should do its processing.

So in most of the case there will be no impact.

Now regarding the case where pte_unmap_safe() was returning 0, and thus
do_swap_page return 0 too, this happens when the page has already been
swapped back. This may happen before do_swap_page() get called or while in
the call to do_swap_page(). In that later case, the check done when
swapin_readahead() returns will detect that case.

The worst case would be that a page fault is occuring on 2 threads at the
same time on the same swapped out page. In that case one thread will take
much time looping in __read_swap_cache_async(). But in the regular page
fault path, this is even worse since the thread would wait for semaphore to
be released before starting anything.

Cheers,
Laurent.



Re: [RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-07-07 Thread Balbir Singh
On Fri, 2017-06-16 at 19:52 +0200, Laurent Dufour wrote:
> From: Peter Zijlstra 
> 
> One of the side effects of speculating on faults (without holding
> mmap_sem) is that we can race with free_pgtables() and therefore we
> cannot assume the page-tables will stick around.
> 
> Remove the relyance on the pte pointer.
 ^^ reliance

Looking at the changelog and the code the impact is not clear.
It looks like after this patch we always assume the pte is not
the same. What is the impact of this patch?

Balbir Singh.



[RFC v5 01/11] mm: Dont assume page-table invariance during faults

2017-06-16 Thread Laurent Dufour
From: Peter Zijlstra 

One of the side effects of speculating on faults (without holding
mmap_sem) is that we can race with free_pgtables() and therefore we
cannot assume the page-tables will stick around.

Remove the relyance on the pte pointer.

Signed-off-by: Peter Zijlstra (Intel) 
---
 mm/memory.c | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 2e65df1831d9..fd952f05e016 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2102,30 +2102,6 @@ int apply_to_page_range(struct mm_struct *mm, unsigned 
long addr,
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
 
-/*
- * handle_pte_fault chooses page fault handler according to an entry which was
- * read non-atomically.  Before making any commitment, on those architectures
- * or configurations (e.g. i386 with PAE) which might give a mix of unmatched
- * parts, do_swap_page must check under lock before unmapping the pte and
- * proceeding (but do_wp_page is only called after already making such a check;
- * and do_anonymous_page can safely check later on).
- */
-static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
-   pte_t *page_table, pte_t orig_pte)
-{
-   int same = 1;
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
-   if (sizeof(pte_t) > sizeof(unsigned long)) {
-   spinlock_t *ptl = pte_lockptr(mm, pmd);
-   spin_lock(ptl);
-   same = pte_same(*page_table, orig_pte);
-   spin_unlock(ptl);
-   }
-#endif
-   pte_unmap(page_table);
-   return same;
-}
-
 static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
 {
debug_dma_assert_idle(src);
@@ -2682,9 +2658,6 @@ int do_swap_page(struct vm_fault *vmf)
int exclusive = 0;
int ret = 0;
 
-   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
-   goto out;
-
entry = pte_to_swp_entry(vmf->orig_pte);
if (unlikely(non_swap_entry(entry))) {
if (is_migration_entry(entry)) {
-- 
2.7.4