Re: [RFC 1/2] page-flags: Make page lock operation atomic

2019-02-12 Thread Peter Zijlstra
On Tue, Feb 12, 2019 at 08:45:35AM +0100, Jan Kara wrote:
> On Mon 11-02-19 09:56:53, Matthew Wilcox wrote:
> > On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote:
> > > On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > > > > 
> > > > >> Signed-off-by: Chintan Pandya 
> > > > > 
> > > > > NAK.
> > > > > 
> > > > > This is bound to regress some stuff. Now agreed that using non-atomic
> > > > > ops is tricky, but many are in places where we 'know' there can't be
> > > > > concurrency.
> > > > > 
> > > > > If you can show any single one is wrong, we can fix that one, but 
> > > > > we're
> > > > > not going to blanket remove all this just because.
> > > > 
> > > > Not quite familiar with below stack but from crash dump, found that this
> > > > was another stack running on some other CPU at the same time which also
> > > > updates page cache lru and manipulate locks.
> > > > 
> > > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> > > > [84415.344588] [20190123_21:27:50.786276]@1 
> > > > workingset_refault+0xdc/0x268
> > > > [84415.344600] [20190123_21:27:50.786288]@1 
> > > > add_to_page_cache_lru+0x84/0x11c
> > > > [84415.344612] [20190123_21:27:50.786301]@1 
> > > > ext4_mpage_readpages+0x178/0x714
> > > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> > > > [84415.344636] [20190123_21:27:50.786324]@1 
> > > > __do_page_cache_readahead+0x16c/0x280
> > > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> > > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> > > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> > > > 
> > > > Not entirely sure if it's racing with the crashing stack or it's simply
> > > > overrides the the bit set by case 2 (mentioned in 0/2).
> > > 
> > > So this is interesting. Looking at __add_to_page_cache_locked() nothing
> > > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
> > > reordered into __add_to_page_cache_locked() after page is actually added 
> > > to
> > > the xarray. So that one particular instance might benefit from atomic
> > > SetPageLocked or a barrier somewhere between __SetPageLocked() and the
> > > actual addition of entry into the xarray.
> > 
> > There's a write barrier when you add something to the XArray, by virtue
> > of the call to rcu_assign_pointer().
> 
> OK, I've missed rcu_assign_pointer(). Thanks for correction... but...
> rcu_assign_pointer() is __smp_store_release(, v) and that on x86 seems to
> be:
> 
> barrier();  \
> WRITE_ONCE(*p, v);  \
> 
> which seems to provide a compiler barrier but not an SMP barrier? So is x86
> store ordering strong enough to make writes appear in the right order? So far
> I didn't think so... What am I missing?

X86 is TSO, and that is strong enough for this. The only visible
reordering on TSO is due to the store-buffer, that is: writes may happen
after later reads.



Re: [RFC 1/2] page-flags: Make page lock operation atomic

2019-02-11 Thread Jan Kara
On Mon 11-02-19 09:56:53, Matthew Wilcox wrote:
> On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote:
> > On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > > > 
> > > >> Signed-off-by: Chintan Pandya 
> > > > 
> > > > NAK.
> > > > 
> > > > This is bound to regress some stuff. Now agreed that using non-atomic
> > > > ops is tricky, but many are in places where we 'know' there can't be
> > > > concurrency.
> > > > 
> > > > If you can show any single one is wrong, we can fix that one, but we're
> > > > not going to blanket remove all this just because.
> > > 
> > > Not quite familiar with below stack but from crash dump, found that this
> > > was another stack running on some other CPU at the same time which also
> > > updates page cache lru and manipulate locks.
> > > 
> > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> > > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> > > [84415.344600] [20190123_21:27:50.786288]@1 
> > > add_to_page_cache_lru+0x84/0x11c
> > > [84415.344612] [20190123_21:27:50.786301]@1 
> > > ext4_mpage_readpages+0x178/0x714
> > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> > > [84415.344636] [20190123_21:27:50.786324]@1 
> > > __do_page_cache_readahead+0x16c/0x280
> > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> > > 
> > > Not entirely sure if it's racing with the crashing stack or it's simply
> > > overrides the the bit set by case 2 (mentioned in 0/2).
> > 
> > So this is interesting. Looking at __add_to_page_cache_locked() nothing
> > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
> > reordered into __add_to_page_cache_locked() after page is actually added to
> > the xarray. So that one particular instance might benefit from atomic
> > SetPageLocked or a barrier somewhere between __SetPageLocked() and the
> > actual addition of entry into the xarray.
> 
> There's a write barrier when you add something to the XArray, by virtue
> of the call to rcu_assign_pointer().

OK, I've missed rcu_assign_pointer(). Thanks for correction... but...
rcu_assign_pointer() is __smp_store_release(, v) and that on x86 seems to
be:

barrier();  \
WRITE_ONCE(*p, v);  \

which seems to provide a compiler barrier but not an SMP barrier? So is x86
store ordering strong enough to make writes appear in the right order? So far
I didn't think so... What am I missing?

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 1/2] page-flags: Make page lock operation atomic

2019-02-11 Thread Matthew Wilcox
On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote:
> On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > > 
> > >> Signed-off-by: Chintan Pandya 
> > > 
> > > NAK.
> > > 
> > > This is bound to regress some stuff. Now agreed that using non-atomic
> > > ops is tricky, but many are in places where we 'know' there can't be
> > > concurrency.
> > > 
> > > If you can show any single one is wrong, we can fix that one, but we're
> > > not going to blanket remove all this just because.
> > 
> > Not quite familiar with below stack but from crash dump, found that this
> > was another stack running on some other CPU at the same time which also
> > updates page cache lru and manipulate locks.
> > 
> > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> > [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
> > [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
> > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> > [84415.344636] [20190123_21:27:50.786324]@1 
> > __do_page_cache_readahead+0x16c/0x280
> > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> > 
> > Not entirely sure if it's racing with the crashing stack or it's simply
> > overrides the the bit set by case 2 (mentioned in 0/2).
> 
> So this is interesting. Looking at __add_to_page_cache_locked() nothing
> seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
> reordered into __add_to_page_cache_locked() after page is actually added to
> the xarray. So that one particular instance might benefit from atomic
> SetPageLocked or a barrier somewhere between __SetPageLocked() and the
> actual addition of entry into the xarray.

There's a write barrier when you add something to the XArray, by virtue
of the call to rcu_assign_pointer().


Re: [RFC 1/2] page-flags: Make page lock operation atomic

2019-02-11 Thread Jan Kara
On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > 
> >> Signed-off-by: Chintan Pandya 
> > 
> > NAK.
> > 
> > This is bound to regress some stuff. Now agreed that using non-atomic
> > ops is tricky, but many are in places where we 'know' there can't be
> > concurrency.
> > 
> > If you can show any single one is wrong, we can fix that one, but we're
> > not going to blanket remove all this just because.
> 
> Not quite familiar with below stack but from crash dump, found that this
> was another stack running on some other CPU at the same time which also
> updates page cache lru and manipulate locks.
> 
> [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
> [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
> [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> [84415.344636] [20190123_21:27:50.786324]@1 
> __do_page_cache_readahead+0x16c/0x280
> [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> 
> Not entirely sure if it's racing with the crashing stack or it's simply
> overrides the the bit set by case 2 (mentioned in 0/2).

So this is interesting. Looking at __add_to_page_cache_locked() nothing
seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
reordered into __add_to_page_cache_locked() after page is actually added to
the xarray. So that one particular instance might benefit from atomic
SetPageLocked or a barrier somewhere between __SetPageLocked() and the
actual addition of entry into the xarray.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 1/2] page-flags: Make page lock operation atomic

2019-02-11 Thread Linux Upstream


On 11/02/19 7:16 PM, Peter Zijlstra wrote:
> On Mon, Feb 11, 2019 at 12:53:53PM +, Chintan Pandya wrote:
>> Currently, page lock operation is non-atomic. This is opening
>> some scope for race condition. For ex, if 2 threads are accessing
>> same page flags, it may happen that our desired thread's page
>> lock bit (PG_locked) might get overwritten by other thread
>> leaving page unlocked. This can cause issues later when some
>> code expects page to be locked but it is not.
>>
>> Make page lock/unlock operation use the atomic version of
>> set_bit API. There are other flag set operations which still
>> uses non-atomic version of set_bit API. Bit, that might be
>> the change for the future.
>>
>> Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e
> 
> That doesn't belong in patches.

Sure. That's a miss. Will fix this.

> 
>> Signed-off-by: Chintan Pandya 
> 
> NAK.
> 
> This is bound to regress some stuff. Now agreed that using non-atomic
> ops is tricky, but many are in places where we 'know' there can't be
> concurrency.
> 
> If you can show any single one is wrong, we can fix that one, but we're
> not going to blanket remove all this just because.

Not quite familiar with below stack but from crash dump, found that this
was another stack running on some other CPU at the same time which also
updates page cache lru and manipulate locks.

[84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
[84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
[84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
[84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
[84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
[84415.344636] [20190123_21:27:50.786324]@1 
__do_page_cache_readahead+0x16c/0x280
[84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
[84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
[84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88

Not entirely sure if it's racing with the crashing stack or it's simply
overrides the the bit set by case 2 (mentioned in 0/2).
> 


Re: [RFC 1/2] page-flags: Make page lock operation atomic

2019-02-11 Thread Peter Zijlstra
On Mon, Feb 11, 2019 at 12:53:53PM +, Chintan Pandya wrote:
> Currently, page lock operation is non-atomic. This is opening
> some scope for race condition. For ex, if 2 threads are accessing
> same page flags, it may happen that our desired thread's page
> lock bit (PG_locked) might get overwritten by other thread
> leaving page unlocked. This can cause issues later when some
> code expects page to be locked but it is not.
> 
> Make page lock/unlock operation use the atomic version of
> set_bit API. There are other flag set operations which still
> uses non-atomic version of set_bit API. Bit, that might be
> the change for the future.
> 
> Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e

That doesn't belong in patches.

> Signed-off-by: Chintan Pandya 

NAK.

This is bound to regress some stuff. Now agreed that using non-atomic
ops is tricky, but many are in places where we 'know' there can't be
concurrency.

If you can show any single one is wrong, we can fix that one, but we're
not going to blanket remove all this just because.


[RFC 1/2] page-flags: Make page lock operation atomic

2019-02-11 Thread Chintan Pandya
Currently, page lock operation is non-atomic. This is opening
some scope for race condition. For ex, if 2 threads are accessing
same page flags, it may happen that our desired thread's page
lock bit (PG_locked) might get overwritten by other thread
leaving page unlocked. This can cause issues later when some
code expects page to be locked but it is not.

Make page lock/unlock operation use the atomic version of
set_bit API. There are other flag set operations which still
uses non-atomic version of set_bit API. Bit, that might be
the change for the future.

Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e
Signed-off-by: Chintan Pandya 
---
 fs/cifs/file.c | 8 
 fs/pipe.c  | 2 +-
 include/linux/page-flags.h | 2 +-
 include/linux/pagemap.h| 6 +++---
 mm/filemap.c   | 4 ++--
 mm/khugepaged.c| 2 +-
 mm/ksm.c   | 2 +-
 mm/memory-failure.c| 2 +-
 mm/memory.c| 2 +-
 mm/migrate.c   | 2 +-
 mm/shmem.c | 6 +++---
 mm/swap_state.c| 4 ++--
 mm/vmscan.c| 2 +-
 13 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 7d6539a04fac..23bcdee37239 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3661,13 +3661,13 @@ readpages_get_pages(struct address_space *mapping, 
struct list_head *page_list,
 * should have access to this page, we're safe to simply set
 * PG_locked without checking it first.
 */
-   __SetPageLocked(page);
+   SetPageLocked(page);
rc = add_to_page_cache_locked(page, mapping,
  page->index, gfp);
 
/* give up if we can't stick it in the cache */
if (rc) {
-   __ClearPageLocked(page);
+   ClearPageLocked(page);
return rc;
}
 
@@ -3688,9 +3688,9 @@ readpages_get_pages(struct address_space *mapping, struct 
list_head *page_list,
if (*bytes + PAGE_SIZE > rsize)
break;
 
-   __SetPageLocked(page);
+   SetPageLocked(page);
if (add_to_page_cache_locked(page, mapping, page->index, gfp)) {
-   __ClearPageLocked(page);
+   ClearPageLocked(page);
break;
}
list_move_tail(>lru, tmplist);
diff --git a/fs/pipe.c b/fs/pipe.c
index 8ef7d7bef775..1bab40a2ca44 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -147,7 +147,7 @@ static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
if (page_count(page) == 1) {
if (memcg_kmem_enabled())
memcg_kmem_uncharge(page, 0);
-   __SetPageLocked(page);
+   SetPageLocked(page);
return 0;
}
return 1;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5af67406b9c9..a56a9bd4bc6b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -268,7 +268,7 @@ static inline int TestClearPage##uname(struct page *page) { 
return 0; }
 #define TESTSCFLAG_FALSE(uname)
\
TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, 
PF_ONLY_HEAD)
 PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, 
PF_NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 51a9a0af3281..87a0447cfbe0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -619,17 +619,17 @@ int replace_page_cache_page(struct page *old, struct page 
*new, gfp_t gfp_mask);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
- * the page is new, so we can just run __SetPageLocked() against it.
+ * the page is new, so we can just run SetPageLocked() against it.
  */
 static inline int add_to_page_cache(struct page *page,
struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
 {
int error;
 
-   __SetPageLocked(page);
+   SetPageLocked(page);
error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
if (unlikely(error))
-   __ClearPageLocked(page);
+   ClearPageLocked(page);
return error;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 8e09304af1ec..14284726cf3a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -807,11 +807,11 @@ int add_to_page_cache_lru(struct page *page, struct 
address_space *mapping,
void *shadow = NULL;
int ret;
 
-   __SetPageLocked(page);
+   SetPageLocked(page);
ret = __add_to_page_cache_locked(page, mapping, offset,
 gfp_mask, );
if (unlikely(ret))
-