Private bug reported:

Description:
Even with the ext4_break_layouts() support added by "ext4: handle layout 
changes to pinned DAX mappings" Still seeing occasional cases with unit test 
where we are truncating a page that has an elevated reference count. 
Investigate.

—

The root cause of this issue is that while the ei->i_mmap_sem provides
synchronization between ext4_break_layouts() and page faults, it doesn't
provide synchronize us with the direct I/O path. This exact same issue exists
in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK.

This allows the direct I/O path to do I/O and raise & lower page->_refcount
while we're executing a truncate/hole punch. This leads to us trying to free
a page with an elevated refcount.

Here's one instance of the race:

CPU 0 CPU 1
----- -----
ext4_punch_hole()
ext4_break_layouts() # all pages have refcount=1

ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page() # elevates refcount

truncate_pagecache_range()
... a few layers ...
dax_disassociate_entry() # sees elevated refcount, WARN_ON_ONCE()

A similar race occurs when the refcount is being dropped while we're running
ext4_break_layouts(), and this is the one that my test was actually hitting:

CPU 0 CPU 1
----- -----
ext4_direct_IO()
... lots of layers ...
follow_page_pte()
get_page()

elevates refcount of page X
ext4_punch_hole()
ext4_break_layouts() # two pages, X & Y, have refcount == 2
__wait_var_event() # called for page X
__put_devmap_managed_page()

drops refcount of X to 1
__wait_var_events() checks X's refcount in "if (condition)", and breaks.
We never actually called ext4_wait_dax_page(), so 'retry' in
ext4_break_layouts() is still false. Exit do/while loop in
ext4_break_layouts, never attempting to wait on page Y which still has an
elevated refcount of 2.
truncate_pagecache_range()
... a few layers ...
dax_disassociate_entry() # sees elevated refcount for Y, WARN_ON_ONCE()

Essentially the solution will most likely involve adding synchronization
between the direct I/O path and truncate/hole punch type operations, and
it'll need to happen for both ext4 and XFS, so the filesystem folks need
to be involved.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: intel-kernel-18.10

** Information type changed from Public to Private

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1787089

Title:
  [AEP-bug] ext4: more rare direct I/O vs unmap failures

Status in linux package in Ubuntu:
  New

Bug description:
  Description:
  Even with the ext4_break_layouts() support added by "ext4: handle layout 
changes to pinned DAX mappings" Still seeing occasional cases with unit test 
where we are truncating a page that has an elevated reference count. 
Investigate.

  —

  The root cause of this issue is that while the ei->i_mmap_sem provides
  synchronization between ext4_break_layouts() and page faults, it doesn't
  provide synchronize us with the direct I/O path. This exact same issue exists
  in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK.

  This allows the direct I/O path to do I/O and raise & lower page->_refcount
  while we're executing a truncate/hole punch. This leads to us trying to free
  a page with an elevated refcount.

  Here's one instance of the race:

  CPU 0 CPU 1
  ----- -----
  ext4_punch_hole()
  ext4_break_layouts() # all pages have refcount=1

  ext4_direct_IO()
  ... lots of layers ...
  follow_page_pte()
  get_page() # elevates refcount

  truncate_pagecache_range()
  ... a few layers ...
  dax_disassociate_entry() # sees elevated refcount, WARN_ON_ONCE()

  A similar race occurs when the refcount is being dropped while we're running
  ext4_break_layouts(), and this is the one that my test was actually hitting:

  CPU 0 CPU 1
  ----- -----
  ext4_direct_IO()
  ... lots of layers ...
  follow_page_pte()
  get_page()

  elevates refcount of page X
  ext4_punch_hole()
  ext4_break_layouts() # two pages, X & Y, have refcount == 2
  __wait_var_event() # called for page X
  __put_devmap_managed_page()

  drops refcount of X to 1
  __wait_var_events() checks X's refcount in "if (condition)", and breaks.
  We never actually called ext4_wait_dax_page(), so 'retry' in
  ext4_break_layouts() is still false. Exit do/while loop in
  ext4_break_layouts, never attempting to wait on page Y which still has an
  elevated refcount of 2.
  truncate_pagecache_range()
  ... a few layers ...
  dax_disassociate_entry() # sees elevated refcount for Y, WARN_ON_ONCE()

  Essentially the solution will most likely involve adding
  synchronization between the direct I/O path and truncate/hole punch
  type operations, and it'll need to happen for both ext4 and XFS, so
  the filesystem folks need to be involved.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787089/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to