Cosmic request: https://lists.ubuntu.com/archives/kernel-team/2018-September/095337.html
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1787089 Title: [AEP-bug] ext4: more rare direct I/O vs unmap failures Status in intel: Triaged Status in linux package in Ubuntu: In Progress Bug description: Description: Even with the ext4_break_layouts() support added by "ext4: handle layout changes to pinned DAX mappings" Still seeing occasional cases with unit test where we are truncating a page that has an elevated reference count. Investigate. — The root cause of this issue is that while the ei->i_mmap_sem provides synchronization between ext4_break_layouts() and page faults, it doesn't provide synchronize us with the direct I/O path. This exact same issue exists in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK. This allows the direct I/O path to do I/O and raise & lower page->_refcount while we're executing a truncate/hole punch. This leads to us trying to free a page with an elevated refcount. Here's one instance of the race: CPU 0 CPU 1 ----- ----- ext4_punch_hole() ext4_break_layouts() # all pages have refcount=1 ext4_direct_IO() ... lots of layers ... follow_page_pte() get_page() # elevates refcount truncate_pagecache_range() ... a few layers ... dax_disassociate_entry() # sees elevated refcount, WARN_ON_ONCE() A similar race occurs when the refcount is being dropped while we're running ext4_break_layouts(), and this is the one that my test was actually hitting: CPU 0 CPU 1 ----- ----- ext4_direct_IO() ... lots of layers ... follow_page_pte() get_page() elevates refcount of page X ext4_punch_hole() ext4_break_layouts() # two pages, X & Y, have refcount == 2 __wait_var_event() # called for page X __put_devmap_managed_page() drops refcount of X to 1 __wait_var_events() checks X's refcount in "if (condition)", and breaks. We never actually called ext4_wait_dax_page(), so 'retry' in ext4_break_layouts() is still false. Exit do/while loop in ext4_break_layouts, never attempting to wait on page Y which still has an elevated refcount of 2. truncate_pagecache_range() ... a few layers ... dax_disassociate_entry() # sees elevated refcount for Y, WARN_ON_ONCE() Essentially the solution will most likely involve adding synchronization between the direct I/O path and truncate/hole punch type operations, and it'll need to happen for both ext4 and XFS, so the filesystem folks need to be involved. To manage notifications about this bug go to: https://bugs.launchpad.net/intel/+bug/1787089/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp