Public bug reported:

SRU Justification
-----------------

[Impact]
Certain sequences of file system operations on a cephfs volume backed by 
fscache with an ext4 store can cause a kernel BUG:


[ 5818.932770] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000000
[ 5818.934354] IP: jbd2__journal_start+0x33/0x1e0
...
[ 5818.962490] Call Trace:
[ 5818.963055] ? ext4_writepages+0x5d5/0xf40
[ 5818.963884] __ext4_journal_start_sb+0x6d/0x120
[ 5818.964994] ext4_writepages+0x5d5/0xf40
[ 5818.965991] ? __enqueue_entity+0x5c/0x60
[ 5818.966791] ? check_preempt_wakeup+0x130/0x240
[ 5818.967679] do_writepages+0x4b/0xe0
[ 5818.968625] ? ext4_mark_inode_dirty+0x1d0/0x1d0
[ 5818.969526] ? do_writepages+0x4b/0xe0
[ 5818.970493] ? ext4_statfs+0x114/0x260
[ 5818.971267] __filemap_fdatawrite_range+0xc1/0x100
[ 5818.972425] ? __filemap_fdatawrite_range+0xc1/0x100
[ 5818.973385] filemap_write_and_wait+0x31/0x90
[ 5818.974461] ext4_bmap+0x8c/0xe0
[ 5818.975150] cachefiles_read_or_alloc_pages+0x1bf/0xd90 [cachefiles]
[ 5818.976718] ? _cond_resched+0x19/0x40
[ 5818.977482] ? wake_up_bit+0x42/0x50
[ 5818.978227] ? fscache_run_op.isra.8+0x4c/0x80 [fscache]
[ 5818.979249] __fscache_read_or_alloc_pages+0x1d3/0x2e0 [fscache]
[ 5818.980397] ceph_readpages_from_fscache+0x6c/0xe0 [ceph]
[ 5818.981630] ceph_readpages+0x49/0x100 [ceph]
[ 5818.982691] __do_page_cache_readahead+0x1c9/0x2c0
[ 5818.983628] ? __cap_is_valid+0x21/0xb0 [ceph]
[ 5818.984526] ondemand_readahead+0x11a/0x2a0
[ 5818.985374] ? ondemand_readahead+0x11a/0x2a0
[ 5818.986825] page_cache_async_readahead+0x71/0x80
[ 5818.987751] generic_file_read_iter+0x784/0xbf0
[ 5818.988663] ? ceph_put_cap_refs+0x1c4/0x330 [ceph]
[ 5818.989620] ? page_cache_tree_insert+0xe0/0xe0
[ 5818.990519] ceph_read_iter+0x106/0x820 [ceph]
[ 5818.991818] new_sync_read+0xe4/0x130
[ 5818.992588] __vfs_read+0x29/0x40
[ 5818.993504] vfs_read+0x8e/0x130
[ 5818.994192] SyS_read+0x55/0xc0
[ 5818.994870] do_syscall_64+0x73/0x130
[ 5818.995632] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

[Fix]
Cherry-pick 5d988308283ecf062fa88f20ae05c52cce0bcdca from upstream.

This patch stops cephfs from reusing current->journal for its own
internal use, which means that it's valid when ext4 uses it via fscache.

[Testcase]
A user has been using the following test case:
( cat /proc/fs/fscache/stats > ~/test.log; i=0; while true; do
    touch small; echo 3 > /proc/sys/vm/drop_caches & md5sum small; let "i++"; 
if ! (( $i % 1000 )); then
        echo "Test iteration $i done" >> ~/test.log; cat /proc/fs/fscache/stats 
>> ~/test.log;
    fi;
done ) > ~/nohup.out 2>&1

(It boils down to "touch file; drop caches; read file")
Without the patch, this fails very quickly - usually the first time, always 
within a few iterations. With the patch, the user ran this loop for over 60 
hours without incident.

[Regression potential]
The change is not trivial, but is limited to cephfs, and has been in mainline 
since v4.16. So the risk of regression is well contained.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Daniel Axtens (daxtens)
         Status: Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1783246

Title:
  Cephfs + fscache: unable to handle kernel NULL pointer dereference at
  0000000000000000 IP: jbd2__journal_start+0x22/0x1f0

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  SRU Justification
  -----------------

  [Impact]
  Certain sequences of file system operations on a cephfs volume backed by 
fscache with an ext4 store can cause a kernel BUG:

  
  [ 5818.932770] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000000
  [ 5818.934354] IP: jbd2__journal_start+0x33/0x1e0
  ...
  [ 5818.962490] Call Trace:
  [ 5818.963055] ? ext4_writepages+0x5d5/0xf40
  [ 5818.963884] __ext4_journal_start_sb+0x6d/0x120
  [ 5818.964994] ext4_writepages+0x5d5/0xf40
  [ 5818.965991] ? __enqueue_entity+0x5c/0x60
  [ 5818.966791] ? check_preempt_wakeup+0x130/0x240
  [ 5818.967679] do_writepages+0x4b/0xe0
  [ 5818.968625] ? ext4_mark_inode_dirty+0x1d0/0x1d0
  [ 5818.969526] ? do_writepages+0x4b/0xe0
  [ 5818.970493] ? ext4_statfs+0x114/0x260
  [ 5818.971267] __filemap_fdatawrite_range+0xc1/0x100
  [ 5818.972425] ? __filemap_fdatawrite_range+0xc1/0x100
  [ 5818.973385] filemap_write_and_wait+0x31/0x90
  [ 5818.974461] ext4_bmap+0x8c/0xe0
  [ 5818.975150] cachefiles_read_or_alloc_pages+0x1bf/0xd90 [cachefiles]
  [ 5818.976718] ? _cond_resched+0x19/0x40
  [ 5818.977482] ? wake_up_bit+0x42/0x50
  [ 5818.978227] ? fscache_run_op.isra.8+0x4c/0x80 [fscache]
  [ 5818.979249] __fscache_read_or_alloc_pages+0x1d3/0x2e0 [fscache]
  [ 5818.980397] ceph_readpages_from_fscache+0x6c/0xe0 [ceph]
  [ 5818.981630] ceph_readpages+0x49/0x100 [ceph]
  [ 5818.982691] __do_page_cache_readahead+0x1c9/0x2c0
  [ 5818.983628] ? __cap_is_valid+0x21/0xb0 [ceph]
  [ 5818.984526] ondemand_readahead+0x11a/0x2a0
  [ 5818.985374] ? ondemand_readahead+0x11a/0x2a0
  [ 5818.986825] page_cache_async_readahead+0x71/0x80
  [ 5818.987751] generic_file_read_iter+0x784/0xbf0
  [ 5818.988663] ? ceph_put_cap_refs+0x1c4/0x330 [ceph]
  [ 5818.989620] ? page_cache_tree_insert+0xe0/0xe0
  [ 5818.990519] ceph_read_iter+0x106/0x820 [ceph]
  [ 5818.991818] new_sync_read+0xe4/0x130
  [ 5818.992588] __vfs_read+0x29/0x40
  [ 5818.993504] vfs_read+0x8e/0x130
  [ 5818.994192] SyS_read+0x55/0xc0
  [ 5818.994870] do_syscall_64+0x73/0x130
  [ 5818.995632] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

  [Fix]
  Cherry-pick 5d988308283ecf062fa88f20ae05c52cce0bcdca from upstream.

  This patch stops cephfs from reusing current->journal for its own
  internal use, which means that it's valid when ext4 uses it via
  fscache.

  [Testcase]
  A user has been using the following test case:
  ( cat /proc/fs/fscache/stats > ~/test.log; i=0; while true; do
      touch small; echo 3 > /proc/sys/vm/drop_caches & md5sum small; let "i++"; 
if ! (( $i % 1000 )); then
          echo "Test iteration $i done" >> ~/test.log; cat 
/proc/fs/fscache/stats >> ~/test.log;
      fi;
  done ) > ~/nohup.out 2>&1

  (It boils down to "touch file; drop caches; read file")
  Without the patch, this fails very quickly - usually the first time, always 
within a few iterations. With the patch, the user ran this loop for over 60 
hours without incident.

  [Regression potential]
  The change is not trivial, but is limited to cephfs, and has been in mainline 
since v4.16. So the risk of regression is well contained.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1783246/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to