Hi Colin,

It seems to happen on 2 different controllers (both LSI). I'm using the
9201-16e at the moment and it performs much faster overall, but doing
something like a zfs scrub on a pool still causes the resets and thus
zfs locks as part of those. Basically seems to be under heavy IO load.
Although, this controller can handle much heavier IO that just a single
scrub. For example, it seems much more likely to happen on my 3.5" 2TB
and 3TB HDD drives, as opposed to my 2.5" 1TB SSD's. The SSD's scream
through a scrub in about an hour, whereas the HDD's take a day or more.

The only thing I can think of is maybe to increase direct cooling on the
controller in case it's overheating. But this is a Dell R710 server
chassis with lots of high volume airflow.

It's really hard to pinpoint the problem between controller, driver, and
filesystem.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1889110

Title:
  zfs pool locks and see "INFO: task txg_sync:4307 blocked for more than
  120 seconds. "

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  ZFS filesystem becomes unresponsive and subsequent NFS shares
  unresponsive. ESXi sees all paths down.

  See this error 3 times in a row.

  
  [184383.479511] INFO: task txg_sync:4307 blocked for more than 120 seconds.   
                                                                                
                                                  
  [184383.479565]       Tainted: P          IO      5.4.0-42-generic #46-Ubuntu 
                                                                                
                                                  
  [184383.479607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.                                                                   
                                                    
  [184383.479655] txg_sync        D    0  4307      2 0x80004000                
                                                                                
                                                  
  [184383.479658] Call Trace:                                                   
                                                                                
                                                  
  [184383.479670]  __schedule+0x2e3/0x740                                       
                                                                                
                                                  
  [184383.479673]  schedule+0x42/0xb0                                           
                                                                                
                                                  
  [184383.479676]  schedule_timeout+0x152/0x2f0                                 
                                                                                
                                                  
  [184383.479683]  ? __next_timer_interrupt+0xe0/0xe0                           
                                                                                
                                                  
  [184383.479685]  io_schedule_timeout+0x1e/0x50                                
                                                                                
                                                  
  [184383.479697]  __cv_timedwait_common+0x15e/0x1c0 [spl]                      
                                                                                
                                                  
  [184383.479702]  ? wait_woken+0x80/0x80                                       
                                                                                
                                                  
  [184383.479710]  __cv_timedwait_io+0x19/0x20 [spl]                            
                                                                                
                                                  
  [184383.479816]  zio_wait+0x11b/0x230 [zfs]                                   
                                                                                
                                                  
  [184383.479905]  ? __raw_spin_unlock+0x9/0x10 [zfs]                           
                                                                                
                                                  
  [184383.479983]  dsl_pool_sync+0xbc/0x410 [zfs]                               
                                                                                
                                                  
  [184383.480069]  spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs]             
                                                                                
                                                  
  [184383.480156]  spa_sync+0x312/0x5b0 [zfs]                                   
                                                                                
                                                  
  [184383.480245]  txg_sync_thread+0x27a/0x310 [zfs]                            
                                                                                
                                                  
  [184383.480334]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]                   
                                                                                
                                                  
  [184383.480344]  thread_generic_wrapper+0x83/0xa0 [spl]                       
                                                                                
                                                  
  [184383.480347]  kthread+0x104/0x140                                          
                                                                                
                                                  
  [184383.480356]  ? clear_bit+0x20/0x20 [spl]                                  
                                                                                
                                                  
  [184383.480358]  ? kthread_park+0x90/0x90                                     
                                                                                
                                                  
  [184383.480361]  ret_from_fork+0x35/0x40                                      


  Then nfsd hangs as well.

  
  [184866.787445] INFO: task nfsd:6585 blocked for more than 120 seconds.
  [184866.787485]       Tainted: P          IO      5.4.0-42-generic #46-Ubuntu
  [184866.787526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [184866.787573] nfsd            D    0  6585      2 0x80004000
  [184866.787575] Call Trace:
  [184866.787578]  __schedule+0x2e3/0x740
  [184866.787675]  ? __raw_spin_unlock+0x9/0x10 [zfs]
  [184866.787678]  schedule+0x42/0xb0
  [184866.787685]  cv_wait_common+0x133/0x180 [spl]
  [184866.787688]  ? wait_woken+0x80/0x80
  [184866.787695]  __cv_wait+0x15/0x20 [spl]
  [184866.787764]  dmu_tx_wait+0x1ee/0x210 [zfs]
  [184866.787834]  dmu_tx_assign+0x49/0x70 [zfs]
  [184866.787929]  zfs_write+0x461/0xd40 [zfs]
  [184866.788025]  ? atomic_sub_return.constprop.0+0xd/0x20 [zfs]
  [184866.788033]  ? atomic_dec+0xd/0x20 [spl]
  [184866.788116]  ? __raw_spin_unlock+0x9/0x10 [zfs]
  [184866.788122]  ? __d_obtain_alias+0x36/0x90
  [184866.788217]  zpl_write_common_iovec+0xad/0x120 [zfs]
  [184866.788313]  zpl_iter_write_common+0x8e/0xb0 [zfs]
  [184866.788409]  zpl_iter_write+0x56/0x90 [zfs]
  [184866.788413]  do_iter_readv_writev+0x14f/0x1d0
  [184866.788416]  do_iter_write+0x84/0x1a0
  [184866.788418]  vfs_iter_write+0x19/0x30
  [184866.788442]  nfsd_vfs_write+0xe0/0x480 [nfsd]
  [184866.788454]  nfsd_write+0x7a/0x160 [nfsd]
  [184866.788458]  ? kmem_cache_alloc+0x16d/0x230
  [184866.788472]  nfsd3_proc_write+0xc3/0x170 [nfsd]
  [184866.788483]  nfsd_dispatch+0xd6/0x220 [nfsd]
  [184866.788508]  svc_process_common+0x3af/0x700 [sunrpc]
  [184866.788527]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
  [184866.788538]  ? nfsd_svc+0x2d0/0x2d0 [nfsd]
  [184866.788557]  svc_process+0xd9/0x110 [sunrpc]
  [184866.788568]  nfsd+0xe8/0x150 [nfsd]
  [184866.788570]  kthread+0x104/0x140
  [184866.788581]  ? nfsd_destroy+0x60/0x60 [nfsd]
  [184866.788583]  ? kthread_park+0x90/0x90
  [184866.788585]  ret_from_fork+0x35/0x40


  Linux zfs-01 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC
  2020 x86_64 x86_64 x86_64 GNU/Linux

  root@zfs-01:/# lsb_release -rd
  Description:    Ubuntu 20.04 LTS
  Release:        20.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1889110/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to