** Changed in: ubuntu-power-systems
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757497

Title:
  Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
  namespaces  (Bolt / NVMe)

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  ---Problem Description---
  We are seeing similar IO Hang on some namespaces when running HTX 16 
namespaces on Ubuntu18.04 
   
  ---uname output---
  Linux ltciofvtr-spoon4 4.15.0-10-generic #11-Ubuntu SMP Tue Feb 13 18:21:52 
UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  (Bolt / NVMe)0003:01:00.0 Non-Volatile memory controller [0108]: Samsung 
Electronics Co Ltd NVMe SSD Controller 172Xa [144d:a822] (rev 01)
    
  Machine Type = AC922 
   
  ---Steps to Reproduce---
   1> Install Ubuntu18.04 , upgrade to 4.15.0-10 kernel
  2> Install htxubuntu-472.deb
  3> make sure you create name spaces 
  #!/bin/bash

  device=/dev/nvme0
  echo $device

  nvme format $device

  nvme set-feature $device -f 0x0b --value=0x0100

  nvme delete-ns $device -n 0xFFFFFFFF
  sleep 5
  nvme list

  nvme get-log $device -l 200 -i 4

  max=`nvme id-ctrl $device | grep ^nn | awk '{print $NF}'`

  for i in $(eval echo {1..$max})
  do
      echo $i
      nvme create-ns $device --nsze=7000000 --ncap=7000000 --flbas=0 --dps=0
      nvme attach-ns $device --namespace-id=$i --controllers=`nvme list-ctrl 
$device | awk -F: '{print $2}'`
      sleep 2
      nvme get-log $device -l 200 -i 4
      sleep 2
  done
  nvme list

  3> run mdt.hd on those namespaces
   
  Contact Information = naveed...@in.ibm.com 
   
  Stack trace output:
   ---------------------------------------------------------------------

  ---------------------------------------------------------------------        
  Device id:/dev/nvme0n8      
  Timestamp:Feb 20 16:57:30 2018        
  err=ffffffff
  sev=1
  Exerciser Name:hxestorage            
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available        
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

          
  ---------------------------------------------------------------------

  ---------------------------------------------------------------------         
              
  Device id:/dev/nvme0n10     
  Timestamp:Feb 20 16:57:36 2018                       
  err=ffffffff
  sev=1
  Exerciser Name:hxestorage                           
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available                       
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519163856; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x8161
         1st lba       Blocks       Kernel    Hang   Duration
          (Hex)        (Hex)        Thread    Cnt    (Secs)
  ** Threshold of 1800 secs on one or more I/Os exceeded!
          0x5ae08b         8     7e0457eaf180      4    4800 

                         
  ---------------------------------------------------------------------

  ---------------------------------------------------------------------        
  Device id:/dev/nvme0n10     
  Timestamp:Feb 20 16:57:36 2018        
  err=ffffffff
  sev=1
  Exerciser Name:hxestorage            
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available        
  Device:Not Available
  Error Text:Hardware Exerciser stopped on error

          
  ---------------------------------------------------------------------

  ---------------------------------------------------------------------         
              
  Device id:/dev/nvme0n4      
  Timestamp:Feb 20 17:14:19 2018                       
  err=ffffffff
  sev=4
  Exerciser Name:hxestorage                           
  Serial No:Not Available
  Part No:Not Available
  Location:Not Available
  FRU Number:Not Available                       
  Device:Not Available
  Error Text:Hung I/O alert! Segment table-1,  Detected 1 I/O(s) hung.
  Current time: 1519164859; hang criteria: 600 secs, Hard hang threshold: 3
  Process ID: 0x815b
         1st lba       Blocks       Kernel    Hang   Duration
          (Hex)        (Hex)        Thread    Cnt    (Secs)
          0x398a7e         2     71d5affff180      3    3000 

                         
  ---------------------------------------------------------------------

  [17643.202114] INFO: task hxestorage:39744 blocked for more than 120 seconds.
  [17643.202180]       Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202342] hxestorage      D    0 39744   3424 0x00040000
  [17643.202346] Call Trace:
  [17643.202352] [c00020382bc4b660] [c00020382bc4b6b0] 0xc00020382bc4b6b0 
(unreliable)
  [17643.202360] [c00020382bc4b830] [c00000000001c080] __switch_to+0x2a0/0x4d0
  [17643.202364] [c00020382bc4b890] [c000000000cfce84] __schedule+0x2a4/0xaf0
  [17643.202366] [c00020382bc4b960] [c000000000cfd710] schedule+0x40/0xc0
  [17643.202370] [c00020382bc4b980] [c00000000014dffc] io_schedule+0x2c/0x50
  [17643.202376] [c00020382bc4b9b0] [c00000000042bf94] 
__blkdev_direct_IO_simple+0x1d4/0x3e0
  [17643.202379] [c00020382bc4bae0] [c00000000042c500] 
blkdev_direct_IO+0x360/0x540
  [17643.202384] [c00020382bc4bbb0] [c0000000002dc1f8] 
generic_file_direct_write+0xc8/0x240
  [17643.202387] [c00020382bc4bc20] [c0000000002dc47c] 
__generic_file_write_iter+0x10c/0x2a0
  [17643.202391] [c00020382bc4bc80] [c00000000042da3c] 
blkdev_write_iter+0xac/0x160
  [17643.202394] [c00020382bc4bcf0] [c0000000003cc3f4] 
new_sync_write+0x104/0x160
  [17643.202397] [c00020382bc4bd80] [c0000000003cfb38] vfs_write+0xd8/0x220
  [17643.202401] [c00020382bc4bdd0] [c0000000003d00b4] SyS_pwrite64+0xc4/0xf0
  [17643.202405] [c00020382bc4be30] [c00000000000b184] system_call+0x58/0x6c
  [17643.202408] INFO: task hxestorage:39748 blocked for more than 120 seconds.
  [17643.202519]       Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.202692] hxestorage      D    0 39748   3424 0x00040000
  [17643.202695] Call Trace:
  [17643.202697] [c00020382bc6f660] [c00020382bc6f6b0] 0xc00020382bc6f6b0 
(unreliable)
  [17643.202701] [c00020382bc6f830] [c00000000001c080] __switch_to+0x2a0/0x4d0
  [17643.202703] [c00020382bc6f890] [c000000000cfce84] __schedule+0x2a4/0xaf0
  [17643.202705] [c00020382bc6f960] [c000000000cfd710] schedule+0x40/0xc0
  [17643.202708] [c00020382bc6f980] [c00000000014dffc] io_schedule+0x2c/0x50
  [17643.202711] [c00020382bc6f9b0] [c00000000042bf94] 
__blkdev_direct_IO_simple+0x1d4/0x3e0
  [17643.202714] [c00020382bc6fae0] [c00000000042c500] 
blkdev_direct_IO+0x360/0x540
  [17643.202717] [c00020382bc6fbb0] [c0000000002dc1f8] 
generic_file_direct_write+0xc8/0x240
  [17643.202720] [c00020382bc6fc20] [c0000000002dc47c] 
__generic_file_write_iter+0x10c/0x2a0
  [17643.202723] [c00020382bc6fc80] [c00000000042da3c] 
blkdev_write_iter+0xac/0x160
  [17643.202726] [c00020382bc6fcf0] [c0000000003cc3f4] 
new_sync_write+0x104/0x160
  [17643.202729] [c00020382bc6fd80] [c0000000003cfb38] vfs_write+0xd8/0x220
  [17643.202732] [c00020382bc6fdd0] [c0000000003d00b4] SyS_pwrite64+0xc4/0xf0
  [17643.202735] [c00020382bc6fe30] [c00000000000b184] system_call+0x58/0x6c
  [17643.202740] INFO: task hxestorage:39917 blocked for more than 120 seconds.
  [17643.202809]       Not tainted 4.15.0-10-generic #11-Ubuntu
  [17643.202882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [17643.203013] hxestorage      D    0 39917   3424 0x00040000
  [17643.203015] Call Trace:
  [17643.203017] [c00020382bcd3720] [0000003c00000000] 0x3c00000000 (unreliable)
  [17643.203021] [c00020382bcd38f0] [c00000000001c080] __switch_to+0x2a0/0x4d0
  [17643.203023] [c00020382bcd3950] [c000000000cfce84] __schedule+0x2a4/0xaf0
  [17643.203025] [c00020382bcd3a20] [c000000000cfd710] schedule+0x40/0xc0
  [17643.203027] [c00020382bcd3a40] [c00000000014dffc] io_schedule+0x2c/0x50
  [17643.203030] [c00020382bcd3a70] [c00000000042bf94] 
__blkdev_direct_IO_simple+0x1d4/0x3e0
  [17643.203033] [c00020382bcd3ba0] [c00000000042c500] 
blkdev_direct_IO+0x360/0x540
  [17643.203036] [c00020382bcd3c70] [c0000000002dbfdc] 
generic_file_read_iter+0xbc/0x210
  [17643.203040] [c00020382bcd3cd0] [c00000000042d1e0] 
blkdev_read_iter+0x50/0x80
  [17643.203043] [c00020382bcd3cf0] [c0000000003cc290] new_sync_read+0x100/0x160
  [17643.203046] [c00020382bcd3d80] [c0000000003cf74c] vfs_read+0xbc/0x1b0
  [17643.203049] [c00020382bcd3dd0] [c0000000003cffc4] SyS_pread64+0xc4/0xf0
  [17643.203052] [c00020382bcd3e30] [c00000000000b184] system_call+0x58/0x6c
  [17643.203056] INFO: task hxestorage:40049 blocked for more than 120 seconds.

  Possible patch being reviewed for this issue:

  http://linuxppc.10917.n7.nabble.com/PATCH-powerpc-64s-Fix-lost-
  pending-interrupt-due-to-race-causing-lost-update-to-irq-happened-
  td135119.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1757497/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to