[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-12-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #27 from Gaetan Trellu (gaetan.tre...@incloudus.com) ---
By compiling the hpsa kernel module from SourceForge on Ubuntu 16.04 with
kernel 4.4 solved the issue for us.

Steps:
# apt-get install dkms build-essential
# tar xjvf hpsa-3.4.20-141.tar.bz2
# cd hpsa-3.4.20/drivers/
# sudo cp -a scsi /usr/src/hpsa-3.4.20.141
# dkms add -m hpsa -v 3.4.20.141
# dkms build -m hpsa -v 3.4.20.141
# dkms install -m hpsa -v 3.4.20.141

Link: https://sourceforge.net/projects/cciss/

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-10-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #26 from Gaetan Trellu (gaetan.tre...@incloudus.com) ---
Moved from Ubuntu 16.04.5 to CentOS 7.5 with hpsa kernel module
(kmod-hpsa-3.4.20-141.rhel7u5.x86_64.rpm) from HPE website.

Running without kernel panic since more than a week.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-09-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #25 from Gaetan Trellu (gaetan.tre...@incloudus.com) ---
More logs.
[5.272077] HP HPSA Driver (v 3.4.14-0)
[5.340589] hpsa :03:00.0: can't disable ASPM; OS doesn't have ASPM
control
[5.352372] hpsa :03:00.0: MSI-X capable controller
[5.358775] hpsa :03:00.0: Logical aborts not supported
[5.410577] scsi host6: hpsa
[5.620173] hpsa :03:00.0: scsi 6:3:0:0: added RAID  HP 
 P440ar   controller SSDSmartPathCap- En- Exp=1
[5.633345] hpsa :03:00.0: scsi 6:0:0:0: masked Direct-Access ATA   
  TK0120GDJXT  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.651921] hpsa :03:00.0: scsi 6:0:1:0: masked Direct-Access ATA   
  TK0120GDJXT  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.682879] ata6.00: ATA-9: VR0120GEJXL, 4IWTHPG0, max UDMA/100
[5.682891] ata5.00: ATA-9: VR0120GEJXL, 4IWTHPG0, max UDMA/100
[5.800257] hpsa :03:00.0: scsi 6:0:2:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.813417] hpsa :03:00.0: scsi 6:0:3:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.826488] hpsa :03:00.0: scsi 6:0:4:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.839558] hpsa :03:00.0: scsi 6:0:5:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.852628] hpsa :03:00.0: scsi 6:0:6:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.865698] hpsa :03:00.0: scsi 6:0:7:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.878769] hpsa :03:00.0: scsi 6:0:8:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.891839] hpsa :03:00.0: scsi 6:0:9:0: masked Direct-Access ATA   
  MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.904910] hpsa :03:00.0: scsi 6:0:10:0: masked Direct-Access ATA  
   MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.918076] hpsa :03:00.0: scsi 6:0:11:0: masked Direct-Access ATA  
   MB3000GCWDB  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.931242] hpsa :03:00.0: scsi 6:0:12:0: masked Direct-Access ATA  
   TK0120GDJXT  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.92] hpsa :03:00.0: scsi 6:0:13:0: masked Direct-Access ATA  
   TK0120GDJXT  PHYS DRV SSDSmartPathCap- En- Exp=0
[5.957609] hpsa :03:00.0: scsi 6:0:14:0: masked Enclosure HPE  
   12G SAS Exp Card enclosure SSDSmartPathCap- En- Exp=0
[5.970871] hpsa :03:00.0: scsi 6:1:0:0: added Direct-Access HP 
 LOGICAL VOLUME   RAID-1(+0) SSDSmartPathCap+ En+ Exp=1
[5.984038] hpsa :03:00.0: scsi 6:1:0:1: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
[5.996822] hpsa :03:00.0: scsi 6:1:0:2: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
[6.009606] hpsa :03:00.0: scsi 6:1:0:3: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.022391] hpsa :03:00.0: scsi 6:1:0:4: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.035176] hpsa :03:00.0: scsi 6:1:0:5: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.047960] hpsa :03:00.0: scsi 6:1:0:6: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.060759] hpsa :03:00.0: scsi 6:1:0:7: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.073545] hpsa :03:00.0: scsi 6:1:0:8: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.086329] hpsa :03:00.0: scsi 6:1:0:9: added Direct-Access HP 
 LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.099113] hpsa :03:00.0: scsi 6:1:0:10: added Direct-Access HP
  LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.111991] hpsa :03:00.0: scsi 6:1:0:11: added Direct-Access HP
  LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.124869] hpsa :03:00.0: scsi 6:1:0:12: added Direct-Access HP
  LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[6.138251] scsi 6:0:0:0: RAID  HP   P440ar   6.60
PQ: 0 ANSI: 5
[6.147610] scsi 6:1:0:0: Direct-Access HP   LOGICAL VOLUME   6.60
PQ: 0 ANSI: 5
[6.156967] scsi 6:1:0:1: Direct-Access HP   LOGICAL VOLUME   6.60
PQ: 0 ANSI: 5
[6.171837] scsi 6:1:0:2: Direct-Access HP   LOGICAL VOLUME   6.60
PQ: 0 ANSI: 5
[6.181197] scsi 6:1:0:3: Direct-Access HP   LOGICAL VOLUME   6.60
PQ: 0 ANSI: 5
[6.190653] scsi 6:1:0:4: Direct-Access HP   LOGICAL VOLUME   6.60
PQ: 0 ANSI: 5
[

[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-08-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

goldyfruit (gaetan.tre...@incloudus.com) changed:

   What|Removed |Added

 CC||gaetan.tre...@incloudus.com

--- Comment #24 from goldyfruit (gaetan.tre...@incloudus.com) ---
Same behavior here with controllers P440ar and P420i on DL480 G8 and DL480p G8.

Firmware:
  - P440ar: 6.60
  - P420i: 8.32

[128958.979859] hpsa :03:00.0: scsi 0:1:0:9: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[129170.663840] INFO: task scsi_eh_0:446 blocked for more than 120 seconds.
[129170.671251]   Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.678176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[129170.686930] scsi_eh_0   D0   446  2 0x8000
[129170.686934] Call Trace:
[129170.686945]  __schedule+0x3d6/0x8b0
[129170.686947]  schedule+0x36/0x80
[129170.686950]  schedule_timeout+0x1db/0x370
[129170.686954]  ? __dev_printk+0x3c/0x80
[129170.686956]  ? dev_printk+0x56/0x80
[129170.686959]  io_schedule_timeout+0x1e/0x50
[129170.686961]  wait_for_completion_io+0xb4/0x140
[129170.686965]  ? wake_up_q+0x70/0x70
[129170.686972]  hpsa_scsi_do_simple_cmd.isra.56+0xc7/0xf0 [hpsa]
[129170.686975]  hpsa_eh_device_reset_handler+0x3bb/0x790 [hpsa]
[129170.686978]  ? sched_clock_cpu+0x11/0xb0
[129170.686983]  ? scsi_device_put+0x2b/0x30
[129170.686987]  scsi_eh_ready_devs+0x368/0xc10
[129170.686993]  ? __pm_runtime_resume+0x5b/0x80
[129170.686995]  scsi_error_handler+0x4c3/0x5c0
[129170.687000]  kthread+0x105/0x140
[129170.687003]  ? scsi_eh_get_sense+0x240/0x240
[129170.687005]  ? kthread_destroy_worker+0x50/0x50
[129170.687012]  ? do_syscall_64+0x73/0x130
[129170.687015]  ? SyS_exit_group+0x14/0x20
[129170.687017]  ret_from_fork+0x35/0x40
[129170.687021] INFO: task jbd2/sda1-8:636 blocked for more than 120 seconds.
[129170.694649]   Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.701598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[129170.710343] jbd2/sda1-8 D0   636  2 0x8000
[129170.710346] Call Trace:
[129170.710349]  __schedule+0x3d6/0x8b0
[129170.710351]  ? bit_wait+0x60/0x60
[129170.710352]  schedule+0x36/0x80
[129170.710354]  io_schedule+0x16/0x40
[129170.710359]  bit_wait_io+0x11/0x60
[129170.710362]  __wait_on_bit+0x63/0x90
[129170.710367]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.710373]  ? bit_waitqueue+0x40/0x40
[129170.710377]  __wait_on_buffer+0x32/0x40
[129170.710381]  jbd2_journal_commit_transaction+0xdf6/0x1760
[129170.710387]  kjournald2+0xc8/0x250
[129170.710392]  ? kjournald2+0xc8/0x250
[129170.710395]  ? wait_woken+0x80/0x80
[129170.710398]  kthread+0x105/0x140
[129170.710399]  ? commit_timeout+0x20/0x20
[129170.710402]  ? kthread_destroy_worker+0x50/0x50
[129170.710404]  ? do_syscall_64+0x73/0x130
[129170.710407]  ? SyS_exit_group+0x14/0x20
[129170.710412]  ret_from_fork+0x35/0x40
[129170.710423] INFO: task rs:main Q:Reg:2907 blocked for more than 120
seconds.
[129170.718358]   Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.725305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[129170.734076] rs:main Q:Reg   D0  2907  1 0x
[129170.734079] Call Trace:
[129170.734082]  __schedule+0x3d6/0x8b0
[129170.734086]  ? bit_waitqueue+0x40/0x40
[129170.734087]  ? bit_wait+0x60/0x60
[129170.734089]  schedule+0x36/0x80
[129170.734091]  io_schedule+0x16/0x40
[129170.734092]  bit_wait_io+0x11/0x60
[129170.734094]  __wait_on_bit+0x63/0x90
[129170.734096]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.734098]  ? bit_waitqueue+0x40/0x40
[129170.734100]  do_get_write_access+0x202/0x410
[129170.734102]  jbd2_journal_get_write_access+0x51/0x70
[129170.734107]  __ext4_journal_get_write_access+0x3b/0x80
[129170.734111]  ext4_reserve_inode_write+0x95/0xc0
[129170.734115]  ? ext4_dirty_inode+0x48/0x70
[129170.734117]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.734119]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.734121]  ext4_dirty_inode+0x48/0x70
[129170.734125]  __mark_inode_dirty+0x184/0x3b0
[129170.734129]  generic_update_time+0x7b/0xd0
[129170.734132]  ? current_time+0x32/0x70
[129170.734134]  file_update_time+0xbe/0x110
[129170.734140]  __generic_file_write_iter+0x9d/0x1f0
[129170.734142]  ext4_file_write_iter+0xc4/0x3f0
[129170.734147]  ? futex_wake+0x90/0x170
[129170.734151]  new_sync_write+0xe5/0x140
[129170.734155]  __vfs_write+0x29/0x40
[129170.734156]  vfs_write+0xb8/0x1b0
[129170.734158]  SyS_write+0x55/0xc0
[129170.734160]  do_syscall_64+0x73/0x130
[129170.734163]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.734165] RIP: 0033:0x7feefa9394bd
[129170.734166] RSP: 002b:7feef7ce8600 EFLAGS: 0293 ORIG_RAX:
0001
[129170.734168] RAX: ffda RBX: 7feeec00d120 RCX:

[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-05-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #23 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
Ho, I have forgotten to say that before the hpsa do some actions, I had several
errors on the disk where the badblocks ran:
...
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Sense Key : Medium Error
[current] 
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Add. Sense: Unrecovered
read error
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 CDB: Read(16) 88 00 00 00
00 01 37 0c c5 b0 00 00 00 08 00 00
[Mon Apr 30 22:21:18 2018] print_req_error: critical medium error, dev sdt,
sector 5218551216
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] Unaligned partial completion
(resid=242, sector_sz=512)
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Sense Key : Medium Error
[current] 
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Add. Sense: Unrecovered
read error
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 CDB: Read(16) 88 00 00 00
00 01 37 0c c5 b8 00 00 00 08 00 00
[Mon Apr 30 22:21:18 2018] print_req_error: critical medium error, dev sdt,
sector 5218551224
[Tue May  1 06:27:37 2018] hpsa :08:00.0: aborted: LUN:00c03901
CDB:12003100
[Tue May  1 06:27:37 2018] hpsa :08:00.0: hpsa_update_device_info: inquiry
failed, device will be skipped.
[Tue May  1 06:27:37 2018] hpsa :08:00.0: scsi 0:0:50:0: removed
Direct-Access ATA  MB4000GCWDC  PHYS DRV SSDSmartP
athCap- En- Exp=0
[Tue May  1 06:28:24 2018] hpsa :08:00.0: aborted: LUN:00c03901
CDB:12003100
[Tue May  1 06:28:24 2018] hpsa :08:00.0: hpsa_update_device_info: inquiry
failed, device will be skipped.
...

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-05-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #22 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
Created attachment 275723
  --> https://bugzilla.kernel.org/attachment.cgi?id=275723=edit
Load on server during reset problem

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-05-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #21 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
I have reproduced the problem.
Here the condition that I have done:

Kernel: 4.16.3-041603-generic
hpsa: 3.4.20-125 with patch to use local work-queue instead of system
work-queue.

I needed to execute a badblocks in a read-only test on a disk who has failed
before:

~# while :; do badblocks -v -b 4096 -s /dev/sdt; done

And several days after, the bug raised.
You'll find a graph of the load in an attachment.
Before the reset, I have a hpsa_update_device_info: inquiry failed and a stack
trace on badblocks (this one seems to be logical)

Load: 850

[Tue May  1 06:27:37 2018] hpsa :08:00.0: aborted: LUN:00c03901
CDB:12003100
[Tue May  1 06:27:37 2018] hpsa :08:00.0: hpsa_update_device_info: inquiry
failed, device will be skipped.
[Tue May  1 06:27:37 2018] hpsa :08:00.0: scsi 0:0:50:0: removed
Direct-Access ATA  MB4000GCWDC  PHYS DRV SSDSmartPathCap- En- Exp=0
[Tue May  1 06:28:24 2018] hpsa :08:00.0: aborted: LUN:00c03901
CDB:12003100
[Tue May  1 06:28:24 2018] hpsa :08:00.0: hpsa_update_device_info: inquiry
failed, device will be skipped.
[Tue May  1 06:29:51 2018] INFO: task badblocks:46824 blocked for more than 120
seconds.
[Tue May  1 06:29:51 2018]   Tainted: G   OE   
4.16.3-041603-generic #201804190730
[Tue May  1 06:29:51 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Tue May  1 06:29:51 2018] badblocks   D0 46824  48728 0x0004
[Tue May  1 06:29:51 2018] Call Trace:
[Tue May  1 06:29:51 2018]  __schedule+0x297/0x880
[Tue May  1 06:29:51 2018]  ? iov_iter_get_pages+0xc0/0x2c0
[Tue May  1 06:29:51 2018]  schedule+0x2c/0x80
[Tue May  1 06:29:51 2018]  io_schedule+0x16/0x40
[Tue May  1 06:29:51 2018]  __blkdev_direct_IO_simple+0x1ff/0x360
[Tue May  1 06:29:51 2018]  ? bdget+0x120/0x120
[Tue May  1 06:29:51 2018]  blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:29:51 2018]  ? blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:29:51 2018]  ? current_time+0x32/0x70
[Tue May  1 06:29:51 2018]  ? __atime_needs_update+0x7f/0x190
[Tue May  1 06:29:51 2018]  generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:29:51 2018]  ? __blkdev_direct_IO_simple+0x360/0x360
[Tue May  1 06:29:51 2018]  ? generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:29:51 2018]  ? __wake_up+0x13/0x20
[Tue May  1 06:29:51 2018]  ? tty_ldisc_deref+0x16/0x20
[Tue May  1 06:29:51 2018]  ? tty_write+0x1fb/0x320
[Tue May  1 06:29:51 2018]  blkdev_read_iter+0x35/0x40
[Tue May  1 06:29:51 2018]  __vfs_read+0xfb/0x170
[Tue May  1 06:29:51 2018]  vfs_read+0x8e/0x130
[Tue May  1 06:29:51 2018]  SyS_read+0x55/0xc0
[Tue May  1 06:29:51 2018]  do_syscall_64+0x73/0x130
[Tue May  1 06:29:51 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Tue May  1 06:29:51 2018] RIP: 0033:0x7fe31b97c330
[Tue May  1 06:29:51 2018] RSP: 002b:7fffcea10258 EFLAGS: 0246
ORIG_RAX: 
[Tue May  1 06:29:51 2018] RAX: ffda RBX: 026e1980 RCX:
7fe31b97c330
[Tue May  1 06:29:51 2018] RDX: 0004 RSI: 7fe31c26e000 RDI:
0003
[Tue May  1 06:29:51 2018] RBP: 1000 R08: 26e19800 R09:
7fffcea10008
[Tue May  1 06:29:51 2018] R10: 7fffcea10020 R11: 0246 R12:
0003
[Tue May  1 06:29:51 2018] R13: 7fe31c26e000 R14: 0040 R15:
0004
[Tue May  1 06:31:52 2018] INFO: task badblocks:46824 blocked for more than 120
seconds.
[Tue May  1 06:31:52 2018]   Tainted: G   OE   
4.16.3-041603-generic #201804190730
[Tue May  1 06:31:52 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Tue May  1 06:31:52 2018] badblocks   D0 46824  48728 0x0004
[Tue May  1 06:31:52 2018] Call Trace:
[Tue May  1 06:31:52 2018]  __schedule+0x297/0x880
[Tue May  1 06:31:52 2018]  ? iov_iter_get_pages+0xc0/0x2c0
[Tue May  1 06:31:52 2018]  schedule+0x2c/0x80
[Tue May  1 06:31:52 2018]  io_schedule+0x16/0x40
[Tue May  1 06:31:52 2018]  __blkdev_direct_IO_simple+0x1ff/0x360
[Tue May  1 06:31:52 2018]  ? bdget+0x120/0x120
[Tue May  1 06:31:52 2018]  blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:31:52 2018]  ? blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:31:52 2018]  ? current_time+0x32/0x70
[Tue May  1 06:31:52 2018]  ? __atime_needs_update+0x7f/0x190
[Tue May  1 06:31:52 2018]  generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:31:52 2018]  ? __blkdev_direct_IO_simple+0x360/0x360
[Tue May  1 06:31:52 2018]  ? generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:31:52 2018]  ? __wake_up+0x13/0x20
[Tue May  1 06:31:52 2018]  ? tty_ldisc_deref+0x16/0x20
[Tue May  1 06:31:52 2018]  ? tty_write+0x1fb/0x320
[Tue May  1 06:31:52 2018]  blkdev_read_iter+0x35/0x40
[Tue May  1 06:31:52 2018]  __vfs_read+0xfb/0x170
[Tue May  1 06:31:52 2018]  vfs_read+0x8e/0x130
[Tue May  1 06:31:52 2018]  

[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-26 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #20 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
So here are all my test.
With the agent enable, using hp check command disk (hpacucli/ssacli and
hpssacli) and launching a sg_reset, the reset has no problem on the problematic
disk:

Apr 26 14:31:20 kernel: hpsa :08:00.0: scsi 0:1:0:0: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 26 14:31:21 kernel: hpsa :08:00.0: device is ready.
Apr 26 14:31:21 kernel: hpsa :08:00.0: scsi 0:1:0:0: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1

The reset only took 1 second.

The "bug" seems to appear only when the disk returns errors concerning
Unrecovered read error (when using badblocks read-only test by example).

I try to reproduce it.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #19 from lober...@redhat.com ---
I was concerned about the agents but Anthony disabled them and still saw this.I
have seen this timeout sometimes when the agents probe via passthrough.

I did just bump into this reset on a 7.5 RHEL kernel with no agents but it
recovered almost immediately.
I need to chase that down

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #18 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
Unfortunatly I don't know what process was consuming the CPU cycles at this.
I'll try to reproduce the problem to reproduce the problem to have the
information.

I'm not using sg_reset to test the lv reset, actually I am launching a
badblocks command on a problematic disk and the reset is invoked when it begins
to fails.

I'll use sg_reset to reproduce the problem and test with/without the agent.
I invoke the agent every 5 minutes to check the controller and disks states.

I keep you inform on my test.

By the way, I thank you for your help.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #17 from Don (don.br...@microsemi.com) ---
(In reply to Anthony Hausman from comment #16)
> Don,
> 
> So I'm actually running the kernel 4.16.3 (build 18-04-19) with the hpsa
> modules patch to use local work-queue insead of system work-queue.
> 
> I have a reproduce a reset with no stack trace (which is a good news).
> The only thing is between the resetting logical and the completation, 2
> hours passed and caused an heavy load on the server during this time:
> 
> Apr 25 01:31:09 kernel: hpsa :08:00.0: scsi 0:1:0:0: resetting logical 
> Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
> Apr 25 03:31:00 kernel: hpsa :08:00.0: device is ready.
> Apr 25 03:31:00 kernel: hpsa :08:00.0: scsi 0:1:0:0: reset logical 
> completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
> SSDSmartPathCap- En- Exp=1
> 
> The good thing after the reset has completed, this one is removed:
> 
> Apr 25 03:31:45 kernel: hpsa :08:00.0: scsi 0:1:0:0: removed
> Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1

The driver was notified by the P420i that the volume went offline, so the
driver removed it from SML.

> Apr 25 03:31:48 kernel: scsi 0:1:0:0: rejecting I/O to dead device

There were I/O requests for the device, but the SML detected that it was
deleted.

> 
> So the question is if it's normal than the reset logical take such a long
> time (and causing trouble on the server)?

It is not normal.

For a Logical Volume reset, the P420i flushes out any outstanding I/O requests
then returns. The SML should block any new requests from coming down while the
reset is in progress.

Do you know what process was consuming the CPU cycles?
ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort
-nk1 | head -20

Are your using sg_reset to test LV resets? Or, does the device have some
intermittent issues which is causing the SML to issue the reset operation?

If you turn off the agents, do the resets complete more quickly?

I am wondering if the agents are frequently probing the P420i for changes when
the reset is active and the agents are consuming the CPU cycles.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #16 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
Don,

So I'm actually running the kernel 4.16.3 (build 18-04-19) with the hpsa
modules patch to use local work-queue insead of system work-queue.

I have a reproduce a reset with no stack trace (which is a good news).
The only thing is between the resetting logical and the completation, 2 hours
passed and caused an heavy load on the server during this time:

Apr 25 01:31:09 kernel: hpsa :08:00.0: scsi 0:1:0:0: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 25 03:31:00 kernel: hpsa :08:00.0: device is ready.
Apr 25 03:31:00 kernel: hpsa :08:00.0: scsi 0:1:0:0: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1

The good thing after the reset has completed, this one is removed:

Apr 25 03:31:45 kernel: hpsa :08:00.0: scsi 0:1:0:0: removed Direct-Access 
   HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 25 03:31:48 kernel: scsi 0:1:0:0: rejecting I/O to dead device

So the question is if it's normal than the reset logical take such a long time
(and causing trouble on the server)?

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #15 from Don (don.br...@microsemi.com) ---
(In reply to Anthony Hausman from comment #11)
> The only patch that I'm sure that I have is the "scsi: hpsa: fix selection
> of reply queue" one.
> For the I'm using an out of the box 4.11 kernel. So I'm really not sure that
> the other patches are present.
> 
> 
> Unfortunately, the module does not compile using 4.11.0-14-generic headers.
> 
> # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd)
> --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt"
> make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic'
> make -C /lib/modules/4.4.0-96-generic/build
> M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules
> make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic'
> make[2]: *** No rule to make target 'kernel/bounds.c', needed by
> 'kernel/bounds.s'.  Stop.
> Makefile:1423: recipe for target
> '_module_/usr/src/linux-headers-4.11.0-14-generic' failed
> make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2
> make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic'
> /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for
> target 'default' failed
> make: *** [default] Error 2
> make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic'
> 
> But if you tell me the principal problem is using the 4.11 kernel, I can
> upgrade it to use the 4.16.3 kernel.
> 
> If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your
> precedent patch on the last 3.4.20-125?


(In reply to Anthony Hausman from comment #11)
> The only patch that I'm sure that I have is the "scsi: hpsa: fix selection
> of reply queue" one.
> For the I'm using an out of the box 4.11 kernel. So I'm really not sure that
> the other patches are present.
> 
> 
> Unfortunately, the module does not compile using 4.11.0-14-generic headers.
> 
> # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd)
> --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt"
> make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic'
> make -C /lib/modules/4.4.0-96-generic/build
> M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules
> make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic'
> make[2]: *** No rule to make target 'kernel/bounds.c', needed by
> 'kernel/bounds.s'.  Stop.
> Makefile:1423: recipe for target
> '_module_/usr/src/linux-headers-4.11.0-14-generic' failed
> make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2
> make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic'
> /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for
> target 'default' failed
> make: *** [default] Error 2
> make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic'
> 
> But if you tell me the principal problem is using the 4.11 kernel, I can
> upgrade it to use the 4.16.3 kernel.
> 
> If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your
> precedent patch on the last 3.4.20-125?

The 4.16.3 driver should be OK to use.

You could not untar the sources I gave you in /tmp and build with make -f
Makefile.alt?

If you copy the source code into the kernel tree, you should be able to do
make modules SUBDIRS=drivers/scsi hpsa.ko

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-23 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #14 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
Indeed I have a charged battery (capacitor) and the writeback-cache enabled.
I run the hp-health component too, I have already try to disable it on the 4.11
kernel and have reproduced the problem of load without it.

The cma related call trace up after the logical drive reset is called.

Right now, I test on a server the kernel 4.16.3-041603-generic with the hpsa
module with the patch to use local work-queue insead of system work-queue.

Right now I didn't reproduce the problem.
I had a disk with bad blocks (before launching a read-only test badblocks
returned a lot of block error) but since I have upgraded the kernel with the
patch hpsa module I have no more error.

I'm still trying to reproduce the problem by launching a badblocks read-only
test on the "ex-faulty" disk.

I'll tell you the result of the test.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #13 from lober...@redhat.com ---
Apr 18 01:29:16 kernel: cmaidad D0  3442  1 0x
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel:  __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel:  schedule+0x36/0x80
Apr 18 01:29:16 kernel:  scsi_block_when_processing_errors+0xd5/0x110
Apr 18 01:29:16 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel:  sg_open+0x14a/0x5c0

 * Likely a pass though from the cma* management
daemons

Can you try reproduce with all the HP Health daemons disabled

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

lober...@redhat.com changed:

   What|Removed |Added

 CC||lober...@redhat.com

--- Comment #12 from lober...@redhat.com ---
We had a bunch of issues with the HPSA as already mentioned above.
The specific issue that we had to revert was this commit
8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef


I assume your array has a charged battery (capacitor) and the writeback-cache
is enabled on the 420i

Are you only seeing this wen you have cmaeventd running, because hat can use
pass through commands and has been known to cause issues.
I am not running any of the HPE Proliant SPP daemons on my system.

I have not seen this load related issue (without those daemons running) that
you are seeing on my DL380G7 or Dl380G8 here so I will work on trying to
reproduce and assist.

Thanks
Laurence

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #11 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
The only patch that I'm sure that I have is the "scsi: hpsa: fix selection of
reply queue" one.
For the I'm using an out of the box 4.11 kernel. So I'm really not sure that
the other patches are present.


Unfortunately, the module does not compile using 4.11.0-14-generic headers.

# make -C /lib/modules/4.11.0-14-generic/build M=$(pwd)
--makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt"
make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic'
make -C /lib/modules/4.4.0-96-generic/build
M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules
make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic'
make[2]: *** No rule to make target 'kernel/bounds.c', needed by
'kernel/bounds.s'.  Stop.
Makefile:1423: recipe for target
'_module_/usr/src/linux-headers-4.11.0-14-generic' failed
make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic'
/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for
target 'default' failed
make: *** [default] Error 2
make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic'

But if you tell me the principal problem is using the 4.11 kernel, I can
upgrade it to use the 4.16.3 kernel.

If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your
precedent patch on the last 3.4.20-125?

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-21 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #10 from Don (don.br...@microsemi.com) ---
Created attachment 275473
  --> https://bugzilla.kernel.org/attachment.cgi?id=275473=edit
Latest out of box hpsa driver.

This tar file contains our latest out-of-box driver.

1. tar xf hpsa-3.4.20-136.tar.bz2
2. cd hpsa-3.4.20/drivers/scsi
3. make -f Makefile.alt

If you are booted from hpsa, you will need to update your initrd and reboot.

If you are using hpsa for non-boot drives, your can
1. rmmod hpsa
2. insmod ./hpsa.ko

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-21 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #9 from Don (don.br...@microsemi.com) ---
When you applied the 4.16 hpsa driver patches, was this patch also applied?

commit 84676c1f21e8ff54befe985f4f14dc1edc10046b
Author: Christoph Hellwig 
Date:   Fri Jan 12 10:53:05 2018 +0800

genirq/affinity: assign vectors to all possible CPUs

Currently we assign managed interrupt vectors to all present CPUs.  This
works fine for systems were we only online/offline CPUs.  But in case of
systems that support physical CPU hotplug (or the virtualized version of
it) this means the additional CPUs covered for in the ACPI tables or on
the command line are not catered for.  To fix this we'd either need to
introduce new hotplug CPU states just for this case, or we can start
assining vectors to possible but not present CPUs.

Reported-by: Christian Borntraeger 
Tested-by: Christian Borntraeger 
Tested-by: Stefan Haberland 
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
Cc: linux-ker...@vger.kernel.org
Cc: Thomas Gleixner 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 


The above patch is why the hpsa-fix-selection-of-reply-queue patch was needed.
If not, I would redact that patch because it may be causing your issues.


There was another patch required for the hpsa-fix-selection-of-reply-queue
patch:
scsi-introduce-force-blk-mq.


The errors shown in your logs indicate issues with DMA transfers of your data.
Unaligned partial completion errors are usually issues with the scatter/gather
buffers that represent your data buffers. I would like to eliminate using
the 4.16 hpsa driver in a 4.11 kernel.

Can you try our out-of-box driver?

I'll attach this to the BZ. You compile it with make -f Makefile.alt
The name is hpsa-3.4.20-136.tar.bz2





commit 8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef
Author: Ming Lei 
Date:   Tue Mar 13 17:42:39 2018 +0800

scsi: hpsa: fix selection of reply queue

Since commit 84676c1f21e8 ("genirq/affinity: assign vectors to all
possible CPUs") we could end up with an MSI-X vector that did not have
any online CPUs mapped. This would lead to I/O hangs since there was no
CPU to receive the completion.

Retrieve IRQ affinity information using pci_irq_get_affinity() and use
this mapping to choose a reply queue.

[mkp: tweaked commit desc]

Cc: Hannes Reinecke 
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Laurence Oberman 
Cc: Meelis Roos 
Cc: Artem Bityutskiy 
Cc: Mike Snitzer 
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible
CPUs")
Signed-off-by: Ming Lei 
Tested-by: Laurence Oberman 
Tested-by: Don Brace 
Tested-by: Artem Bityutskiy 
Acked-by: Don Brace 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 



I believe this patch is also required.

commit cf2a0ce8d1c25c8cc4509874d270be8fc6026cc3
Author: Ming Lei 
Date:   Tue Mar 13 17:42:41 2018 +0800

scsi: introduce force_blk_mq

From scsi driver view, it is a bit troublesome to support both blk-mq
and non-blk-mq at the same time, especially when drivers need to support
multi hw-queue.

This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
can provide blk-mq only support, so driver code can avoid the trouble
for supporting both.

Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Mike Snitzer 
Cc: Laurence Oberman 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-21 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #8 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
So I have reproduced the problem with the patch driver.
At the beginning, one disk return a lot of blk_update_request: critical medium
error/Unrecovered read error and after the driver trigger a reset logical on
all disk.

At the first trigger, all reset completed successfully but the third reset on
the problematic error disk the system hang out and the reset never complete.

The load on the server is less important at that time but application seems to
stuck their IO still.
And the faulty disk is still considered healthy via the hp utilitues (ssacli).

Here is the stack trace:

[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] Unaligned partial completion
(resid=32, sector_sz=512)
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 Sense Key : Medium Error
[current] 
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 Add. Sense: Unrecovered
read error
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 CDB: Read(16) 88 00 00 00
00 02 36 46 b5 a8 00 00 04 00 00 00
[Fri Apr 20 20:56:58 2018] blk_update_request: critical medium error, dev sdp,
sector 9500538280
[Fri Apr 20 20:57:30 2018] hpsa :08:00.0: scsi 0:1:0:15: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 20:59:06 2018] hpsa :08:00.0: device is ready.
[Fri Apr 20 20:59:06 2018] hpsa :08:00.0: scsi 0:1:0:15: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] Unaligned partial completion
(resid=198, sector_sz=512)
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 Sense Key : Medium Error
[current] 
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 Add. Sense: Unrecovered
read error
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 CDB: Read(16) 88 00 00 00
00 02 36 46 b9 a8 00 00 04 00 00 00
[Fri Apr 20 21:00:05 2018] blk_update_request: critical medium error, dev sdp,
sector 9500539304
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] Unaligned partial completion
(resid=48, sector_sz=512)
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 Sense Key : Medium Error
[current] 
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 Add. Sense: Unrecovered
read error
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 CDB: Read(16) 88 00 00 00
00 02 36 46 a9 a8 00 00 04 00 00 00
[Fri Apr 20 21:00:56 2018] blk_update_request: critical medium error, dev sdp,
sector 9500535208
[Fri Apr 20 21:09:59 2018] hpsa :08:00.0: scsi 0:1:0:15: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 21:48:43 2018] hpsa :08:00.0: device is ready.
[Fri Apr 20 21:48:43 2018] hpsa :08:00.0: scsi 0:1:0:15: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1
[Fri Apr 20 21:51:44 2018] hpsa :08:00.0: scsi 0:1:0:0: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:05 2018] hpsa :08:00.0: device is ready.
[Fri Apr 20 22:14:05 2018] hpsa :08:00.0: scsi 0:1:0:0: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:05 2018] hpsa :08:00.0: scsi 0:1:0:1: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:06 2018] hpsa :08:00.0: device is ready.
[Fri Apr 20 22:14:06 2018] hpsa :08:00.0: scsi 0:1:0:1: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:06 2018] hpsa :08:00.0: scsi 0:1:0:2: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:07 2018] hpsa :08:00.0: device is ready.
[Fri Apr 20 22:14:07 2018] hpsa :08:00.0: scsi 0:1:0:2: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:07 2018] hpsa :08:00.0: scsi 0:1:0:3: resetting logical 
Direct-Access HP   LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:08 2018] hpsa :08:00.0: device is ready.
[Fri Apr 20 22:14:08 2018] hpsa :08:00.0: scsi 0:1:0:3: reset logical 
completed successfully Direct-Access HP   LOGICAL VOLUME   RAID-0
SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:08 2018] hpsa :08:00.0: scsi 0:1:0:4: resetting logical 
Direct-Access HP   LOGICAL VOLUME   

[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #7 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
I had a similar stack trace:

Apr 20 14:57:18 kernel: INFO: task jbd2/sdt-8:10890 blocked for more than 120
seconds.
Apr 20 14:57:18 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 20 14:57:18 kernel: jbd2/sdt-8  D0 10890  2 0x
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  jbd2_journal_commit_transaction+0x241/0x1830
Apr 20 14:57:18 kernel:  ? update_load_avg+0x84/0x560
Apr 20 14:57:18 kernel:  ? update_load_avg+0x84/0x560
Apr 20 14:57:18 kernel:  ? dequeue_entity+0xed/0x4c0
Apr 20 14:57:18 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 20 14:57:18 kernel:  ? lock_timer_base+0x7d/0xa0
Apr 20 14:57:18 kernel:  kjournald2+0xca/0x250
Apr 20 14:57:18 kernel:  ? kjournald2+0xca/0x250
Apr 20 14:57:18 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 20 14:57:18 kernel:  kthread+0x109/0x140
Apr 20 14:57:18 kernel:  ? commit_timeout+0x10/0x10
Apr 20 14:57:18 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 20 14:57:18 kernel:  ret_from_fork+0x25/0x30
Apr 20 14:57:18 kernel: INFO: task task:13497 blocked for more than 120
seconds.
Apr 20 14:57:18 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 20 14:57:18 kernel: taskD0 13497  13196 0x
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  rwsem_down_write_failed+0x237/0x3b0
Apr 20 14:57:18 kernel:  ? copy_page_to_iter_iovec+0x97/0x170
Apr 20 14:57:18 kernel:  call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  ? call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  down_write+0x2d/0x40
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x70/0x3c0
Apr 20 14:57:18 kernel:  ? futex_wake+0x90/0x170
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:7fa0801acc90 EFLAGS: 0293 ORIG_RAX:
0012
Apr 20 14:57:18 kernel: RAX: ffda RBX: 7fa0480009d0 RCX:
7fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0200 RSI: 7fa004000b30 RDI:
000f
Apr 20 14:57:18 kernel: RBP: 7fa0801ad060 R08: 7fa0801acd2c R09:
0001
Apr 20 14:57:18 kernel: R10: 0001f86be000 R11: 0293 R12:
7fa0040014c0
Apr 20 14:57:18 kernel: R13: 7fa004000d80 R14: 002e R15:
7fa0480009d0
Apr 20 14:57:18 kernel: INFO: task task:13499 blocked for more than 120
seconds.
Apr 20 14:57:18 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 20 14:57:18 kernel: taskD0 13499  13196 0x
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  rwsem_down_write_failed+0x237/0x3b0
Apr 20 14:57:18 kernel:  ? copy_page_to_iter_iovec+0x97/0x170
Apr 20 14:57:18 kernel:  call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  ? call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  down_write+0x2d/0x40
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x70/0x3c0
Apr 20 14:57:18 kernel:  ? futex_wake+0x90/0x170
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:7fa07f9abc90 EFLAGS: 0293 ORIG_RAX:
0012
Apr 20 14:57:18 kernel: RAX: ffda RBX: 7f9fac008d00 RCX:
7fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0200 RSI: 7fa0080013b0 RDI:
000f
Apr 20 14:57:18 kernel: RBP: 7fa07f9ac060 R08: 7fa07f9abd2c R09:
0001
Apr 20 14:57:18 kernel: R10: 000219541000 R11: 0293 R12:
7fa008001140
Apr 20 14:57:18 kernel: R13: 7fa0080008c0 R14: 002e R15:
7f9fac008d00
Apr 20 14:57:18 kernel: INFO: task task:13510 blocked for more than 120
seconds.
Apr 20 14:57:18 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > 

[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #6 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
I have a stack trave about the Workqueue:

Apr 19 11:22:52 kernel: INFO: task kworker/u129:28:428 blocked for more than
120 seconds.
Apr 19 11:22:52 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 19 11:22:52 kernel: kworker/u129:28 D0   428  2 0x
Apr 19 11:22:52 kernel: Workqueue: writeback wb_workfn (flush-67:80)
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? find_get_pages_tag+0x19f/0x2b0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_writepages+0x4e6/0xe20
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_writepages+0x4e6/0xe20
Apr 19 11:22:52 kernel:  ? generic_writepages+0x67/0x90
Apr 19 11:22:52 kernel:  ? sd_init_command+0x30/0xb0
Apr 19 11:22:52 kernel:  do_writepages+0x1e/0x30
Apr 19 11:22:52 kernel:  ? do_writepages+0x1e/0x30
Apr 19 11:22:52 kernel:  __writeback_single_inode+0x45/0x330
Apr 19 11:22:52 kernel:  writeback_sb_inodes+0x26a/0x5f0
Apr 19 11:22:52 kernel:  __writeback_inodes_wb+0x92/0xc0
Apr 19 11:22:52 kernel:  wb_writeback+0x26e/0x320
Apr 19 11:22:52 kernel:  wb_workfn+0x2cf/0x3a0
Apr 19 11:22:52 kernel:  ? wb_workfn+0x2cf/0x3a0
Apr 19 11:22:52 kernel:  process_one_work+0x16b/0x4a0
Apr 19 11:22:52 kernel:  worker_thread+0x4b/0x500
Apr 19 11:22:52 kernel:  kthread+0x109/0x140
Apr 19 11:22:52 kernel:  ? process_one_work+0x4a0/0x4a0
Apr 19 11:22:52 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 19 11:22:52 kernel:  ret_from_fork+0x25/0x30
Apr 19 11:22:52 kernel: INFO: task jbd2/sdbb-8:10556 blocked for more than 120
seconds.
Apr 19 11:22:52 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 19 11:22:52 kernel: jbd2/sdbb-8 D0 10556  2 0x
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  ? update_cfs_rq_load_avg.constprop.91+0x227/0x4e0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  jbd2_journal_commit_transaction+0x241/0x1830
Apr 19 11:22:52 kernel:  ? update_load_avg+0x84/0x560
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  ? lock_timer_base+0x7d/0xa0
Apr 19 11:22:52 kernel:  kjournald2+0xca/0x250
Apr 19 11:22:52 kernel:  ? kjournald2+0xca/0x250
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  kthread+0x109/0x140
Apr 19 11:22:52 kernel:  ? commit_timeout+0x10/0x10
Apr 19 11:22:52 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 19 11:22:52 kernel:  ret_from_fork+0x25/0x30
Apr 19 11:22:52 kernel: INFO: task task:14138 blocked for more than 120
seconds.
Apr 19 11:22:52 kernel:   Tainted: G   OE   4.11.0-14-generic
#20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Apr 19 11:22:52 kernel: taskD0 14138  14058 0x
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x9e/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 

[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-19 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #5 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
Don,

I have applied the patch, it actually run and I try to reproduce the problem.
I'll inform you about the diagnose.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #4 from Don (don.br...@microsemi.com) ---
Your stack trace does not show and hpsa driver components, but I do see the
reset issued but not completing.

I'm hoping that the attached patch helps diagnose the issue a little better.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #3 from Don (don.br...@microsemi.com) ---
Created attachment 275437
  --> https://bugzilla.kernel.org/attachment.cgi?id=275437=edit
Patch to use local work-queue insead of system work-queue

If the driver initiates a re-scan from a system work-queue, the kernel can
hang.

This patch has not been submitted to linux-scsi, I will be sending this patch
out soon.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

--- Comment #2 from Anthony Hausman (anthonyhaussm...@gmail.com) ---
I don't have any "Controller lockup detected" message in the Syslog
unfortunately.
On the ilo IML log, the last message was about the cache module:

CAUTION: POST Messages - POST Error: 1792-Slot X Drive Array - Valid Data Found
in Cache Module. Data will automatically be written to drive array..

I have nothing about lockup entries.

Indeed, we use the driver from the last kernel and compiled it for 4.11.
I am ready to test the patch you are proposing. 
Where can I retrieve it?

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete

2018-04-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199435

Don (don.br...@microsemi.com) changed:

   What|Removed |Added

 CC||don.br...@microsemi.com

--- Comment #1 from Don (don.br...@microsemi.com) ---
Do you see any lockup messages in the console logs?
"Controller lockup detected"...


The driver you used is from 4.16 kernel on a 4.11 kernel? I have not tested
this configuration.

I notice that the driver is still using the kernel work-queue for monitoring. I
will be sending up a patch to change this to local work-queues soon. Perhaps
you can test this patch? It may help to discover more information on what is
happening.

Also, after you rebooted, were there any lockup entries in the ilo IML log?

-- 
You are receiving this mail because:
You are the assignee for the bug.