--- Comment From cls...@us.ibm.com 2015-03-11 19:15 EDT---
This looks fixed with 3.19.0-8-generic #8-Ubuntu
it was able to recover from EEH.
[ 2694.622586] EEH: Notify device drivers to shutdown
[ 2694.622587] mlx4_core 0004:01:00.0: device was reset successfully
[ 2694.622589] mlx4_core 0004:01:00.0: mlx4_pci_err_detected was called
[ 2694.622594] mlx4_en 0004:01:00.0: Internal error detected, restarting device
[ 2694.622786] mlx4_en: eth14: Close port called
[ 2694.846830] mlx4_en 0004:01:00.0: removed PHC
[ 2694.874036] EEH: Collect temporary log
[ 2694.879101] EEH: of node=/pciex@3fffe4200/pci@0/ethernet@0
[ 2694.879465] EEH: PCI device/vendor: 100715b3
[ 2694.879478] EEH: PCI cmd/status register: 00100142
[ 2694.879479] EEH: PCI-E capabilities and status follow:
[ 2694.879544] EEH: PCI-E 00: 00020010 10008e02 0020204e 0843f483
[ 2694.879597] EEH: PCI-E 10: 10830040
[ 2694.879598] EEH: PCI-E 20:
[ 2694.879599] EEH: PCI-E AER capability register set follows:
[ 2694.879666] EEH: PCI-E AER 00: 18c20001 00062010
[ 2694.879719] EEH: PCI-E AER 10: 2000 01e0
[ 2694.879772] EEH: PCI-E AER 20:
[ 2694.879785] EEH: PCI-E AER 30:
[ 2694.879787] PHB3 PHB#4 Diag-data (Version: 1)
[ 2694.879789] brdgCtl: 0002
[ 2694.879790] UtlSts: 0020
[ 2694.879791] RootSts: 0040 0040 f0830048 00100147
[ 2694.879792] PhbSts: 001c 001c
[ 2694.879793] Lem: 0010 42498e327f502eae
[ 2694.879795] InAErr: 8000 8000 04020080
[ 2694.879796] PE[ 1] A/B: 8480002b 8000
[ 2694.879797] PE[ 2] A/B: 8000 8000
[ 2694.879798] PE[ 3] A/B: 8000 8000
[ 2694.879799] PE[ 4] A/B: 8000 8000
[ 2694.879800] PE[ 5] A/B: 8000 8000
[ 2694.879801] EEH: Reset without hotplug activity
[ 2698.898176] EEH: Notify device drivers the completion of reset
[ 2698.898181] mlx4_core 0004:01:00.0: mlx4_pci_slot_reset was called
[ 2698.898218] mlx4_core 0004:01:00.0: enabling device (0140 -> 0142)
[ 2705.396286] mlx4_core 0004:01:00.0: PCIe link speed is 8.0GT/s, device
supports 8.0GT/s
[ 2705.396288] mlx4_core 0004:01:00.0: PCIe link width is x8, device supports x8
[ 2706.143789] mlx4_en 0004:01:00.0: registered PHC clock
[ 2706.143864] mlx4_en 0004:01:00.0: Activating port:1
[ 2706.159496] mlx4_en: eth11: Using 256 TX rings
[ 2706.159504] mlx4_en: eth11: Using 8 RX rings
[ 2706.159506] mlx4_en: eth11: frag:0 - size:1518 prefix:0 stride:1536
[ 2706.159722] mlx4_en: eth11: Initializing port
[ 2706.160022] mlx4_en 0004:01:00.0: Activating port:2
[ 2706.165214] mlx4_core 0004:01:00.0 eth14: renamed from eth11
[ 2706.188419] mlx4_en: eth11: Using 256 TX rings
[ 2706.188427] mlx4_en: eth11: Using 8 RX rings
[ 2706.188430] mlx4_en: eth11: frag:0 - size:1518 prefix:0 stride:1536
[ 2706.188660] mlx4_en: eth11: Initializing port
[ 2706.197316] EEH: Notify device driver to resume
[ 2706.525987] mlx4_core 0004:01:00.0 eth16: renamed from eth11
[ 2707.487156] mlx4_en: eth14: Link Up
[ 2707.542052] mlx4_en: eth16: Link Up
thanks.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1422481
Title:
mlx4 not recovering from EEH in Ubuntu 15.04 (Mellanox)
Status in linux package in Ubuntu:
Fix Released
Bug description:
---Problem Description---
EEH is not working with mlx4 driver. When the driver recovered it hits
another EEH.
---uname output---
Linux ubuntu 3.18.0-12-generic #13 SMP Mon Feb 9 16:31:42 CST 2015 ppc64le
ppc64le ppc64le GNU/Linux
---Additional Hardware Info---
Need Mellanox adapter like Connect 3 adapter.
Machine Type = P8
---Steps to Reproduce---
Just inject EEH to mlx4 device.
Stack trace output:
from EEH recovery then it hits this:
[ 188.747571] EEH: Collect temporary log
[ 188.748330] EEH: of node=/pci@8002007/ethernet@3
[ 188.748339] EEH: PCI device/vendor: 100715b3
[ 188.748361] EEH: PCI cmd/status register: 00100146
[ 188.748362] EEH: PCI-E capabilities and status follow:
[ 188.748459] EEH: PCI-E 00: 00020010 10008e02 0001200e 0843f483
[ 188.748537] EEH: PCI-E 10: 1083
[ 188.748539] EEH: PCI-E 20:
[ 188.748540] EEH: PCI-E AER capability register set follows:
[ 188.748625] EEH: PCI-E AER 00: 00020001 00062010
[ 188.748704] EEH: PCI-E AER 10: 2000 2000 01e0
[ 188.748783] EEH: PCI-E AER 20:
[ 188.748805] EEH: PCI-E AER 30:
[ 188.748813] EEH: Reset without hotplug activity
[ 193.833245] EEH