** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New
** Also affects: linux (Ubuntu Yakkety) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Xenial) Status: New => Fix Committed ** Changed in: linux (Ubuntu Yakkety) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1652018 Title: PowerNV: PCI Slot is invalid after fencedPHB Error injection Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: Fix Committed Status in linux source package in Yakkety: Fix Committed Bug description: == Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2016-12-21 01:16:41 == ---Problem Description--- PCI Slot is in invalid state after fencedPHB Error injection Test. Contact Information = ppaid...@in.ibm.com ---uname output--- Linux brigstrat1p1 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:46:13 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux Machine Type = PowerNV CSE-829U ---Debugger--- A debugger is not configured ---Steps to Reproduce--- 1. Boot the system to runtime. 2. Inject fencedPHB Error. echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0002/err_injct_outbound dmesg: [42725.641368] EEH: PHB#2 failure detected, location: N/A [42725.641450] CPU: 8 PID: 898 Comm: kworker/u320:1 Not tainted 4.4.0-57-generic #78-Ubuntu [42725.641461] Workqueue: i40e i40e_service_task [i40e] [42725.641464] Call Trace: [42725.641469] [c00000000407f9e0] [c000000000b13b4c] dump_stack+0xb0/0xf0 (unreliable) [42725.641474] [c00000000407fa20] [c0000000000376e0] eeh_dev_check_failure+0x200/0x580 [42725.641477] [c00000000407fac0] [c000000000037ae4] eeh_check_failure+0x84/0xd0 [42725.641485] [c00000000407fb00] [d000000035845710] i40e_service_task+0x17b0/0x1a30 [i40e] [42725.641489] [c00000000407fc50] [c0000000000dde10] process_one_work+0x1e0/0x5a0 [42725.641492] [c00000000407fce0] [c0000000000de364] worker_thread+0x194/0x680 [42725.641496] [c00000000407fd80] [c0000000000e6e60] kthread+0x110/0x130 [42725.641499] [c00000000407fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 [42725.641509] EEH: Detected error on PHB#2 [42725.641514] EEH: This PCI device has failed 1 times in the last hour [42725.641516] EEH: Notify device drivers to shutdown [42725.641523] i40e 0002:01:00.0: i40e_pci_error_detected: error 2 [42725.641907] i40e 0002:01:00.0: VSI seid 396 Tx ring 0 disable timeout [42725.642144] i40e 0002:01:00.0: VSI seid 396 Rx ring 0 disable timeout [42725.666205] i40e 0002:01:00.1: i40e_pci_error_detected: error 2 [42725.666499] i40e 0002:01:00.2: i40e_pci_error_detected: error 2 [42725.666533] i40e 0002:01:00.0: ARQ event error -32 [42725.666601] i40e 0002:01:00.3: i40e_pci_error_detected: error 2 [42725.666700] EEH: Collect temporary log [42725.666702] PHB3 PHB#2 Diag-data (Version: 1) [42725.666703] brdgCtl: 0000ffff [42725.666704] UtlSts: 00100000 00000000 00000000 [42725.666706] RootSts: ffffffff ffffffff ffffffff ffffffff 0000ffff [42725.666707] RootErrSts: ffffffff ffffffff ffffffff [42725.666708] RootErrLog: ffffffff ffffffff ffffffff ffffffff [42725.666709] RootErrLog1: ffffffff 0000000000000000 0000000000000000 [42725.666711] nFir: 0000808000000000 0030006e00000000 0000800000000000 [42725.666712] PhbSts: 0000001800000000 0000001800000000 [42725.666713] Lem: 8000020000800000 42498e367f502eae 8000000000000000 [42725.666715] OutErr: 8000002000000000 8000000000000000 120800600003fffe 402002a800000000 [42725.666716] InBErr: 0000000040000000 0000000040000000 0000080000000000 000c10c010010000 [42725.666718] EEH: Reset without hotplug activity [42730.052455] EEH: Notify device drivers the completion of reset [42730.053334] EEH: Notify device driver to resume [42730.184457] i40e 0002:01:00.0 enP2p1s0f0: NIC Link is Down [42731.568230] i40e 0002:01:00.0 enP2p1s0f0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None OPAL LOG: [42990.475630456,7] PHB#0002: CRESET: Starts [42990.482717333,7] PHB#0002: CRESET: No pending transactions [42991.023963215,7] PHB#0002: CRESET: Reinitialization [42991.023964143,7] PHB#0002: Initializing PHB... [42991.075167078,7] PHB#0002: Core revision 0xa30005 [42991.075171529,7] PHB#0002: Default system config: 0x421100fc30000000 [42991.075172655,7] PHB#0002: New system config : 0x421000fc30000000 [42991.075174000,7] PHB#0002: PHB_RESET is 0x2000000000000000 [42991.075410938,7] PHB#0002: Waiting for DLP PG reset to complete... [42991.083713914,7] PHB#0002: Initialization complete [42991.136599535,7] PHB#0002: FRESET: Starts [42991.136600954,7] PHB#0002: FRESET: Prepare for link down [42991.136602933,7] PHB#0002: FRESET: Assert [42992.138625290,7] PHB#0002: FRESET: Deassert [42993.140657592,7] PHB#0002: LINK: Start polling [42993.193893558,7] PHB#0002: LINK: Electrical link detected [42993.247138072,7] PHB#0002: LINK: Link is up [42993.247174237,3] PCI-SLOT-0000000000000002 Invalid state 00000000 == Comment: #2 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-12-22 04:57:28 == $ git log fbce44d0ed42e465317 -1 commit fbce44d0ed42e4653172376f4dfeaa5710f06a27 Author: Gavin Shan <gws...@linux.vnet.ibm.com> Date: Fri Jun 24 16:44:19 2016 +1000 powerpc/powernv: Call opal_pci_poll() if needed When issuing PHB reset, OPAL API opal_pci_poll() is called to drive the state machine in OPAL forward. However, we needn't always call the function under some circumstances like reset deassert. This avoids calling opal_pci_poll() when OPAL_SUCCESS is returned from opal_pci_reset(). Except the overhead introduced by additional one unnecessary OPAL call, I didn't run into real issue because of this. Reported-by: Pridhiviraj Paidipeddi <ppaidd...@in.ibm.com> Signed-off-by: Gavin Shan <gws...@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <m...@ellerman.id.au> $ git tag --contains fbce44d0e v4.9 v4.9-rc1 v4.9-rc2 v4.9-rc3 v4.9-rc4 v4.9-rc5 v4.9-rc6 v4.9-rc7 v4.9-rc8 $ This issue is fixed by commit # fbce44d0ed4, available in kernel version 4.9. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652018/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp