Hello,

I have a system using dpdk 1.8 with 82599ES ixgbe NICs. These are
provided to a virtual guest via pci passthrough. Our dpdk application
on the guest takes control of the NICs using igb_uio.

On certain systems, under conditions we have not yet figured out,
sending traffic causes the host to kernel panic. It looks like a pci
device is reporting a fatal error.

>From the error, the issue looks to be either the bridge connected to
the ixgbe, or the ixgbe itself; I cannot decipher the message beyond
that.

This has happened on three different machines, so I do not think it is
bad hardware.

I was wondering if anybody has run into this before, and if they have
any solutions. I tried searching the mailing list, but couldn't find
anything related.


3108395.524535] {1}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 3
[3108395.533959] {1}[Hardware Error]: APEI generic hardware error status
[3108395.541149] {1}[Hardware Error]: severity: 1, fatal
[3108395.546785] {1}[Hardware Error]: section: 0, severity: 1, fatal
[3108395.553586] {1}[Hardware Error]: flags: 0x01
[3108395.558543] {1}[Hardware Error]: primary
[3108395.563113] {1}[Hardware Error]: section_type: PCIe error
[3108395.569332] {1}[Hardware Error]: port_type: 6, downstream switch port
[3108395.576715] {1}[Hardware Error]: version: 1.16
[3108395.581866] {1}[Hardware Error]: command: 0x0407, status: 0x0010
[3108395.588763] {1}[Hardware Error]: device_id: 0000:05:01.0
[3108395.594886] {1}[Hardware Error]: slot: 0
[3108395.599455] {1}[Hardware Error]: secondary_bus: 0x06
[3108395.605189] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724
[3108395.612572] {1}[Hardware Error]: class_code: 000406
[3108395.618208] {1}[Hardware Error]: bridge: secondary_status:
0x0000, control: 0x0003
[3108395.626853] {1}[Hardware Error]: section: 1, severity: 1, fatal
[3108395.633653] {1}[Hardware Error]: flags: 0x01
[3108395.638611] {1}[Hardware Error]: primary
[3108395.643179] {1}[Hardware Error]: section_type: PCIe error
[3108395.649396] {1}[Hardware Error]: port_type: 6, downstream switch port
[3108395.656778] {1}[Hardware Error]: version: 1.16
[3108395.661930] {1}[Hardware Error]: command: 0x0407, status: 0x0010
[3108395.668829] {1}[Hardware Error]: device_id: 0000:05:09.0
[3108395.674951] {1}[Hardware Error]: slot: 0
[3108395.679521] {1}[Hardware Error]: secondary_bus: 0x09
[3108395.685254] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724
[3108395.692636] {1}[Hardware Error]: class_code: 000406
[3108395.698272] {1}[Hardware Error]: bridge: secondary_status:
0x0000, control: 0x0003
[3108395.706915] Kernel panic - not syncing: Fatal hardware error!

0000:05:01.0 is a PLX pci bridge. It has two ixgbe NICs connected to
it. Likewise with 0000:05:09.0.

Here is the boot cmdline on the host (we're using iommu):

BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64
root=UUID=57d79ff0-1152-46fb-a619-b2a102de3d5f ro
console=ttyS0,115200n8 vconsole.font=latarcyrheb-sun16
crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol0
vconsole.keymap=us LANG=en_US.UTF-8 intel_iommu=on

Any help would be greatly appreciated.

Thanks,

Kyle

Reply via email to