Hello, I have a system using dpdk 1.8 with 82599ES ixgbe NICs. These are provided to a virtual guest via pci passthrough. Our dpdk application on the guest takes control of the NICs using igb_uio.
On certain systems, under conditions we have not yet figured out, sending traffic causes the host to kernel panic. It looks like a pci device is reporting a fatal error. >From the error, the issue looks to be either the bridge connected to the ixgbe, or the ixgbe itself; I cannot decipher the message beyond that. This has happened on three different machines, so I do not think it is bad hardware. I was wondering if anybody has run into this before, and if they have any solutions. I tried searching the mailing list, but couldn't find anything related. 3108395.524535] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 [3108395.533959] {1}[Hardware Error]: APEI generic hardware error status [3108395.541149] {1}[Hardware Error]: severity: 1, fatal [3108395.546785] {1}[Hardware Error]: section: 0, severity: 1, fatal [3108395.553586] {1}[Hardware Error]: flags: 0x01 [3108395.558543] {1}[Hardware Error]: primary [3108395.563113] {1}[Hardware Error]: section_type: PCIe error [3108395.569332] {1}[Hardware Error]: port_type: 6, downstream switch port [3108395.576715] {1}[Hardware Error]: version: 1.16 [3108395.581866] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [3108395.588763] {1}[Hardware Error]: device_id: 0000:05:01.0 [3108395.594886] {1}[Hardware Error]: slot: 0 [3108395.599455] {1}[Hardware Error]: secondary_bus: 0x06 [3108395.605189] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724 [3108395.612572] {1}[Hardware Error]: class_code: 000406 [3108395.618208] {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0003 [3108395.626853] {1}[Hardware Error]: section: 1, severity: 1, fatal [3108395.633653] {1}[Hardware Error]: flags: 0x01 [3108395.638611] {1}[Hardware Error]: primary [3108395.643179] {1}[Hardware Error]: section_type: PCIe error [3108395.649396] {1}[Hardware Error]: port_type: 6, downstream switch port [3108395.656778] {1}[Hardware Error]: version: 1.16 [3108395.661930] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [3108395.668829] {1}[Hardware Error]: device_id: 0000:05:09.0 [3108395.674951] {1}[Hardware Error]: slot: 0 [3108395.679521] {1}[Hardware Error]: secondary_bus: 0x09 [3108395.685254] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724 [3108395.692636] {1}[Hardware Error]: class_code: 000406 [3108395.698272] {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0003 [3108395.706915] Kernel panic - not syncing: Fatal hardware error! 0000:05:01.0 is a PLX pci bridge. It has two ixgbe NICs connected to it. Likewise with 0000:05:09.0. Here is the boot cmdline on the host (we're using iommu): BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=57d79ff0-1152-46fb-a619-b2a102de3d5f ro console=ttyS0,115200n8 vconsole.font=latarcyrheb-sun16 crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol0 vconsole.keymap=us LANG=en_US.UTF-8 intel_iommu=on Any help would be greatly appreciated. Thanks, Kyle