--- Begin Message ---
Still happening with kernel
6.5.13-5-pve
Stefan
> On Apr 2, 2024, at 13:09, Stefan Radman <[email protected]> wrote:
>
> Workaround: No more kernel panics on reboot when pinning kernel 6.2.16-20-pve.
>
> Affected kernels:
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> The original issue [1] was solved long ago [2] but apparently re-introduced
> recently [3].
>
> Regression [4] being discussed on kernel.org
>
> Looks like a back and forth in the tg3 driver.
>
> Note that the kernel panic is only triggered by “reboot” and not by
> “shutdown”.
>
> Stefan
>
> root@per740:~# proxmox-boot-tool kernel list
> Manually selected kernels:
> None.
>
> Automatically selected kernels:
> 6.2.16-20-pve
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> Pinned kernel:
> 6.2.16-20-pve
> root@per740:~# pveversion
> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.2.16-20-pve)
>
> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>
> [2] tg3: Disable tg3 device on system reboot to avoid triggering AER
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>
> [3] tg3: power down device only on SYSTEM_POWER_OFF
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9fc3bc7643341dc5be7d269f3d3dbe441d8d7ac3
>
> [4] * [PATCH] tg3: add new module param to force device power down on reboot
> https://lore.kernel.org/lkml/[email protected]/T/
>
>
>> On Apr 2, 2024, at 09:37, Gilberto Ferreira <[email protected]>
>> wrote:
>>
>> Perhaps you should try another kernel besides 6.15 like 6.2 for instance.
>>
>> Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user
>> <[email protected] <mailto:[email protected]>> escreveu:
>>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Stefan Radman <[email protected] <mailto:[email protected]>>
>>> To: Proxmox VE user list <[email protected]
>>> <mailto:[email protected]>>
>>> Cc: PVE User List <[email protected]
>>> <mailto:[email protected]>>
>>> Bcc:
>>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>>> Yesterday I had the same thing happen when shutting down a Dell PowerEdge
>>> R740.
>>>
>>> Again, the kernel panic was triggered by a BCM5720 running Broadcom
>>> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve.
>>>
>>> R740 BIOS 2.21.2 (but also happened with 2.20.1)
>>>
>>> Stefan
>>>
>>> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5
>>> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic
>>> Hardware Error Source: 5
>>> [1325589.991223] {1}[Hardware Error]: event severity: fatal
>>> [1325589.991225] {1}[Hardware Error]: Error 0, type: fatal
>>> [1325589.991227] {1}[Hardware Error]: section_type: PCIe error
>>> [1325589.991228] {1}[Hardware Error]: port_type: 0, PCIe end point
>>> [1325589.991231] {1}[Hardware Error]: version: 3.0
>>> [1325589.991233] {1}[Hardware Error]: command: 0x0002, status: 0x0010
>>> [1325589.991235] {1}[Hardware Error]: device_id: 0000:01:00.1
>>> [1325589.991237] {1}[Hardware Error]: slot: 0
>>> [1325589.991239] {1}[Hardware Error]: secondary_bus: 0x00
>>> [1325589.991240] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f
>>> [1325589.991242] {1}[Hardware Error]: class_code: 020000
>>> [1325589.991244] {1}[Hardware Error]: aer_uncor_status: 0x00100000,
>>> aer_uncor_mask: 0x00010000
>>> [1325589.991246] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
>>> [1325589.991248] {1}[Hardware Error]: TLP Header: 40000001 0000010f
>>> 90028090 00000000
>>> [1325589.991252] Kernel panic - not syncing: Fatal hardware error!
>>> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O
>>> 6.5.13-1-pve #1
>>> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS
>>> 2.20.1 09/13/2023
>>> [1325589.991259] Call Trace:
>>> [1325589.991261] <NMI>
>>>
>>> root@per740:~# pveversion
>>> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)
>>>
>>> root@per740:~# ethtool -i eno4
>>> driver: tg3
>>> version: 6.5.13-3-pve
>>> firmware-version: FFV22.71.3 bc 5720-v1.39
>>> expansion-rom-version:
>>> bus-info: 0000:01:00.1
>>> supports-statistics: yes
>>> supports-test: yes
>>> supports-eeprom-access: yes
>>> supports-register-dump: yes
>>> supports-priv-flags: no
>>>
>>>
>>> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user
>>> > <[email protected] <mailto:[email protected]>> wrote:
>>> >
>>> >
>>> > From: Stefan Radman <[email protected] <mailto:[email protected]>>
>>> > Subject: 6.5.13-3-pve kernel panic on shutdown
>>> > Date: March 28, 2024 at 15:50:02 GMT+1
>>> > To: PVE User List <[email protected]
>>> > <mailto:[email protected]>>
>>> >
>>> >
>>> > I recently noticed that a Dell Poweredge R540 currently running Proxmox
>>> > VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
>>> >
>>> > The kernel panic is triggered 3-4 seconds after the last network
>>> > interface goes down (onboard BCM5720 LOM), while the system enters S5
>>> > (sleep) state.
>>> >
>>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>>> > disabling slave
>>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>>> > disabling slave
>>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>>> > [84460.001615] bond0: now running without any active interface!
>>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>>> > Hardware Error Source: 5
>>> >
>>> > This is reproducible on every reboot.
>>> >
>>> > R540 and BCM5720 are running the latest firmware available from the Dell
>>> > support website.
>>> >
>>> > Link [2] below seem to suggest that my problem is related to a
>>> > combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
>>> >
>>> > Has anyone else seen this lately (or ever) with Promox VE?
>>> >
>>> > Thank you
>>> >
>>> > Stefan
>>> >
>>> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
>>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>>> >
>>> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
>>> > causes Bus Fatal Error when rebooting system with BCM5720 NIC
>>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
>>> >
>>> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
>>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
>>> >
>>> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
>>> > triggering AER
>>> > https://lore.kernel.org/netdev/caad53p7pmep+vwlz+fgddntgq2kqgl54fo86bpy7oy9tkzx...@mail.gmail.com/T/
>>> >
>>> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
>>> > https://patches.linaro.org/project/linux-acpi/patch/[email protected]/
>>> >
>>> > [6] * [PATCH] tg3: add new module param to force device power down on
>>> > reboot
>>> > https://lore.kernel.org/lkml/[email protected]/T/
>>> >
>>> >
>>> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block devices.
>>> > [84458.607141] systemd-shutdown[1]: Rebooting.
>>> > [84458.612283] spi-nor spi0.0: Software reset failed: -524
>>> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
>>> > called outbound_intr_mask:0x40000009
>>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>>> > disabling slave
>>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>>> > disabling slave
>>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>>> > [84460.001615] bond0: now running without any active interface!
>>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>>> > Hardware Error Source: 5
>>> > [84463.685116] {1}[Hardware Error]: event severity: fatal
>>> > [84463.685117] {1}[Hardware Error]: Error 0, type: fatal
>>> > [84463.685119] {1}[Hardware Error]: section_type: PCIe error
>>> > [84463.685120] {1}[Hardware Error]: port_type: 0, PCIe end point
>>> > [84463.685121] {1}[Hardware Error]: version: 3.0
>>> > [84463.685122] {1}[Hardware Error]: command: 0x0002, status: 0x0010
>>> > [84463.685123] {1}[Hardware Error]: device_id: 0000:04:00.1
>>> > [84463.685125] {1}[Hardware Error]: slot: 0
>>> > [84463.685126] {1}[Hardware Error]: secondary_bus: 0x00
>>> > [84463.685127] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f
>>> > [84463.685128] {1}[Hardware Error]: class_code: 020000
>>> > [84463.685129] {1}[Hardware Error]: aer_uncor_status: 0x00100000,
>>> > aer_uncor_mask: 0x00010000
>>> > [84463.685130] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
>>> > [84463.685131] {1}[Hardware Error]: TLP Header: 40000001 0000010f
>>> > 90028090 00000000
>>> > [84463.685134] Kernel panic - not syncing: Fatal hardware error!
>>> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P O
>>> > 6.5.13-3-pve #1
>>> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
>>> > 2.21.1 03/07/2024
>>> > [84463.685140] Call Trace:
>>> > [84463.685142] <NMI>
>>> > …
>>> >
>>> > root@pve:~# pveversion
>>> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
>>> > root@pve:~# ethtool -i eno2
>>> > driver: tg3
>>> > version: 6.5.13-3-pve
>>> > firmware-version: FFV22.71.3 bc 5720-v1.39
>>> > expansion-rom-version:
>>> > bus-info: 0000:04:00.1
>>> > supports-statistics: yes
>>> > supports-test: yes
>>> > supports-eeprom-access: yes
>>> > supports-register-dump: yes
>>> > supports-priv-flags: no
>>> > root@pve:~# lspci | fgrep 04:00.1
>>> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
>>> > BCM5720 Gigabit Ethernet PCIe
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > pve-user mailing list
>>> > [email protected] <mailto:[email protected]>
>>> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Stefan Radman via pve-user <[email protected]
>>> <mailto:[email protected]>>
>>> To: Proxmox VE user list <[email protected]
>>> <mailto:[email protected]>>
>>> Cc: Stefan Radman <[email protected] <mailto:[email protected]>>, PVE
>>> User List <[email protected] <mailto:[email protected]>>
>>> Bcc:
>>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>>> _______________________________________________
>>> pve-user mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user