--- Begin Message ---
Hi Richard,

El 22/9/20 a las 10:19, richard lucassen escribió:
On Mon, 3 Aug 2020 13:54:54 +0200
Eneko Lacunza via pve-user <[email protected]> wrote:

As reported 10 days ago, we have found a e1000e driver hang recently,
after upgrading from PVE 5.4 to 6.2, in an otherwise stable server.

It could be a driver issue and not a virtio network issue, but we
haven't seen another hang since the one reported.
[note] I just moved the images to a new proxmox 6.2.11 environment and
the problem remains. An RTL8169 NIC works well

We had a new fence on 7th sept on that cluster. Can't confirm if it was a e1000e hang, but it is likely.

3 nodes on the cluster; all 3 have integrated e1000e interfaces, and we're seeing random down/ups of intel physical interfaces:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)

I also found a trace (not linked to a fence) in the logs of the first and third node, maybe it isn't related:

Sep  8 08:57:14 proxmox1 kernel: [35054.564849] ------------[ cut here ]------------ Sep  8 08:57:14 proxmox1 kernel: [35054.564856] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out Sep  8 08:57:14 proxmox1 kernel: [35054.564867] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x264/0x270 Sep  8 08:57:14 proxmox1 kernel: [35054.564868] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfsv3 nfs_acl nfs lockd grace fscache ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table _filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp ip_set_hash_net ip_set sct p iptable_filter bpfilter xfs softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_hdmi crc32_pcl mul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio aesni_intel crypto_simd cryptd glue_helper mei_hdcp i915 drm_kms_helper snd_hda_intel snd_intel_dspcfg intel_cstate snd_hda_codec  snd_hda_core snd_hwdep snd_pcm snd_timer mei_me snd mei soundcore drm i2c_algo_bit intel_pch_thermal intel_rapl_perf fb_sys_fops syscopyarea sysfillrect sysimgblt ie31200_edac dell_wmi Sep  8 08:57:14 proxmox1 kernel: [35054.564888]  dell_smbios serio_raw dcdbas pcspkr sparse_keymap intel_wmi_thunderbolt wmi_bmof dell_wmi_descriptor mac_hid acpi_pad zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(P O) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq d m_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c e1000e xhci_pci psmouse i2c_i801 xhci_hcd ahci tg3 libahci wmi video Sep  8 08:57:14 proxmox1 kernel: [35054.564915] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O      5.4.44-2-pve #1 Sep  8 08:57:14 proxmox1 kernel: [35054.564915] Hardware name: Dell Inc. PowerEdge T30/07T4MC, BIOS 1.0.7 07/30/2017 Sep  8 08:57:14 proxmox1 kernel: [35054.564917] RIP: 0010:dev_watchdog+0x264/0x270 Sep  8 08:57:14 proxmox1 kernel: [35054.564918] Code: 48 85 c0 75 e6 eb a0 4c 89 ef c6 05 81 1a eb 00 01 e8 80 b1 fa ff 89 d9 4c 89 ee 48 c7 c7 70 2f 63 bb 48 89 c2 e8 cd 7a 74 ff <0f> 0b eb 82 0f 1f 84 00 00 00
 00 00 0f 1f 44 00 00 55 48 89 e5 41
Sep  8 08:57:14 proxmox1 kernel: [35054.564918] RSP: 0018:ffffb352c003ce58 EFLAGS: 00010282 Sep  8 08:57:14 proxmox1 kernel: [35054.564919] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Sep  8 08:57:14 proxmox1 kernel: [35054.564920] RDX: ffff9b79bdaa7740 RSI: 00000000000000f6 RDI: 0000000000000300 Sep  8 08:57:14 proxmox1 kernel: [35054.564920] RBP: ffffb352c003ce88 R08: 00000000000003d9 R09: 0000000000000004 Sep  8 08:57:14 proxmox1 kernel: [35054.564921] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 Sep  8 08:57:14 proxmox1 kernel: [35054.564921] R13: ffff9b79ac3f0000 R14: ffff9b79ac3f0480 R15: ffff9b79accd0080 Sep  8 08:57:14 proxmox1 kernel: [35054.564922] FS: 0000000000000000(0000) GS:ffff9b79bda80000(0000) knlGS:0000000000000000 Sep  8 08:57:14 proxmox1 kernel: [35054.564922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep  8 08:57:14 proxmox1 kernel: [35054.564923] CR2: 00007fff22cdde9c CR3: 0000000709c0a004 CR4: 00000000003626e0 Sep  8 08:57:14 proxmox1 kernel: [35054.564923] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep  8 08:57:14 proxmox1 kernel: [35054.564924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  8 08:57:14 proxmox1 kernel: [35054.564924] Call Trace:
Sep  8 08:57:14 proxmox1 kernel: [35054.564925]  <IRQ>
Sep  8 08:57:14 proxmox1 kernel: [35054.564928]  ? pfifo_fast_enqueue+0x160/0x160
Sep  8 08:57:14 proxmox1 kernel: [35054.564930] call_timer_fn+0x32/0x130
Sep  8 08:57:14 proxmox1 kernel: [35054.564931] run_timer_softirq+0x1a5/0x430
Sep  8 08:57:14 proxmox1 kernel: [35054.564933]  ? ktime_get+0x3c/0xa0
Sep  8 08:57:14 proxmox1 kernel: [35054.564935]  ? lapic_next_deadline+0x26/0x30 Sep  8 08:57:14 proxmox1 kernel: [35054.564936]  ? clockevents_program_event+0x93/0xf0
Sep  8 08:57:14 proxmox1 kernel: [35054.564938] __do_softirq+0xdc/0x2d4
Sep  8 08:57:14 proxmox1 kernel: [35054.564940]  irq_exit+0xa9/0xb0
Sep  8 08:57:14 proxmox1 kernel: [35054.564941] smp_apic_timer_interrupt+0x79/0x130 Sep  8 08:57:14 proxmox1 kernel: [35054.564942] apic_timer_interrupt+0xf/0x20
Sep  8 08:57:14 proxmox1 kernel: [35054.564943]  </IRQ>
Sep  8 08:57:14 proxmox1 kernel: [35054.564945] RIP: 0010:cpuidle_enter_state+0xbd/0x450 Sep  8 08:57:14 proxmox1 kernel: [35054.564946] Code: ff e8 a7 b4 84 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 ca 22 8b ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d Sep  8 08:57:14 proxmox1 kernel: [35054.564946] RSP: 0018:ffffb352c00c3e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Sep  8 08:57:14 proxmox1 kernel: [35054.564947] RAX: ffff9b79bdaaad40 RBX: ffffffffbb957a00 RCX: 000000000000001f Sep  8 08:57:14 proxmox1 kernel: [35054.564948] RDX: 00001fe1c6e2a49d RSI: 0000000026a5b845 RDI: 0000000000000000 Sep  8 08:57:14 proxmox1 kernel: [35054.564948] RBP: ffffb352c00c3e88 R08: 0000000000000002 R09: 000000000002a5c0 Sep  8 08:57:14 proxmox1 kernel: [35054.564949] R10: 000069f45edea306 R11: ffff9b79bdaa99e0 R12: ffff9b79bdab6600 Sep  8 08:57:14 proxmox1 kernel: [35054.564949] R13: 0000000000000006 R14: ffffffffbb957c58 R15: ffffffffbb957c40 Sep  8 08:57:14 proxmox1 kernel: [35054.564951]  ? cpuidle_enter_state+0x99/0x450
Sep  8 08:57:14 proxmox1 kernel: [35054.564952] cpuidle_enter+0x2e/0x40
Sep  8 08:57:14 proxmox1 kernel: [35054.564954] call_cpuidle+0x23/0x40
Sep  8 08:57:14 proxmox1 kernel: [35054.564954] call_cpuidle+0x23/0x40
Sep  8 08:57:14 proxmox1 kernel: [35054.564955]  do_idle+0x22c/0x270
Sep  8 08:57:14 proxmox1 kernel: [35054.564957] cpu_startup_entry+0x1d/0x20
Sep  8 08:57:14 proxmox1 kernel: [35054.564958] start_secondary+0x166/0x1c0
Sep  8 08:57:14 proxmox1 kernel: [35054.564960] secondary_startup_64+0xa4/0xb0 Sep  8 08:57:14 proxmox1 kernel: [35054.564961] ---[ end trace 3a481687c9259238 ]---


Cheers



--
Eneko Lacunza                   | Tel.  943 569 206
                                | Email [email protected]
Director Técnico                | Site. https://www.binovo.es
BINOVO IT HUMAN PROJECT S.L     | Dir.  Astigarragako Bidea, 2 - 2º izda. 
Oficina 10-11, 20180 Oiartzun



--- End Message ---
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to