Very cool that this is fixed!

Mark Schouten

> Op 2 jul. 2021 om 22:58 heeft Thomas Lamprecht <[email protected]> het 
> volgende geschreven:
> 
> On 29.06.21 10:05, Mark Schouten wrote:
>> Hi,
>> 
>> Op 24-06-2021 om 15:16 schreef Martin Maurer:
>>> We are pleased to announce the first beta release of Proxmox Virtual 
>>> Environment 7.0! The 7.x family is based on the great Debian 11 "Bullseye" 
>>> and comes with a 5.11 kernel, QEMU 6.0, LXC 4.0, OpenZFS 2.0.4.
>> 
>> I just upgraded a node in our demo cluster and all seemed fine. Except for 
>> non-working cluster network. I was unable to ping the node through the 
>> cluster interface, pvecm saw no other nodes and ceph was broken.
>> 
>> However, if I ran tcpdump, ping started working, but not the rest.
>> 
>> Interesting situation, which I 'fixed' by disabling vlan-aware-bridge for 
>> that interface. After the reboot, everything works (AFAICS).
>> 
>> If Proxmox wants to debug this, feel free to reach out to me, I can grant 
>> you access to this node so you can check it out.
>> 
> 
> FYI, there was some more investigation regarding this, mostly spear headed by 
> Wolfgang,
> and we found and fixed[0] an actual, rather old (fixes commit is from 2014!), 
> bridge bug
> in the kernel.
> 
> The first few lines of the fix's commit message[0] explain the basics:
> 
>> [..] bridges with `vlan_filtering 1` and only 1 auto-port don't
>> set IFF_PROMISC for unicast-filtering-capable ports.
> 
> Further, we saw all that weird behavior as
> * while this is independent of any specific network driver, those specific 
> drivers
>  vary wildly in how the do things, and some thus worked (by luck) while 
> others did
>  not.
> 
> * It can really only happen in the vlan-aware case, as else all ports are set 
> promisc
>  no matter what, but depending in which order things are done the result may 
> still
>  differ even with vlan-aware on
> 
> * It did not matter before (i.e., before systemd started to also apply their
>  MACAddressPolicy by default onto virtual devices like bridges) because then 
> the
>  bridge basically always had a MAC from one of it's ports, so the fdb always
>  contained the bridge's MAC implicitly and the bug was concealed.
> 
> So it's quite likely that this rather confusing mix of behaviors would had 
> pop up
> in more places, where bridges are used, in the upcoming  months when that 
> systemd
> change slowly rolled into stable distros, so actually really nice to find and 
> fix
> (*knocks wood*) this during beta!
> 
> Anyhow, a newer kernel build is now also available in the bullseye based 
> pvetest
> repository, if you want to test and confirm the fix:
> 
> pve-kernel-5.11.22-1-pve version 5.11.22-2
> 
> cheers,
> Thomas
> 
> 
> [0]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a019abd80220
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to