[
https://issues.apache.org/jira/browse/MESOS-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866074#comment-17866074
]
Jason Zhou commented on MESOS-10243:
------------------------------------
Update:
We have discovered that in systems with systemd version above 242, there is a
potential data race where udev will try to update the MAC address of the device
at the same time as us if the systemd's MacAddressPolicy is set to
'persistent'. To prevent udev from trying to set the veth device's MAC address
by itself, we must set the device MAC address on creation so that
addr_assign_type will be set to NET_ADDR_SET, which prevents udev from
attempting to change the MAC address of the veth device.
see:
[https://github.com/torvalds/linux/commit/2afb9b533423a9b97f84181e773cf9361d98fed6]
see:
[https://lore.kernel.org/netdev/cahxsexy8lkzocbdbzss_vjopc_tqmyzm87kc192hpmuhmcq...@mail.gmail.com/T/]
Patch for avoiding race condition: [https://reviews.apache.org/r/75086/]
Todo: also avoid race condition for the created peer link:
[https://reviews.apache.org/r/75087/]
> MAC Address changes from link::setMAC may not stick
> ---------------------------------------------------
>
> Key: MESOS-10243
> URL: https://issues.apache.org/jira/browse/MESOS-10243
> Project: Mesos
> Issue Type: Bug
> Reporter: Jason Zhou
> Priority: Major
>
> It seems that there are scenarios where mesos containers cannot communicate
> with agents as the MAC addresses are set incorrectly, leading to dropped
> packets. A workaround for this behavior is to check that the MAC address is
> set correctly after the ioctl call, and retry the address setting if
> necessary.
> In our test, this workaround appears to reduce the frequency of this issue,
> but does not seem to prevent all such failures.
> Reviewboard ticket for the workaround: [https://reviews.apache.org/r/75057/]
> Observed scenarios with incorrectly assigned MAC addresses:
> 1. ioctl returns the correct MAC address, but not net::mac
> 2. both net::mac and ioctl return the same MAC address, but are both wrong
> 3. There are no cases where ioctl/net::mac come back with the same MAC
> address as before setting. i.e. there is no no-op observed.
> 4. There is a possibility that ioctl/net::mac results disagree with each
> other even before attempting to set our desired MAC address. As such, we
> check that the results agree before we set, and log a warning if we find
> a mismatch
> 5. There is a possibility that the MAC address we set ends up overwritten by
> a garbage value after setMAC has already completed and checked that the
> mac address was set correctly. Since this error happens after this
> function has finished, we cannot log nor detect it in setMAC. Our
> workaround cannot deal with this scenario as it occurs outside setMAC
> Notes:
> 1. We have observed this behavior only on CentOS 9 systems at the moment,
> We have tried kernels 5.15.147, 5.15.160, 5.15.161, which all have this
> issue.
> CentOS 7 systems do not seem to have this issue with setMAC.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)