Gaëtan Rivet <gr...@u256.net> writes:

> On Tue, Jan 25, 2022, at 17:10, Mike Pattrick wrote:
>> On Mon, Dec 27, 2021 at 8:27 AM Paolo Valerio <pvale...@redhat.com> wrote:
>>>
>>> Some sporadic false positive may be visible for the following tests:
>>>
>>> - conntrack - IPv6 HTTP
>>> - conntrack - FTP over IPv6
>>>
>>> The failures show up randomly.
>>> The reason appears to be source address used when performing the
>>> request using wget:
>>> -tcp,orig=(src=fc00::1,dst=fc00::2,sport=<cleared>,dport=<cleared>),reply=(src=fc00::2,dst=fc00::1,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)
>>> +tcp,orig=(src=fe80::f0eb:f8ff:fef0:138f,dst=fc00::2,sport=<cleared>,dport=<cleared>),reply=(src=fc00::2,dst=fe80::f0eb:f8ff:fef0:138f,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)
>>>
>>> It seems that the problem can be addressed in multiple ways, but using
>>> "nodad" seems to be safe enough to fix the issue that now, after
>>> hundreds of attempts, is no longer present.
>>>
>>> Signed-off-by: Paolo Valerio <pvale...@redhat.com>
>>> ---
>>
>> I wasn't able to reproduce the sporadic failures after 100 iterations
>> of "conntrack - FTP over IPv6", but I still think this fix makes
>> sense, and shouldn't break anything.
>>
>> Acked-by: Mike Pattrick <m...@redhat.com>
>>
>
> Hi Paolo,
>
> I'm sorry, I don't understand the root cause of the failure.
>
> Seeing the src IP switched from fc00:: to fe80::, and how the nodad flag
> solves it, I am guessing that assigning fc00::1 on the veth failed due
> to this address already being in use in the LAN?
>
> How does it happen if all involved links are veths? Could it be that either 
> other tests
> are running in parallel with veths having the same IP? Or maybe something else
> unrelated to OVS?
>
> Nodad might make the link creation work, but I'm worried that it could make 
> the
> result unpredictable, i.e. making it less likely, and less easy to understand 
> and debug
> in the rare remaining times it triggers.
>

I see your concern, and it's a fair point.
I don't expect dad to fail as there are no duplicates.
I probed the error path that is supposed to handle the DAD failure, and
event never gets generated, that is, it didn't fail.

What it seems to happen is:

- DAD starts (the ifaddr tentative flags is set)
- The process connect()s (skipping ifaddr with tentative flag set)
- DAD successfully terminates (tentative gets cleared)

Probing during the failure and dumping the first byte of the address
(for convenience) and the ifaddr flags seems to confirm that:

ping6 651231 [010] 15603.586147: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfc
ping6 651231 [010] 15603.586148: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfe
ping6 651233 [009] 15603.692852: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfc
ping6 651233 [009] 15603.692856: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfe
ping6 651266 [005] 15604.807881: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfc
ping6 651266 [005] 15604.807885: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0x80 u6_addr8=0xfe
ping6 651266 [005] 15604.808005: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfc
ping6 651266 [005] 15604.808007: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0x80 u6_addr8=0xfe
 wget 651293 [009] 15605.041008: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0xc0 u6_addr8=0xfc
 wget 651293 [009] 15605.041010: probe:__ipv6_dev_get_saddr_L9: 
(ffffffffb5b5919a) flags=0x80 u6_addr8=0xfe

#define IFA_F_TENTATIVE         0x40
#define IFA_F_PERMANENT         0x80

the ping fails to assign the IP, but it retries until succeeds
(IFA_TENTATIVE cleared for "fe80::").
The subsequent connect() call performed by wget returns "fe80::"
as it is the only one without the tentative flag set.

I didn't test it, but I suppose that "optimistic" would also solve the
problem, but AFAICS, disabling dad in a test involving namespaces is
fine (furthermore, we know there are no duplicates).  This is something we
already perform in "datapath - ping over vxlan6 tunnel" and it seems ok
to me.

> Best,
> -- 
> Gaetan Rivet

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to