On 6/19/24 16:07, Xin Long wrote:
> On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>
>> On 6/18/24 17:50, Ilya Maximets wrote:
>>> On 6/18/24 16:58, Xin Long wrote:
>>>> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>
>>>>> On 6/17/24 22:10, Ilya Maximets wrote:
>>>>>> On 7/16/23 23:09, Xin Long wrote:
>>>>>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be 
>>>>>>> removed
>>>>>>> from the hashtable when lookup, we can simplify the exp processing code 
>>>>>>> a
>>>>>>> lot in openvswitch conntrack.
>>>>>>>
>>>>>>> Signed-off-by: Xin Long <lucien....@gmail.com>
>>>>>>> ---
>>>>>>>  net/openvswitch/conntrack.c | 78 +++++--------------------------------
>>>>>>>  1 file changed, 10 insertions(+), 68 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>>>>>>> index 331730fd3580..fa955e892210 100644
>>>>>>> --- a/net/openvswitch/conntrack.c
>>>>>>> +++ b/net/openvswitch/conntrack.c
>>>>>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net 
>>>>>>> *net, struct sw_flow_key *key,
>>>>>>>      return 0;
>>>>>>>  }
>>>>>>>
>>>>>>> -static struct nf_conntrack_expect *
>>>>>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone 
>>>>>>> *zone,
>>>>>>> -               u16 proto, const struct sk_buff *skb)
>>>>>>> -{
>>>>>>> -    struct nf_conntrack_tuple tuple;
>>>>>>> -    struct nf_conntrack_expect *exp;
>>>>>>> -
>>>>>>> -    if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
>>>>>>> &tuple))
>>>>>>> -            return NULL;
>>>>>>> -
>>>>>>> -    exp = __nf_ct_expect_find(net, zone, &tuple);
>>>>>>> -    if (exp) {
>>>>>>> -            struct nf_conntrack_tuple_hash *h;
>>>>>>> -
>>>>>>> -            /* Delete existing conntrack entry, if it clashes with the
>>>>>>> -             * expectation.  This can happen since conntrack ALGs do 
>>>>>>> not
>>>>>>> -             * check for clashes between (new) expectations and 
>>>>>>> existing
>>>>>>> -             * conntrack entries.  nf_conntrack_in() will check the
>>>>>>> -             * expectations only if a conntrack entry can not be found,
>>>>>>> -             * which can lead to OVS finding the expectation (here) in 
>>>>>>> the
>>>>>>> -             * init direction, but which will not be removed by the
>>>>>>> -             * nf_conntrack_in() call, if a matching conntrack entry is
>>>>>>> -             * found instead.  In this case all init direction packets
>>>>>>> -             * would be reported as new related packets, while reply
>>>>>>> -             * direction packets would be reported as un-related
>>>>>>> -             * established packets.
>>>>>>> -             */
>>>>>>> -            h = nf_conntrack_find_get(net, zone, &tuple);
>>>>>>> -            if (h) {
>>>>>>> -                    struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>>>>>>> -
>>>>>>> -                    nf_ct_delete(ct, 0, 0);
>>>>>>> -                    nf_ct_put(ct);
>>>>>>> -            }
>>>>>>> -    }
>>>>>>> -
>>>>>>> -    return exp;
>>>>>>> -}
>>>>>>> -
>>>>>>>  /* This replicates logic from nf_conntrack_core.c that is not 
>>>>>>> exported. */
>>>>>>>  static enum ip_conntrack_info
>>>>>>>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
>>>>>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
>>>>>>> sw_flow_key *key,
>>>>>>>                       const struct ovs_conntrack_info *info,
>>>>>>>                       struct sk_buff *skb)
>>>>>>>  {
>>>>>>> -    struct nf_conntrack_expect *exp;
>>>>>>> -
>>>>>>> -    /* If we pass an expected packet through nf_conntrack_in() the
>>>>>>> -     * expectation is typically removed, but the packet could still be
>>>>>>> -     * lost in upcall processing.  To prevent this from happening we
>>>>>>> -     * perform an explicit expectation lookup.  Expected connections 
>>>>>>> are
>>>>>>> -     * always new, and will be passed through conntrack only when they 
>>>>>>> are
>>>>>>> -     * committed, as it is OK to remove the expectation at that time.
>>>>>>> -     */
>>>>>>> -    exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
>>>>>>> -    if (exp) {
>>>>>>> -            u8 state;
>>>>>>> -
>>>>>>> -            /* NOTE: New connections are NATted and Helped only when
>>>>>>> -             * committed, so we are not calling into NAT here.
>>>>>>> -             */
>>>>>>> -            state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
>>>>>>> -            __ovs_ct_update_key(key, state, &info->zone, exp->master);
>>>>>>
>>>>>> Hi, Xin, others.
>>>>>>
>>>>>> Unfortunately, it seems like removal of this code broke the expected 
>>>>>> behavior.
>>>>>> OVS in userspace expects that SYN packet of a new related FTP connection 
>>>>>> will
>>>>>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk 
>>>>>> and not
>>>>>> new.  This is a problem because we need to commit this connection with 
>>>>>> the label
>>>>>> and we do that for +new packets.  If we can't get +new packet we'll have 
>>>>>> to commit
>>>>>> every single +rel+trk packet, which doesn't make a lot of sense.  And 
>>>>>> it's a
>>>>>> significant behavior change regardless.
>>>>>
>>>>> Interestingly enough I see +new+rel+trk packets in cases without SNAT,
>>>>> but we can only get +rel+trk in cases with SNAT.  So, this may be just
>>>>> a generic conntrack bug somewhere.  At least the behavior seems fairly
>>>>> inconsistent.
>>>>>
>>>> In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same
>>>> time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_NEW
>>>> are set at the same time by ovs_ct_update_key() when this related ct
>>>> is not confirmed.
>>>>
>>>> The check-kernel test of "FTP SNAT orig tuple" skiped on my env somehow:
>>>>
>>>> # make check-kernel
>>>> 144: conntrack - FTP SNAT orig tuple   skipped (system-traffic.at:7295)
>>>>
>>>> Any idea why? or do you know any other testcase that expects +new+rel+trk
>>>> but returns +rel+trk only?
>>>
>>> You need to install lftp and pyftpdlib.  The pyftpdlib may only be available
>>> via pip on some systems.
>>>
>>>>
>>>> Thanks.
>>>>>>
>>>>>> Could you, please, take a look?
>>>>>>
>>>>>> The issue can be reproduced by running check-kernel tests in OVS repo.
>>>>>> 'FTP SNAT orig tuple' tests fail 100% of the time.
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>
>>>
>>
>> Hmm.  After further investigation, it seems that the issue is not about ct 
>> state,
>> but the ct label.  Before this commit we had information about both the 
>> original
>> tuple of the parent connection and the mark/label of the parent connection:
>>
> Make senses. Now I can see the difference after this commit.
> We will need a fix in __ovs_ct_update_key() to copy mark & label from
> ct->master for exp ct.
> 
> @@ -196,8 +196,8 @@ static void __ovs_ct_update_key(struct sw_flow_key
> *key, u8 state,
>  {
>         key->ct_state = state;
>         key->ct_zone = zone->id;
> -       key->ct.mark = ovs_ct_get_mark(ct);
> -       ovs_ct_get_labels(ct, &key->ct.labels);
> +       key->ct.mark = 0;
> +       memset(&key->ct.labels, 0, OVS_CT_LABELS_LEN);
> 
>         if (ct) {
>                 const struct nf_conntrack_tuple *orig;
> @@ -205,6 +205,8 @@ static void __ovs_ct_update_key(struct sw_flow_key
> *key, u8 state,
>                 /* Use the master if we have one. */
>                 if (ct->master)
>                         ct = ct->master;
> +               key->ct.mark = ovs_ct_get_mark(ct);
> +               ovs_ct_get_labels(ct, &key->ct.labels);
>                 orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> 
>                 /* IP version must match with the master connection. */
> 
> We may need to run some regression tests for such a change.

Thank, Xin!  This seems like a change in the right direction and it fixes
this particular test.  But, I guess, we should get mark/labels from the
master connection only if it is not yet confirmed.  Users may commit different
labels for the related connection.  This should be more in line with the
previous behavior.

What do you think?

Best regards, Ilya Maximets.

> 
> Thanks.
> 
>> system@ovs-system: miss upcall:
>> recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25),
>> ct_zone(0x1),ct_mark(0),ct_label(0x4d2000000000000000000000001),
>> ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=50648,tp_dst=21),
>> eth(src=de:d9:f3:c8:5a:3a,dst=80:88:88:88:88:88),eth_type(0x0800),
>> ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no),
>> tcp(src=57027,dst=38153),tcp_flags(syn)
>>
>> But after this change, we still have the original tuple of the parent 
>> connection,
>> but the label is no longer in the flow key:
>>
>> system@ovs-system: miss upcall:
>> recirc_id(0x2),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),ct_state(0x25),
>> ct_zone(0x1),ct_mark(0),ct_label(0),
>> ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=6,tp_src=34668,tp_dst=21),
>> eth(src=66:eb:74:c6:79:24,dst=80:88:88:88:88:88),eth_type(0x0800),
>> ipv4(src=10.1.1.2,dst=10.1.1.9,proto=6,tos=0,ttl=64,frag=no),
>> tcp(src=49529,dst=35459),tcp_flags(syn)
>>
>> ct_state(0x25) == +new+rel+trk
>>
>> Best regards, Ilya Maximets.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to