Hi, Paolo,

IIRC, you have a revision version on these patches?
I guess it should be closer to the upstream than mine?


Aaron Conole <acon...@redhat.com> 于2023年5月17日周三 21:54写道:

> Paolo Valerio <pvale...@redhat.com> writes:
>
> > Ilya Maximets <i.maxim...@ovn.org> writes:
> >
> >> On 5/4/23 19:21, Paolo Valerio wrote:
> >>> Ilya Maximets <i.maxim...@ovn.org> writes:
> >>>
> >>>> On 4/19/23 20:40, Paolo Valerio wrote:
> >>>>> During the creation of a new connection, there's a chance both key
> and
> >>>>> rev_key end up having the same hash. This is more common in the case
> >>>>> of all-zero snat with no collisions. In that case, once the
> >>>>> connection is expired, but not cleaned up, if a new packet with the
> >>>>> same 5-tuple is received, an assertion failure gets triggered in
> >>>>> conn_update_state() because of a previous failure of retrieving a
> >>>>> CT_CONN_TYPE_DEFAULT connection.
> >>>>>
> >>>>> Fix it by releasing the nat_conn during the connection creation in
> the
> >>>>> case of same hash for both key and rev_key.
> >>>>
> >>>> This sounds a bit odd.  Shouldn't we treat hash collision as a normal
> case?
> >>>>
> >>>> Looking at the code, I'm assuming that the issue comes from the
> following
> >>>> part in process_one():
> >>>>
> >>>>     if (OVS_LIKELY(conn)) {
> >>>>         if (conn->conn_type == CT_CONN_TYPE_UN_NAT) {
> >>>>             ...
> >>>>             conn_key_lookup(ct, &ctx->key, hash, now, &conn,
> &ctx->reply);
> >>>>
> >>>> And here we get the same connection again, because the default one is
> already
> >>>> expired.  Is that correct?
> >>>>
> >>>> If so, maybe we should add an extra condition to conn_key_lookup() to
> >>>> only look for DEFAULT connections instead, just for this case?  Since
> >>>> we really don't want to get the UN_NAT one here.
> >>>>
> >>>
> >>> Hello Ilya,
> >>>
> >>> It's a fair point.
> >>> I initially thought about the approach you're suggesting, but I had
> some
> >>> concerns about it that I'll try to summarize below.
> >>>
> >>> For sure it would fix the issue (it could require the first patch to be
> >>> applied as well for the branches with rcu exp lists).
> >>>
> >>> Based on the current logic, new packets matching that expired
> connection
> >>> but not evicted will be marked as +inv and further packets will be
> >>> marked so for the whole sweep interval unless an exception like this
> get
> >>> added:
> >>>
> >>> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis);
> >>> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */
> >>> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true);
> >>> /* special case where there's hash collision */
> >>> if (!conn && ctx->hash != hash) {
> >>>     pkt->md.ct_state |= CS_INVALID;
> >>>     write_ct_md(pkt, zone, NULL, NULL, NULL);
> >>>     ...
> >>>     return;
> >>> }
> >>>
> >>> This would further require that subsequent lookup in the
> create_new_conn
> >>> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.:
> >>>
> >>> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis);
> >>> /* Only check for CT_CONN_TYPE_DEFAULT */
> >>> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) {
> >>>     conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
> >>>                           helper, alg_exp, ct_alg_ctl, tp_id);
> >>> }
> >>>
> >>> otherwise we could incur in a false positive which prevent to create a
> >>> new connection.
> >>
> >> I'm not really sure if what described above is more correct way of doing
> >> things or not...  Aaron, do you have opinion on this?
> >>
> >> Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the
> >> moment DEFAULT counterpart of it expires?  Or that will that be against
> >> some logic / not possible to do?
> >>
> >
> > As far as I can tell, this could not be straightforward as simply
> > marking it as expired should not be reliable (e.g. doing it from the
> > sweeper), and I guess that managing the expiration time field for the
> > nat_conn as well would require updating the nat_conn every time the
> > default one gets updated, probably making it a bit unpractical.
> >
> > Another approach would be removing the nat_conn [1] altogether.
> > The problem in this case is backporting. Some adjustments that would add
> > to the patch might be needed for older branches.
> >
> > [1]
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/
>
> I think that work was interesting, and maybe the best way to go
> forward.  Backports would become difficult, though - agreed.
>
> >>
> >>>
> >>>> Best regards, Ilya Maximets.
> >>>>
> >>>>>
> >>>>> Reported-by: Michael Plato <michael.pl...@tu-berlin.de>
> >>>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP
> address.")
> >>>>> Signed-off-by: Paolo Valerio <pvale...@redhat.com>
> >>>>> ---
> >>>>> In this thread [0] there are some more details. A similar
> >>>>> approach here could be to avoid to add the nat_conn to the cmap and
> >>>>> letting the sweeper release the memory for nat_conn once the whole
> >>>>> connection gets freed.
> >>>>> That approach could still be ok, but the drawback is that it could
> >>>>> require a different patch for older branches that don't include
> >>>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with
> >>>>> rculists."). It still worth to be considered.
> >>>>>
> >>>>> [0]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html
> >>>>> ---
> >>>>>  lib/conntrack.c |   21 +++++++++++++--------
> >>>>>  1 file changed, 13 insertions(+), 8 deletions(-)
> >>>>>
> >>>>> diff --git a/lib/conntrack.c b/lib/conntrack.c
> >>>>> index 7e1fc4b1f..d2ee127d9 100644
> >>>>> --- a/lib/conntrack.c
> >>>>> +++ b/lib/conntrack.c
> >>>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct
> dp_packet *pkt,
> >>>>>              }
> >>>>>
> >>>>>              nat_packet(pkt, nc, false, ctx->icmp_related);
> >>>>> -            memcpy(&nat_conn->key, &nc->rev_key, sizeof
> nat_conn->key);
> >>>>> -            memcpy(&nat_conn->rev_key, &nc->key, sizeof
> nat_conn->rev_key);
> >>>>> -            nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
> >>>>> -            nat_conn->nat_action = 0;
> >>>>> -            nat_conn->alg = NULL;
> >>>>> -            nat_conn->nat_conn = NULL;
> >>>>> -            uint32_t nat_hash = conn_key_hash(&nat_conn->key,
> ct->hash_basis);
> >>>>> -            cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash);
> >>>>> +            uint32_t nat_hash = conn_key_hash(&nc->rev_key,
> ct->hash_basis);
> >>>>> +            if (nat_hash != ctx->hash) {
> >>>>> +                memcpy(&nat_conn->key, &nc->rev_key, sizeof
> nat_conn->key);
> >>>>> +                memcpy(&nat_conn->rev_key, &nc->key, sizeof
> nat_conn->rev_key);
> >>>>> +                nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
> >>>>> +                nat_conn->nat_action = 0;
> >>>>> +                nat_conn->alg = NULL;
> >>>>> +                nat_conn->nat_conn = NULL;
> >>>>> +                cmap_insert(&ct->conns, &nat_conn->cm_node,
> nat_hash);
> >>>>> +            } else {
> >>>>> +                free(nat_conn);
> >>>>> +                nat_conn = NULL;
> >>>>> +            }
> >>>>>          }
> >>>>>
> >>>>>          nc->nat_conn = nat_conn;
> >>>>>
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


-- 
hepeng
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to