Hi, Paolo, IIRC, you have a revision version on these patches? I guess it should be closer to the upstream than mine?
Aaron Conole <acon...@redhat.com> 于2023年5月17日周三 21:54写道: > Paolo Valerio <pvale...@redhat.com> writes: > > > Ilya Maximets <i.maxim...@ovn.org> writes: > > > >> On 5/4/23 19:21, Paolo Valerio wrote: > >>> Ilya Maximets <i.maxim...@ovn.org> writes: > >>> > >>>> On 4/19/23 20:40, Paolo Valerio wrote: > >>>>> During the creation of a new connection, there's a chance both key > and > >>>>> rev_key end up having the same hash. This is more common in the case > >>>>> of all-zero snat with no collisions. In that case, once the > >>>>> connection is expired, but not cleaned up, if a new packet with the > >>>>> same 5-tuple is received, an assertion failure gets triggered in > >>>>> conn_update_state() because of a previous failure of retrieving a > >>>>> CT_CONN_TYPE_DEFAULT connection. > >>>>> > >>>>> Fix it by releasing the nat_conn during the connection creation in > the > >>>>> case of same hash for both key and rev_key. > >>>> > >>>> This sounds a bit odd. Shouldn't we treat hash collision as a normal > case? > >>>> > >>>> Looking at the code, I'm assuming that the issue comes from the > following > >>>> part in process_one(): > >>>> > >>>> if (OVS_LIKELY(conn)) { > >>>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { > >>>> ... > >>>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, > &ctx->reply); > >>>> > >>>> And here we get the same connection again, because the default one is > already > >>>> expired. Is that correct? > >>>> > >>>> If so, maybe we should add an extra condition to conn_key_lookup() to > >>>> only look for DEFAULT connections instead, just for this case? Since > >>>> we really don't want to get the UN_NAT one here. > >>>> > >>> > >>> Hello Ilya, > >>> > >>> It's a fair point. > >>> I initially thought about the approach you're suggesting, but I had > some > >>> concerns about it that I'll try to summarize below. > >>> > >>> For sure it would fix the issue (it could require the first patch to be > >>> applied as well for the branches with rcu exp lists). > >>> > >>> Based on the current logic, new packets matching that expired > connection > >>> but not evicted will be marked as +inv and further packets will be > >>> marked so for the whole sweep interval unless an exception like this > get > >>> added: > >>> > >>> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); > >>> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ > >>> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); > >>> /* special case where there's hash collision */ > >>> if (!conn && ctx->hash != hash) { > >>> pkt->md.ct_state |= CS_INVALID; > >>> write_ct_md(pkt, zone, NULL, NULL, NULL); > >>> ... > >>> return; > >>> } > >>> > >>> This would further require that subsequent lookup in the > create_new_conn > >>> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: > >>> > >>> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); > >>> /* Only check for CT_CONN_TYPE_DEFAULT */ > >>> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { > >>> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, > >>> helper, alg_exp, ct_alg_ctl, tp_id); > >>> } > >>> > >>> otherwise we could incur in a false positive which prevent to create a > >>> new connection. > >> > >> I'm not really sure if what described above is more correct way of doing > >> things or not... Aaron, do you have opinion on this? > >> > >> Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the > >> moment DEFAULT counterpart of it expires? Or that will that be against > >> some logic / not possible to do? > >> > > > > As far as I can tell, this could not be straightforward as simply > > marking it as expired should not be reliable (e.g. doing it from the > > sweeper), and I guess that managing the expiration time field for the > > nat_conn as well would require updating the nat_conn every time the > > default one gets updated, probably making it a bit unpractical. > > > > Another approach would be removing the nat_conn [1] altogether. > > The problem in this case is backporting. Some adjustments that would add > > to the patch might be needed for older branches. > > > > [1] > > > https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/ > > I think that work was interesting, and maybe the best way to go > forward. Backports would become difficult, though - agreed. > > >> > >>> > >>>> Best regards, Ilya Maximets. > >>>> > >>>>> > >>>>> Reported-by: Michael Plato <michael.pl...@tu-berlin.de> > >>>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP > address.") > >>>>> Signed-off-by: Paolo Valerio <pvale...@redhat.com> > >>>>> --- > >>>>> In this thread [0] there are some more details. A similar > >>>>> approach here could be to avoid to add the nat_conn to the cmap and > >>>>> letting the sweeper release the memory for nat_conn once the whole > >>>>> connection gets freed. > >>>>> That approach could still be ok, but the drawback is that it could > >>>>> require a different patch for older branches that don't include > >>>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with > >>>>> rculists."). It still worth to be considered. > >>>>> > >>>>> [0] > https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html > >>>>> --- > >>>>> lib/conntrack.c | 21 +++++++++++++-------- > >>>>> 1 file changed, 13 insertions(+), 8 deletions(-) > >>>>> > >>>>> diff --git a/lib/conntrack.c b/lib/conntrack.c > >>>>> index 7e1fc4b1f..d2ee127d9 100644 > >>>>> --- a/lib/conntrack.c > >>>>> +++ b/lib/conntrack.c > >>>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct > dp_packet *pkt, > >>>>> } > >>>>> > >>>>> nat_packet(pkt, nc, false, ctx->icmp_related); > >>>>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof > nat_conn->key); > >>>>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof > nat_conn->rev_key); > >>>>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > >>>>> - nat_conn->nat_action = 0; > >>>>> - nat_conn->alg = NULL; > >>>>> - nat_conn->nat_conn = NULL; > >>>>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, > ct->hash_basis); > >>>>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > >>>>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, > ct->hash_basis); > >>>>> + if (nat_hash != ctx->hash) { > >>>>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof > nat_conn->key); > >>>>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof > nat_conn->rev_key); > >>>>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > >>>>> + nat_conn->nat_action = 0; > >>>>> + nat_conn->alg = NULL; > >>>>> + nat_conn->nat_conn = NULL; > >>>>> + cmap_insert(&ct->conns, &nat_conn->cm_node, > nat_hash); > >>>>> + } else { > >>>>> + free(nat_conn); > >>>>> + nat_conn = NULL; > >>>>> + } > >>>>> } > >>>>> > >>>>> nc->nat_conn = nat_conn; > >>>>> > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> d...@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > -- hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev