On 3/4/20 8:44 PM, Dumitru Ceara wrote: > On 3/4/20 7:45 PM, Ilya Maximets wrote: >> On 3/4/20 2:01 PM, Dumitru Ceara wrote: >>> On 1/30/20 3:16 PM, Dumitru Ceara wrote: >>>> When a new conntrack zone is entered, the ct_state field is zeroed in >>>> order to avoid using state information from different zones. >>>> >>>> One such scenario is when a packet is double NATed. Assuming two zones >>>> and 3 flows performing the following actions in order on the packet: >>>> 1. ct(zone=5,nat), recirc >>>> 2. ct(zone=1), recirc >>>> 3. ct(zone=1,nat) >>>> >>>> If at step #1 the packet matches an existing NAT entry, it will get >>>> translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT. >>>> At step #2 the new tuple might match an existing connection and >>>> pkt->md.ct_zone is set to 1. >>>> If at step #3 the packet matches an existing NAT entry in zone 1, >>>> handle_nat() will be called to perform the translation but it will >>>> return early because the packet's zone matches the conntrack zone and >>>> the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the >>>> translations in zone 5. >>>> >>>> In order to reliably detect when a packet enters a new conntrack zone >>>> we also need to zero out the pkt->md.ct_zone field when initializing >>>> metadata in pkt_metadata_init(). >>>> >>>> CC: Darrell Ball <dlu...@gmail.com> >>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com> >>>> --- >>>> lib/conntrack.c | 5 +++++ >>>> lib/packets.h | 5 +++++ >>>> 2 files changed, 10 insertions(+) >>>> >>>> diff --git a/lib/conntrack.c b/lib/conntrack.c >>>> index ff5a894..e4d934a 100644 >>>> --- a/lib/conntrack.c >>>> +++ b/lib/conntrack.c >>>> @@ -1277,6 +1277,11 @@ process_one(struct conntrack *ct, struct dp_packet >>>> *pkt, >>>> const struct nat_action_info_t *nat_action_info, >>>> ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper) >>>> { >>>> + /* Reset ct_state whenever entering a new zone. */ >>>> + if (pkt->md.ct_zone != zone) { >>>> + pkt->md.ct_state = 0; >>>> + } >>>> + >>>> bool create_new_conn = false; >>>> conn_key_lookup(ct, &ctx->key, ctx->hash, now, &ctx->conn, >>>> &ctx->reply); >>>> struct conn *conn = ctx->conn; >>>> diff --git a/lib/packets.h b/lib/packets.h >>>> index 5d7f82c..fae64bb 100644 >>>> --- a/lib/packets.h >>>> +++ b/lib/packets.h >>>> @@ -161,6 +161,11 @@ pkt_metadata_init(struct pkt_metadata *md, odp_port_t >>>> port) >>>> */ >>>> memset(md, 0, offsetof(struct pkt_metadata, ct_orig_tuple_ipv6)); >>>> >>>> + /* Explicitly zero out ct_zone in order to be able to properly >>>> determine >>>> + * when a packet enters a new conntrack zone. >>>> + */ >>>> + md->ct_zone = 0; >> > > Hi Ilya, > > Thanks for reviewing this! > >> I'm not an expert in conntrack, but I'm bothered about this change in >> init function. This function works for every single packet and any >> modification might significantly affect performance. > > I agree, I'm not too happy with this either but it seemed the safest > from a correctness perspective. > >> >> There is an assumption that conntrack fields of packet metadata are >> never used if ct_state is zero. So, every user of these fields must >> be sure that ct_state is correctly initialized. > > Yes, but as far as I understand, this doesn't imply that if ct_state is > non-zero the pkt->md.ct_zone field is initialized and safe to use. > >> Will it work if we'll change your 'if' statement in 'process_one' >> function to something like: >> >> if (pkt->md.ct_state && pkt->md.ct_zone != zone) { >> pkt->md.ct_state = 0; >> } >> >> and will not change metadata initialization code? >> > > I think one example of reading unitialized pkt->md.ct_zone is if > process_one() was called for a packet and returned early in case of > CT_CONN_TYPE_UN_NAT: > > https://github.com/openvswitch/ovs/blob/master/lib/conntrack.c#L1303 > > Here we set pkt->md.ct_state to "CS_TRACKED | CS_INVALID" but we don't > touch pkt->md.ct_zone. I think there are cases when this packet could > reach process_one() again further down the pipeline and we'd be reading > the uninitialized pkt->md.ct_zone. > > Right now I don't see any other places where we set pkt->md.ct_state > while leaving pkt->md.ct_zone uninitialized. If that's true I can make > sure ct_zone is properly initialized here too and then the check you > suggested would be ok in all cases and we could get rid of the change in > pkt_metadata_init(). > > I'll give it a try and test it out to see if all scenarios are covered. > > Thanks, > Dumitru >
Hi Ilya, Darrell, I sent out a v2 with the changes mentioned above: https://patchwork.ozlabs.org/patch/1249513/ Thanks, Dumitru >>>> + >>>> /* It can be expensive to zero out all of the tunnel metadata. >>>> However, >>>> * we can just zero out ip_dst and the rest of the data will never be >>>> * looked at. */ >>>> >>> >>> Hi, >>> >>> Just a reminder about this patch. >>> >>> OVN LB harpinning system tests on OVN master and OVN-20.03 with OVS >>> userspace datapath fail because of the issue this patch addresses. >>> >>> Thanks, >>> Dumitru >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev