On Fri, Nov 20, 2020 at 9:08 PM Dumitru Ceara <dce...@redhat.com> wrote: > > When a port binding of type "l3gateway" is claimed its remote peer > port_binding is also stored in local_datapath.peer_ports[].remote. > > If the remote peer port_binding is deleted first (i.e., before the local > "l3gateway" one) then we need to remove the complete > local_datapath.peer_ports[] entry in order to avoid ending up using > dangling pointers to already freed port bindings. > > Also, properly reset local_datapath->has_local_l3gateway in > remove_pb_from_local_datapath(). > > Ilya reported this issue found by AddressSanitizer during his testing: > > ==1816017==ERROR: AddressSanitizer: heap-use-after-free on address > 0x6140000cb170 at pc 0x0000005ab574 bp 0x7fff68925a30 sp 0x7fff68925a28 > READ of size 8 at 0x6140000cb170 thread T0 > #0 0x5ab573 in put_replace_chassis_mac_flows > git/ovn/controller/physical.c:550:9 > #1 0x5a65eb in consider_port_binding git/ovn/controller/physical.c:1168:13 > #2 0x5a8764 in physical_run git/ovn/controller/physical.c:1607:9 > #3 0x5a0064 in flow_output_physical_flow_changes_handler > git/ovn/controller/ovn-controller.c:2127:9 > #4 0x5db423 in engine_compute git/ovn/lib/inc-proc-eng.c:306:18 > #5 0x5dae1f in engine_run_node git/ovn/lib/inc-proc-eng.c:352:14 > #6 0x5dac74 in engine_run git/ovn/lib/inc-proc-eng.c:377:9 > #7 0x59ad64 in main git/ovn/controller/ovn-controller.c > #8 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) > #9 0x480b2d in _start (git/ovn/controller/ovn-controller+0x480b2d) > > 0x6140000cb170 is located 304 bytes inside of 408-byte region > [0x6140000cb040,0x6140000cb1d8) > freed by thread T0 here: > #0 0x520d07 in free (git/ovn/controller/ovn-controller+0x520d07) > #1 0x712de7 in ovsdb_idl_db_track_clear git/ovs/lib/ovsdb-idl.c:1984:21 > #2 0x59b5cd in main git/ovn/controller/ovn-controller.c:2762:9 > #3 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) > > Reported-by: Ilya Maximets <i.maxim...@ovn.org> > Fixes: 354bdba51abf ("ovn-controller: I-P for SB port binding and OVS > interface in runtime_data.") > Signed-off-by: Dumitru Ceara <dce...@redhat.com>
Hi Dumitru, Please see below for few comments. > --- > controller/binding.c | 47 ++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 42 insertions(+), 5 deletions(-) > > diff --git a/controller/binding.c b/controller/binding.c > index eb92679..cb60c5d 100644 > --- a/controller/binding.c > +++ b/controller/binding.c > @@ -1523,21 +1523,41 @@ binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, > } > > static const struct sbrec_port_binding * > -get_peer_lport(const struct sbrec_port_binding *pb, > - struct binding_ctx_in *b_ctx_in) > +get_peer_lport__(const struct sbrec_port_binding *pb, > + struct binding_ctx_in *b_ctx_in) > { > const char *peer_name = smap_get(&pb->options, "peer"); > - if (strcmp(pb->type, "patch") || !peer_name) { > + > + if (!peer_name) { > return NULL; > } > > const struct sbrec_port_binding *peer; > peer = lport_lookup_by_name(b_ctx_in->sbrec_port_binding_by_name, > peer_name); > - > return (peer && peer->datapath) ? peer : NULL; > } > > +static const struct sbrec_port_binding * > +get_l3gw_peer_lport(const struct sbrec_port_binding *pb, > + struct binding_ctx_in *b_ctx_in) > +{ > + if (strcmp(pb->type, "l3gateway")) { > + return NULL; > + } > + return get_peer_lport__(pb, b_ctx_in); > +} > + > +static const struct sbrec_port_binding * > +get_peer_lport(const struct sbrec_port_binding *pb, > + struct binding_ctx_in *b_ctx_in) > +{ > + if (strcmp(pb->type, "patch")) { > + return NULL; > + } > + return get_peer_lport__(pb, b_ctx_in); > +} > + > /* This function adds the local datapath of the 'peer' of > * lport 'pb' to the local datapaths if it is not yet added. > */ > @@ -1654,7 +1674,9 @@ remove_pb_from_local_datapath(const struct > sbrec_port_binding *pb, > pb->logical_port)) { > ld->localnet_port = NULL; > } > - } else if (!strcmp(pb->type, "l3gateway")) { > + } This makes sense. Thanks for spotting this. > + > + if (!strcmp(pb->type, "l3gateway")) { > const char *chassis_id = smap_get(&pb->options, > "l3gateway-chassis"); > if (chassis_id && !strcmp(chassis_id, chassis_rec->name)) { > @@ -1956,12 +1978,27 @@ handle_deleted_lport(const struct sbrec_port_binding > *pb, > struct binding_ctx_in *b_ctx_in, > struct binding_ctx_out *b_ctx_out) > { > + /* If the binding is local, remove it. */ > struct local_datapath *ld = > get_local_datapath(b_ctx_out->local_datapaths, > pb->datapath->tunnel_key); > if (ld) { > remove_pb_from_local_datapath(pb, b_ctx_in->chassis_rec, > b_ctx_out, ld); > + return; > + } > + > + /* If the binding is not local, if 'pb' is a L3 gateway port, we should > + * remove its peer, if that one is local. > + */ Hi Dumitru, Is this issue seen wiith this Ilya's patch series applied ? https://patchwork.ozlabs.org/project/ovn/list/?series=214426 I have not looked into that series yet. But I fail to understand how can we encounter this issue with the present master ? Because - If a logical router is a gateway router, then all the logical router ports of that router and the peer logical switch ports are claimed by the same chassis where the options:chassis=<ch> is set. - So when a logical router port of l3gateway is deleted or its peer (of type l3gatewayport) is deleted, then if that port_binding is not in the local datapath, then its peer cannot be in the local datapath too. Do you see a scenarion where this could happen with the present master ? Thanks Numan > + pb = get_l3gw_peer_lport(pb, b_ctx_in); > + if (pb) { > + ld = get_local_datapath(b_ctx_out->local_datapaths, > + pb->datapath->tunnel_key); > + if (ld) { > + remove_pb_from_local_datapath(pb, b_ctx_in->chassis_rec, > b_ctx_out, > + ld); > + } > } > } > > -- > 1.8.3.1 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev