On 11/23/20 6:30 PM, Numan Siddique wrote: > On Fri, Nov 20, 2020 at 9:08 PM Dumitru Ceara <dce...@redhat.com> wrote: >> >> When a port binding of type "l3gateway" is claimed its remote peer >> port_binding is also stored in local_datapath.peer_ports[].remote. >> >> If the remote peer port_binding is deleted first (i.e., before the local >> "l3gateway" one) then we need to remove the complete >> local_datapath.peer_ports[] entry in order to avoid ending up using >> dangling pointers to already freed port bindings. >> >> Also, properly reset local_datapath->has_local_l3gateway in >> remove_pb_from_local_datapath(). >> >> Ilya reported this issue found by AddressSanitizer during his testing: >> >> ==1816017==ERROR: AddressSanitizer: heap-use-after-free on address >> 0x6140000cb170 at pc 0x0000005ab574 bp 0x7fff68925a30 sp 0x7fff68925a28 >> READ of size 8 at 0x6140000cb170 thread T0 >> #0 0x5ab573 in put_replace_chassis_mac_flows >> git/ovn/controller/physical.c:550:9 >> #1 0x5a65eb in consider_port_binding >> git/ovn/controller/physical.c:1168:13 >> #2 0x5a8764 in physical_run git/ovn/controller/physical.c:1607:9 >> #3 0x5a0064 in flow_output_physical_flow_changes_handler >> git/ovn/controller/ovn-controller.c:2127:9 >> #4 0x5db423 in engine_compute git/ovn/lib/inc-proc-eng.c:306:18 >> #5 0x5dae1f in engine_run_node git/ovn/lib/inc-proc-eng.c:352:14 >> #6 0x5dac74 in engine_run git/ovn/lib/inc-proc-eng.c:377:9 >> #7 0x59ad64 in main git/ovn/controller/ovn-controller.c >> #8 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) >> #9 0x480b2d in _start (git/ovn/controller/ovn-controller+0x480b2d) >> >> 0x6140000cb170 is located 304 bytes inside of 408-byte region >> [0x6140000cb040,0x6140000cb1d8) >> freed by thread T0 here: >> #0 0x520d07 in free (git/ovn/controller/ovn-controller+0x520d07) >> #1 0x712de7 in ovsdb_idl_db_track_clear git/ovs/lib/ovsdb-idl.c:1984:21 >> #2 0x59b5cd in main git/ovn/controller/ovn-controller.c:2762:9 >> #3 0x7f39fa6491a2 in __libc_start_main (/lib64/libc.so.6+0x271a2) >> >> Reported-by: Ilya Maximets <i.maxim...@ovn.org> >> Fixes: 354bdba51abf ("ovn-controller: I-P for SB port binding and OVS >> interface in runtime_data.") >> Signed-off-by: Dumitru Ceara <dce...@redhat.com> > > Hi Dumitru, > > Please see below for few comments. > >> --- >> controller/binding.c | 47 ++++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 42 insertions(+), 5 deletions(-) >> >> diff --git a/controller/binding.c b/controller/binding.c >> index eb92679..cb60c5d 100644 >> --- a/controller/binding.c >> +++ b/controller/binding.c >> @@ -1523,21 +1523,41 @@ binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, >> } >> >> static const struct sbrec_port_binding * >> -get_peer_lport(const struct sbrec_port_binding *pb, >> - struct binding_ctx_in *b_ctx_in) >> +get_peer_lport__(const struct sbrec_port_binding *pb, >> + struct binding_ctx_in *b_ctx_in) >> { >> const char *peer_name = smap_get(&pb->options, "peer"); >> - if (strcmp(pb->type, "patch") || !peer_name) { >> + >> + if (!peer_name) { >> return NULL; >> } >> >> const struct sbrec_port_binding *peer; >> peer = lport_lookup_by_name(b_ctx_in->sbrec_port_binding_by_name, >> peer_name); >> - >> return (peer && peer->datapath) ? peer : NULL; >> } >> >> +static const struct sbrec_port_binding * >> +get_l3gw_peer_lport(const struct sbrec_port_binding *pb, >> + struct binding_ctx_in *b_ctx_in) >> +{ >> + if (strcmp(pb->type, "l3gateway")) { >> + return NULL; >> + } >> + return get_peer_lport__(pb, b_ctx_in); >> +} >> + >> +static const struct sbrec_port_binding * >> +get_peer_lport(const struct sbrec_port_binding *pb, >> + struct binding_ctx_in *b_ctx_in) >> +{ >> + if (strcmp(pb->type, "patch")) { >> + return NULL; >> + } >> + return get_peer_lport__(pb, b_ctx_in); >> +} >> + >> /* This function adds the local datapath of the 'peer' of >> * lport 'pb' to the local datapaths if it is not yet added. >> */ >> @@ -1654,7 +1674,9 @@ remove_pb_from_local_datapath(const struct >> sbrec_port_binding *pb, >> pb->logical_port)) { >> ld->localnet_port = NULL; >> } >> - } else if (!strcmp(pb->type, "l3gateway")) { >> + } > > This makes sense. Thanks for spotting this. > >> + >> + if (!strcmp(pb->type, "l3gateway")) { >> const char *chassis_id = smap_get(&pb->options, >> "l3gateway-chassis"); >> if (chassis_id && !strcmp(chassis_id, chassis_rec->name)) { >> @@ -1956,12 +1978,27 @@ handle_deleted_lport(const struct sbrec_port_binding >> *pb, >> struct binding_ctx_in *b_ctx_in, >> struct binding_ctx_out *b_ctx_out) >> { >> + /* If the binding is local, remove it. */ >> struct local_datapath *ld = >> get_local_datapath(b_ctx_out->local_datapaths, >> pb->datapath->tunnel_key); >> if (ld) { >> remove_pb_from_local_datapath(pb, b_ctx_in->chassis_rec, >> b_ctx_out, ld); >> + return; >> + } >> + >> + /* If the binding is not local, if 'pb' is a L3 gateway port, we should >> + * remove its peer, if that one is local. >> + */ > > Hi Dumitru, >
Hi Numan, > Is this issue seen wiith this Ilya's patch series applied ? > https://patchwork.ozlabs.org/project/ovn/list/?series=214426 > Yes, that's the easiest way to see the issue but the bug is also present on master branch. > I have not looked into that series yet. But I fail to understand how > can we encounter this issue with the present master ? > > Because > - If a logical router is a gateway router, then all the logical > router ports of that router and the peer logical switch ports > are claimed by the same chassis where the options:chassis=<ch> is set. > - So when a logical router port of l3gateway is deleted or its peer > (of type l3gatewayport) is deleted, then > if that port_binding is not in the local datapath, then its peer > cannot be in the local datapath too. > > Do you see a scenarion where this could happen with the present master ? > As far as I understand there's no guarantee that SB updates that happen in the same ovn-northd transaction are sent to ovn-controller in the same jsonrpc update message and I think that's what's causing this. I managed to replicate the issue on the master branch by simulating a bit of delay in ovn-controller and making the failing test run faster by skipping the packet checks: https://github.com/dceara/ovn/commit/801327f0dd593f2593de19a34f2e6d0101f7aa1c With AddressSanitizer enabled: $ ./boot.sh && ./configure --with-ovs-source=~/git-repos/ovs CFLAGS="-g -O2 -fsanitize=address -fno-omit-frame-pointer -fno-common" && make And running test 159 in a loop until it fails, I occasionally hit the use after free: $ while true; do make check TESTSUITEFLAGS="-d 159" || break; done $ head -10 tests/testsuite.dir/159/asan* ================================================================= ==1964550==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000033b70 at pc 0x000000480fd6 bp 0x7fff7920c640 sp 0x7fff7920c630 READ of size 8 at 0x614000033b70 thread T0 #0 0x480fd5 in put_replace_chassis_mac_flows controller/physical.c:550 #1 0x480fd5 in consider_port_binding controller/physical.c:1168 #2 0x482f58 in physical_run controller/physical.c:1605 #3 0x474432 in flow_output_physical_flow_changes_handler controller/ovn-controller.c:2170 #4 0x4bf417 in engine_compute lib/inc-proc-eng.c:306 #5 0x4bf417 in engine_run_node lib/inc-proc-eng.c:352 #6 0x4bf417 in engine_run lib/inc-proc-eng.c:377 Thanks, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev