On graceful-restart of ovn-controller, the chassis row is inserted in the Chassis table. During this transaction, there is a window of time where an idl row-read may not return the newly created row - even though the row should exist, but the transaction is in an incomplete state. As a result, get_chassis() in binding_run() returns a null chassis record binding_run exits early, and does not create local_datapaths, and patch_run deletes localnet patch ports. In a later run, the localnet patch ports are recreated.
This is reproducable consistently but not on every restart. The fix is to handle the case that the chassis record may be null in binding_run, and yet create local_datapaths. RFC because, I am not sure if the fix is correct - ie, can there be local_datapaths with no chassis record ? And if this fix is not correct, then what is the correct fix ? Would it be to not run binding_run() and patch_run() when ovnsb_idl_txn is null ? Restart logs follow with commentary: 2016-04-28T18:35:42.448Z|00001|vlog|INFO|opened log file /home/ovs/ovs/tests/testsuite.dir/2035/hv/ovn-controller.log 2016-04-28T18:35:42.449Z|00002|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connecting... 2016-04-28T18:35:42.449Z|00003|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connected 2016-04-28T18:35:42.452Z|00004|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connecting... 2016-04-28T18:35:42.452Z|00005|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connected 2016-04-28T18:35:42.454Z|00006|ovsdb_idl|INFO|ovsdb_idl_txn_insert: Chassis row inserted into transaction above 2016-04-28T18:35:42.454Z|00007|binding|INFO|Claiming lport localvif2 for this chassis. 2016-04-28T18:35:42.454Z|00008|binding|INFO|Claiming lport localvif3 for this chassis. 2016-04-28T18:35:42.454Z|00009|binding|INFO|Claiming lport localcif4 for this chassis. 2016-04-28T18:35:42.454Z|00010|binding|INFO|Claiming lport localcif5 for this chassis. 2016-04-28T18:35:42.454Z|00011|binding|INFO|Claiming lport localcif1 for this chassis. 2016-04-28T18:35:42.454Z|00012|binding|INFO|Claiming lport localvif1 for this chassis. 2016-04-28T18:35:42.454Z|00013|binding|INFO|Claiming lport localvif201 for this chassis. 2016-04-28T18:35:42.454Z|00014|binding|INFO|Claiming lport localcif3 for this chassis. 2016-04-28T18:35:42.454Z|00015|binding|INFO|Claiming lport localcif2 for this chassis. Binding run found the chassis record and has claimed the vifs 2016-04-28T18:35:42.455Z|00016|ofctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch 2016-04-28T18:35:42.455Z|00017|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting... 2016-04-28T18:35:42.455Z|00018|pinctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch 2016-04-28T18:35:42.456Z|00019|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting... 2016-04-28T18:35:42.457Z|00020|ovsdb_idl|INFO|ovsdb_idl_row_clear_new: At this point read of Chassis table returns no rows, and the transaction status is still incomplete. 2016-04-28T18:35:42.457Z|00021|binding|INFO|no chassis rec! Binding run exits early because chassis_rec was null 2016-04-28T18:35:42.459Z|00022|patch|INFO|removing port patch-br-int-to-localnet201 2016-04-28T18:35:42.459Z|00023|patch|INFO|removing port patch-br-int-to-localnet1 2016-04-28T18:35:42.459Z|00024|patch|INFO|removing port patch-localnet1-to-br-int 2016-04-28T18:35:42.459Z|00025|patch|INFO|removing port patch-localnet201-to-br-int Localnet ports are removed above, because local_datapaths dont exist 2016-04-28T18:35:42.459Z|00026|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected 2016-04-28T18:35:42.460Z|00027|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected 2016-04-28T18:35:42.460Z|00028|ovsdb_idl|INFO|ovsdb_idl_row_create: Now, the transaction is complete Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/binding.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 32fcb85..9f50cd5 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -155,9 +155,6 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, const struct sbrec_port_binding *binding_rec; chassis_rec = get_chassis(ctx->ovnsb_idl, chassis_id); - if (!chassis_rec) { - return; - } struct shash lports = SHASH_INITIALIZER(&lports); if (br_int) { @@ -190,10 +187,10 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, if (iface_rec && ctx->ovs_idl_txn) { update_qos(iface_rec, binding_rec); } - if (binding_rec->chassis == chassis_rec) { + if (chassis_rec && binding_rec->chassis == chassis_rec) { continue; } - if (ctx->ovnsb_idl_txn) { + if (ctx->ovnsb_idl_txn && chassis_rec) { if (binding_rec->chassis) { VLOG_INFO("Changing chassis for lport %s from %s to %s.", binding_rec->logical_port, @@ -205,7 +202,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, } sbrec_port_binding_set_chassis(binding_rec, chassis_rec); } - } else if (binding_rec->chassis == chassis_rec) { + } else if (chassis_rec && binding_rec->chassis == chassis_rec) { if (ctx->ovnsb_idl_txn) { VLOG_INFO("Releasing lport %s from this chassis.", binding_rec->logical_port); -- 2.3.9 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev