On Thu, Apr 28, 2016 at 08:23:59PM -0400, Ramu Ramamurthy wrote: > On graceful-restart of ovn-controller, the chassis row is > inserted in the Chassis table. During this transaction, > there is a window of time where an idl row-read may not > return the newly created row - even though the row should exist, > but the transaction is in an incomplete state. > As a result, get_chassis() in binding_run() returns a null chassis record > binding_run exits early, and does not > create local_datapaths, and patch_run deletes > localnet patch ports. In a later run, the localnet patch ports are recreated. > > This is reproducable consistently but not on every restart. > The fix is to handle the case that the chassis record > may be null in binding_run, and yet create local_datapaths. > > RFC because, I am not sure if the fix is correct - ie, > can there be local_datapaths with no chassis record ? And if this > fix is not correct, then what is the correct fix ? Would it be > to not run binding_run() and patch_run() when ovnsb_idl_txn is null ?
I think that this is correct, or at any rate reasonable. I applied this to master, folding in the following logic simplification: diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 9f50cd5..a0d8b96 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -187,10 +187,8 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, if (iface_rec && ctx->ovs_idl_txn) { update_qos(iface_rec, binding_rec); } - if (chassis_rec && binding_rec->chassis == chassis_rec) { - continue; - } - if (ctx->ovnsb_idl_txn && chassis_rec) { + if (ctx->ovnsb_idl_txn && chassis_rec + && binding_rec->chassis != chassis_rec) { if (binding_rec->chassis) { VLOG_INFO("Changing chassis for lport %s from %s to %s.", binding_rec->logical_port, _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev