On graceful-restart of ovn-controller, the chassis row is
inserted in the Chassis table. During this transaction,
there is a window of time where an idl row-read may not
return the newly created row - even though the row should exist,
but the transaction is in an incomplete state.
As a result, get_chassis() in binding_run() returns a null chassis record
binding_run exits early, and does not
create local_datapaths, and patch_run deletes
localnet patch ports. In a later run, the localnet patch ports are recreated.

This is reproducable consistently but not on every restart.
The fix is to handle the case that the chassis record
may be null in binding_run, and yet create local_datapaths.

RFC because, I am not sure if the fix is correct - ie,
can there be local_datapaths with no chassis record ? And if this
fix is not correct, then what is the correct fix ? Would it be
to not run binding_run() and patch_run() when ovnsb_idl_txn is null ?

Restart logs follow with commentary:

2016-04-28T18:35:42.448Z|00001|vlog|INFO|opened log file 
/home/ovs/ovs/tests/testsuite.dir/2035/hv/ovn-controller.log
2016-04-28T18:35:42.449Z|00002|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock:
 connecting...
2016-04-28T18:35:42.449Z|00003|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock:
 connected
2016-04-28T18:35:42.452Z|00004|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock:
 connecting...
2016-04-28T18:35:42.452Z|00005|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock:
 connected
2016-04-28T18:35:42.454Z|00006|ovsdb_idl|INFO|ovsdb_idl_txn_insert:
                Chassis row inserted into transaction above
2016-04-28T18:35:42.454Z|00007|binding|INFO|Claiming lport localvif2 for this 
chassis.
2016-04-28T18:35:42.454Z|00008|binding|INFO|Claiming lport localvif3 for this 
chassis.
2016-04-28T18:35:42.454Z|00009|binding|INFO|Claiming lport localcif4 for this 
chassis.
2016-04-28T18:35:42.454Z|00010|binding|INFO|Claiming lport localcif5 for this 
chassis.
2016-04-28T18:35:42.454Z|00011|binding|INFO|Claiming lport localcif1 for this 
chassis.
2016-04-28T18:35:42.454Z|00012|binding|INFO|Claiming lport localvif1 for this 
chassis.
2016-04-28T18:35:42.454Z|00013|binding|INFO|Claiming lport localvif201 for this 
chassis.
2016-04-28T18:35:42.454Z|00014|binding|INFO|Claiming lport localcif3 for this 
chassis.
2016-04-28T18:35:42.454Z|00015|binding|INFO|Claiming lport localcif2 for this 
chassis.
               Binding run found the chassis record and has claimed the vifs
2016-04-28T18:35:42.455Z|00016|ofctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt:
 connecting to switch
2016-04-28T18:35:42.455Z|00017|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt:
 connecting...
2016-04-28T18:35:42.455Z|00018|pinctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt:
 connecting to switch
2016-04-28T18:35:42.456Z|00019|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt:
 connecting...
2016-04-28T18:35:42.457Z|00020|ovsdb_idl|INFO|ovsdb_idl_row_clear_new:
                At this point read of Chassis table returns no rows, and
                the transaction status is still incomplete.
2016-04-28T18:35:42.457Z|00021|binding|INFO|no chassis rec!
                Binding run exits early because chassis_rec was null
2016-04-28T18:35:42.459Z|00022|patch|INFO|removing port 
patch-br-int-to-localnet201
2016-04-28T18:35:42.459Z|00023|patch|INFO|removing port 
patch-br-int-to-localnet1
2016-04-28T18:35:42.459Z|00024|patch|INFO|removing port 
patch-localnet1-to-br-int
2016-04-28T18:35:42.459Z|00025|patch|INFO|removing port 
patch-localnet201-to-br-int
               Localnet ports are removed above, because local_datapaths dont 
exist
2016-04-28T18:35:42.459Z|00026|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt:
 connected
2016-04-28T18:35:42.460Z|00027|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt:
 connected
2016-04-28T18:35:42.460Z|00028|ovsdb_idl|INFO|ovsdb_idl_row_create:
               Now, the transaction is complete

Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com>
---
 ovn/controller/binding.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index 32fcb85..9f50cd5 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -155,9 +155,6 @@ binding_run(struct controller_ctx *ctx, const struct 
ovsrec_bridge *br_int,
     const struct sbrec_port_binding *binding_rec;
 
     chassis_rec = get_chassis(ctx->ovnsb_idl, chassis_id);
-    if (!chassis_rec) {
-        return;
-    }
 
     struct shash lports = SHASH_INITIALIZER(&lports);
     if (br_int) {
@@ -190,10 +187,10 @@ binding_run(struct controller_ctx *ctx, const struct 
ovsrec_bridge *br_int,
             if (iface_rec && ctx->ovs_idl_txn) {
                 update_qos(iface_rec, binding_rec);
             }
-            if (binding_rec->chassis == chassis_rec) {
+            if (chassis_rec && binding_rec->chassis == chassis_rec) {
                 continue;
             }
-            if (ctx->ovnsb_idl_txn) {
+            if (ctx->ovnsb_idl_txn && chassis_rec) {
                 if (binding_rec->chassis) {
                     VLOG_INFO("Changing chassis for lport %s from %s to %s.",
                               binding_rec->logical_port,
@@ -205,7 +202,7 @@ binding_run(struct controller_ctx *ctx, const struct 
ovsrec_bridge *br_int,
                 }
                 sbrec_port_binding_set_chassis(binding_rec, chassis_rec);
             }
-        } else if (binding_rec->chassis == chassis_rec) {
+        } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
             if (ctx->ovnsb_idl_txn) {
                 VLOG_INFO("Releasing lport %s from this chassis.",
                           binding_rec->logical_port);
-- 
2.3.9

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to