On 23/06/17 10:37, Eelco Chaudron wrote:
On 06/23/2017 02:12 AM, Ben Pfaff wrote:
On Thu, Jun 22, 2017 at 04:04:44PM -0300, Flavio Leitner wrote:
On Thu, Jun 22, 2017 at 01:04:59AM +0800, Huanle Han wrote:
Hi,all

I get this problem with latest(dbd8112) branch-2.7 code on my Ubuntu.
root@ubuntu:/var/log/# ovs-vsctl show
adf2ea99-0c53-4180-914f-7dadaa71302b
     Bridge test
         Port test
             Interface test
                 type: internal
     Bridge "manage"
         Port "manage"
             Interface "manage"
                 type: internal
error: "could not open network device manage (File exists)"
         Port "veth0"
             Interface "veth0"
         Port "eth0"
             Interface "eth0"
     ovs_version: "2.7.0"

How to reproduce:
1. add bridge "manage", up and add ip on it
2. restart ovs-vswitchd
3. "ovs-vsctl show" displays error message.

The reason:
In following "netdev_open" call on ovs-vswitchd start, input "type" is NULL
and "manage" is opened as a "system" netdev_class iface incorrectly.

#0  netdev_open (name=0x7fffffffe2bc "manage", type=0x0,
netdevp=0x7fffffffc3b0) at ../lib/netdev.c:396
#1  0x000000000052c492 in get_src_addr (ip6_dst=0x7fffffffe2ac,
output_bridge=0x7fffffffe2bc "manage", psrc=0x8f3490) at
../lib/ovs-router.c:141
#2  0x000000000052c85d in ovs_router_insert__ (priority=104 'h',
ip6_dst=0x7fffffffe29c, plen=104 'h', output_bridge=0x7fffffffe2bc
"manage", gw=0x7fffffffe2ac) at ../lib/ovs-router.c:202
#3  0x000000000052c980 in ovs_router_insert (ip_dst=0x7fffffffe29c,
plen=104 'h', output_bridge=0x7fffffffe2bc "manage", gw=0x7fffffffe2ac) at
../lib/ovs-router.c:228
#4 0x000000000058f63a in route_table_handle_msg (change=0x7fffffffe290) at
../lib/route-table.c:295
#5 0x000000000058f1da in route_table_reset () at ../lib/route-table.c:174 #6 0x000000000058f034 in route_table_init () at ../lib/route-table.c:110
#7  0x0000000000495838 in dp_initialize () at ../lib/dpif.c:126
#8  0x0000000000495b40 in dp_enumerate_types (types=0x7fffffffe3a0) at
../lib/dpif.c:244
#9  0x000000000042eb1c in enumerate_types (types=0x7fffffffe3a0) at
../ofproto/ofproto-dpif.c:267
#10 0x000000000041b81c in ofproto_enumerate_types (types=0x7fffffffe3a0) at
../ofproto/ofproto.c:432
#11 0x000000000040df1e in bridge_run__ () at ../vswitchd/bridge.c:3020
#12 0x000000000040e196 in bridge_run () at ../vswitchd/bridge.c:3082
#13 0x00000000004138ef in main (argc=1, argv=0x7fffffffe578) at
../vswitchd/ovs-vswitchd.c:119

After then, ovs fails to netdev_open "manage" with type == "internal".
"File exists" error is reported.
I think commit d3b8f50(netdev: Fix netdev_open() to adhere to class type if
given) introduces this problem. It need be improved.
Your analysis is correct.  One solution is to wait vswitchd to
configure first and only then enable ovs-route.  The problem is that
more modules might use netdev_open() and it doesn't sound like a good
idea to have a control per module.

Another option is to map the device's info to a class instead of using
"system", so we could use internal classes for vports, for instance.
However, that doesn't guarantee it will match with what is configured
in the DB.
This sounds like the kind of problem that I expected the following
commit might cause.

     commit d3b8f5052292b3ba9084ffed097e90b87f2950f5
     Author: Eelco Chaudron <echau...@redhat.com>
     Date:   Thu Jun 1 14:38:09 2017 +0200

         netdev: Fix netdev_open() to adhere to class type if given

If we can't fix it somehow, we might need to revert.

Reverting the above patch does not solve the real issue here. If we revert we donot get the error, but the wrong class is used, hence the wrong callbacks
get called.

The main issue is with netdev_open() being called with type = NULL before
the interface is actually configured in the system. We could track these
"auto" generated interfaces, and once netdev_open() gets called with a
valid type reconfigure (re-create) it. netdev_remove() could work here
but not sure if its safe due to reference counting.

Some background info; I see a lot of netdev_open() with a NULL type, but they just grep some data and closed it again. I found that the tunnel code is keeping a reference to the netdev (for the IP assigned) and hence it is not going away before the "real" interface opens it.

The change below is based on tracking the "classless" opened devices, and once they get opened with a "real" class, re-create them. I did some basic testing and it seem to work fine, also went over related code and could not find any corner case.

So please take a peek, and if it all makes sense I can send out an official patch.

Thanks,

Eelco


diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 3c3c181..b3c57d5 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -45,6 +45,10 @@ struct netdev {
     const struct netdev_class *netdev_class; /* Functions to control
                                                 this device. */

+    /* If this is 'true' the user did not specify a netdev_class when
+ * opening this device, and therefore got assigned to the "system" class */
+    bool auto_classified;
+
     /* A sequence number which indicates changes in one of 'netdev''s
      * properties.   It must be nonzero so that users have a value which
      * they may use as a reset when tracking 'netdev'.
diff --git a/lib/netdev.c b/lib/netdev.c
index a7840a8..89afa71 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -361,7 +361,7 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
     OVS_EXCLUDED(netdev_mutex)
 {
     struct netdev *netdev;
-    int error;
+    int error = 0;

     if (!name[0]) {
/* Reject empty names. This saves the providers having to do this. At @@ -375,6 +375,29 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)

     ovs_mutex_lock(&netdev_mutex);
     netdev = shash_find_data(&netdev_shash, name);
+
+    if (netdev &&
+        type && type[0] && strcmp(type, netdev->netdev_class->type)) {
+
+        if (netdev->auto_classified) {
+ /* If this device was first created without a classification type, + * for example due to routing or tunneling code, and they keep a + * reference, a "classified" call to open will fail. In this case + * we remove the classless device, and re-add it below. We remove + * the netdev from the shash, and change the sequence, so owners of
+             * the old classless device can release/cleanup. */
+            if (netdev->node) {
+                shash_delete(&netdev_shash, netdev->node);
+                netdev->node = NULL;
+                netdev_change_seq_changed(netdev);
+            }
+
+            netdev = NULL;
+        } else {
+            error = EEXIST;
+        }
+    }
+
     if (!netdev) {
         struct netdev_registered_class *rc;

@@ -384,6 +407,7 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
             if (netdev) {
                 memset(netdev, 0, sizeof *netdev);
                 netdev->netdev_class = rc->class;
+                netdev->auto_classified = type && type[0] ? false : true;
                 netdev->name = xstrdup(name);
                 netdev->change_seq = 1;
                 netdev->reconfigure_seq = seq_create();
@@ -416,10 +440,6 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
                       name, type);
             error = EAFNOSUPPORT;
         }
- } else if (type && type[0] && strcmp(type, netdev->netdev_class->type)) {
-        error = EEXIST;
-    } else {
-        error = 0;
     }

     if (!error) {










_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to