I joined this list recently, and encountered something very similar to this
user:

On 8 January 2016 at 04:52, Benoît <benoitne at gmail.com
<http://openvswitch.org/mailman/listinfo/discuss>> wrote:
>* I have an issue where ovs-vswitchd is starting too early.
*>* I got a persistent name for an interface (pnic_wwan) but it is happening
*>* after ovs-vswitchd starts so it makes an error as it does'nt find the
*>* interface name!
*>>*     Bridge vswitch_wwan
*>*         Port pnic_wwan
*>*             Interface pnic_wwan
*>*                 error: "could not open network device pnic_wwan (No such
*>* device)"*


I am testing with Fedora 23. It seems that with openvswitch.service
enabled, openvswitch-nonetwork.service starts too early, before any of the
physical network interfaces have been detected.

During a "clean" shutdown process, and if the OVS bridge is configured
using /etc/sysconfig/network/* with TYPE=OVSBridge, the bridge is normally
removed on shutdown, which leaves the system in an acceptable state as when
openvswitch-nonetwork.service starts early, there is no bridge in
existence, so there is no problem.

However, if shutdown is unclean for any reason - if ifdown-ovs was not
executed properly for any reason - then the system comes up with the
physical network interface ports already pre-associated with the bridge,
and because the bridge is started before networking exists, it leads to
"could not open network device ens2f0 (No such device)" (in my case, the
persistence naming is the default as selected by udev configuration).

This error persists, in that the physical ports are unusable in this state.
Now, in some cases, the ifup-ovs will delete and re-add the port, so other
than errors during startup, the bridge becomes healthy when the port is
re-added. In the fali cases, "ovs-vsctl show" will show the physical
interfaces with the "No such device" error, even though the interfaces
clearly do exist by this point.

In my case, I am trying to use TYPE=OVSBond. I have dual 10 GbE and I
wanted to use an OVS bridge instead of a Linux bridge for my host
networking, with several VLAN configured as TYPE=OVSIntPort on the bridge.
If I configured the physical interfaces as TYPE=OVSPort, and I have
TYPE=OVSBond list them with BOND_IFACES, then I get a different problem at
startup...  Where the TYPE=OVSPort initialization tries to re-add the port
with:

ovs-vsctl -t 10 -- --if-exists del-port ens2f0 -- add-port ens2f0

But this fails with "cannot create a port named ens2f0 because an interface
named ens2f0 already exists on bridge br-ext". In this case, the port is
part of the bond, not directly part of the bridge, and the re-add code
isn't able to work around this problem.

During further investigation, I found that after the system is up (and
particularly after network.service has been run), I could "systemctl
restart openvswitch" and "ovs-vsctl show" would no longer list "No such
device" for the physical interface ports.

After trying to understand and dis-entangle all the cause and effect, I
finally realized that ifup-ovs will start OVS on demand, after the physical
interfaces have been detected and assigned names (including possible
renames ... eth0 => ens2f0, ...), and that I could avoid starting OVS too
early, simple by *not* enabling the openvswitch.service.

This is now working... By *not* enabling openvswitch.service, and letting
ifup-ovs start up openvswitch on demand, the system is coming up reliably
whether clean shutdown or force reset (I want the server to be crash-safe,
so I explicitly test this case).... But, I'm now concerned about the
direction of Fedora and openvswitch-nonetwork.service, and I am wondering
if my work-around of not enabling openvswitch.service makes sense, and is
part of the design of ifup-ovs that will be supported going forwards, or is
just lucky that it works, and this could break with a future openvswitch
update, or a future version of Fedora?

I think the openvswitch-nonetwork.service starting early, and presuming
that physical interfaces can actually be used that early, is a defect in
openvswitch. I think the intent is to make OVS bridges and internal ports
available for use with the rest of the networking support, but this only
currently works properly for virtual bridges that are not connected to
physical interfaces. By "works properly", I mean that it comes up clean
whether shutdown was "clean" or "dirty", and doesn't have errors about "No
such device", and does not need the port to be re-added to clear this error
state.

Without any real understanding of the complexity here, I am thinking that
when OpenVSwitch starts early, before the physical network interfaces exist
according to the kernel, OpenVSwitch should delay initialization of those
ports or bonds until the physical network interfaces actually do exist. The
"No such device" issue should automatically clear as soon as the device
actually does come into existence. In my case, I would like the "bond0"
(TYPE=OVSBond) to be re-initialized as soon as one or both of "ens2f0"
(TYPE=OVSPort) or "ens2f1" (TYPE=OVSPort) become real, similar to what
would happen when the link state for the real interfaces goes up or down. I
think this should also applies to regular ports on the bridge. There should
be no need for ifup-ovs to re-create the port if it already exists, and
just needs to be properly initialized *after* the physical interface comes
into existence in the kernel. Is this something that is already understood,
or already being worked on? I found very little information on this with
Google searching, which is how I stumbled upon this original thread...

Other work-arounds that I tried that may be of interest to people to
understand exactly how it fails, and how it behaves:

1) I tried to use regular TYPE=Ethernet (instead of TYPE=OVSPort) network
interfaces, and "ifup" the physical interfaces as a "Pre" command to the
openvswitch-nonetwork.service. This gave a warning about "Delaying
initialization" from "ifup". I believe it *did* fix the problem, but only
because the "ifup" failed, so the openvswitch-nonetwork.service startup was
aborted early, and it happened later due to ifup-ovs. As even "/bin/false"
would have had the same effect here, I considered this an invalid
work-around and this helped lead me to the conclusion of disabling
openvswitch.service altogether as the more sensible work-around.

2) I tried to "modprobe ixgbe" (the network driver for the Intel cards I
have) as a "Pre" command to the openvswitch-nonetwork.service. This had
similar behaviour to the "ifup" above. Also not a very good solution.

-- 
Mark Mielke <mark.mie...@gmail.com>
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to