On Thu, Oct 27, 2016 at 3:32 AM, Bertrand Low <[email protected]> wrote:
> Hi there, > > > > Has anyone noticed a “connection flapping” behaviour for the br-int bridge > when openvswitch is restarted? I’ve provided some details below. > > Also, in general, is it expected for NetVirt to be able to use an existing > br-int bridge or does the br-int creation have to be initiated by NetVirt > (i.e. can an operator run “add-br br-int” before “set-manager” on the > switch)? > > > > This scenario is related to bug 6944 – the pipeline flows for an existing > br-int bridge are not consistently installed on the switch when openvswitch > is restarted. > > > > Scenario: > > - *ODL 3-node cluster running distribution-karaf-0.5.0-Boron > with the odl-ovsdb-openstack feature installed* > > - *A standalone OVS node running OVS version 2.4.0* > > > > - Step 1) set-manager of OVS node to all 3-nodes in the cluster > > - Step 2) confirm that br-int is created and all pipeline flows > are pushed to the OVS node > > - Step 3) on the OVS node, run systemctl stop openvswitch > > - Step 4) on the OVS node, run systemctl start openvswitch > > > > - In some runs of the above sequence of steps, the pipeline > flows for the br-int bridge are pushed to the OVS node; in other instances, > no flows are pushed. > > > > Debugging this, it looks like the following things are occurring at step > 4) (excerpts from karaf.log): > > > > 1) 2016-10-26 15:16:25,220 | INFO | entLoopGroup-9-2 | > DeviceManagerImpl | 297 - org.opendaylight.openflowplugin.impl > - 0.3.0.Boron | ConnectionEvent: Device connected to controller, Device:/ > 192.168.254.36:37625, NodeId:Uri [_value=openflow:207293242576193] > > 2) 2016-10-26 15:16:25,574 | INFO | ntDispatcherImpl | > OF13Provider | 315 - > org.opendaylight.netvirt.openstack.net-virt-providers > - 1.3.0.Boron | initializeOFFlowRules: bridgeName: br-int > > 3) 2016-10-26 15:16:26,601 | INFO | entLoopGroup-9-2 | > SystemNotificationsListenerImpl | 297 - org.opendaylight.openflowplugin.impl > - 0.3.0.Boron | ConnectionEvent: Connection closed by device, Device:/ > 192.168.254.36:37625, NodeId:openflow:207293242576193 > > 4) 2016-10-26 15:16:27,247 | INFO | entLoopGroup-9-1 | > DeviceManagerImpl | 297 - org.opendaylight.openflowplugin.impl > - 0.3.0.Boron | ConnectionEvent: Device connected to controller, Device:/ > 192.168.254.36:37631, NodeId:Uri [_value=openflow:202927631275083] > > > > At 1), br-int connects to ODL with ID openflow:X (in this instance, > 207293242576193) > > At 2), NetVirt writes the pipeline flows into the config datastore. From > my testing, this is always successful and the flows are always installed in > the config datastore. > > At 3), br-int (openflow:207293242576193) disconnects from ODL – not sure > why this occurs. Could this be a bug on Openvswitch? > > At 4), br-int reconnects to ODL, this time with a different ID openflow:Y > (in this case, 202927631275083) > > > > In the success case where the pipeline flows are pushed down, it would > appear that SalFlowServiceImpl was able to push the flows down to the > switch in-between 2) and 3). i.e. ODL was able to push the pipeline flows > for openflow:X before the switch disconnected. In the failure case, the > above sequence of events is exactly the same, but no flows were pushed down. > > In some success instances, SalFlowServiceImpl would log that the “Flow > add…finished without error”, and in some success cases SalFlowServiceImpl > would log that “Flow add failed for flow…errors=Device disconnected” even > though the flow was actually installed on the switch. > > > > The ovs-vswitchd.log has the following logs corresponding to the above > connect-disconnect-connect sequence: > > > > 2016-10-26T19:16:24.910Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: > connecting... > > 2016-10-26T19:16:24.910Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: > connected > > 2016-10-26T19:16:24.912Z|00006|ofproto_dpif|INFO|system@ovs-system: > Datapath supports recirculation > > 2016-10-26T19:16:24.912Z|00007|ofproto_dpif|INFO|system@ovs-system: MPLS > label stack length probed as 0 > > 2016-10-26T19:16:24.912Z|00008|ofproto_dpif|INFO|system@ovs-system: > datapath does not support masked set action feature. > > 2016-10-26T19:16:24.912Z|00009|ofproto_dpif|INFO|system@ovs-system: > Datapath does not support unique flow ids > > 2016-10-26T19:16:24.916Z|00010|bridge|INFO|bridge br-int: added interface > br-int on port 65534 > > 2016-10-26T19:16:24.916Z|00011|bridge|INFO|bridge br-int: using datapath > ID 0000bc8838168d41 > > 2016-10-26T19:16:24.916Z|00012|connmgr|INFO|br-int: added service > controller "punix:/var/run/openvswitch/br-int.mgmt" > > 2016-10-26T19:16:24.916Z|00013|connmgr|INFO|br-int: added primary > controller "tcp:192.168.254.35:6653" > > 2016-10-26T19:16:24.917Z|00014|rconn|INFO|br-int<->tcp:192.168.254.35:6653: > connecting... > > 2016-10-26T19:16:24.917Z|00015|connmgr|INFO|br-int: added primary > controller "tcp:192.168.254.33:6653" > > 2016-10-26T19:16:24.917Z|00016|rconn|INFO|br-int<->tcp:192.168.254.33:6653: > connecting... > > 2016-10-26T19:16:24.917Z|00017|connmgr|INFO|br-int: added primary > controller "tcp:192.168.254.34:6653" > > 2016-10-26T19:16:24.917Z|00018|rconn|INFO|br-int<->tcp:192.168.254.34:6653: > connecting... > > 2016-10-26T19:16:24.924Z|00019|rconn|INFO|br-int<->tcp:192.168.254.33:6653: > connected > > 2016-10-26T19:16:24.924Z|00020|rconn|INFO|br-int<->tcp:192.168.254.34:6653: > connected > > 2016-10-26T19:16:24.925Z|00021|bridge|INFO|ovs-vswitchd (Open vSwitch) > 2.4.0 > > 2016-10-26T19:16:24.927Z|00022|rconn|INFO|br-int<->tcp:192.168.254.35:6653: > connected > > 2016-10-26T19:16:24.929Z|00001|ofproto_dpif_upcall(handler5)|INFO|received > packet on unassociated datapath port 0 > > 2016-10-26T19:16:25.430Z|00001|dpif(revalidator4)|WARN|system@ovs-system: > failed to flow_get (Invalid argument) > ufid:b1d72bef-1acb-4755-850a-2aeb4fb75ad2 > <empty>, packets:0, bytes:0, used:never > > 2016-10-26T19:16:25.430Z|00002|ofproto_dpif_upcall(revalidator4)|WARN|Failed > to acquire udpif_key corresponding to unexpected flow (Invalid argument): > ufid:b1d72bef-1acb-4755-850a-2aeb4fb75ad2 > > 2016-10-26T19:16:26.312Z|00023|bridge|INFO|bridge br-int: using datapath > ID 0000b88fc560944b > maybe something here is the issue. Looks like odl pushed flows when the switch reconnected, of which some had issues. But here, did the dpid stay the same or did it change? If it changed, maybe ovs is disconnecting and reconnecting to push the dpid change to the controller. I thought the dpid would stay the same as long as the network devices stayed the same. Maybe we should add the ovsdb config to keep the dpid the same across restarts? Also maybe enable for ovsdb and vswitchd logging. > 2016-10-26T19:16:26.313Z|00024|rconn|INFO|br-int<->tcp:192.168.254.35:6653: > disconnecting > > 2016-10-26T19:16:26.313Z|00025|rconn|INFO|br-int<->tcp:192.168.254.33:6653: > disconnecting > > 2016-10-26T19:16:26.313Z|00026|rconn|INFO|br-int<->tcp:192.168.254.34:6653: > disconnecting > > 2016-10-26T19:16:26.952Z|00028|rconn|INFO|br-int<->tcp:192.168.254.35:6653: > connecting... > > 2016-10-26T19:16:26.952Z|00029|rconn|INFO|br-int<->tcp:192.168.254.33:6653: > connecting... > > 2016-10-26T19:16:26.952Z|00030|rconn|INFO|br-int<->tcp:192.168.254.34:6653: > connecting... > > 2016-10-26T19:16:26.958Z|00031|rconn|INFO|br-int<->tcp:192.168.254.35:6653: > connected > > 2016-10-26T19:16:26.958Z|00032|rconn|INFO|br-int<->tcp:192.168.254.33:6653: > connected > > 2016-10-26T19:16:26.958Z|00033|rconn|INFO|br-int<->tcp:192.168.254.34:6653: > connected > > 2016-10-26T19:16:34.926Z|00034|memory|INFO|35400 kB peak resident set > size after 10.0 seconds > > > > If the flows were successfully pushed, you would also see the line > > > > |connmgr|INFO|br-int<->tcp:192.168.254.33:6653: 12 flow_mods in the last > 0 s (12 adds) > > > > Before the last “connecting….connected” attempt. > > > > Any thoughts on the observed connection flapping (bug on OVS?) and on > supporting the openvswitch restart scenario? > > > > Thanks, > > Bertrand > > > > ::DISCLAIMER:: > ------------------------------------------------------------ > ------------------------------------------------------------ > ---------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > ------------------------------------------------------------ > ------------------------------------------------------------ > ---------------------------- > > _______________________________________________ > netvirt-dev mailing list > [email protected] > https://lists.opendaylight.org/mailman/listinfo/netvirt-dev > >
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
