Adding openflow to thread. Anil, could someone take a look at this for carbon? We are seeing a connection flapping and end up missing port status updates. This leads to stale models and flows.
This is blocking the carbon sr3. On Jan 24, 2018 12:58 AM, "D Arunprakash" <[email protected]> wrote: > Ignore my previous email. > > > > The tunnel port got deleted around 18:49:29.373 and added back on > 18:52:46.26 > > > > 2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653: sent > (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b): > addr:3e:0c:ed:2e:a9:ba > > > > 2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653: sent > (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b): > addr:8a:2f:9f:c6:fe:d9 > > > > Immediately after tunnel delete, I’m seeing so multiple switch flaps for > quite sometime, > > > > 2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix: entering ACTIVE > > 2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent (Success): OFPT_HELLO > (OF1.3) (xid=0x75): > > version bitmap: 0x04 > > 2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received: OFPT_HELLO > (OF1.3) (xid=0x1): > > > > 2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix: connection closed > by peer > > 2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix: entering > DISCONNECTED > > 2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix: entering ACTIVE > > > > Also, I’m seeing error in karaf log > > > > 2018-01-23 18:49:29,378 | WARN | entLoopGroup-7-3 | > DeviceContextImpl | 280 - org.opendaylight.openflowplugin.impl > - 0.4.3.SNAPSHOT | writePortStatusMessage > > 2018-01-23 18:49:29,379 | WARN | entLoopGroup-7-3 | > DeviceContextImpl | 280 - org.opendaylight.openflowplugin.impl > - 0.4.3.SNAPSHOT | submit transaction for write port status message > > 2018-01-23 18:49:29,379 | WARN | rd-dispatcher-23 | > ShardDataTree | 184 - > org.opendaylight.controller.sal-distributed-datastore > - 1.5.3.SNAPSHOT | member-1-shard-inventory-operational: Store Tx > member-1-datastore-operational-fe-0-chn-8-txn-11-0: Data validation > failed for path /(urn:opendaylight:inventory?revision=2013-08-19)nodes/ > node/node[{(urn:opendaylight:inventory?revision=2013-08-19) > id=openflow:246869078989547}]/AugmentationIdentifier{ > childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, > (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, > (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, > (urn:opendaylight:flow:inventory?revision=2013-08-19)table, > (urn:opendaylight:flow:inventory?revision=2013-08-19)group, > (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, > (urn:opendaylight:flow:inventory?revision=2013-08-19)software, > (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, > (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, > (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, > (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, > (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, > (urn:opendaylight:flow:inventory?revision=2013-08-19)description, > (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, > (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, > (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, > (urn:opendaylight:flow:inventory?revision=2013-08-19) > meter]}/(urn:opendaylight:flow:inventory?revision=2013- > 08-19)table/table[{(urn:opendaylight:flow:inventory? > revision=2013-08-19)id=50}]/flow. > > org.opendaylight.yangtools.yang.data.api.schema.tree. > ModifiedNodeDoesNotExistException: Node /(urn:opendaylight:inventory? > revision=2013-08-19)nodes/node/node[{(urn:opendaylight: > inventory?revision=2013-08-19)id=openflow:246869078989547}]/ > AugmentationIdentifier{childNames=[(urn:opendaylight: > flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow: > inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow: > inventory?revision=2013-08-19)supported-match-types, > (urn:opendaylight:flow:inventory?revision=2013-08-19)table, > (urn:opendaylight:flow:inventory?revision=2013-08-19)group, > (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, > (urn:opendaylight:flow:inventory?revision=2013-08-19)software, > (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, > (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, > (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features, > (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, > (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, > (urn:opendaylight:flow:inventory?revision=2013-08-19)description, > (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, > (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, > (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, > (urn:opendaylight:flow:inventory?revision=2013-08-19) > meter]}/(urn:opendaylight:flow:inventory?revision=2013- > 08-19)table/table[{(urn:opendaylight:flow:inventory? > revision=2013-08-19)id=50}]/flow does not exist. Cannot apply > modification to its children. > > > > We need to check why there is multiple switch disconnect and reconnect and > how ofp handles the same. > > > > Regards, > > Arun > > > > *From:* Vishal Thapar > *Sent:* Wednesday, January 24, 2018 9:52 AM > *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>; > Josh Hershberg <[email protected]>; D Arunprakash < > [email protected]> > *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]> > *Subject:* RE: is dhcp issue fixed on carbon? > > > > Missed adding most important detail and added Arun. > > > > Inventory operational is still showing old port and new port for some > reason. I guess that is what caused problems. > > > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/ > 263/log_02_l3.html.gz#s1-t25-k4-k2-k1-k2-k56 > > > > {"id":"openflow:246869078989547:4","flow-node- > inventory:supported":"","flow-node-inventory:peer-features": > "","flow-node-inventory:port-number":4,"flow-node- > inventory:hardware-address":"3e:0c:ed:2e:a9:ba","flow-node- > inventory:current-feature":"","flow-node-inventory:maximum- > speed":0,"flow-node-inventory:reason":"add","flow-node- > inventory:configuration":"","flow-node-inventory: > advertised-features":"","flow-node-inventory:current-speed": > 0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node- > inventory:state":{"link-down":false,"blocked":false,"live":false}} > > > > {"id":"openflow:246869078989547:9","flow-node- > inventory:supported":"","flow-node-inventory:peer-features": > "","flow-node-inventory:port-number":9,"flow-node- > inventory:hardware-address":"8a:2f:9f:c6:fe:d9","flow-node- > inventory:current-feature":"","flow-node-inventory:maximum- > speed":0,"flow-node-inventory:reason":"add","flow-node- > inventory:configuration":"","flow-node-inventory: > advertised-features":"","flow-node-inventory:current-speed": > 0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node- > inventory:state":{"link-down":false,"blocked":false,"live":false}} > > > > OVS output from same set of logs: > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/ > 263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k11-k4 > > > > 9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9 > > config: 0 > > state: 0 > > speed: 0 Mbps now, 0 Mbps max > > > > So for now I’d peg it as OFPlugin issue. It didn’t detect or inform us of > old port delete and that is why we didn’t delete old flows. Though > wondering if something else in IFM code could’ve handled it, but don’t > think we handle OfPort number changes, expect a delete+add in such > scenarios. Faseela can pitch in why we have service binding entry with new > port number but flow is still using old one. > > > > Regards, > > Vishal. > > > > *From:* Vishal Thapar > *Sent:* 24 January 2018 09:26 > *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>; > Josh Hershberg <[email protected]> > *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]> > *Subject:* RE: is dhcp issue fixed on carbon? > > > > Quick analysis: > > > > Not related to policy stuff. Service binding has entry for the new port > number but table 220 flow is still using old port number. > > > > { "bound-services": [ { "flow-cookie": 134217735, "flow-priority": 9, > "instruction": [ { "apply-actions": { "action": [ { "order": 0, > "output-action": { "max-length": 0, "output-node-connector": "*9*" } } ] > }, "order": 0 } ], "service-name": "default.tun55fb50d0a2b", > "service-priority": 9, "service-type": > "interface-service-bindings:service-type-flow-based" > } ], "interface-name": "tun55fb50d0a2b", "service-mode": > "interface-service-bindings:service-mode-egress" } > > > > {"id":"246869078989547.220.tun55fb50d0a2b.0","priority": > 9,"table_id":220,"installHw":true,"hard-timeout":0,"match": > {"openflowplugin-extension-general:extension-list":[{"extension-key":" > openflowplugin-extension-nicira-match:nxm-nx-reg6-key", > "extension":{"openflowplugin-extension-nicira-match:nxm-nx- > reg":{"value":4096,"reg":"nicira-match:nxm-nx-reg6"}}}]} > ,"cookie":134217735,"flow-name":"default.tun55fb50d0a2b" > ,"strict":true,"instructions":{"instruction":[{"order":0," > apply-actions":{"action":[{"order":0,"output-action":{" > max-length":0,"output-node-connector":"*4*"}}]}}]}," > barrier":false,"idle-timeout":0} > > > > cookie=0x8000007, duration=403.965s, table=220, n_packets=0, n_bytes=0, > priority=9,reg6=0x1000 actions=output:*4* > > > > In OVS logs you can see this tunnel port getting deleted and then coming > back in with a different OfPort. > > > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/ > 263/compute_2/ovs-vswitchd.log.gz > > > > It goes from 4 to 9. This happens due to clean up in previous suite which > doesn’t actually clean up everything and leaves entry for that old service > binding. Can confirm it from interfaces-state entry for same port in first > and second suites. So we have stale flows and stale service bindings for > old tunnel port. Could probably check with OFPlugin how they handle update > of a flow, may probably not work. > > > > We need to check if cleanup has been done completely before moving to next > suite. This is where the work we been doing on tools comes in. > > > > Regards, > > Vishal. > > > > *From:* Faseela K > *Sent:* 24 January 2018 08:10 > *To:* Sam Hague <[email protected]>; Josh Hershberg <[email protected]> > *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen < > [email protected]>; Manu B <[email protected]> > *Subject:* RE: is dhcp issue fixed on carbon? > > > > Looks more or less similar issue, tunnel flow is programmed in table 220 > with older tunnel’s port number, which was deleted in l2 suite. However > policy code has not kicked in. I will take a detailed look on what is > causing this issue now. > > > > Thanks, > > Faseela > > > > *From:* Faseela K > *Sent:* Wednesday, January 24, 2018 7:48 AM > *To:* 'Sam Hague' <[email protected]>; Josh Hershberg <[email protected] > > > *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen < > [email protected]>; Manu B <[email protected]> > *Subject:* RE: is dhcp issue fixed on carbon? > > > > Thanks Sam for initial triaging. > > I will take a look at this. > > > > *From:* Sam Hague [mailto:[email protected] <[email protected]>] > *Sent:* Wednesday, January 24, 2018 6:54 AM > *To:* Faseela K <[email protected]>; Josh Hershberg < > [email protected]> > *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen < > [email protected]>; Manu B <[email protected]> > *Subject:* Re: is dhcp issue fixed on carbon? > > > > OK, seems pretty consistent that table 220 flows are not showing up. > Vishal, Faseela, can you see if it is like the policymgr one where the > bind/unbind was wrong? That seems the closest culprit as those were the > last patches merged. > > > > Here is another case where the table 220 flow is missing in suite [5] of > job [6]. This time the port missing is a tunnel port. "9(tun55fb50d0a2b): > addr:8a:2f:9f:c6:fe:d9" is missing from table 220. And then in suite [7] > of the same job this port has the same issue where the tunnel port is > missing: "16(tap28760838-a7): addr:fe:16:3e:26:0a:e3" > > > > [5] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/ > 263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k12-k4 > > > > [6] https://logs.opendaylight.org/releng/vex-yul-odl- > jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful- > carbon/263/log_02_l3.html.gz > > > > [7] https://logs.opendaylight.org/releng/vex-yul-odl- > jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful- > carbon/263/log_04_security_group.html.gz > > > > On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague <[email protected]> wrote: > > further details for Josh since the original email doesn't have many... > > > > - so the "l3.Check Vm Instances Have Ip Address" test fails with the net1 > not being able to get all the vm ips for it's three vms. > > - '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None' - this means the > first vm of the three did not get a ip > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-upstream-stateful- > carbon/298/log_02_l3.html.gz#s1-t11-k8 > > > > - looks at the neutron ports to find which port goes with vm1 > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-upstream-stateful- > carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k4-k1-k2 > > get the missing ip as 31.0.0.6, then look at next log to get the port > > - look at the 31.0.0.x addresses, we know 31,0.0 > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-upstream-stateful- > carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k8-k2 > > 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f | > ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' | > ACTIVE | > > > > - I know the first vm (and second) are both on the compute_1 so look at > the ovs logs on compute_1 > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-upstream-stateful- > carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4 > > > > - compute_1, in the ofctl show br-int, we see port 7 > > 7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f > > > > - then check flows to see if there is a table 220 flow for port 7 > > https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/ > netvirt-csit-1node-openstack-ocata-upstream-stateful- > carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k12-k4 > > And the table 220 flow for port 7 is not there, so the vm can't get an IP. > > > > [3] is the patch vishal pushed to fix a similar issue the first time we > saw this. What we found is that the elan tag was being reused, because a > port was deleted and then a new one created and the elan tag reused. So you > ended up with a tunnel port stomping on a vm port. > > [3] https://git.opendaylight.org/gerrit/#/c/67009/ > > > > On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague <[email protected]> wrote: > > Adding Josh to thread. > > > > On Tue, Jan 23, 2018 at 2:25 PM, Faseela K <[email protected]> wrote: > > Manu, > > Could you please take a look at the DHCP failure in the below run? > > I am caught up with something else, will help you out in initial > triaging. > > Thanks, > > Faseela > > > > *From:* Sam Hague [mailto:[email protected]] > *Sent:* Monday, January 22, 2018 10:57 PM > *To:* Vishal Thapar <[email protected]>; Faseela K < > [email protected]>; Jamo Luhrsen <[email protected]> > *Subject:* is dhcp issue fixed on carbon? > > > > Vishal, Faseela, > > > > can you look at this job run to see if the issue you fixed with the > policymgr binding is fixed? in this build the whole poligymgr bundle has > been removed. This is carbon so I just removed the whole bundle as we would > never use it. Could that have uncovered something that the code was doing? > If so, then even master and nitrogen should have the issue since there we > disabled building policymgr - so should be the same as removing it. > > > > Other thing, merged in carbon is the bind/unbind patches for elan and > dhcp. Could those have an impact? > > > > Thanks, Sam > > > > I don't see "7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f" pop up in the > table 220 flows which was the problem before. > > > > 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f | > ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' | > ACTIVE | > > > > Thanks, Sam > > > > [1] https://logs.opendaylight.org/releng/vex-yul-odl- > jenkins-1/netvirt-csit-1node-openstack-ocata-upstream- > stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4 > > > > > > >
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
