Adding openflow to thread.

Anil, could someone take a look at this for carbon? We are seeing a
connection flapping and end up missing port status updates. This leads to
stale models and flows.

This is blocking the carbon sr3.

On Jan 24, 2018 12:58 AM, "D Arunprakash" <[email protected]>
wrote:

> Ignore my previous email.
>
>
>
> The tunnel port got deleted around 18:49:29.373 and added back on
> 18:52:46.26
>
>
>
> 2018-01-23T18:49:29.373Z|01979|vconn|DBG|tcp:10.30.170.63:6653: sent
> (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): DEL: 4(tun55fb50d0a2b):
> addr:3e:0c:ed:2e:a9:ba
>
>
>
> 2018-01-23T18:52:46.261Z|03083|vconn|DBG|tcp:10.30.170.63:6653: sent
> (Success): OFPT_PORT_STATUS (OF1.3) (xid=0x0): ADD: 9(tun55fb50d0a2b):
> addr:8a:2f:9f:c6:fe:d9
>
>
>
> Immediately after tunnel delete, I’m seeing so multiple switch flaps for
> quite sometime,
>
>
>
> 2018-01-23T18:49:35.155Z|02108|rconn|DBG|br-int<->unix: entering ACTIVE
>
> 2018-01-23T18:49:35.155Z|02109|vconn|DBG|unix: sent (Success): OFPT_HELLO
> (OF1.3) (xid=0x75):
>
> version bitmap: 0x04
>
> 2018-01-23T18:49:35.155Z|02110|vconn|DBG|unix: received: OFPT_HELLO
> (OF1.3) (xid=0x1):
>
>
>
> 2018-01-23T18:49:35.307Z|02144|rconn|DBG|br-int<->unix: connection closed
> by peer
>
> 2018-01-23T18:49:35.307Z|02145|rconn|DBG|br-int<->unix: entering
> DISCONNECTED
>
> 2018-01-23T18:49:35.324Z|02146|rconn|DBG|br-int<->unix: entering ACTIVE
>
>
>
> Also, I’m seeing error in karaf log
>
>
>
> 2018-01-23 18:49:29,378 | WARN  | entLoopGroup-7-3 |
> DeviceContextImpl                | 280 - org.opendaylight.openflowplugin.impl
> - 0.4.3.SNAPSHOT | writePortStatusMessage
>
> 2018-01-23 18:49:29,379 | WARN  | entLoopGroup-7-3 |
> DeviceContextImpl                | 280 - org.opendaylight.openflowplugin.impl
> - 0.4.3.SNAPSHOT | submit transaction for write port status message
>
> 2018-01-23 18:49:29,379 | WARN  | rd-dispatcher-23 |
> ShardDataTree                    | 184 - 
> org.opendaylight.controller.sal-distributed-datastore
> - 1.5.3.SNAPSHOT | member-1-shard-inventory-operational: Store Tx
> member-1-datastore-operational-fe-0-chn-8-txn-11-0: Data validation
> failed for path /(urn:opendaylight:inventory?revision=2013-08-19)nodes/
> node/node[{(urn:opendaylight:inventory?revision=2013-08-19)
> id=openflow:246869078989547}]/AugmentationIdentifier{
> childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)port-number,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)table,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)group,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)software,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)description,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)
> meter]}/(urn:opendaylight:flow:inventory?revision=2013-
> 08-19)table/table[{(urn:opendaylight:flow:inventory?
> revision=2013-08-19)id=50}]/flow.
>
> org.opendaylight.yangtools.yang.data.api.schema.tree.
> ModifiedNodeDoesNotExistException: Node /(urn:opendaylight:inventory?
> revision=2013-08-19)nodes/node/node[{(urn:opendaylight:
> inventory?revision=2013-08-19)id=openflow:246869078989547}]/
> AugmentationIdentifier{childNames=[(urn:opendaylight:
> flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:
> inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:
> inventory?revision=2013-08-19)supported-match-types,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)table,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)group,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)software,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)description,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter,
> (urn:opendaylight:flow:inventory?revision=2013-08-19)
> meter]}/(urn:opendaylight:flow:inventory?revision=2013-
> 08-19)table/table[{(urn:opendaylight:flow:inventory?
> revision=2013-08-19)id=50}]/flow does not exist. Cannot apply
> modification to its children.
>
>
>
> We need to check why there is multiple switch disconnect and reconnect and
> how ofp handles the same.
>
>
>
> Regards,
>
> Arun
>
>
>
> *From:* Vishal Thapar
> *Sent:* Wednesday, January 24, 2018 9:52 AM
> *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>;
> Josh Hershberg <[email protected]>; D Arunprakash <
> [email protected]>
> *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]>
> *Subject:* RE: is dhcp issue fixed on carbon?
>
>
>
> Missed adding most important detail and added Arun.
>
>
>
> Inventory operational is still showing old port and new port for some
> reason. I guess that is what caused problems.
>
>
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/
> 263/log_02_l3.html.gz#s1-t25-k4-k2-k1-k2-k56
>
>
>
> {"id":"openflow:246869078989547:4","flow-node-
> inventory:supported":"","flow-node-inventory:peer-features":
> "","flow-node-inventory:port-number":4,"flow-node-
> inventory:hardware-address":"3e:0c:ed:2e:a9:ba","flow-node-
> inventory:current-feature":"","flow-node-inventory:maximum-
> speed":0,"flow-node-inventory:reason":"add","flow-node-
> inventory:configuration":"","flow-node-inventory:
> advertised-features":"","flow-node-inventory:current-speed":
> 0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node-
> inventory:state":{"link-down":false,"blocked":false,"live":false}}
>
>
>
> {"id":"openflow:246869078989547:9","flow-node-
> inventory:supported":"","flow-node-inventory:peer-features":
> "","flow-node-inventory:port-number":9,"flow-node-
> inventory:hardware-address":"8a:2f:9f:c6:fe:d9","flow-node-
> inventory:current-feature":"","flow-node-inventory:maximum-
> speed":0,"flow-node-inventory:reason":"add","flow-node-
> inventory:configuration":"","flow-node-inventory:
> advertised-features":"","flow-node-inventory:current-speed":
> 0,"flow-node-inventory:name":"tun55fb50d0a2b","flow-node-
> inventory:state":{"link-down":false,"blocked":false,"live":false}}
>
>
>
> OVS output from same set of logs:
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/
> 263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k11-k4
>
>
>
> 9(tun55fb50d0a2b): addr:8a:2f:9f:c6:fe:d9
>
>     config: 0
>
>     state: 0
>
>     speed: 0 Mbps now, 0 Mbps max
>
>
>
> So for now I’d peg it as OFPlugin issue. It didn’t detect or inform us of
> old port delete and that is why we didn’t delete old flows. Though
> wondering if something else in IFM code could’ve handled it, but don’t
> think we handle OfPort number changes, expect a delete+add in such
> scenarios. Faseela can pitch in why we have service binding entry with new
> port number but flow is still using old one.
>
>
>
> Regards,
>
> Vishal.
>
>
>
> *From:* Vishal Thapar
> *Sent:* 24 January 2018 09:26
> *To:* Faseela K <[email protected]>; Sam Hague <[email protected]>;
> Josh Hershberg <[email protected]>
> *Cc:* Jamo Luhrsen <[email protected]>; Manu B <[email protected]>
> *Subject:* RE: is dhcp issue fixed on carbon?
>
>
>
> Quick analysis:
>
>
>
> Not related to policy stuff. Service binding has entry for the new port
> number but table 220 flow is still using old port number.
>
>
>
> { "bound-services": [ { "flow-cookie": 134217735, "flow-priority": 9,
> "instruction": [ { "apply-actions": { "action": [ { "order": 0,
> "output-action": { "max-length": 0, "output-node-connector": "*9*" } } ]
> }, "order": 0 } ], "service-name": "default.tun55fb50d0a2b",
> "service-priority": 9, "service-type": 
> "interface-service-bindings:service-type-flow-based"
> } ], "interface-name": "tun55fb50d0a2b", "service-mode":
> "interface-service-bindings:service-mode-egress" }
>
>
>
> {"id":"246869078989547.220.tun55fb50d0a2b.0","priority":
> 9,"table_id":220,"installHw":true,"hard-timeout":0,"match":
> {"openflowplugin-extension-general:extension-list":[{"extension-key":"
> openflowplugin-extension-nicira-match:nxm-nx-reg6-key",
> "extension":{"openflowplugin-extension-nicira-match:nxm-nx-
> reg":{"value":4096,"reg":"nicira-match:nxm-nx-reg6"}}}]}
> ,"cookie":134217735,"flow-name":"default.tun55fb50d0a2b"
> ,"strict":true,"instructions":{"instruction":[{"order":0,"
> apply-actions":{"action":[{"order":0,"output-action":{"
> max-length":0,"output-node-connector":"*4*"}}]}}]},"
> barrier":false,"idle-timeout":0}
>
>
>
> cookie=0x8000007, duration=403.965s, table=220, n_packets=0, n_bytes=0,
> priority=9,reg6=0x1000 actions=output:*4*
>
>
>
> In OVS logs you can see this tunnel port getting deleted and then coming
> back in with a different OfPort.
>
>
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/
> 263/compute_2/ovs-vswitchd.log.gz
>
>
>
> It goes from 4 to 9. This happens due to clean up in previous suite which
> doesn’t actually clean up everything and leaves entry for that old service
> binding. Can confirm it from interfaces-state entry for same port in first
> and second suites. So we have stale flows and stale service bindings for
> old tunnel port. Could probably check with OFPlugin how they handle update
> of a flow, may probably not work.
>
>
>
> We need to check if cleanup has been done completely before moving to next
> suite. This is where the work we been doing on tools comes in.
>
>
>
> Regards,
>
> Vishal.
>
>
>
> *From:* Faseela K
> *Sent:* 24 January 2018 08:10
> *To:* Sam Hague <[email protected]>; Josh Hershberg <[email protected]>
> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen <
> [email protected]>; Manu B <[email protected]>
> *Subject:* RE: is dhcp issue fixed on carbon?
>
>
>
> Looks more or less similar issue, tunnel flow is programmed in table 220
> with older tunnel’s port number, which was deleted in l2 suite. However
> policy code has not kicked in. I will take a detailed look on what is
> causing this issue now.
>
>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Faseela K
> *Sent:* Wednesday, January 24, 2018 7:48 AM
> *To:* 'Sam Hague' <[email protected]>; Josh Hershberg <[email protected]
> >
> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen <
> [email protected]>; Manu B <[email protected]>
> *Subject:* RE: is dhcp issue fixed on carbon?
>
>
>
> Thanks Sam for initial triaging.
>
> I will take a look at this.
>
>
>
> *From:* Sam Hague [mailto:[email protected] <[email protected]>]
> *Sent:* Wednesday, January 24, 2018 6:54 AM
> *To:* Faseela K <[email protected]>; Josh Hershberg <
> [email protected]>
> *Cc:* Vishal Thapar <[email protected]>; Jamo Luhrsen <
> [email protected]>; Manu B <[email protected]>
> *Subject:* Re: is dhcp issue fixed on carbon?
>
>
>
> OK, seems pretty consistent that table 220 flows are not showing up.
> Vishal, Faseela, can you see if it is like the policymgr one where the
> bind/unbind was wrong? That seems the closest culprit as those were the
> last patches merged.
>
>
>
> Here is another case where the table 220 flow is missing in suite [5] of
> job [6]. This time the port missing is a tunnel port. "9(tun55fb50d0a2b):
> addr:8a:2f:9f:c6:fe:d9" is missing from table 220. And then in suite [7]
> of the same job this port has the same issue where the tunnel port is
> missing: "16(tap28760838-a7): addr:fe:16:3e:26:0a:e3"
>
>
>
> [5] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-gate-stateful-carbon/
> 263/log_02_l3.html.gz#s1-t25-k4-k1-k3-k1-k12-k4
>
>
>
> [6] https://logs.opendaylight.org/releng/vex-yul-odl-
> jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-
> carbon/263/log_02_l3.html.gz
>
>
>
> [7] https://logs.opendaylight.org/releng/vex-yul-odl-
> jenkins-1/netvirt-csit-1node-openstack-ocata-gate-stateful-
> carbon/263/log_04_security_group.html.gz
>
>
>
> On Tue, Jan 23, 2018 at 3:33 PM, Sam Hague <[email protected]> wrote:
>
> further details for Josh since the original email doesn't have many...
>
>
>
> - so the "l3.Check Vm Instances Have Ip Address" test fails with the net1
> not being able to get all the vm ips for it's three vms.
>
> - '[u'None', u'31.0.0.9', u'31.0.0.10']' contains 'None' - this means the
> first vm of the three did not get a ip
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-upstream-stateful-
> carbon/298/log_02_l3.html.gz#s1-t11-k8
>
>
>
> - looks at the neutron ports to find which port goes with vm1
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-upstream-stateful-
> carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k4-k1-k2
>
> get the missing ip as 31.0.0.6, then look at next log to get the port
>
> - look at the 31.0.0.x addresses, we know 31,0.0
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-upstream-stateful-
> carbon/298/log_02_l3.html.gz#s1-t11-k9-k1-k8-k2
>
> 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f |
> ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' |
> ACTIVE |
>
>
>
> - I know the first vm (and second) are both on the compute_1 so look at
> the ovs logs on compute_1
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-upstream-stateful-
> carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4
>
>
>
> - compute_1, in the ofctl show br-int, we see port 7
>
> 7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f
>
>
>
> - then check flows to see if there is a table 220 flow for port 7
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
> netvirt-csit-1node-openstack-ocata-upstream-stateful-
> carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k12-k4
>
> And the table 220 flow for port 7 is not there, so the vm can't get an IP.
>
>
>
> [3] is the patch vishal pushed to fix a similar issue the first time we
> saw this. What we found is that the elan tag was being reused, because a
> port was deleted and then a new one created and the elan tag reused. So you
> ended up with a tunnel port stomping on a vm port.
>
> [3]  https://git.opendaylight.org/gerrit/#/c/67009/
>
>
>
> On Tue, Jan 23, 2018 at 3:07 PM, Sam Hague <[email protected]> wrote:
>
> Adding Josh to thread.
>
>
>
> On Tue, Jan 23, 2018 at 2:25 PM, Faseela K <[email protected]> wrote:
>
> Manu,
>
>    Could you please take a look at the DHCP failure in the below run?
>
>     I am caught up with something else, will help you out in initial
> triaging.
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Sam Hague [mailto:[email protected]]
> *Sent:* Monday, January 22, 2018 10:57 PM
> *To:* Vishal Thapar <[email protected]>; Faseela K <
> [email protected]>; Jamo Luhrsen <[email protected]>
> *Subject:* is dhcp issue fixed on carbon?
>
>
>
> Vishal, Faseela,
>
>
>
> can you look at this job run to see if the issue you fixed with the
> policymgr binding is fixed? in this build the whole poligymgr bundle has
> been removed. This is carbon so I just removed the whole bundle as we would
> never use it. Could that have uncovered something that the code was doing?
> If so, then even master and nitrogen should have the issue since there we
> disabled building policymgr - so should be the same as removing it.
>
>
>
> Other thing, merged in carbon is the bind/unbind patches for elan and
> dhcp. Could those have an impact?
>
>
>
> Thanks, Sam
>
>
>
> I don't see "7(tap3862fa17-4e): addr:fe:16:3e:96:06:3f" pop up in the
> table 220 flows which was the problem before.
>
>
>
> 3862fa17-4e7d-4d41-9237-c372fca11c03 | | fa:16:3e:96:06:3f |
> ip_address='31.0.0.6', subnet_id='697e1b34-1adb-4299-b50f-6527b15260fd' |
> ACTIVE |
>
>
>
> Thanks, Sam
>
>
>
> [1] https://logs.opendaylight.org/releng/vex-yul-odl-
> jenkins-1/netvirt-csit-1node-openstack-ocata-upstream-
> stateful-carbon/298/log_02_l3.html.gz#s1-t11-k9-k2-k1-k2-k1-k11-k4
>
>
>
>
>
>
>
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to