> On Jan 27, 2017, at 1:01 PM, Thomas Morin <thomas.mo...@orange.com> wrote:
> 
> Jarno,
> 
> 2017-01-27, Jarno Rajahalme:
>> Cleanest solution would be to use the new clone action for the datapath, 
>> which, if important, we could also backport to OVS 2.5.x.
> 
> Yes, clone is indeed much more elegant.
> 
>> 
>> Datapath CLONE is yet to be upstreamed, though, may take a little time.
> 
> Even before net-dev upstreaming, there doesn't seem to even be a clone 
> implementation in ovs/datapath yet, right ?
> 

Right. The process for new features is to first upstream to net-next, then 
backport to ovs/datapath.

  Jarno

> -Thomas
> 
> 
> 
> 
>>> 2017-01-06, Jarno Rajahalme:
>>>> I sent a patch to do this fix on OVS master. IMO we should also make the 
>>>> datapath more flexible so that it would be able to POP back to IP after 
>>>> PUSH actions on an IP packet in the same action list. But that will not 
>>>> make it to OVS 2.7.
>>>> 
>>>> I would appreciate if Thomas could apply the and test!
>>>> 
>>>>   Jarno
>>>> 
>>>>> On Dec 14, 2016, at 1:52 PM, Jarno Rajahalme <ja...@ovn.org 
>>>>> <mailto:ja...@ovn.org>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Dec 13, 2016, at 8:44 PM, Takashi YAMAMOTO <yamam...@ovn.org 
>>>>>> <mailto:yamam...@ovn.org>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Dec 14, 2016 at 9:02 AM, Jarno Rajahalme <ja...@ovn.org 
>>>>>> <mailto:ja...@ovn.org>> wrote:
>>>>>> 
>>>>>>> On Dec 13, 2016, at 3:32 PM, Takashi YAMAMOTO <yamam...@ovn.org 
>>>>>>> <mailto:yamam...@ovn.org>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Dec 13, 2016 at 6:49 PM, Thomas Morin <thomas.mo...@orange.com 
>>>>>>> <mailto:thomas.mo...@orange.com>> wrote:
>>>>>>> Hi Jarno,
>>>>>>> 
>>>>>>> 2016-12-10, Jarno Rajahalme:
>>>>>>>> On Dec 9, 2016, at 7:04 AM, Thomas Morin <thomas.mo...@orange.com 
>>>>>>>> <mailto:thomas.mo...@orange.com>> wrote:
>>>>>>>>> 
>>>>>>>>> 2016-12-09, Thomas Morin:
>>>>>>>>>> In the same setup as the one on which the bug was observed, [...]
>>>>>>>>> 
>>>>>>>>> I was confused, I in fact tested the patch (branch-2.5) on a 
>>>>>>>>> /different/ setup as the one on which we hit the bug, using MPLS over 
>>>>>>>>> a GRE tunnel port, rather than plain MPLS over an eth port.
>>>>>>>>> Sorry if any confusion arised... I can test on the first setup if 
>>>>>>>>> relevant.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Maybe the kernel datapath does not support MPLS over a GRE tunnel 
>>>>>>>> port. Having ‘dmesg’ output for the test run might reveal why the 
>>>>>>>> actions validation fails.
>>>>>>> 
>>>>>>> The dmesg output was the following: 
>>>>>>> 
>>>>>>> [171295.258939] openvswitch: netlink: Flow actions may not be safe on 
>>>>>>> all matching packets.
>>>>>>> 
>>>>>>> I've tested the patch on the platform on which the bug was initially 
>>>>>>> hit (*not* using MPLS/GRE), and I have the following a few times in the 
>>>>>>> logs right after I do an "ovs-appctl fdb/flush":
>>>>>>> 
>>>>>>> 2016-12-13T09:44:08.449Z|00001|dpif(handler68)|WARN|Dropped 3 log 
>>>>>>> messages in last 1 seconds (most recently, 1 seconds ago) due to 
>>>>>>> excessive rate
>>>>>>> 2016-12-13T09:44:08.449Z|00002|dpif(handler68)|WARN|system@ovs-system: 
>>>>>>> failed to put[create] (Invalid argument) 
>>>>>>> ufid:f046c4c4-b97f-436d-bd7c-91ed307275ac 
>>>>>>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(9),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:61:c0:b5,dst=00:00:5e:00:43:64),eth_type(0x0800),ipv4(src=10.0.1.29,dst=10.0.0.3,proto=6,tos=0/0xfc,ttl=64,frag=no),tcp(src=54253,dst=8080),tcp_flags(0/0),
>>>>>>>  
>>>>>>> actions:set(ipv4(src=10.0.1.29,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=433680,tc=0,ttl=63,bos=1,eth_type=0x8847),7,set(eth(src=fa:16:3e:61:c0:b5,dst=00:00:5e:00:43:64)),pop_mpls(eth_type=0x800),set(ipv4(src=10.0.1.29,dst=10.0.0.3,tos=0/0xfc,ttl=64)),push_vlan(vid=1,pcp=0),3,8,pop_vlan,13
>>>>>>> 
>>>>>>> And dmesg:
>>>>>>> [926833.612372] openvswitch: netlink: Flow actions may not be safe on 
>>>>>>> all matching packets.
>>>>>>> 
>>>>>>> it's complaining about set ipv4 after pop_mpls because it doesn't know 
>>>>>>> about the "restored" l3.
>>>>>>> while we can improve the kernel, i guess we need to stick with recirc 
>>>>>>> for now.
>>>>>>>  
>>>>>> 
>>>>>> This should do it? Does not break any existing tests on branch-2.5, but 
>>>>>> I did not create a test for this yet.
>>>>>> 
>>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>>>>> index fb25f63..6ee2a72 100644
>>>>>> --- a/ofproto/ofproto-dpif-xlate.c
>>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>>>>> @@ -2899,6 +2899,15 @@ xlate_commit_actions(struct xlate_ctx *ctx)
>>>>>>  {
>>>>>>      bool use_masked = ctx->xbridge->support.masked_set_action;
>>>>>>  
>>>>>> +    /* An MPLS packet can be implicitly popped back to a non-MPLS 
>>>>>> packet, if a
>>>>>> +     * patch port peer or a group bucket pushed MPLS.  Set the 
>>>>>> 'was_mpls' flag
>>>>>> +     * also in that case. */
>>>>>> +    if (eth_type_mpls(ctx->base_flow.dl_type)
>>>>>> +        && !eth_type_mpls(ctx->xin->flow.dl_type)
>>>>>> +        && ctx->xbridge->support.odp.recirc) {
>>>>>> +        ctx->was_mpls = true;
>>>>>> +    }
>>>>>> +
>>>>>> 
>>>>>> i guess this check needs to be somewhere around "ctx->was_mpls = 
>>>>>> old_was_mpls" in
>>>>>> affected functions. 
>>>>>> 
>>>>> 
>>>>> Right, as that is where the implicit MPLS POP action happens, when the 
>>>>> ‘ctx->xin->flow’ is restored.
>>>>> 
>>>>>   Jarno
>>>>> 
>>>>>>      ctx->xout->slow |= commit_odp_actions(&ctx->xin->flow, 
>>>>>> &ctx->base_flow,
>>>>>>                                            ctx->odp_actions, ctx->wc,
>>>>>>                                            use_masked);
>>>>>> 
>>>>>>   Jarno
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -Thomas
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> On Dec 1, 2016, at 5:57 PM, Jarno Rajahalme <ja...@ovn.org 
>>>>>>>>>>>> <mailto:ja...@ovn.org>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 30, 2016, at 8:50 PM, Ben Pfaff <b...@ovn.org 
>>>>>>>>>>>>> <mailto:b...@ovn.org>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Nov 30, 2016 at 06:58:57PM -0800, Jarno Rajahalme wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 30, 2016, at 8:41 AM, Ben Pfaff <b...@ovn.org 
>>>>>>>>>>>>>>> <mailto:b...@ovn.org>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Nov 30, 2016 at 12:05:46PM +0100, Thomas Morin wrote:
>>>>>>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2016-11-30, Ben Pfaff:
>>>>>>>>>>>>>>>>> Do you have any idea what in your OpenFlow pipeline might do 
>>>>>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>>> i.e. is there anything especially tricky in the OpenFlow 
>>>>>>>>>>>>>>>>> flows?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Are you willing to show us your OpenFlow flow table?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The setup involves three OVS bridges connected with 
>>>>>>>>>>>>>>>> patch-ports: br-int --
>>>>>>>>>>>>>>>> br-tun -- br-mpls, with the traffic that triggers the assert 
>>>>>>>>>>>>>>>> being processed
>>>>>>>>>>>>>>>> by br-int with a NORMAL action (ie. MAC learning).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The flows in this setup aren't particularly tricky, I think, 
>>>>>>>>>>>>>>>> although I'm
>>>>>>>>>>>>>>>> not sure what qualifies as tricky or non-tricky :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Anyway, since yesterday I managed to identify the event that 
>>>>>>>>>>>>>>>> trigger the
>>>>>>>>>>>>>>>> assert, by adding more logging before the assert and 
>>>>>>>>>>>>>>>> displaying the actions
>>>>>>>>>>>>>>>> taken:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00001|odp_util(revalidator45)|WARN|commit_set_ipv4_action
>>>>>>>>>>>>>>>> assert would fail....
>>>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00002|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>>>> base_flow: 
>>>>>>>>>>>>>>>> ip,in_port=5,dl_vlan=3,dl_vlan_pcp=0,dl_src=fa:16:3e:33:f7:fe,dl_dst=00:00:5e:00:43:64,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
>>>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00003|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>>>> flow: 
>>>>>>>>>>>>>>>> tcp,in_port=5,dl_vlan=3,dl_vlan_pcp=0,dl_src=fa:16:3e:33:f7:fe,dl_dst=00:00:5e:00:43:64,nw_src=10.0.1.22,nw_dst=10.0.0.3,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53295,tp_dst=8080,tcp_flags=psh|ack
>>>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00004|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>>>> masks: 
>>>>>>>>>>>>>>>> recirc_id=0xffffffff,reg0=0xffffffff,in_port=4294967295,dl_vlan=4095,dl_vlan_pcp=7,dl_src=ff:ff:ff:ff:ff:ff,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0xffff
>>>>>>>>>>>>>>>> 2016-11-29T14:44:40.126Z|00005|odp_util(revalidator45)|WARN|  
>>>>>>>>>>>>>>>> actions: 
>>>>>>>>>>>>>>>> set(ipv4(src=10.0.1.22,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=410384,tc=0,ttl=63,bos=1,eth_type=0x8847),9,set(eth(src=fa:16:3e:33:f7:fe,dst=00:00:5e:00:43:64)),pop_mpls(eth_type=0x800),push_vlan(vid=3,pcp=0),1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> push_mpls clears L3/L4, while pop_mpls re-populates them, and 
>>>>>>>>>>>>>> then processing the output to port 1 hits the assert?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's what I'm thinking too.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jarno, is this something you have time to look into?  It'd be 
>>>>>>>>>>>>> great, if
>>>>>>>>>>>>> you do.  I'm way behind.
>>>>>>>>>>>> 
>>>>>>>>>>>> I’m looking at this.
>>>>>>>>>>>> 
>>>>>>>>>>>> Based on the trace given it seems that:
>>>>>>>>>>>> 1. Packet is received on br-int port 32, which outputs it via 
>>>>>>>>>>>> NORMAL action over a patch port to another bridge. The only 
>>>>>>>>>>>> patch-port on br-int is 2 (patch-tun). The NORMAL action adds 
>>>>>>>>>>>> dl_vlan=1.
>>>>>>>>>>>> 2. br-tun receives the packet on in_port 1 (patch-int), and 
>>>>>>>>>>>> outputs it on it’s port 2 (patch-to-mpls)
>>>>>>>>>>>> 3. br-mpls receives the packet on it’s in_port 2 (patch-to-tun), 
>>>>>>>>>>>> does pop_vlan, and outputs to it’s port 21 (ipvpn-pp-out), which 
>>>>>>>>>>>> is also an patch port.
>>>>>>>>>>>> 4. br-mpls (?) receives the packet on it’s in_port 20 
>>>>>>>>>>>> (ipvpn-pp-in), does 
>>>>>>>>>>>> dec_ttl,push_mpls:0x8847,load:0x644c0->OXM_OF_MPLS_LABEL[],set_field:b8:2a:72:de:1b:e3->eth_src,set_field:00:17:cb:79:2c:01->eth_dst,output:1
>>>>>>>>>>>> 
>>>>>>>>>>>> All this generates a megaflow: 
>>>>>>>>>>>> set(ipv4(src=10.0.1.23,dst=10.0.0.3,ttl=63)),set(eth(src=b8:2a:72:de:1b:e3,dst=00:17:cb:79:2c:01)),push_mpls(label=410816,tc=0,ttl=63,bos=1,eth_type=0x8847),9
>>>>>>>>>>>> 
>>>>>>>>>>>> This is only the beginning part of the trouble-some megaflow, in 
>>>>>>>>>>>> which br-int sends the packet also to another port (vlan 3), and 
>>>>>>>>>>>> as part of that pops the MPLS and restores the original ethernet 
>>>>>>>>>>>> addresses. Maybe this would happen with the trace too, if you 
>>>>>>>>>>>> flushed MACs before the trace?
>>>>>>>>>>>> 
>>>>>>>>>>>> The patch ports 21 and 20 appear to be in the same bridge and 
>>>>>>>>>>>> patched to each other. Is this the case?
>>>>>>>>>>>> 
>>>>>>>>>>>> The crashing megaflow has in_port=5,dl_vlan=3. Is this also on 
>>>>>>>>>>>> br-int?
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, OVS 2.6 is a little bit less aggressive about avoiding 
>>>>>>>>>>>> recirculation after mpls operations, and I’d be interested to know 
>>>>>>>>>>>> if your case fails the same way with OVS 2.6?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>>   Jarno
>>>> 
>>> 
>> 
> 
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to