Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 11:20:04AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:

[...]

> > 3. With regards to the mirroring part of your question, I need to check
> >on that and possibly its broken. But my thinking is that a mirroring
> >vport would regarded in the same way as any other vport in this respect
> >and the presence of the "layer3" flag would control whether an Ethernet
> >header is present or not.
> > 
> >It may be the case that its not possible to use a tunnel vport as a
> >mirroring vport. And as all other vports currently do not support
> >"layer3" then currently an Ethernet header would always be present
> >on output to a mirror.
> 
> We can just require a mirror port to be always L2 and output the L3
> packets with zero Ethernet addresses. For mirroring, this should be
> okay(?)

Yes, I expect that is a good approach.

I'll look over the code to see about making that so if its not already the
case.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Jiri Benc
On Fri, 20 May 2016 18:12:05 +0900, Simon Horman wrote:
> 1. push_eth adds an Ethernet header with all-zero addresses and
>the Ethernet type as determined from skb->protocol which is in
>turn determined by the tunnel header (we have discussed that
>bit before).
> 
>In principle it is pushed when needed. And this happens automatically
>as controlled by user-space.
> 
>It is possible to modify the Ethernet addresses using a custom rule.
>(I need to exercise that more often.)
> 
> 2. For the GRE part of the scenario above it is important to know that with
>the accompanying user-space patch set OvS user-space the user-space
>representation of a vport (from now on simply vport) may be "layer3" or
>not.
> 
>This allows OvS user-space to determine if an Ethernet header should be
>present or not on receive. And if it needs to be present or not on
>transmit. This allows it to automatically use pop_eth and push_eth to
>control the presence of an Ethernet header so its there when it needs to
>be and not when it doesn't.
> 
>So if a GRE vport is "layer3" then no Ethernet header should be
>present on transmit, regardless of where the packet came from. And
>conversely if the GRE vport is not "layer3" then an Ethernet header
>should be present.

I think this works for me. Thanks a lot for answering my questions!

> 3. With regards to the mirroring part of your connection, I need to check
>on that and possibly its broken. But my thinking is that a mirroring
>vport would regarded in the same way as any other vport in this respect
>and the presence of the "layer3" flag would control whether an Ethernet
>header is present or not.
> 
>It may be the case that its not possible to use a tunnel vport as a
>mirroring vport. And as all other vports currently do not support
>"layer3" then currently an Ethernet header would always be present
>on output to a mirror.

We can just require a mirror port to be always L2 and output the L3
packets with zero Ethernet addresses. For mirroring, this should be
okay(?)

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 10:39:39AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> > My understanding is that currently OvS handles access ports using a
> > push_vlan action.
> 
> When needed (i.e. when the packet goes to a non-access port), yes.
> 
> > And that this patch set in conjunction with its
> > user-space counterpart should ensure that a push_eth action occurs first.
> > This is the context of my remarks above.
> 
> Okay, works for me principally. If it's later found insufficient,
> relaxing push_vlan and pop_vlan to work also for L3 frames is still
> easily possible without breaking compatibility.

Right. I'm all for allowing extension later if the need arises.

> Out of curiosity (and without looking at the user space patchset), what
> will the pushed Ethernet header contain? E.g., consider the following
> scenario: two GRE ports, both set as access ports with the same tag,
> and a mirror port mirroring everything. Now an IP packet without inner
> Ethernet header is received on one of the GRE interfaces.
> 
> Will the packet be output to the second GRE port? Will it be sent out
> without the inner Ethernet header? Are custom rules necessary, or will
> NORMAL take care of this? What will be sent to the mirror port?

Let me take a stab at answering that without running any tests.

1. push_eth adds an Ethernet header with all-zero addresses and
   the Ethernet type as determined from skb->protocol which is in
   turn determined by the tunnel header (we have discussed that
   bit before).

   In principle it is pushed when needed. And this happens automatically
   as controlled by user-space.

   It is possible to modify the Ethernet addresses using a custom rule.
   (I need to exercise that more often.)

2. For the GRE part of the scenario above it is important to know that with
   the accompanying user-space patch set OvS user-space the user-space
   representation of a vport (from now on simply vport) may be "layer3" or
   not.

   This allows OvS user-space to determine if an Ethernet header should be
   present or not on receive. And if it needs to be present or not on
   transmit. This allows it to automatically use pop_eth and push_eth to
   control the presence of an Ethernet header so its there when it needs to
   be and not when it doesn't.

   So if a GRE vport is "layer3" then no Ethernet header should be
   present on transmit, regardless of where the packet came from. And
   conversely if the GRE vport is not "layer3" then an Ethernet header
   should be present.

3. With regards to the mirroring part of your connection, I need to check
   on that and possibly its broken. But my thinking is that a mirroring
   vport would regarded in the same way as any other vport in this respect
   and the presence of the "layer3" flag would control whether an Ethernet
   header is present or not.

   It may be the case that its not possible to use a tunnel vport as a
   mirroring vport. And as all other vports currently do not support
   "layer3" then currently an Ethernet header would always be present
   on output to a mirror.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Jiri Benc
On Fri, 20 May 2016 17:16:13 +0900, Simon Horman wrote:
> My understanding is that currently OvS handles access ports using a
> push_vlan action.

When needed (i.e. when the packet goes to a non-access port), yes.

> And that this patch set in conjunction with its
> user-space counterpart should ensure that a push_eth action occurs first.
> This is the context of my remarks above.

Okay, works for me principally. If it's later found insufficient,
relaxing push_vlan and pop_vlan to work also for L3 frames is still
easily possible without breaking compatibility.

Out of curiosity (and without looking at the user space patchset), what
will the pushed Ethernet header contain? E.g., consider the following
scenario: two GRE ports, both set as access ports with the same tag,
and a mirror port mirroring everything. Now an IP packet without inner
Ethernet header is received on one of the GRE interfaces.

Will the packet be output to the second GRE port? Will it be sent out
without the inner Ethernet header? Are custom rules necessary, or will
NORMAL take care of this? What will be sent to the mirror port?

Thanks!

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 05:11:23PM +0900, Simon Horman wrote:
> On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> > On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > > The second option does seem rather tempting although I'm not sure
> > > that it actually plays out in the access-port scenario at this time.
> > 
> > We support gre ports to be access ports currently. With conversion to
> > ipgre, this needs to continue working. It's no problem for frames with
> > the Ethernet header but now we have a situation where a port is tagged,
> > thus the user expects that packets received on that port will behave
> > accordingly. I don't think we can make some packets honor this and some
> > ignore this; and we can't disallow gre to be an access port.
> > 
> > How do you plan to solve this? By user space always pushing an ethernet
> > header before push_vlan?
> 
> Yes. That is my understanding of how OvS currently handles access ports but
> I have a feeling that either I am mistaken or that you are referring to a
> slightly different scenario.

Hi again.

I apologise for having sent my previous email a little too quickly.

My understanding is that currently OvS handles access ports using a
push_vlan action. And that this patch set in conjunction with its
user-space counterpart should ensure that a push_eth action occurs first.
This is the context of my remarks above.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Simon Horman
On Fri, May 20, 2016 at 10:00:28AM +0200, Jiri Benc wrote:
> On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> > The second option does seem rather tempting although I'm not sure
> > that it actually plays out in the access-port scenario at this time.
> 
> We support gre ports to be access ports currently. With conversion to
> ipgre, this needs to continue working. It's no problem for frames with
> the Ethernet header but now we have a situation where a port is tagged,
> thus the user expects that packets received on that port will behave
> accordingly. I don't think we can make some packets honor this and some
> ignore this; and we can't disallow gre to be an access port.
> 
> How do you plan to solve this? By user space always pushing an ethernet
> header before push_vlan?

Yes. That is my understanding of how OvS currently handles access ports but
I have a feeling that either I am mistaken or that you are referring to a
slightly different scenario.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-20 Thread Jiri Benc
On Fri, 20 May 2016 14:29:01 +0900, Simon Horman wrote:
> The second option does seem rather tempting although I'm not sure
> that it actually plays out in the access-port scenario at this time.

We support gre ports to be access ports currently. With conversion to
ipgre, this needs to continue working. It's no problem for frames with
the Ethernet header but now we have a situation where a port is tagged,
thus the user expects that packets received on that port will behave
accordingly. I don't think we can make some packets honor this and some
ignore this; and we can't disallow gre to be an access port.

How do you plan to solve this? By user space always pushing an ethernet
header before push_vlan?

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-19 Thread Simon Horman
Hi Jiri,

On Tue, May 17, 2016 at 04:32:50PM +0200, Jiri Benc wrote:
> Looking through the patchset again, this time more deeply. Sorry for
> the delay.

No need to be sorry, good things take time.

> On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> > +struct ovs_action_push_eth {
> > +   struct ovs_key_ethernet addresses;
> > +   __be16   eth_type;
> 
> Extra spaces.

Sorry about that.

As per some earlier discussion I plan to remove the eth_type field entirely.

> 
> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +   skb_pull_rcsum(skb, ETH_HLEN);
> > +   skb_reset_mac_header(skb);
> > +   skb->mac_len -= ETH_HLEN;
> > +
> > +   invalidate_flow_key(key);
> > +   return 0;
> > +}
> 
> There's a fundamental question here: how should pop_eth behave when
> vlan tag is present?
> 
> There are two options: either vlan is considered part of the Ethernet
> header and pop_eth means implicitly resetting vlan tag, or packet can
> have vlan tag even if it's not Ethernet.
> 
> This patch seems to implement the first option; however, skb->vlan_tci
> should be reset and pop_eth should check whether the vlan tag is
> present in the frame (deaccelerated) and remove it if it is. Otherwise,
> the behavior of pop_eth would be inconsistent.
> 
> However, I'm not sure whether the second option does not make more
> sense. It may, in fact, be needed - ARPHRD_NONE tunnel port could not
> be set as an access port otherwise (unless I'm missing something).
> 
> In that case, pop_eth will need to put the vlan tag to skb->vlan_tci if
> it's in the frame itself. Also, push_vlan and pop_vlan would need to be
> modified to work with is_layer3 packets.

Good point.

The second option does seem rather tempting although I'm not sure
that it actually plays out in the access-port scenario at this time.

> > +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> > +   const struct ovs_action_push_eth *ethh)
> > +{
> > +   int err;
> > +
> > +   /* De-accelerate any hardware accelerated VLAN tag added to a previous
> > +* Ethernet header */
> > +   err = skb_vlan_deaccel(skb);
> 
> Why? Just keep it in skb->vlan_tci.

Agreed, this seems unnecessary.

> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct 
> > sw_flow_key *key)
> >  
> > skb_reset_mac_header(skb);
> >  
> > -   /* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> > -* header in the linear data area.
> > -*/
> > -   eth = eth_hdr(skb);
> > -   ether_addr_copy(key->eth.src, eth->h_source);
> > -   ether_addr_copy(key->eth.dst, eth->h_dest);
> > +   /* Link layer. */
> > +   if (key->phy.is_layer3) {
> > +   key->eth.tci = 0;
> 
> Could make sense to use skb->vlan_tci, see above.

The incremental patch below is what I have so far.
The patch to add skb_vlan_deaccel() should also be dropped.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c413c588a24f..6853ab008861 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2994,6 +2994,7 @@ int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
 gfp_t gfp);
+int skb_vlan_accel(struct sk_buff *skb);
 
 static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7a1d48983f81..a36c7491f714 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4482,12 +4482,28 @@ pull:
return err;
 }
 
-int skb_vlan_pop(struct sk_buff *skb)
+/* If a vlan tag is present move it to hw accel tag */
+int skb_vlan_accel(struct sk_buff *skb)
 {
u16 vlan_tci;
__be16 vlan_proto;
int err;
 
+   vlan_proto = skb->protocol;
+   err = __skb_vlan_pop(skb, _tci);
+   if (unlikely(err))
+   return err;
+
+   __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
+   return 0;
+}
+EXPORT_SYMBOL(skb_vlan_accel);
+
+int skb_vlan_pop(struct sk_buff *skb)
+{
+   u16 vlan_tci;
+   int err;
+
if (likely(skb_vlan_tag_present(skb))) {
skb->vlan_tci = 0;
} else {
@@ -4500,19 +4516,13 @@ int skb_vlan_pop(struct sk_buff *skb)
if (err)
return err;
}
-   /* move next vlan tag to hw accel tag */
+
if (likely((skb->protocol != htons(ETH_P_8021Q) &&
skb->protocol != htons(ETH_P_8021AD)) ||
   skb->len < VLAN_ETH_HLEN))
return 0;
 
-   vlan_proto = skb->protocol;
-   err = __skb_vlan_pop(skb, _tci);
-   if (unlikely(err))
-   return err;
-
-   __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
-   return 0;
+   return skb_vlan_accel(skb);
 }
 EXPORT_SYMBOL(skb_vlan_pop);
 
diff 

Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-17 Thread Simon Horman
On Tue, May 17, 2016 at 04:43:20PM +0200, Jiri Benc wrote:
> On Thu, 12 May 2016 07:46:52 +0900, Simon Horman wrote:
> > If we can live with a bogus skb->mac_len value that is sufficient for
> > ovs_flow_key_extract.() and set correctly by key_extract() (which happens
> > anyway) we could do something like this:
> > 
> > } else if (vport->dev->type == ARPHRD_NONE) {
> > if (skb->protocol == htons(ETH_P_TEB))
> > /* Ignores presence of VLAN but is sufficient for
> >  * ovs_flow_key_extract() which then calls key_extract()
> >  * which calculates skb->mac_len correctly. */
> > skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
> > else
> > skb->mac_len = 0;
> > }
> > 
> > 
> > But perhaps I have missed the point somehow.
> 
> You did not, I was more thinking aloud. But you're right, it doesn't
> make much sense.
> 
> So, wouldn't it be actually more correct and in line with patch 2 to
> call eth_type_trans() here?

Yes, that seems reasonable.

> Possibly even followed by skb_vlan_untag
> (not needed. But it might make things easier to have the first vlan tag
> always reside inside skb->vlan_tci and treat that as an invariant in
> the whole ovs kernel code. It'll need to be done in more spots than
> just this one, though, and is probably matter of a separate patchset).

That also seems reasonable. But as it seems more invasive and is not
strictly necessary I'd rather handle that update separately.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-17 Thread Jiri Benc
On Thu, 12 May 2016 07:46:52 +0900, Simon Horman wrote:
> If we can live with a bogus skb->mac_len value that is sufficient for
> ovs_flow_key_extract.() and set correctly by key_extract() (which happens
> anyway) we could do something like this:
> 
>   } else if (vport->dev->type == ARPHRD_NONE) {
>   if (skb->protocol == htons(ETH_P_TEB))
>   /* Ignores presence of VLAN but is sufficient for
>* ovs_flow_key_extract() which then calls key_extract()
>* which calculates skb->mac_len correctly. */
>   skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
>   else
>   skb->mac_len = 0;
>   }
> 
> 
> But perhaps I have missed the point somehow.

You did not, I was more thinking aloud. But you're right, it doesn't
make much sense.

So, wouldn't it be actually more correct and in line with patch 2 to
call eth_type_trans() here? Possibly even followed by skb_vlan_untag
(not needed. But it might make things easier to have the first vlan tag
always reside inside skb->vlan_tci and treat that as an invariant in
the whole ovs kernel code. It'll need to be done in more spots than
just this one, though, and is probably matter of a separate patchset).

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-17 Thread Jiri Benc
Looking through the patchset again, this time more deeply. Sorry for
the delay.

On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> +struct ovs_action_push_eth {
> + struct ovs_key_ethernet addresses;
> + __be16   eth_type;

Extra spaces.

> +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> + skb_pull_rcsum(skb, ETH_HLEN);
> + skb_reset_mac_header(skb);
> + skb->mac_len -= ETH_HLEN;
> +
> + invalidate_flow_key(key);
> + return 0;
> +}

There's a fundamental question here: how should pop_eth behave when
vlan tag is present?

There are two options: either vlan is considered part of the Ethernet
header and pop_eth means implicitly resetting vlan tag, or packet can
have vlan tag even if it's not Ethernet.

This patch seems to implement the first option; however, skb->vlan_tci
should be reset and pop_eth should check whether the vlan tag is
present in the frame (deaccelerated) and remove it if it is. Otherwise,
the behavior of pop_eth would be inconsistent.

However, I'm not sure whether the second option does not make more
sense. It may, in fact, be needed - ARPHRD_NONE tunnel port could not
be set as an access port otherwise (unless I'm missing something).

In that case, pop_eth will need to put the vlan tag to skb->vlan_tci if
it's in the frame itself. Also, push_vlan and pop_vlan would need to be
modified to work with is_layer3 packets.

> +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> + const struct ovs_action_push_eth *ethh)
> +{
> + int err;
> +
> + /* De-accelerate any hardware accelerated VLAN tag added to a previous
> +  * Ethernet header */
> + err = skb_vlan_deaccel(skb);

Why? Just keep it in skb->vlan_tci.

> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct 
> sw_flow_key *key)
>  
>   skb_reset_mac_header(skb);
>  
> - /* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
> -  * header in the linear data area.
> -  */
> - eth = eth_hdr(skb);
> - ether_addr_copy(key->eth.src, eth->h_source);
> - ether_addr_copy(key->eth.dst, eth->h_dest);
> + /* Link layer. */
> + if (key->phy.is_layer3) {
> + key->eth.tci = 0;

Could make sense to use skb->vlan_tci, see above.

Thanks,

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-11 Thread Simon Horman
On Wed, May 11, 2016 at 04:09:28PM +0200, Jiri Benc wrote:
> On Wed, 11 May 2016 12:06:35 +0900, Simon Horman wrote:
> > Is this close to what you had in mind?
> 
> Yes but see below.
> 
> > @@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
> > *tun_info,
> > key->phy.skb_mark = skb->mark;
> > ovs_ct_fill_key(skb, key);
> > key->ovs_flow_hash = 0;
> > -   key->phy.is_layer3 = is_layer3;
> > +   key->phy.is_layer3 = (tun_info && skb->mac_len == 0);
> 
> Do we have to depend on tun_info? It would be nice to support all
> ARPHRD_NONE interfaces, not just tunnels. The tun interface (from
> the tuntap driver) comes to mind, for example.

Yes, I think that should work. I was just being cautious.

Do you think it is safe to detect TEB based on skb->protocol regardless
of the presence of tun_info?

> > +++ b/net/openvswitch/vport-netdev.c
> > @@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
> > if (vport->dev->type == ARPHRD_ETHER) {
> > skb_push(skb, ETH_HLEN);
> > skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> > +   } else if (vport->dev->type == ARPHRD_NONE) {
> > +   if (skb->protocol == htons(ETH_P_TEB)) {
> > +   struct ethhdr *eth = eth_hdr(skb);
> > +
> > +   if (unlikely(skb->len < ETH_HLEN))
> > +   goto error;
> > +
> > +   skb->mac_len = ETH_HLEN;
> > +   if (eth->h_proto == htons(ETH_P_8021Q))
> > +   skb->mac_len += VLAN_HLEN;
> > +   } else {
> > +   skb->mac_len = 0;
> > +   }
> 
> Without putting much thought into this, could this perhaps be left for
> parse_ethertype (called from key_extract) to do?

I think I am confused.

I believe that key_extract() does already do all of the above (and more).

The purpose of the above change was to do this work here rather than
leaving it to parse_ethertype. This is because I was under the impression
that is what you were after. Specifically as a mechanism to avoid relying
on vport->dev->type in ovs_flow_key_extract.

If we can live with a bogus skb->mac_len value that is sufficient for
ovs_flow_key_extract.() and set correctly by key_extract() (which happens
anyway) we could do something like this:

} else if (vport->dev->type == ARPHRD_NONE) {
if (skb->protocol == htons(ETH_P_TEB))
/* Ignores presence of VLAN but is sufficient for
 * ovs_flow_key_extract() which then calls key_extract()
 * which calculates skb->mac_len correctly. */
skb->mac_len = ETH_HLEN; /* Ignores presence of VLAN */
else
skb->mac_len = 0;
}


But perhaps I have missed the point somehow.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-11 Thread Jiri Benc
On Wed, 11 May 2016 12:28:14 +0900, Simon Horman wrote:
> I think that at this stage I would prefer to prohibit push_eth() acting
> on a packet which already has an ethernet header. Indeed that is what
> my patch-set already does in its modifications of __ovs_nla_copy_actions().
> 
> The reason that I lean towards prohibiting this is that I do not
> have an easy way to exercise this case within the current patch-set.
> And thus this extra complexity seems well suited to being handled handled
> incrementally as further work.

Works for me. I don't see any real usage for multiple Ethernet headers.

Thanks!

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-11 Thread Jiri Benc
On Wed, 11 May 2016 12:06:35 +0900, Simon Horman wrote:
> Is this close to what you had in mind?

Yes but see below.

> @@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
> *tun_info,
>   key->phy.skb_mark = skb->mark;
>   ovs_ct_fill_key(skb, key);
>   key->ovs_flow_hash = 0;
> - key->phy.is_layer3 = is_layer3;
> + key->phy.is_layer3 = (tun_info && skb->mac_len == 0);

Do we have to depend on tun_info? It would be nice to support all
ARPHRD_NONE interfaces, not just tunnels. The tun interface (from
the tuntap driver) comes to mind, for example.

> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
>   if (vport->dev->type == ARPHRD_ETHER) {
>   skb_push(skb, ETH_HLEN);
>   skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
> + } else if (vport->dev->type == ARPHRD_NONE) {
> + if (skb->protocol == htons(ETH_P_TEB)) {
> + struct ethhdr *eth = eth_hdr(skb);
> +
> + if (unlikely(skb->len < ETH_HLEN))
> + goto error;
> +
> + skb->mac_len = ETH_HLEN;
> + if (eth->h_proto == htons(ETH_P_8021Q))
> + skb->mac_len += VLAN_HLEN;
> + } else {
> + skb->mac_len = 0;
> + }

Without putting much thought into this, could this perhaps be left for
parse_ethertype (called from key_extract) to do?

Thanks,

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-11 Thread Jiri Benc
On Wed, 11 May 2016 10:50:12 +0900, Simon Horman wrote:
> On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote:
> > We have two options here:
> > 
> > 1. As for metadata tunnels all the info is in metadata_dst and we
> >don't need the IP/GRE header for anything, we can make the ipgre
> >interface ARPHRD_NONE in metadata based mode.
> > 
> > 2. We can fix this up in ovs after receiving the packet from
> >ARPHRD_IPGRE interface.
> > 
> > I think the first option is the correct one. We already don't assign
> > dev->header_ops in metadata mode. I'll prepare a patch.
> 
> I agree that 1. seems to be the better approach.

I just sent a patch that fixes this. And we have the same bug in
VXLAN-GPE, I sent a patch, too.

> Sure, if that is your preference I think it should be simple enough to
> implement. I agree that netdev_port_receive() looks like a good place for
> this.

Great, thanks!

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-10 Thread Simon Horman
Hi Jiri,

On Tue, May 10, 2016 at 02:06:18PM +0200, Jiri Benc wrote:
> On Mon, 9 May 2016 17:18:20 +0900, Simon Horman wrote:
> > On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> > > In addition, we should check whether mac_len > 0 and in such case,
> > > change skb->protocol to ETH_P_TEB first (and store that value in the
> > > pushed eth header).
> > > 
> > > Similarly on pop_eth, we need to check skb->protocol and if it is
> > > ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> > > skb->protocol correctly. It's probably not that simple, as we'd need a
> > > version of eth_type_trans that doesn't need a net device.
> > 
> > I'm not sure I understand the interaction with ETH_P_TEB here.
> > 
> > In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive
> > processing to find the inner protocol from the packet and at that point
> > skb->protocol is set to that value. And that for further packet processing
> > the fact the packet was received as TEB is transparent.
> 
> Yes but think about the case when you have two Ethernet headers pushed.
> 
> We can either disallow it, or do what I described.
> 
> Specifically, let's assume we have an IP packet with an Ethernet
> header present. skb->protocol is ETH_P_IP. Now, when there's skb_push,
> the correct operation would be setting skb->protocol to ETH_P_TEB,
> pushing a new Ethernet header and filing ETH_P_TEB to the ethertype
> field in the new header. The packet is not ETH_P_IP anymore, as the L2
> header is followed by another Ethernet header now.

Thanks for the clarification, I had not considered the case of two
ethernet headers when I wrote my previous email.

I think that at this stage I would prefer to prohibit push_eth() acting
on a packet which already has an ethernet header. Indeed that is what
my patch-set already does in its modifications of __ovs_nla_copy_actions().

The reason that I lean towards prohibiting this is that I do not
have an easy way to exercise this case within the current patch-set.
And thus this extra complexity seems well suited to being handled handled
incrementally as further work.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-10 Thread Simon Horman
Hi Jiri,

On Wed, May 11, 2016 at 10:50:09AM +0900, Simon Horman wrote:

[...]

> > > Its possible that I've overlooked something but as things stand I think
> > > things look like this:
> > > 
> > > * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> > > * ovs_flow_key_extract() calls key_extract() which amongst other things
> > >   sets up the skb->mac_header and skb->mac_len of the skb.
> > > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
> > >   in the case of TEB
> > > * Actions update the above mentioned skb fields as appropriate.
> > 
> > Okay, that actually eases things somewhat.
> > 
> > > So it seems to me that it should be safe to rely on skb->protocol
> > > in the receive path. Or more specifically, in netdev_port_receive().
> > > 
> > > If mac_len is also able to be used then I think fine. But it seems to me
> > > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> > > could be done early on, say in netdev_port_receive(). But it seems that
> > > would involve duplicating some of what is already occurring in
> > > key_extract().
> > 
> > I'd actually prefer doing this earlier, netdev_port_receive looks like
> > the right place. Just set mac_len there (or whatever) and let
> > key_extract do the rest of the work, not depending on dev->type in
> > there.
> > 
> > My point about recirculation was not actually valid, as I missed you're
> > doing this in ovs_flow_key_extract and not in key_extract. Still,
> > I think the special handling of particular interface types belongs to
> > the tx processing on those interfaces, not to the common code.
> 
> Sure, if that is your preference I think it should be simple enough to
> implement. I agree that netdev_port_receive() looks like a good place for
> this.

So far I have the following, which seems to work
changes to make dev->type ARPHRD_NONE for compat GRE vports.

Is this close to what you had in mind?

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index d320c2657627..4d2698596033 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -700,8 +700,6 @@ int ovs_flow_key_update(struct sk_buff *skb, struct 
sw_flow_key *key)
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 struct sk_buff *skb, struct sw_flow_key *key)
 {
-   bool is_layer3 = false;
-   bool is_teb = false;
int err;
 
/* Extract metadata from packet. */
@@ -709,14 +707,6 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
*tun_info,
key->tun_proto = ip_tunnel_info_af(tun_info);
memcpy(>tun_key, _info->key, sizeof(key->tun_key));
 
-   if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
-   if (skb->protocol == htons(ETH_P_TEB))
-   is_teb = true;
-   else
-   is_layer3 = true;
-   }
-
-
if (tun_info->options_len) {
BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
   8)) - 1
@@ -739,17 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
*tun_info,
key->phy.skb_mark = skb->mark;
ovs_ct_fill_key(skb, key);
key->ovs_flow_hash = 0;
-   key->phy.is_layer3 = is_layer3;
+   key->phy.is_layer3 = (tun_info && skb->mac_len == 0);
key->recirc_id = 0;
 
err = key_extract(skb, key);
if (err < 0)
return err;
 
-   if (is_teb)
-   skb->protocol = key->eth.type;
-   else if (is_layer3)
+   if (key->phy.is_layer3)
key->eth.type = skb->protocol;
+   else if (tun_info && skb->protocol == htons(ETH_P_TEB))
+   skb->protocol = key->eth.type;
 
return err;
 }
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 7d54414b35eb..ac8178ac2c81 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -60,7 +60,21 @@ static void netdev_port_receive(struct sk_buff *skb)
if (vport->dev->type == ARPHRD_ETHER) {
skb_push(skb, ETH_HLEN);
skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+   } else if (vport->dev->type == ARPHRD_NONE) {
+   if (skb->protocol == htons(ETH_P_TEB)) {
+   struct ethhdr *eth = eth_hdr(skb);
+
+   if (unlikely(skb->len < ETH_HLEN))
+   goto error;
+
+   skb->mac_len = ETH_HLEN;
+   if (eth->h_proto == htons(ETH_P_8021Q))
+   skb->mac_len += VLAN_HLEN;
+   } else {
+   skb->mac_len = 0;
+   }
}
+
ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
return;
 error:


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-10 Thread Simon Horman
Hi Jiri,

On Tue, May 10, 2016 at 02:01:06PM +0200, Jiri Benc wrote:
> On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote:
> > It seems to be caused by the following:
> > 
> > 1. __ipgre_rcv() calls skb_pop_mac_header() which
> >sets skb->mac_header to the skb->network_header.
> > 
> > 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls
> >skb_reset_network_header(). This updates skb->network_header to
> >just after the end of the GRE header.
> > 
> >This is 28 bytes after the old skb->network_header
> >as there is a 20 byte IPv4 header followed by an
> >8 byte GRE header.
> > 
> > 3. Later, dev_gro_receive() calls skb_reset_mac_len().
> >This calculates skb->mac_len based on skb->network_header and
> >skb->mac_header. I.e. 28 bytes.
> 
> Right. Thanks for tracking this down!
> 
> > I think this may be possible to address by calling
> > skb_reset_network_header() instead of skb_pop_mac_header()
> > in __ipgre_rcv().
> 
> We can't do that. The interface type is ARPHRD_IPGRE and not
> ARPHRD_NONE, so the current behavior makes pretty good sense. See
> e.g. commit 0e3da5bb8da45.
> 
> We have two options here:
> 
> 1. As for metadata tunnels all the info is in metadata_dst and we
>don't need the IP/GRE header for anything, we can make the ipgre
>interface ARPHRD_NONE in metadata based mode.
> 
> 2. We can fix this up in ovs after receiving the packet from
>ARPHRD_IPGRE interface.
> 
> I think the first option is the correct one. We already don't assign
> dev->header_ops in metadata mode. I'll prepare a patch.

I agree that 1. seems to be the better approach.

> > Its possible that I've overlooked something but as things stand I think
> > things look like this:
> > 
> > * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> > * ovs_flow_key_extract() calls key_extract() which amongst other things
> >   sets up the skb->mac_header and skb->mac_len of the skb.
> > * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
> >   in the case of TEB
> > * Actions update the above mentioned skb fields as appropriate.
> 
> Okay, that actually eases things somewhat.
> 
> > So it seems to me that it should be safe to rely on skb->protocol
> > in the receive path. Or more specifically, in netdev_port_receive().
> > 
> > If mac_len is also able to be used then I think fine. But it seems to me
> > that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> > could be done early on, say in netdev_port_receive(). But it seems that
> > would involve duplicating some of what is already occurring in
> > key_extract().
> 
> I'd actually prefer doing this earlier, netdev_port_receive looks like
> the right place. Just set mac_len there (or whatever) and let
> key_extract do the rest of the work, not depending on dev->type in
> there.
> 
> My point about recirculation was not actually valid, as I missed you're
> doing this in ovs_flow_key_extract and not in key_extract. Still,
> I think the special handling of particular interface types belongs to
> the tx processing on those interfaces, not to the common code.

Sure, if that is your preference I think it should be simple enough to
implement. I agree that netdev_port_receive() looks like a good place for
this.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-10 Thread Jiri Benc
On Mon, 9 May 2016 17:18:20 +0900, Simon Horman wrote:
> On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> > In addition, we should check whether mac_len > 0 and in such case,
> > change skb->protocol to ETH_P_TEB first (and store that value in the
> > pushed eth header).
> > 
> > Similarly on pop_eth, we need to check skb->protocol and if it is
> > ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> > skb->protocol correctly. It's probably not that simple, as we'd need a
> > version of eth_type_trans that doesn't need a net device.
> 
> I'm not sure I understand the interaction with ETH_P_TEB here.
> 
> In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive
> processing to find the inner protocol from the packet and at that point
> skb->protocol is set to that value. And that for further packet processing
> the fact the packet was received as TEB is transparent.

Yes but think about the case when you have two Ethernet headers pushed.

We can either disallow it, or do what I described.

Specifically, let's assume we have an IP packet with an Ethernet
header present. skb->protocol is ETH_P_IP. Now, when there's skb_push,
the correct operation would be setting skb->protocol to ETH_P_TEB,
pushing a new Ethernet header and filing ETH_P_TEB to the ethertype
field in the new header. The packet is not ETH_P_IP anymore, as the L2
header is followed by another Ethernet header now.

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-10 Thread Jiri Benc
On Mon, 9 May 2016 17:04:22 +0900, Simon Horman wrote:
> It seems to be caused by the following:
> 
> 1. __ipgre_rcv() calls skb_pop_mac_header() which
>sets skb->mac_header to the skb->network_header.
> 
> 2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls
>skb_reset_network_header(). This updates skb->network_header to
>just after the end of the GRE header.
> 
>This is 28 bytes after the old skb->network_header
>as there is a 20 byte IPv4 header followed by an
>8 byte GRE header.
> 
> 3. Later, dev_gro_receive() calls skb_reset_mac_len().
>This calculates skb->mac_len based on skb->network_header and
>skb->mac_header. I.e. 28 bytes.

Right. Thanks for tracking this down!

> I think this may be possible to address by calling
> skb_reset_network_header() instead of skb_pop_mac_header()
> in __ipgre_rcv().

We can't do that. The interface type is ARPHRD_IPGRE and not
ARPHRD_NONE, so the current behavior makes pretty good sense. See
e.g. commit 0e3da5bb8da45.

We have two options here:

1. As for metadata tunnels all the info is in metadata_dst and we
   don't need the IP/GRE header for anything, we can make the ipgre
   interface ARPHRD_NONE in metadata based mode.

2. We can fix this up in ovs after receiving the packet from
   ARPHRD_IPGRE interface.

I think the first option is the correct one. We already don't assign
dev->header_ops in metadata mode. I'll prepare a patch.

> Its possible that I've overlooked something but as things stand I think
> things look like this:
> 
> * ovs_flow_key_extract() keys off dev->type and skb->protocol.
> * ovs_flow_key_extract() calls key_extract() which amongst other things
>   sets up the skb->mac_header and skb->mac_len of the skb.
> * ovs_flow_key_extract() sets skb->protocol to that of the inner packet
>   in the case of TEB
> * Actions update the above mentioned skb fields as appropriate.

Okay, that actually eases things somewhat.

> So it seems to me that it should be safe to rely on skb->protocol
> in the receive path. Or more specifically, in netdev_port_receive().
> 
> If mac_len is also able to be used then I think fine. But it seems to me
> that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
> could be done early on, say in netdev_port_receive(). But it seems that
> would involve duplicating some of what is already occurring in
> key_extract().

I'd actually prefer doing this earlier, netdev_port_receive looks like
the right place. Just set mac_len there (or whatever) and let
key_extract do the rest of the work, not depending on dev->type in
there.

My point about recirculation was not actually valid, as I missed you're
doing this in ovs_flow_key_extract and not in key_extract. Still,
I think the special handling of particular interface types belongs to
the tx processing on those interfaces, not to the common code.

Thanks!

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-09 Thread Simon Horman
On Fri, May 06, 2016 at 11:35:04AM +0200, Jiri Benc wrote:
> On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> > +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> > +   const struct ovs_action_push_eth *ethh)
> > +{
> > +   int err;
> > +
> > +   /* De-accelerate any hardware accelerated VLAN tag added to a previous
> > +* Ethernet header */
> > +   err = skb_vlan_deaccel(skb);
> > +   if (unlikely(err))
> > +   return err;
> > +
> > +   /* Add the new Ethernet header */
> > +   if (skb_cow_head(skb, ETH_HLEN) < 0)
> > +   return -ENOMEM;
> > +
> > +   skb_push(skb, ETH_HLEN);
> > +   skb_reset_mac_header(skb);
> > +   skb_reset_mac_len(skb);
> > +
> > +   ether_addr_copy(eth_hdr(skb)->h_source, ethh->addresses.eth_src);
> > +   ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
> > +   eth_hdr(skb)->h_proto = ethh->eth_type;
> 
> This doesn't seem right. We know the packet type, it's skb->protocol.
> We should fill in that.

I think that makes sense. Possibly the eth_type field
can be removed from ovs_action_push_eth.

> In addition, we should check whether mac_len > 0 and in such case,
> change skb->protocol to ETH_P_TEB first (and store that value in the
> pushed eth header).
> 
> Similarly on pop_eth, we need to check skb->protocol and if it is
> ETH_P_TEB, call eth_type_trans on the modified frame to set the new
> skb->protocol correctly. It's probably not that simple, as we'd need a
> version of eth_type_trans that doesn't need a net device.

I'm not sure I understand the interaction with ETH_P_TEB here.

In my mind skb->protocol == ETH_P_TEB may be used early on in OvS's receive
processing to find the inner protocol from the packet and at that point
skb->protocol is set to that value. And that for further packet processing
the fact the packet was received as TEB is transparent.

Conversely, skb->protocol may be set as necessary on transmit as a packet
exits OvS.


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-09 Thread Simon Horman
On Fri, May 06, 2016 at 11:25:14AM +0200, Jiri Benc wrote:
> On Fri, 6 May 2016 14:57:07 +0900, Simon Horman wrote:
> > On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote:
> > > On transmit side you are using mac_len to detect l3 packet, why not do
> > > same while extracting the key?
> 
> I agree. The skb should be self-contained, i.e. it should be obvious
> whether it has the MAC header set or not just from the skb itself at
> any point in the packet processing. Otherwise, I'd expect things like
> recirculation to break after push/pop of eth header.
> 
> > Unfortunately mac_len can't be relied on here, emprically it has the same
> > value (28 in my tests) for both the TEB and layer3 case above.
> 
> That's strange, it looks like there's something setting the mac header
> unconditionally in ovs code. We should find that place and change it.

It seems to be caused by the following:

1. __ipgre_rcv() calls skb_pop_mac_header() which
   sets skb->mac_header to the skb->network_header.

2. __ipgre_rcv() then calls ip_tunnel_rcv() which calls
   skb_reset_network_header(). This updates skb->network_header to
   just after the end of the GRE header.

   This is 28 bytes after the old skb->network_header
   as there is a 20 byte IPv4 header followed by an
   8 byte GRE header.

3. Later, dev_gro_receive() calls skb_reset_mac_len().
   This calculates skb->mac_len based on skb->network_header and
   skb->mac_header. I.e. 28 bytes.


I think this may be possible to address by calling
skb_reset_network_header() instead of skb_pop_mac_header()
in __ipgre_rcv().

I think that would leave skb->mac_header and skb->network_header both
set to just after the end of the GRE header and result in mac_len of 0.
It looks like this owuld be for for both TEB and non-TEB GRE packets
and that OVS would need to check against mac_len, protocol and most
likely dev->type early on.

> The ARPHRD_NONE interfaces don't even set mac_header and mac_len, this
> will need to be set by ovs upon getting frame from such interface. 

Noted.

> > Perhaps that could be changed by futher enhancements in the tunneling code
> > but I think things are symetric as they stand:
> > 
> > * On recieve skb->protocol can be read to distinguish TEB and layer3 packets
> > * On transmit skb->protocol should be set to distinguish TEB and layer3 
> > packets
> 
> Yes, but you need to act upon this directly after receiving the
> frame/just before sending the frame and set up an internal flag that
> will be used throughout the code. That way, the packet can be handed
> over to different parts of the code, recirculated, etc. without
> worries. skb->mac_len is probably a good candidate for such flag.

Its possible that I've overlooked something but as things stand I think
things look like this:

* ovs_flow_key_extract() keys off dev->type and skb->protocol.
* ovs_flow_key_extract() calls key_extract() which amongst other things
  sets up the skb->mac_header and skb->mac_len of the skb.
* ovs_flow_key_extract() sets skb->protocol to that of the inner packet
  in the case of TEB
* Actions update the above mentioned skb fields as appropriate.

So it seems to me that it should be safe to rely on skb->protocol
in the receive path. Or more specifically, in netdev_port_receive().

If mac_len is also able to be used then I think fine. But it seems to me
that it needs to be set up by OvS at least for the ARPHRD_NONE case. This
could be done early on, say in netdev_port_receive(). But it seems that
would involve duplicating some of what is already occurring in
key_extract().


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-06 Thread Jiri Benc
On Wed,  4 May 2016 16:36:30 +0900, Simon Horman wrote:
> +static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
> + const struct ovs_action_push_eth *ethh)
> +{
> + int err;
> +
> + /* De-accelerate any hardware accelerated VLAN tag added to a previous
> +  * Ethernet header */
> + err = skb_vlan_deaccel(skb);
> + if (unlikely(err))
> + return err;
> +
> + /* Add the new Ethernet header */
> + if (skb_cow_head(skb, ETH_HLEN) < 0)
> + return -ENOMEM;
> +
> + skb_push(skb, ETH_HLEN);
> + skb_reset_mac_header(skb);
> + skb_reset_mac_len(skb);
> +
> + ether_addr_copy(eth_hdr(skb)->h_source, ethh->addresses.eth_src);
> + ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
> + eth_hdr(skb)->h_proto = ethh->eth_type;

This doesn't seem right. We know the packet type, it's skb->protocol.
We should fill in that.

In addition, we should check whether mac_len > 0 and in such case,
change skb->protocol to ETH_P_TEB first (and store that value in the
pushed eth header).

Similarly on pop_eth, we need to check skb->protocol and if it is
ETH_P_TEB, call eth_type_trans on the modified frame to set the new
skb->protocol correctly. It's probably not that simple, as we'd need a
version of eth_type_trans that doesn't need a net device.

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-06 Thread Jiri Benc
On Fri, 6 May 2016 14:57:07 +0900, Simon Horman wrote:
> On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote:
> > On transmit side you are using mac_len to detect l3 packet, why not do
> > same while extracting the key?

I agree. The skb should be self-contained, i.e. it should be obvious
whether it has the MAC header set or not just from the skb itself at
any point in the packet processing. Otherwise, I'd expect things like
recirculation to break after push/pop of eth header.

> Unfortunately mac_len can't be relied on here, emprically it has the same
> value (28 in my tests) for both the TEB and layer3 case above.

That's strange, it looks like there's something setting the mac header
unconditionally in ovs code. We should find that place and change it.

The ARPHRD_NONE interfaces don't even set mac_header and mac_len, this
will need to be set by ovs upon getting frame from such interface. 

> Perhaps that could be changed by futher enhancements in the tunneling code
> but I think things are symetric as they stand:
> 
> * On recieve skb->protocol can be read to distinguish TEB and layer3 packets
> * On transmit skb->protocol should be set to distinguish TEB and layer3 
> packets

Yes, but you need to act upon this directly after receiving the
frame/just before sending the frame and set up an internal flag that
will be used throughout the code. That way, the packet can be handed
over to different parts of the code, recirculated, etc. without
worries. skb->mac_len is probably a good candidate for such flag.

 Jiri


Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-05 Thread Simon Horman
[CC Jiri Benc]

On Thu, May 05, 2016 at 10:37:08AM -0700, pravin shelar wrote:
> On Wed, May 4, 2016 at 12:36 AM, Simon Horman
>  wrote:
> > From: Lorand Jakab 
> >
> > Implementation of the pop_eth and push_eth actions in the kernel, and
> > layer 3 flow support.
> >
> > This doesn't actually do anything yet as no layer 2 tunnel ports are
> > supported yet. The original patch by Lorand was against the Open vSwtich
> > tree which has L2 LISP tunnels but that is not supported in mainline Linux.
> > I (Simon) plan to follow up with support for non-TEB GRE ports based on
> > work by Thomas Morin.
> >
> > Cc: Thomas Morin 
> > Signed-off-by: Lorand Jakab 
> > Signed-off-by: Simon Horman 
> >
> > ---
> 
> ...
> 
> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index 0ea128eeeab2..6e174ea5f2bb 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct 
> > sw_flow_key *key)
> >
> > skb_reset_mac_header(skb);
> >
> > -   /* Link layer.  We are guaranteed to have at least the 14 byte 
> > Ethernet
> > -* header in the linear data area.
> > -*/
> > -   eth = eth_hdr(skb);
> > -   ether_addr_copy(key->eth.src, eth->h_source);
> > -   ether_addr_copy(key->eth.dst, eth->h_dest);
> > +   /* Link layer. */
> > +   if (key->phy.is_layer3) {
> > +   key->eth.tci = 0;
> > +   key->eth.type = skb->protocol;
> > +   } else {
> > +   eth = eth_hdr(skb);
> > +   ether_addr_copy(key->eth.src, eth->h_source);
> > +   ether_addr_copy(key->eth.dst, eth->h_dest);
> >
> > -   __skb_pull(skb, 2 * ETH_ALEN);
> > -   /* We are going to push all headers that we pull, so no need to
> > -* update skb->csum here.
> > -*/
> > +   __skb_pull(skb, 2 * ETH_ALEN);
> > +   /* We are going to push all headers that we pull, so no 
> > need to
> > +* update skb->csum here.
> > +*/
> >
> > -   key->eth.tci = 0;
> > -   if (skb_vlan_tag_present(skb))
> > -   key->eth.tci = htons(skb->vlan_tci);
> > -   else if (eth->h_proto == htons(ETH_P_8021Q))
> > -   if (unlikely(parse_vlan(skb, key)))
> > -   return -ENOMEM;
> > +   key->eth.tci = 0;
> > +   if (skb_vlan_tag_present(skb))
> > +   key->eth.tci = htons(skb->vlan_tci);
> > +   else if (eth->h_proto == htons(ETH_P_8021Q))
> > +   if (unlikely(parse_vlan(skb, key)))
> > +   return -ENOMEM;
> >
> > -   key->eth.type = parse_ethertype(skb);
> > -   if (unlikely(key->eth.type == htons(0)))
> > -   return -ENOMEM;
> > +   key->eth.type = parse_ethertype(skb);
> > +   if (unlikely(key->eth.type == htons(0)))
> > +   return -ENOMEM;
> > +   }
> >
> > skb_reset_network_header(skb);
> > skb_reset_mac_len(skb);
> > @@ -696,11 +699,23 @@ int ovs_flow_key_update(struct sk_buff *skb, struct 
> > sw_flow_key *key)
> >  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
> >  struct sk_buff *skb, struct sw_flow_key *key)
> >  {
> > +   bool is_layer3 = false;
> > +   bool is_teb = false;
> is_layer3 and is_teb are mutually exclusive, so can't we use single
> boolean here?

Sure, I can do something like the following if you prefer.
To my mind it makes things a bit less readable. But I don't feel
strongly about this.

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index d320c2657627..fc92cf542101 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -701,7 +701,6 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
*tun_info,
 struct sk_buff *skb, struct sw_flow_key *key)
 {
bool is_layer3 = false;
-   bool is_teb = false;
int err;
 
/* Extract metadata from packet. */
@@ -709,13 +708,9 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
*tun_info,
key->tun_proto = ip_tunnel_info_af(tun_info);
memcpy(>tun_key, _info->key, sizeof(key->tun_key));
 
-   if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
-   if (skb->protocol == htons(ETH_P_TEB))
-   is_teb = true;
-   else
-   is_layer3 = true;
-   }
-
+   if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER &&
+   skb->protocol != htons(ETH_P_TEB))
+   is_layer3 = true;
 
if (tun_info->options_len) {
BUILD_BUG_ON((1 << 

Re: [PATCH v9 net-next 4/7] openvswitch: add layer 3 flow/port support

2016-05-05 Thread pravin shelar
On Wed, May 4, 2016 at 12:36 AM, Simon Horman
 wrote:
> From: Lorand Jakab 
>
> Implementation of the pop_eth and push_eth actions in the kernel, and
> layer 3 flow support.
>
> This doesn't actually do anything yet as no layer 2 tunnel ports are
> supported yet. The original patch by Lorand was against the Open vSwtich
> tree which has L2 LISP tunnels but that is not supported in mainline Linux.
> I (Simon) plan to follow up with support for non-TEB GRE ports based on
> work by Thomas Morin.
>
> Cc: Thomas Morin 
> Signed-off-by: Lorand Jakab 
> Signed-off-by: Simon Horman 
>
> ---

...

> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 0ea128eeeab2..6e174ea5f2bb 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct 
> sw_flow_key *key)
>
> skb_reset_mac_header(skb);
>
> -   /* Link layer.  We are guaranteed to have at least the 14 byte 
> Ethernet
> -* header in the linear data area.
> -*/
> -   eth = eth_hdr(skb);
> -   ether_addr_copy(key->eth.src, eth->h_source);
> -   ether_addr_copy(key->eth.dst, eth->h_dest);
> +   /* Link layer. */
> +   if (key->phy.is_layer3) {
> +   key->eth.tci = 0;
> +   key->eth.type = skb->protocol;
> +   } else {
> +   eth = eth_hdr(skb);
> +   ether_addr_copy(key->eth.src, eth->h_source);
> +   ether_addr_copy(key->eth.dst, eth->h_dest);
>
> -   __skb_pull(skb, 2 * ETH_ALEN);
> -   /* We are going to push all headers that we pull, so no need to
> -* update skb->csum here.
> -*/
> +   __skb_pull(skb, 2 * ETH_ALEN);
> +   /* We are going to push all headers that we pull, so no need 
> to
> +* update skb->csum here.
> +*/
>
> -   key->eth.tci = 0;
> -   if (skb_vlan_tag_present(skb))
> -   key->eth.tci = htons(skb->vlan_tci);
> -   else if (eth->h_proto == htons(ETH_P_8021Q))
> -   if (unlikely(parse_vlan(skb, key)))
> -   return -ENOMEM;
> +   key->eth.tci = 0;
> +   if (skb_vlan_tag_present(skb))
> +   key->eth.tci = htons(skb->vlan_tci);
> +   else if (eth->h_proto == htons(ETH_P_8021Q))
> +   if (unlikely(parse_vlan(skb, key)))
> +   return -ENOMEM;
>
> -   key->eth.type = parse_ethertype(skb);
> -   if (unlikely(key->eth.type == htons(0)))
> -   return -ENOMEM;
> +   key->eth.type = parse_ethertype(skb);
> +   if (unlikely(key->eth.type == htons(0)))
> +   return -ENOMEM;
> +   }
>
> skb_reset_network_header(skb);
> skb_reset_mac_len(skb);
> @@ -696,11 +699,23 @@ int ovs_flow_key_update(struct sk_buff *skb, struct 
> sw_flow_key *key)
>  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  struct sk_buff *skb, struct sw_flow_key *key)
>  {
> +   bool is_layer3 = false;
> +   bool is_teb = false;
is_layer3 and is_teb are mutually exclusive, so can't we use single
boolean here?


> +   int err;
> +
> /* Extract metadata from packet. */
> if (tun_info) {
> key->tun_proto = ip_tunnel_info_af(tun_info);
> memcpy(>tun_key, _info->key, sizeof(key->tun_key));
>
> +   if (OVS_CB(skb)->input_vport->dev->type != ARPHRD_ETHER) {
> +   if (skb->protocol == htons(ETH_P_TEB))
> +   is_teb = true;
> +   else
> +   is_layer3 = true;
> +   }
> +
On transmit side you are using mac_len to detect l3 packet, why not do
same while extracting the key?