Re: [PATCH v2 net] net: skbuff: skb_vlan_push: Fix wrong unwinding of skb->data after __vlan_insert_tag call

2016-09-28 Thread Shmulik Ladkani
On Wed, 28 Sep 2016 16:43:38 +0200 Daniel Borkmann  wrote:
> Couldn't we end up with 1) for the act_vlan case when we'd have the
> offset-adjusted skb_vlan_push() fix from here, where we'd then redirect
> to ingress where skb_vlan_pop() would be called? If I'm not missing
> something, skb_vlan_push() would then point to the data location of 1)
> and with your other proposed direct netif_receive_skb() patch, no
> further skb->data adjustments would be done, right?

Right. Then skb_vlan_pop() should expect either (1) or (2).

> Another potential issue (but unrelated to this fix here) I just noticed
> is, whether act_vlan might have the same problem as we fixed in 8065694e6519
> ("bpf: fix checksum for vlan push/pop helper"). So potentially, we could
> end up fixing CHECKSUM_COMPLETE wrongly on ingress, since these 14 bytes
> are already pulled out of the sum at that point.
> 
> > Should we adjust "offset" back, only if resulting offset is >=14 ?  
> 
> If also the checksum one might end up as an issue, maybe it's just best
> to go through the pain and do the push/pull for data plus csum, so both
> skb_vlan_*() functions see the frame starting from mac header temporarily?

Although not related to this specific fix, I see 2 ways addressing the
rcsum problem:

1. Per your suggestion, skb_vlan_*() to expect 'data' at mac_header
   That would simplify things; for this suggested 'data unwind' fix as well

2. Within skb_vlan_*(), deduct (according to initial offset) whether
   we're already "pulled out" of the rcsum, and not invoke the
   skb_postpull/push_rcsum update.

Will meditate some more.

Thanks
Shmulik


Re: [PATCH v2 net] net: skbuff: skb_vlan_push: Fix wrong unwinding of skb->data after __vlan_insert_tag call

2016-09-28 Thread Shmulik Ladkani
On Wed, 28 Sep 2016 16:43:38 +0200 Daniel Borkmann  wrote:
> > (1) suppose upon entry we have
> >
> >  DA,SA,0x8100,TCI,0x0800,
> >  ^^
> >  mac_hdr  data
> >
> > initial offset is 18, and after current unwinding code we'll get  
> 
> You mean data points after the 0x0800, right?

Sorry. Yes, exactly as you say. Initially 18 bytes ahead:

DA,SA,0x8100,TCI,0x0800,
^   ^
mac_hdr data


Re: [PATCH v2 net] net: skbuff: skb_vlan_push: Fix wrong unwinding of skb->data after __vlan_insert_tag call

2016-09-28 Thread Daniel Borkmann

On 09/28/2016 01:56 PM, Shmulik Ladkani wrote:

On Wed, 28 Sep 2016 12:30:56 +0200, dan...@iogearbox.net wrote:

@@ -4608,6 +4608,8 @@ int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, 
u16 vlan_tci)

skb->protocol = skb->vlan_proto;
skb->mac_len += VLAN_HLEN;
+   if (offset)
+   offset += VLAN_HLEN;

skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
__skb_pull(skb, offset);


This looks much better indeed than your v1 of this patch.


Yep, after some meditation and history digging I happened to notice I
was barking at the wrong tree.


So the issue might only be visible to act_vlan as the other remaining user of
skb_vlan_push().


Yes, this is correct. I'll amend the log message to express that.
The bug occurs for callers of skb_vlan_push() whose data is not
pointing at mac_header.


My only question would be:
what about __skb_vlan_pop(), wouldn't that then need the same adjustment
a la offset -= VLAN_HLEN?


Well, theoretically, yes; but caller may expect 2 different things:

(assuming tags are in-payload)

(1) suppose upon entry we have

 DA,SA,0x8100,TCI,0x0800,
 ^^
 mac_hdr  data

initial offset is 18, and after current unwinding code we'll get


You mean data points after the 0x0800, right?



 DA,SA,0x0800,4_bytes,
 ^^
 mac_hdr  data

which is probably incorrect, adjustment 'offset -= VLAN_HLEN' is needed.

(2) suppose upon entry we have

 DA,SA,0x8100,TCI,0x0800
 ^^
 mac_hdr  data

initial offset is 14, and after current unwinding code we'll get

 DA,SA,0x0800,
 ^^
 mac_hdr  data

which is probably what user has intended.
(had we adjusted offset to be 10, 'data' would point into SA)

 From test I've made using act_vlan upon ingress on QinQ tags, existing call
provides data as in (2).

Thoughts?


Yeah, so we likely end up at 2) because of things like eth_type_trans()
that would only pull ETH_HLEN.

Couldn't we end up with 1) for the act_vlan case when we'd have the
offset-adjusted skb_vlan_push() fix from here, where we'd then redirect
to ingress where skb_vlan_pop() would be called? If I'm not missing
something, skb_vlan_push() would then point to the data location of 1)
and with your other proposed direct netif_receive_skb() patch, no
further skb->data adjustments would be done, right?

Another potential issue (but unrelated to this fix here) I just noticed
is, whether act_vlan might have the same problem as we fixed in 8065694e6519
("bpf: fix checksum for vlan push/pop helper"). So potentially, we could
end up fixing CHECKSUM_COMPLETE wrongly on ingress, since these 14 bytes
are already pulled out of the sum at that point.


Should we adjust "offset" back, only if resulting offset is >=14 ?


If also the checksum one might end up as an issue, maybe it's just best
to go through the pain and do the push/pull for data plus csum, so both
skb_vlan_*() functions see the frame starting from mac header temporarily?
Jiri, any thoughts?


Re: [PATCH v2 net] net: skbuff: skb_vlan_push: Fix wrong unwinding of skb->data after __vlan_insert_tag call

2016-09-28 Thread Shmulik Ladkani
Hi,

On Wed, 28 Sep 2016 12:30:56 +0200, dan...@iogearbox.net wrote:
> > @@ -4608,6 +4608,8 @@ int skb_vlan_push(struct sk_buff *skb, __be16 
> > vlan_proto, u16 vlan_tci)
> >
> > skb->protocol = skb->vlan_proto;
> > skb->mac_len += VLAN_HLEN;
> > +   if (offset)
> > +   offset += VLAN_HLEN;
> >
> > skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
> > __skb_pull(skb, offset);
> 
> This looks much better indeed than your v1 of this patch.

Yep, after some meditation and history digging I happened to notice I
was barking at the wrong tree.

> So the issue might only be visible to act_vlan as the other remaining user of
> skb_vlan_push(). 

Yes, this is correct. I'll amend the log message to express that.
The bug occurs for callers of skb_vlan_push() whose data is not
pointing at mac_header.

> My only question would be:
> what about __skb_vlan_pop(), wouldn't that then need the same adjustment
> a la offset -= VLAN_HLEN?

Well, theoretically, yes; but caller may expect 2 different things:

(assuming tags are in-payload)

(1) suppose upon entry we have

DA,SA,0x8100,TCI,0x0800,
^^
mac_hdr  data

initial offset is 18, and after current unwinding code we'll get

DA,SA,0x0800,4_bytes,
^^
mac_hdr  data

which is probably incorrect, adjustment 'offset -= VLAN_HLEN' is needed.

(2) suppose upon entry we have

DA,SA,0x8100,TCI,0x0800
^^
mac_hdr  data

initial offset is 14, and after current unwinding code we'll get

DA,SA,0x0800,
^^
mac_hdr  data

which is probably what user has intended.
(had we adjusted offset to be 10, 'data' would point into SA)

From test I've made using act_vlan upon ingress on QinQ tags, existing call
provides data as in (2).

Thoughts?
Should we adjust "offset" back, only if resulting offset is >=14 ?

Thanks,
Shmulik


Re: [PATCH v2 net] net: skbuff: skb_vlan_push: Fix wrong unwinding of skb->data after __vlan_insert_tag call

2016-09-28 Thread Daniel Borkmann

On 09/28/2016 11:08 AM, Shmulik Ladkani wrote:

From: Shmulik Ladkani 

In case 'skb_vlan_push' is called on an skb with a hw-accel vlan tag
present, the existing hw-accel tag is inserted into the payload, and
the new given tag is placed as new hw-accel tag.

In order to insert the existing hw-accel tag, 'skb_vlan_push' adjusts
the 'data' pointer at the mac_header (if needed), invokes __vlan_insert_tag,
and finally re-adjusts 'data' back to its original position (according
to the remembered "adjustment offset").

However, successful '__vlan_insert_tag' pushes 4 more bytes at start of
frame.
Alas, the remembered "adjustment offset" is NOT fixed to account for
these additional 4 bytes, so the subsequent '__skb_pull(skb, offset)'
fails to unwind 'data' back to its original location.

Since 'skb->mac_len' IS fixed to account for the additional 4 bytes
(incremented to a total of 18 bytes), any access to
'skb->data - skb->mac_len' points to bytes PRIOR start of frame.

This is problematic, as many constructs in the stack are issuing
'skb_push(skb, skb->mac_len)' prior xmit to a device (e.g tcf_mirred,
tcf_bpf, nf_dup_netdev_egress), resulting in bogus frames being
xmitted (having random 4 bytes at start of frame).

For example:

  # ip l add dev d0 type dummy
  # tc filter add dev eth0 parent : pref 1 basic \
  action vlan push protocol 802.1ad id 5 pipe \
  action mirred egress redirect dev d0

  Any 802.1q (hw-accel) tagged frames arriving on eth0 are xmitted as
  bogus frames on d0; whereas the expected behavior is having QinQ frames.

Fix, by properly accouting the additionally pushed 4 bytes (in the case
where an adjustment to point at mac_header was done).

Fixes: 93515d53b1 ("net: move vlan pop/push functions into common code")
Signed-off-by: Shmulik Ladkani 
Cc: Pravin Shelar 
Cc: Jiri Pirko 
---
  v2: Instead of reducing mac_len by 4 bytes, which was found incorrect,
  fix the problem of wrong unwinding of 'skb->data'

  David, if patch ok, suggest this goes to -stable

  net/core/skbuff.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d36c754..3926b79 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4608,6 +4608,8 @@ int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, 
u16 vlan_tci)

skb->protocol = skb->vlan_proto;
skb->mac_len += VLAN_HLEN;
+   if (offset)
+   offset += VLAN_HLEN;

skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
__skb_pull(skb, offset);


This looks much better indeed than your v1 of this patch.

v1 would have definitely changed existing behavior for the cls_bpf/act_bpf
case. Both start at the beginning of mac header from ingress and egress side.
So when frame comes in on ingress, skb->data points to start of net header,
we then do __skb_push(skb, skb->mac_len) before running BPF prog, and after
return from BPF prog again __skb_pull(skb, skb->mac_len) to return to original
location. Thus in skb_vlan_push() from BPF helper call offset is always 0;
perhaps similar in ovs case.

With the removed skb->mac_len adjustment in skb_vlan_push() from your v1, we
would then have pointed into vlan header on return to stack instead of net
header location as we do currently.

So the issue might only be visible to act_vlan as the other remaining user of
skb_vlan_push(). Above fix looks better to me. So if we don't start at the
mac header yet, we need to do the __skb_push()/__skb_pull() adjustment from
there, and since we expand mac header, we need to take these 4 bytes into
account as well for returning to original location. My only question would
be: what about __skb_vlan_pop(), wouldn't that then need the same adjustment
a la offset -= VLAN_HLEN?

Thanks,
Daniel