Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-04-07 Thread Ben Greear

On 04/07/2016 08:11 AM, Vijay Pandurangan wrote:

On Fri, Mar 25, 2016 at 7:46 PM, Ben Greear  wrote:

A real NIC can either do hardware checksums, or it cannot.  If it
cannot, then the host must do it on the CPU for both transmit and
receive.

Veth is not a real NIC, and it cannot do hardware checksum offloading.

So, we either lie and pretend it does, or we eat massive amounts
of CPU usage to calculate and check checksums when sending across
a veth pair.



That's a good point. Does anyone know what the overhead actually is these days?


You could try setting up a system with ixgbe or similar, and then manually
disable csum offload using ethtool, and see how that performs in comparison
to hardware offload?


But, if I am purposely corrupting a frame destined for veth, then the only
reason
I would want the stack to check the checksums is if I were testing my own
stack's checksum logic, and that seems to be a pretty limited use.



In the common case you're 100% right.  OTOH, there's something
disconcerting about an abstraction layer lying and behaving
unexpectedly.  Most traffic that originates on a machine can have its
checksums safely ignored.  Whatever the reason is (maybe, as you say
you're testing checksums – on the other hand maybe there's a bug in
your code somewhere), I really feel like we should try to figure out a
way to ensure that this optimization is at the very least opt-in…


I'm fine with allowing a user to force software-csum on veth devices
if someone wants to code that up, but forcing sw-csum for local frames
on veth devices should be disabled by default.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-04-07 Thread Vijay Pandurangan
On Fri, Mar 25, 2016 at 7:46 PM, Ben Greear  wrote:
> A real NIC can either do hardware checksums, or it cannot.  If it
> cannot, then the host must do it on the CPU for both transmit and
> receive.
>
> Veth is not a real NIC, and it cannot do hardware checksum offloading.
>
> So, we either lie and pretend it does, or we eat massive amounts
> of CPU usage to calculate and check checksums when sending across
> a veth pair.
>

That's a good point. Does anyone know what the overhead actually is these days?

>>> Any frame sent from a socket can be considered to be a local packet in my
>>> opinion.
>>
>>
>> I'm not sure that's totally right. Your bridge is adding a delay to
>> your packets; it could just as easily be simulating corruption by
>> corrupting 5% of packets going through it. If this change allows
>> corrupt packets to be delivered to an application when they could not
>> be delivered if the packets were routed via physical eths, I think
>> that is a bug.
>
>
> I actually do support corrupting the frame, but what I normally do is
> corrupt the contents
> of the packet, and then recalculate the IP checksum (and TCP if it applies)
> and send it on its way.  The receiving NIC and stack will pass the frame up
> to
> the application since the checksums match, and it would be up the
> application
> to deal with it.  So, I can easily cause an application to receive corrupted
> frames over physical eths.
>
> I can also corrupt without updating the checksums in case you want to
> test another systems NIC and/or stack.
>
> But, if I am purposely corrupting a frame destined for veth, then the only
> reason
> I would want the stack to check the checksums is if I were testing my own
> stack's checksum logic, and that seems to be a pretty limited use.


In the common case you're 100% right.  OTOH, there's something
disconcerting about an abstraction layer lying and behaving
unexpectedly.  Most traffic that originates on a machine can have its
checksums safely ignored.  Whatever the reason is (maybe, as you say
you're testing checksums – on the other hand maybe there's a bug in
your code somewhere), I really feel like we should try to figure out a
way to ensure that this optimization is at the very least opt-in…

>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Ben Greear

On 03/25/2016 04:03 PM, Vijay Pandurangan wrote:

On Fri, Mar 25, 2016 at 6:23 PM, Ben Greear  wrote:

On 03/25/2016 02:59 PM, Vijay Pandurangan wrote:


consider two scenarios, where process a sends raw ethernet frames
containing UDP packets to b

I) process a --> veth --> process b

II) process a -> eth -> wire -> eth -> process b

I believe (I) is the simplest setup we can create that will replicate this
bug.

If process a sends frames that contain UDP packets to process b, what
is the behaviour we want if the UDP packet *has an incorrect
checksum*?

It seems to me that I and II should have identical behaviour, and I
would think that (II) would not deliver the packets to the
application.

In (I) with Cong's patch would we be delivering corrupt UDP packets to
process b despite an incorrect checksum in (I)?

If so, I would argue that this patch isn't right.



Checksums are normally used to deal with flaky transport mechanisms,
and once a machine receives the frame, we do not keep re-calculating
checksums
as we move it through various drivers and subsystems.

In particular, checksums are NOT a security mechanism and can be easily
faked.

Since packets sent on one veth never actually hit any unreliable transport
before they are received on the peer veth, then there should be no need to
checksum packets whose origin is known to be on the local machine.


That's a good argument.  I'm trying to figure out how to reconcile
your thoughts with the argument that virtual ethernet devices are an
abstraction that should behave identically to perfectly-functional
physical ethernet devices when connected with a wire.

In my view, the invariant must be identical functionality, and if I
were writing a regression test for this system, that's what I would
test. I think optimizations for eliding checksums should be
implemented only if they don't alter this functionality.

There must be a way to structure / write this code so that we can
optimize veths without causing different behaviour ...


A real NIC can either do hardware checksums, or it cannot.  If it
cannot, then the host must do it on the CPU for both transmit and
receive.

Veth is not a real NIC, and it cannot do hardware checksum offloading.

So, we either lie and pretend it does, or we eat massive amounts
of CPU usage to calculate and check checksums when sending across
a veth pair.


Any frame sent from a socket can be considered to be a local packet in my
opinion.


I'm not sure that's totally right. Your bridge is adding a delay to
your packets; it could just as easily be simulating corruption by
corrupting 5% of packets going through it. If this change allows
corrupt packets to be delivered to an application when they could not
be delivered if the packets were routed via physical eths, I think
that is a bug.


I actually do support corrupting the frame, but what I normally do is corrupt 
the contents
of the packet, and then recalculate the IP checksum (and TCP if it applies)
and send it on its way.  The receiving NIC and stack will pass the frame up to
the application since the checksums match, and it would be up the application
to deal with it.  So, I can easily cause an application to receive corrupted
frames over physical eths.

I can also corrupt without updating the checksums in case you want to
test another systems NIC and/or stack.

But, if I am purposely corrupting a frame destined for veth, then the only 
reason
I would want the stack to check the checksums is if I were testing my own
stack's checksum logic, and that seems to be a pretty limited use.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Vijay Pandurangan
On Fri, Mar 25, 2016 at 6:23 PM, Ben Greear  wrote:
> On 03/25/2016 02:59 PM, Vijay Pandurangan wrote:
>>
>> consider two scenarios, where process a sends raw ethernet frames
>> containing UDP packets to b
>>
>> I) process a --> veth --> process b
>>
>> II) process a -> eth -> wire -> eth -> process b
>>
>> I believe (I) is the simplest setup we can create that will replicate this
>> bug.
>>
>> If process a sends frames that contain UDP packets to process b, what
>> is the behaviour we want if the UDP packet *has an incorrect
>> checksum*?
>>
>> It seems to me that I and II should have identical behaviour, and I
>> would think that (II) would not deliver the packets to the
>> application.
>>
>> In (I) with Cong's patch would we be delivering corrupt UDP packets to
>> process b despite an incorrect checksum in (I)?
>>
>> If so, I would argue that this patch isn't right.
>
>
> Checksums are normally used to deal with flaky transport mechanisms,
> and once a machine receives the frame, we do not keep re-calculating
> checksums
> as we move it through various drivers and subsystems.
>
> In particular, checksums are NOT a security mechanism and can be easily
> faked.
>
> Since packets sent on one veth never actually hit any unreliable transport
> before they are received on the peer veth, then there should be no need to
> checksum packets whose origin is known to be on the local machine.

That's a good argument.  I'm trying to figure out how to reconcile
your thoughts with the argument that virtual ethernet devices are an
abstraction that should behave identically to perfectly-functional
physical ethernet devices when connected with a wire.

In my view, the invariant must be identical functionality, and if I
were writing a regression test for this system, that's what I would
test. I think optimizations for eliding checksums should be
implemented only if they don't alter this functionality.

There must be a way to structure / write this code so that we can
optimize veths without causing different behaviour ...


>
> Any frame sent from a socket can be considered to be a local packet in my
> opinion.

I'm not sure that's totally right. Your bridge is adding a delay to
your packets; it could just as easily be simulating corruption by
corrupting 5% of packets going through it. If this change allows
corrupt packets to be delivered to an application when they could not
be delivered if the packets were routed via physical eths, I think
that is a bug.

>
> That is what Cong's patch does as far as I can tell.
>
>
> Thanks,
> Ben
>
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Ben Greear

On 03/25/2016 02:59 PM, Vijay Pandurangan wrote:

consider two scenarios, where process a sends raw ethernet frames
containing UDP packets to b

I) process a --> veth --> process b

II) process a -> eth -> wire -> eth -> process b

I believe (I) is the simplest setup we can create that will replicate this bug.

If process a sends frames that contain UDP packets to process b, what
is the behaviour we want if the UDP packet *has an incorrect
checksum*?

It seems to me that I and II should have identical behaviour, and I
would think that (II) would not deliver the packets to the
application.

In (I) with Cong's patch would we be delivering corrupt UDP packets to
process b despite an incorrect checksum in (I)?

If so, I would argue that this patch isn't right.


Checksums are normally used to deal with flaky transport mechanisms,
and once a machine receives the frame, we do not keep re-calculating checksums
as we move it through various drivers and subsystems.

In particular, checksums are NOT a security mechanism and can be easily faked.

Since packets sent on one veth never actually hit any unreliable transport
before they are received on the peer veth, then there should be no need to
checksum packets whose origin is known to be on the local machine.

Any frame sent from a socket can be considered to be a local packet in my
opinion.

That is what Cong's patch does as far as I can tell.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Cong Wang
On Fri, Mar 25, 2016 at 2:59 PM, Vijay Pandurangan  wrote:
> consider two scenarios, where process a sends raw ethernet frames
> containing UDP packets to b
>
> I) process a --> veth --> process b
>
> II) process a -> eth -> wire -> eth -> process b
>
> I believe (I) is the simplest setup we can create that will replicate this 
> bug.
>
> If process a sends frames that contain UDP packets to process b, what
> is the behaviour we want if the UDP packet *has an incorrect
> checksum*?
>
> It seems to me that I and II should have identical behaviour, and I
> would think that (II) would not deliver the packets to the
> application.
>
> In (I) with Cong's patch would we be delivering corrupt UDP packets to
> process b despite an incorrect checksum in (I)?
>

Right, I thought packet socket does the checksum by itself, so the
problem is: if user-space does the checksum like packet socket, its
checksum could be wrong therefore we can not trust it on RX path
once it loops back.

Let me think about it again.


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Cong Wang
On Fri, Mar 25, 2016 at 1:56 PM, Ben Greear  wrote:
> On 03/24/2016 10:33 PM, Cong Wang wrote:
>
>> Here we go:
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 1ecfa71..ab66080 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -1925,6 +1925,7 @@ static int packet_sendmsg_spkt(struct socket
>> *sock, struct msghdr *msg,
>>  goto out_unlock;
>>  }
>>
>> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
>>  skb->protocol = proto;
>>  skb->dev = dev;
>>  skb->priority = sk->sk_priority;
>> @@ -2496,6 +2497,7 @@ static int tpacket_fill_skb(struct packet_sock
>> *po, struct sk_buff *skb,
>>
>>  ph.raw = frame;
>>
>> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
>>  skb->protocol = proto;
>>  skb->dev = dev;
>>  skb->priority = po->sk.sk_priority;
>> @@ -2805,6 +2807,7 @@ static struct sk_buff *packet_alloc_skb(struct
>> sock *sk, size_t prepad,
>>  skb_put(skb, linear);
>>  skb->data_len = len - linear;
>>  skb->len += len - linear;
>> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
>>
>>  return skb;
>>   }
>
>
> I have tested UDP, TCP, TCPv6 and custom Ethernet frames across a veth pair.
>
> And, UDP, TCP, and pktgen across a pair of veth pairs
> bridged by my user-space packet filter.
>
> All of these tests work fine with your patch as far as I can tell.
>
> So, you can add:
>
> Tested-by: Ben Greear 

Thanks for testing! I will send it out formally after I audit more drivers,
also let's give Tom some time to response.

>
> That said, it could easily break some drivers and/or other scenarios that I
> have not tested, so at the least it should cook a while upstream before
> going into the
> stable tree

Yeah.

Thanks.


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Vijay Pandurangan
consider two scenarios, where process a sends raw ethernet frames
containing UDP packets to b

I) process a --> veth --> process b

II) process a -> eth -> wire -> eth -> process b

I believe (I) is the simplest setup we can create that will replicate this bug.

If process a sends frames that contain UDP packets to process b, what
is the behaviour we want if the UDP packet *has an incorrect
checksum*?

It seems to me that I and II should have identical behaviour, and I
would think that (II) would not deliver the packets to the
application.

In (I) with Cong's patch would we be delivering corrupt UDP packets to
process b despite an incorrect checksum in (I)?

If so, I would argue that this patch isn't right.


On Fri, Mar 25, 2016 at 4:56 PM, Ben Greear  wrote:
> On 03/24/2016 10:33 PM, Cong Wang wrote:
>
>> Here we go:
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 1ecfa71..ab66080 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -1925,6 +1925,7 @@ static int packet_sendmsg_spkt(struct socket
>> *sock, struct msghdr *msg,
>>  goto out_unlock;
>>  }
>>
>> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
>>  skb->protocol = proto;
>>  skb->dev = dev;
>>  skb->priority = sk->sk_priority;
>> @@ -2496,6 +2497,7 @@ static int tpacket_fill_skb(struct packet_sock
>> *po, struct sk_buff *skb,
>>
>>  ph.raw = frame;
>>
>> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
>>  skb->protocol = proto;
>>  skb->dev = dev;
>>  skb->priority = po->sk.sk_priority;
>> @@ -2805,6 +2807,7 @@ static struct sk_buff *packet_alloc_skb(struct
>> sock *sk, size_t prepad,
>>  skb_put(skb, linear);
>>  skb->data_len = len - linear;
>>  skb->len += len - linear;
>> +   skb->ip_summed = CHECKSUM_UNNECESSARY;
>>
>>  return skb;
>>   }
>
>
> I have tested UDP, TCP, TCPv6 and custom Ethernet frames across a veth pair.
>
> And, UDP, TCP, and pktgen across a pair of veth pairs
> bridged by my user-space packet filter.
>
> All of these tests work fine with your patch as far as I can tell.
>
> So, you can add:
>
> Tested-by: Ben Greear 
>
> That said, it could easily break some drivers and/or other scenarios that I
> have not tested, so at the least it should cook a while upstream before
> going into the
> stable tree
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Vijay Pandurangan
On Fri, Mar 25, 2016 at 10:35 AM, Ben Greear  wrote:
>
>
> On 03/24/2016 10:24 PM, Vijay Pandurangan wrote:
>>
>> On Fri, Mar 25, 2016 at 1:07 AM, Ben Greear 
>> wrote:
>>>
>>> On 03/24/2016 09:45 PM, Vijay Pandurangan wrote:


 Actually, maybe they should be set to CHECKSUM_PARTIAL if we want veth
 to drop the packets if they have bad checksums before they hit the
 application level.
>>>
>>>
>>>
>>> VETH is pretty special in that when you transmit a frame on one
>>> device, it's pair receives it, and unless there is RAM corruption
>>> or bugs in the kernel, then it cannot be corrupted.
>>
>>
>> Yeah, you're right that that's an optimization. However, I think that
>> we should first ensure that
>>
>> a->veth->b
>>
>> operates exactly like:
>>
>> a->physical eth 1 -> physical eth 2->b
>>
>> in all cases.  Once we have that working everywhere we could think
>> about optimizations.
>>
>>
>> If we're willing to refactor, we could implement the optimization by
>> allowing veth devices to know whether their immediate peer is. If a
>> veth knows it's talking to another veth, it could under some
>> circumstances elide checksum calculation and verification.  I'm not
>> sure what abstractions that would break, though. What do you guys
>> think?
>
>
> veth ALWAYS transmits to another VETH.  The problem is that when veth is
> given a packet to transmit, it is difficult to know where that packet
> came from.

Yeah you're totally right – I guess what I was trying to express (but
failed at) was that we might need to be able to track the original
source of the packet for optimizations.
>
> And, adding software checksumming to veth for every frame would be a huge
> performance hit.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Ben Greear

On 03/24/2016 10:33 PM, Cong Wang wrote:


Here we go:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1ecfa71..ab66080 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1925,6 +1925,7 @@ static int packet_sendmsg_spkt(struct socket
*sock, struct msghdr *msg,
 goto out_unlock;
 }

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
 skb->protocol = proto;
 skb->dev = dev;
 skb->priority = sk->sk_priority;
@@ -2496,6 +2497,7 @@ static int tpacket_fill_skb(struct packet_sock
*po, struct sk_buff *skb,

 ph.raw = frame;

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
 skb->protocol = proto;
 skb->dev = dev;
 skb->priority = po->sk.sk_priority;
@@ -2805,6 +2807,7 @@ static struct sk_buff *packet_alloc_skb(struct
sock *sk, size_t prepad,
 skb_put(skb, linear);
 skb->data_len = len - linear;
 skb->len += len - linear;
+   skb->ip_summed = CHECKSUM_UNNECESSARY;

 return skb;
  }


I have tested UDP, TCP, TCPv6 and custom Ethernet frames across a veth pair.

And, UDP, TCP, and pktgen across a pair of veth pairs
bridged by my user-space packet filter.

All of these tests work fine with your patch as far as I can tell.

So, you can add:

Tested-by: Ben Greear 

That said, it could easily break some drivers and/or other scenarios that I
have not tested, so at the least it should cook a while upstream before going 
into the
stable tree

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread David Miller
From: Ben Greear 
Date: Fri, 25 Mar 2016 10:14:46 -0700

> Anyway, you know the stack and drivers better than me, so if you
> think Cong's patch is valid, then I'll test it and make sure it
> works in my setups at least.

It probably is, I'm just waiting to see if Tom Herbert will give
some feedback or not as this is an area he understands well.


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Ben Greear



On 03/25/2016 09:44 AM, David Miller wrote:

From: Ben Greear 
Date: Fri, 25 Mar 2016 09:10:58 -0700


I am suspicious that this will break at least some drivers.  I
grepped around for ip_summed, and found this, for instance:

davicom/dm9000.c

 /* The DM9000 is not smart enough to leave fragmented packets


An really old (circa 1997), not-oft-used, driver such as this is not
the place to be looking for correct usage of skb->ip_summed semantics.

I would never use whatever a driver like this does influence whether I
apply a bug fix or not.



Point is, it took me 5 minutes to find that, and I did not look hard at many
other drivers.  e1000e and igb appear to be fine, and maybe the rest of them
are too.  Lord knows what other strange setups might be effected by the
ip_summed change.

Anyway, you know the stack and drivers better than me, so if you think Cong's
patch is valid, then I'll test it and make sure it works in my setups at least.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread David Miller
From: Cong Wang 
Date: Fri, 25 Mar 2016 09:32:23 -0700

> So I believe dm9000 needs to fix.

+1


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread David Miller
From: Ben Greear 
Date: Fri, 25 Mar 2016 09:10:58 -0700

> I am suspicious that this will break at least some drivers.  I
> grepped around for ip_summed, and found this, for instance:
> 
> davicom/dm9000.c
> 
> /* The DM9000 is not smart enough to leave fragmented packets

An really old (circa 1997), not-oft-used, driver such as this is not
the place to be looking for correct usage of skb->ip_summed semantics.

I would never use whatever a driver like this does influence whether I
apply a bug fix or not.


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Cong Wang
On Fri, Mar 25, 2016 at 9:10 AM, Ben Greear  wrote:
> I am suspicious that this will break at least some drivers.  I grepped
> around
> for ip_summed, and found this, for instance:
>
> davicom/dm9000.c
>
> /* The DM9000 is not smart enough to leave fragmented packets alone.
> */
> if (dm->ip_summed != ip_summed) {
> if (ip_summed == CHECKSUM_NONE)
> iow(dm, DM9000_TCCR, 0);
> else
> iow(dm, DM9000_TCCR, TCCR_IP | TCCR_UDP | TCCR_TCP);
> dm->ip_summed = ip_summed;
> }
>
>
> It is taking action based on ip_summed == CHECKSUM_NONE, and your change
> will probably break that.
>
> I would suggest that we try to make any fix specific only to veth,
> at least for now.  A tree-wide audit of drivers is probably required
> to safely make the kind of change you propose above.
>
> So, unless you can explain why your change is safe, then I do not plan
> to test it.

I just blindly trust the comments there:

 * CHECKSUM_UNNECESSARY:
 *
 *   This has the same meaning on as CHECKSUM_NONE for checksum offload on
 *   output.

Let's Cc Tom who wrote this comment.

On the other hand, hyperv got this correctly:

if ((skb->ip_summed == CHECKSUM_NONE) ||
(skb->ip_summed == CHECKSUM_UNNECESSARY))
goto do_send;

So I believe dm9000 needs to fix.

Thanks.


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Ben Greear



On 03/24/2016 10:33 PM, Cong Wang wrote:

On Thu, Mar 24, 2016 at 10:13 PM, Ben Greear  wrote:



On 03/24/2016 10:06 PM, Cong Wang wrote:


On Thu, Mar 24, 2016 at 9:34 PM, Ben Greear 
wrote:




On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:



Oops, I think my last email didn't go through due to an inadvertent
html attachment from my phone mail client.

Can you send us a copy of a packet you're sending and/or confirm that
the IP and UDP4 checksums are set correctly in the packet?

If those are set right, I think we need to read through the networking
code again to see why this is broken...




Wireshark decodes the packet as having no checksum errors.

I think the contents of the packet is correct, but the 'ip_summed'
field is set incorrectly to 'NONE' when transmitting on a raw packet
socket.



Yeah, these bugs are all due to the different interpretations of
ip_summed on TX path and RX path. I think the following patch
should work, if the comments don't mislead me. Could you give
it a try?

For the long term, we need to unify the meaning of ip_summed
on TX path and RX path, or at least translate it in skb_scrub_packet().



I can test this tomorrow, but I think it will not work.  I'm not sending raw
IP frames, I'm sending full ethernet frames.  Socket is PF_PACKET, SOCK_RAW.

Your patch may still be useful for others though?


Here we go:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1ecfa71..ab66080 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1925,6 +1925,7 @@ static int packet_sendmsg_spkt(struct socket
*sock, struct msghdr *msg,
 goto out_unlock;
 }

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
 skb->protocol = proto;
 skb->dev = dev;
 skb->priority = sk->sk_priority;
@@ -2496,6 +2497,7 @@ static int tpacket_fill_skb(struct packet_sock
*po, struct sk_buff *skb,

 ph.raw = frame;

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
 skb->protocol = proto;
 skb->dev = dev;
 skb->priority = po->sk.sk_priority;
@@ -2805,6 +2807,7 @@ static struct sk_buff *packet_alloc_skb(struct
sock *sk, size_t prepad,
 skb_put(skb, linear);
 skb->data_len = len - linear;
 skb->len += len - linear;
+   skb->ip_summed = CHECKSUM_UNNECESSARY;

 return skb;
  }


I am suspicious that this will break at least some drivers.  I grepped around
for ip_summed, and found this, for instance:

davicom/dm9000.c

/* The DM9000 is not smart enough to leave fragmented packets alone. */
if (dm->ip_summed != ip_summed) {
if (ip_summed == CHECKSUM_NONE)
iow(dm, DM9000_TCCR, 0);
else
iow(dm, DM9000_TCCR, TCCR_IP | TCCR_UDP | TCCR_TCP);
dm->ip_summed = ip_summed;
}


It is taking action based on ip_summed == CHECKSUM_NONE, and your change
will probably break that.

I would suggest that we try to make any fix specific only to veth,
at least for now.  A tree-wide audit of drivers is probably required
to safely make the kind of change you propose above.

So, unless you can explain why your change is safe, then I do not plan
to test it.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-25 Thread Ben Greear



On 03/24/2016 10:24 PM, Vijay Pandurangan wrote:

On Fri, Mar 25, 2016 at 1:07 AM, Ben Greear  wrote:

On 03/24/2016 09:45 PM, Vijay Pandurangan wrote:


Actually, maybe they should be set to CHECKSUM_PARTIAL if we want veth
to drop the packets if they have bad checksums before they hit the
application level.



VETH is pretty special in that when you transmit a frame on one
device, it's pair receives it, and unless there is RAM corruption
or bugs in the kernel, then it cannot be corrupted.


Yeah, you're right that that's an optimization. However, I think that
we should first ensure that

a->veth->b

operates exactly like:

a->physical eth 1 -> physical eth 2->b

in all cases.  Once we have that working everywhere we could think
about optimizations.


If we're willing to refactor, we could implement the optimization by
allowing veth devices to know whether their immediate peer is. If a
veth knows it's talking to another veth, it could under some
circumstances elide checksum calculation and verification.  I'm not
sure what abstractions that would break, though. What do you guys
think?


veth ALWAYS transmits to another VETH.  The problem is that when veth is
given a packet to transmit, it is difficult to know where that packet
came from.

And, adding software checksumming to veth for every frame would be a huge
performance hit.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Cong Wang
On Thu, Mar 24, 2016 at 10:13 PM, Ben Greear  wrote:
>
>
> On 03/24/2016 10:06 PM, Cong Wang wrote:
>>
>> On Thu, Mar 24, 2016 at 9:34 PM, Ben Greear 
>> wrote:
>>>
>>>
>>>
>>> On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:


 Oops, I think my last email didn't go through due to an inadvertent
 html attachment from my phone mail client.

 Can you send us a copy of a packet you're sending and/or confirm that
 the IP and UDP4 checksums are set correctly in the packet?

 If those are set right, I think we need to read through the networking
 code again to see why this is broken...
>>>
>>>
>>>
>>> Wireshark decodes the packet as having no checksum errors.
>>>
>>> I think the contents of the packet is correct, but the 'ip_summed'
>>> field is set incorrectly to 'NONE' when transmitting on a raw packet
>>> socket.
>>
>>
>> Yeah, these bugs are all due to the different interpretations of
>> ip_summed on TX path and RX path. I think the following patch
>> should work, if the comments don't mislead me. Could you give
>> it a try?
>>
>> For the long term, we need to unify the meaning of ip_summed
>> on TX path and RX path, or at least translate it in skb_scrub_packet().
>
>
> I can test this tomorrow, but I think it will not work.  I'm not sending raw
> IP frames, I'm sending full ethernet frames.  Socket is PF_PACKET, SOCK_RAW.
>
> Your patch may still be useful for others though?

Here we go:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1ecfa71..ab66080 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1925,6 +1925,7 @@ static int packet_sendmsg_spkt(struct socket
*sock, struct msghdr *msg,
goto out_unlock;
}

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = sk->sk_priority;
@@ -2496,6 +2497,7 @@ static int tpacket_fill_skb(struct packet_sock
*po, struct sk_buff *skb,

ph.raw = frame;

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = po->sk.sk_priority;
@@ -2805,6 +2807,7 @@ static struct sk_buff *packet_alloc_skb(struct
sock *sk, size_t prepad,
skb_put(skb, linear);
skb->data_len = len - linear;
skb->len += len - linear;
+   skb->ip_summed = CHECKSUM_UNNECESSARY;

return skb;
 }


Thanks for testing!


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Vijay Pandurangan
On Fri, Mar 25, 2016 at 1:07 AM, Ben Greear  wrote:
> On 03/24/2016 09:45 PM, Vijay Pandurangan wrote:
>>
>> Actually, maybe they should be set to CHECKSUM_PARTIAL if we want veth
>> to drop the packets if they have bad checksums before they hit the
>> application level.
>
>
> VETH is pretty special in that when you transmit a frame on one
> device, it's pair receives it, and unless there is RAM corruption
> or bugs in the kernel, then it cannot be corrupted.

Yeah, you're right that that's an optimization. However, I think that
we should first ensure that

a->veth->b

operates exactly like:

a->physical eth 1 -> physical eth 2->b

in all cases.  Once we have that working everywhere we could think
about optimizations.


If we're willing to refactor, we could implement the optimization by
allowing veth devices to know whether their immediate peer is. If a
veth knows it's talking to another veth, it could under some
circumstances elide checksum calculation and verification.  I'm not
sure what abstractions that would break, though. What do you guys
think?



>
> But, if you are routing frames from the network to veth
> devices, then the original packet could be corrupted, as
> described in your patch.
>
> Probably the right behaviour is to keep the old logic for packets
> originating from sockets, at least.  I am not sure of a good way
> to implement that.
>
> As for testing, I am not sure.  Probably you have to make a good
> effort and then just deal with fallout like what I found.  I would guess
> that any of us who have ever written an interesting patch have also written
> an interesting bug!
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear



On 03/24/2016 10:06 PM, Cong Wang wrote:

On Thu, Mar 24, 2016 at 9:34 PM, Ben Greear  wrote:



On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:


Oops, I think my last email didn't go through due to an inadvertent
html attachment from my phone mail client.

Can you send us a copy of a packet you're sending and/or confirm that
the IP and UDP4 checksums are set correctly in the packet?

If those are set right, I think we need to read through the networking
code again to see why this is broken...



Wireshark decodes the packet as having no checksum errors.

I think the contents of the packet is correct, but the 'ip_summed'
field is set incorrectly to 'NONE' when transmitting on a raw packet
socket.


Yeah, these bugs are all due to the different interpretations of
ip_summed on TX path and RX path. I think the following patch
should work, if the comments don't mislead me. Could you give
it a try?

For the long term, we need to unify the meaning of ip_summed
on TX path and RX path, or at least translate it in skb_scrub_packet().


I can test this tomorrow, but I think it will not work.  I'm not sending raw
IP frames, I'm sending full ethernet frames.  Socket is PF_PACKET, SOCK_RAW.

Your patch may still be useful for others though?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear

On 03/24/2016 09:45 PM, Vijay Pandurangan wrote:

Actually, maybe they should be set to CHECKSUM_PARTIAL if we want veth
to drop the packets if they have bad checksums before they hit the
application level.


VETH is pretty special in that when you transmit a frame on one
device, it's pair receives it, and unless there is RAM corruption
or bugs in the kernel, then it cannot be corrupted.

But, if you are routing frames from the network to veth
devices, then the original packet could be corrupted, as
described in your patch.

Probably the right behaviour is to keep the old logic for packets
originating from sockets, at least.  I am not sure of a good way
to implement that.

As for testing, I am not sure.  Probably you have to make a good
effort and then just deal with fallout like what I found.  I would guess
that any of us who have ever written an interesting patch have also written
an interesting bug!

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Cong Wang
On Thu, Mar 24, 2016 at 9:34 PM, Ben Greear  wrote:
>
>
> On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:
>>
>> Oops, I think my last email didn't go through due to an inadvertent
>> html attachment from my phone mail client.
>>
>> Can you send us a copy of a packet you're sending and/or confirm that
>> the IP and UDP4 checksums are set correctly in the packet?
>>
>> If those are set right, I think we need to read through the networking
>> code again to see why this is broken...
>
>
> Wireshark decodes the packet as having no checksum errors.
>
> I think the contents of the packet is correct, but the 'ip_summed'
> field is set incorrectly to 'NONE' when transmitting on a raw packet
> socket.

Yeah, these bugs are all due to the different interpretations of
ip_summed on TX path and RX path. I think the following patch
should work, if the comments don't mislead me. Could you give
it a try?

For the long term, we need to unify the meaning of ip_summed
on TX path and RX path, or at least translate it in skb_scrub_packet().

Thanks.

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 8d22de7..726457e 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -377,7 +377,7 @@ static int raw_send_hdrinc(struct sock *sk, struct
flowi4 *fl4,
iph = ip_hdr(skb);
skb_put(skb, length);

-   skb->ip_summed = CHECKSUM_NONE;
+   skb->ip_summed = CHECKSUM_UNNECESSARY;

sock_tx_timestamp(sk, _shinfo(skb)->tx_flags);

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4319e65..e6b3e31 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -646,7 +646,7 @@ static int rawv6_send_hdrinc(struct sock *sk,
struct msghdr *msg, int length,
skb_reset_network_header(skb);
iph = ipv6_hdr(skb);

-   skb->ip_summed = CHECKSUM_NONE;
+   skb->ip_summed = CHECKSUM_UNNECESSARY;

skb->transport_header = skb->network_header;
err = memcpy_from_msg(iph, msg, length);


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Vijay Pandurangan
Actually, maybe they should be set to CHECKSUM_PARTIAL if we want veth
to drop the packets if they have bad checksums before they hit the
application level.

On Fri, Mar 25, 2016 at 12:41 AM, Vijay Pandurangan  wrote:
> agreed. It should maybe be set to CHECKSUM_UNNECESSARY. The comment
> seems to imply that it's treated the same as CHECKSUM_NONE but that's
> evidently not true. I think that would fix the checksumming issue but
> I'm fearful it may break something else:
> http://lxr.free-electrons.com/source/include/linux/skbuff.h#L177
>
> I'm really worried there are other equally subtle bugs hidden in this
> code. Do we have any kind of regression test, or any automated way to
> test all possible values on an skb to determine side effects to any
> change here? (I'm new to the kernel so sorry if there's an answer in
> an FAQ somewhere).
>
> If not,
> 1. how should we ensure that our change doesn't break something else?
> 2. Should we audit / simplify the checksum code or come up with a list
> of test cases that covers all uses?
>
>
> On Fri, Mar 25, 2016 at 12:34 AM, Ben Greear  wrote:
>>
>>
>> On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:
>>>
>>> Oops, I think my last email didn't go through due to an inadvertent
>>> html attachment from my phone mail client.
>>>
>>> Can you send us a copy of a packet you're sending and/or confirm that
>>> the IP and UDP4 checksums are set correctly in the packet?
>>>
>>> If those are set right, I think we need to read through the networking
>>> code again to see why this is broken...
>>
>>
>> Wireshark decodes the packet as having no checksum errors.
>>
>> I think the contents of the packet is correct, but the 'ip_summed'
>> field is set incorrectly to 'NONE' when transmitting on a raw packet
>> socket.
>>
>>
>> Thanks,
>> Ben
>>
>> --
>> Ben Greear 
>> Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Vijay Pandurangan
agreed. It should maybe be set to CHECKSUM_UNNECESSARY. The comment
seems to imply that it's treated the same as CHECKSUM_NONE but that's
evidently not true. I think that would fix the checksumming issue but
I'm fearful it may break something else:
http://lxr.free-electrons.com/source/include/linux/skbuff.h#L177

I'm really worried there are other equally subtle bugs hidden in this
code. Do we have any kind of regression test, or any automated way to
test all possible values on an skb to determine side effects to any
change here? (I'm new to the kernel so sorry if there's an answer in
an FAQ somewhere).

If not,
1. how should we ensure that our change doesn't break something else?
2. Should we audit / simplify the checksum code or come up with a list
of test cases that covers all uses?


On Fri, Mar 25, 2016 at 12:34 AM, Ben Greear  wrote:
>
>
> On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:
>>
>> Oops, I think my last email didn't go through due to an inadvertent
>> html attachment from my phone mail client.
>>
>> Can you send us a copy of a packet you're sending and/or confirm that
>> the IP and UDP4 checksums are set correctly in the packet?
>>
>> If those are set right, I think we need to read through the networking
>> code again to see why this is broken...
>
>
> Wireshark decodes the packet as having no checksum errors.
>
> I think the contents of the packet is correct, but the 'ip_summed'
> field is set incorrectly to 'NONE' when transmitting on a raw packet
> socket.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear



On 03/24/2016 06:44 PM, Vijay Pandurangan wrote:

Oops, I think my last email didn't go through due to an inadvertent
html attachment from my phone mail client.

Can you send us a copy of a packet you're sending and/or confirm that
the IP and UDP4 checksums are set correctly in the packet?

If those are set right, I think we need to read through the networking
code again to see why this is broken...


Wireshark decodes the packet as having no checksum errors.

I think the contents of the packet is correct, but the 'ip_summed'
field is set incorrectly to 'NONE' when transmitting on a raw packet
socket.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Vijay Pandurangan
Oops, I think my last email didn't go through due to an inadvertent
html attachment from my phone mail client.

Can you send us a copy of a packet you're sending and/or confirm that
the IP and UDP4 checksums are set correctly in the packet?

If those are set right, I think we need to read through the networking
code again to see why this is broken...



On Thu, Mar 24, 2016 at 9:13 PM, Ben Greear  wrote:
> On 03/24/2016 06:11 PM, Ben Greear wrote:
>>
>> On 03/24/2016 05:06 PM, Ben Greear wrote:
>>>
>>> On 03/24/2016 04:56 PM, Cong Wang wrote:

 On Thu, Mar 24, 2016 at 3:01 PM, Ben Greear 
 wrote:
>
> I have an application that creates two pairs of veth devices.
>
> a <-> b   c <-> d
>
> b and c have a raw packet socket opened on them and I 'bridge' frames
> between b and c to provide network emulation (ie, configurable delay).
>

 IIUC, you create two raw sockets in order to bridge these two veth
 pairs?
 That is, to receive packets on one socket and deliver packets on the
 other?
>>>
>>>
>>> Yes.
>>>
> I put IP 1.1.1.1/24 on a, 1.1.1.2/24 on d, and then create a UDP
> connection
> (using policy based routing to ensure frames are sent on the
> appropriate
> interfaces).
>
> This is user-space only app, and kernel in this case is completely
> unmodified.
>
> The commit below breaks this feature:  UDP frames are sniffed on both a
> and
> d ports
> (in both directions), but the UDP socket does not receive frames.
>
> Using normal ethernet ports, this network emulation feature works fine,
> so
> it is
> specific to VETH.
>
> A similar test with just sending UDP between a single veth pair:  e <->
> f
> works fine.  Maybe it has something to do with raw packets?
>

 Yeah, I have the same feeling. Could you trace kfree_skb() to see
 where these packets are dropped? At UDP layer?
>>>
>>>
>>> Since reverting the patch fixes this, it almost certainly has to be due
>>> to some
>>> checksum checking logic.  Since UDP sockets (between single veth pair)
>>> works, it would appear to be related to my packet bridge, so maybe
>>> it is specific to raw packets and/or sendmmsg api.
>>>
>>> I'll investigate it better tomorrow.
>>
>>
>> So, I found time to poke at it this evening:
>>
>> Sending between two veth pairs, no packet bridge involved.
>
>
> E, to be clear:  I mean sending between two ends of a single veth pair
> here.
>
>>
>> UDP:  ip_summed is 3 (CHECKSUM_PARTIAL)   # Works fine.
>> raw packet frames, custom ether protocol (0x type):  ip_summed is 0
>> (NONE) # Works fine.
>>
>> When I try to send UDP through the veth pairs & pkt bridge, I see this:
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear

On 03/24/2016 06:11 PM, Ben Greear wrote:

On 03/24/2016 05:06 PM, Ben Greear wrote:

On 03/24/2016 04:56 PM, Cong Wang wrote:

On Thu, Mar 24, 2016 at 3:01 PM, Ben Greear  wrote:

I have an application that creates two pairs of veth devices.

a <-> b   c <-> d

b and c have a raw packet socket opened on them and I 'bridge' frames
between b and c to provide network emulation (ie, configurable delay).



IIUC, you create two raw sockets in order to bridge these two veth pairs?
That is, to receive packets on one socket and deliver packets on the other?


Yes.


I put IP 1.1.1.1/24 on a, 1.1.1.2/24 on d, and then create a UDP connection
(using policy based routing to ensure frames are sent on the appropriate
interfaces).

This is user-space only app, and kernel in this case is completely
unmodified.

The commit below breaks this feature:  UDP frames are sniffed on both a and
d ports
(in both directions), but the UDP socket does not receive frames.

Using normal ethernet ports, this network emulation feature works fine, so
it is
specific to VETH.

A similar test with just sending UDP between a single veth pair:  e <-> f
works fine.  Maybe it has something to do with raw packets?



Yeah, I have the same feeling. Could you trace kfree_skb() to see
where these packets are dropped? At UDP layer?


Since reverting the patch fixes this, it almost certainly has to be due to some
checksum checking logic.  Since UDP sockets (between single veth pair)
works, it would appear to be related to my packet bridge, so maybe
it is specific to raw packets and/or sendmmsg api.

I'll investigate it better tomorrow.


So, I found time to poke at it this evening:

Sending between two veth pairs, no packet bridge involved.


E, to be clear:  I mean sending between two ends of a single veth pair here.



UDP:  ip_summed is 3 (CHECKSUM_PARTIAL)   # Works fine.
raw packet frames, custom ether protocol (0x type):  ip_summed is 0 (NONE) 
# Works fine.

When I try to send UDP through the veth pairs & pkt bridge, I see this:


Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear

On 03/24/2016 05:06 PM, Ben Greear wrote:

On 03/24/2016 04:56 PM, Cong Wang wrote:

On Thu, Mar 24, 2016 at 3:01 PM, Ben Greear  wrote:

I have an application that creates two pairs of veth devices.

a <-> b   c <-> d

b and c have a raw packet socket opened on them and I 'bridge' frames
between b and c to provide network emulation (ie, configurable delay).



IIUC, you create two raw sockets in order to bridge these two veth pairs?
That is, to receive packets on one socket and deliver packets on the other?


Yes.


I put IP 1.1.1.1/24 on a, 1.1.1.2/24 on d, and then create a UDP connection
(using policy based routing to ensure frames are sent on the appropriate
interfaces).

This is user-space only app, and kernel in this case is completely
unmodified.

The commit below breaks this feature:  UDP frames are sniffed on both a and
d ports
(in both directions), but the UDP socket does not receive frames.

Using normal ethernet ports, this network emulation feature works fine, so
it is
specific to VETH.

A similar test with just sending UDP between a single veth pair:  e <-> f
works fine.  Maybe it has something to do with raw packets?



Yeah, I have the same feeling. Could you trace kfree_skb() to see
where these packets are dropped? At UDP layer?


Since reverting the patch fixes this, it almost certainly has to be due to some
checksum checking logic.  Since UDP sockets (between single veth pair)
works, it would appear to be related to my packet bridge, so maybe
it is specific to raw packets and/or sendmmsg api.

I'll investigate it better tomorrow.


So, I found time to poke at it this evening:

Sending between two veth pairs, no packet bridge involved.

UDP:  ip_summed is 3 (CHECKSUM_PARTIAL)   # Works fine.
raw packet frames, custom ether protocol (0x type):  ip_summed is 0 (NONE) 
# Works fine.

When I try to send UDP through the veth pairs & pkt bridge, I see this:

(pkt-bridge connects to the 'b' side of the veth pairs)

Mar 24 17:59:34 ben-ota-1 kernel: dev: rddVR0  rcv: rddVR0b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:34 ben-ota-1 kernel: dev: rddVR1b  rcv: rddVR1  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:34 ben-ota-1 kernel: dev: rddVR1  rcv: rddVR1b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:34 ben-ota-1 kernel: dev: rddVR0b  rcv: rddVR0  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:34 ben-ota-1 kernel: dev: rddVR0  rcv: rddVR0b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:34 ben-ota-1 kernel: dev: rddVR1b  rcv: rddVR1  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR1  rcv: rddVR1b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR0b  rcv: rddVR0  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR0  rcv: rddVR0b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR1b  rcv: rddVR1  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR1  rcv: rddVR1b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR0b  rcv: rddVR0  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR0  rcv: rddVR0b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR1b  rcv: rddVR1  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR1  rcv: rddVR1b  ip_summed: 3  
rcv-features: 0x184074011e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR0b  rcv: rddVR0  ip_summed: 0  
rcv-features: 0x184075b59e9
Mar 24 17:59:35 ben-ota-1 kernel: dev: rddVR0  rcv: rddVR0b  ip_summed: 3  
rcv-features: 0x184074011e9


I am guessing the issue is that when my pkt bridge sends a raw frame that is 
actually a UDP packet,
the fact that it has ip_summed == 0 in the kernel causes the frame to be 
dropped.


I modified veth.c like this for this test:

static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct veth_priv *priv = netdev_priv(dev);
struct net_device *rcv;
int length = skb->len;

rcu_read_lock();
rcv = rcu_dereference(priv->peer);
if (unlikely(!rcv)) {
kfree_skb(skb);
goto drop;
}

pr_err("dev: %s  rcv: %s  ip_summed: %d  rcv-features: 0x%llx\n",
   dev->name, rcv->name, skb->ip_summed, (unsigned long 
long)rcv->features);
#if 0
/* don't change ip_summed == CHECKSUM_PARTIAL, as that
 * will cause bad checksum on forwarded packets
 */
if (skb->ip_summed == CHECKSUM_NONE &&
rcv->features & NETIF_F_RXCSUM)
skb->ip_summed = CHECKSUM_UNNECESSARY;
#endif


Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear

On 03/24/2016 04:56 PM, Cong Wang wrote:

On Thu, Mar 24, 2016 at 3:01 PM, Ben Greear  wrote:

I have an application that creates two pairs of veth devices.

a <-> b   c <-> d

b and c have a raw packet socket opened on them and I 'bridge' frames
between b and c to provide network emulation (ie, configurable delay).



IIUC, you create two raw sockets in order to bridge these two veth pairs?
That is, to receive packets on one socket and deliver packets on the other?


Yes.


I put IP 1.1.1.1/24 on a, 1.1.1.2/24 on d, and then create a UDP connection
(using policy based routing to ensure frames are sent on the appropriate
interfaces).

This is user-space only app, and kernel in this case is completely
unmodified.

The commit below breaks this feature:  UDP frames are sniffed on both a and
d ports
(in both directions), but the UDP socket does not receive frames.

Using normal ethernet ports, this network emulation feature works fine, so
it is
specific to VETH.

A similar test with just sending UDP between a single veth pair:  e <-> f
works fine.  Maybe it has something to do with raw packets?



Yeah, I have the same feeling. Could you trace kfree_skb() to see
where these packets are dropped? At UDP layer?


Since reverting the patch fixes this, it almost certainly has to be due to some
checksum checking logic.  Since UDP sockets (between single veth pair)
works, it would appear to be related to my packet bridge, so maybe
it is specific to raw packets and/or sendmmsg api.

I'll investigate it better tomorrow.

Thanks,
Ben



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Cong Wang
On Thu, Mar 24, 2016 at 3:01 PM, Ben Greear  wrote:
> I have an application that creates two pairs of veth devices.
>
> a <-> b   c <-> d
>
> b and c have a raw packet socket opened on them and I 'bridge' frames
> between b and c to provide network emulation (ie, configurable delay).
>

IIUC, you create two raw sockets in order to bridge these two veth pairs?
That is, to receive packets on one socket and deliver packets on the other?


>
> I put IP 1.1.1.1/24 on a, 1.1.1.2/24 on d, and then create a UDP connection
> (using policy based routing to ensure frames are sent on the appropriate
> interfaces).
>
> This is user-space only app, and kernel in this case is completely
> unmodified.
>
> The commit below breaks this feature:  UDP frames are sniffed on both a and
> d ports
> (in both directions), but the UDP socket does not receive frames.
>
> Using normal ethernet ports, this network emulation feature works fine, so
> it is
> specific to VETH.
>
> A similar test with just sending UDP between a single veth pair:  e <-> f
> works fine.  Maybe it has something to do with raw packets?
>

Yeah, I have the same feeling. Could you trace kfree_skb() to see
where these packets are dropped? At UDP layer?

Thanks.


Re: veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear

On 03/24/2016 03:49 PM, Vijay Pandurangan wrote:

Hmm that's troubling. We tested various UDP configurations but I don't think we 
tested this setup. Do you have code or more specific instructions we can use to
replicate this bug?


The user-space app is not open source, and the routing setup is
a bit tricky as well.

I'm not certain it is specific to UDP, probably it is not.

I'm using sendmmsg to transmit frames on the packet socket, so
possibly something in that path is key.

Truth is, I can debug this with printk and see what is going
on if you have no immediate ideas what is going wrong.

Thanks,
Ben




On Mar 24, 2016 6:02 PM, "Ben Greear" > wrote:

I have an application that creates two pairs of veth devices.

a <-> b   c <-> d

b and c have a raw packet socket opened on them and I 'bridge' frames
between b and c to provide network emulation (ie, configurable delay).


I put IP 1.1.1.1/24  on a, 1.1.1.2/24 
 on d, and then create a UDP connection
(using policy based routing to ensure frames are sent on the appropriate
interfaces).

This is user-space only app, and kernel in this case is completely 
unmodified.

The commit below breaks this feature:  UDP frames are sniffed on both a and 
d ports
(in both directions), but the UDP socket does not receive frames.

Using normal ethernet ports, this network emulation feature works fine, so 
it is
specific to VETH.

A similar test with just sending UDP between a single veth pair:  e <-> f
works fine.  Maybe it has something to do with raw packets?


The patch below is the culprit:


[greearb@ben-dt3 linux-2.6]$ git bisect bad
ce8c839b74e3017996fad4e1b7ba2e2625ede82f is the first bad commit
commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f
Author: Vijay Pandurangan >
Date:   Fri Dec 18 14:34:59 2015 -0500

 veth: don’t modify ip_summed; doing so treats packets with bad 
checksums as good.

 Packets that arrive from real hardware devices have ip_summed ==
 CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
 CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
 current version of veth will replace CHECKSUM_NONE with
 CHECKSUM_UNNECESSARY, which causes corrupt packets routed from 
hardware to
 a veth device to be delivered to the application. This caused 
applications
 at Twitter to receive corrupt data when network hardware was corrupting
 packets.

...

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0ef4a5a..ba21d07 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -117,12 +117,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
struct net_device *dev)
 kfree_skb(skb);
 goto drop;
 }
-   /* don't change ip_summed == CHECKSUM_PARTIAL, as that
-* will cause bad checksum on forwarded packets
-*/
-   if (skb->ip_summed == CHECKSUM_NONE &&
-   rcv->features & NETIF_F_RXCSUM)
-   skb->ip_summed = CHECKSUM_UNNECESSARY;

 if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
 struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);


Any suggestions for how to fix this so that I get the old working behaviour 
and
the bug this patch was trying to fix is also resolved?

Thanks,
Ben

--
Ben Greear >
Candela Technologies Inc http://www.candelatech.com




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



veth regression with "don’t modify ip_summed; doing so treats packets with bad checksums as good."

2016-03-24 Thread Ben Greear

I have an application that creates two pairs of veth devices.

a <-> b   c <-> d

b and c have a raw packet socket opened on them and I 'bridge' frames
between b and c to provide network emulation (ie, configurable delay).


I put IP 1.1.1.1/24 on a, 1.1.1.2/24 on d, and then create a UDP connection
(using policy based routing to ensure frames are sent on the appropriate
interfaces).

This is user-space only app, and kernel in this case is completely unmodified.

The commit below breaks this feature:  UDP frames are sniffed on both a and d 
ports
(in both directions), but the UDP socket does not receive frames.

Using normal ethernet ports, this network emulation feature works fine, so it is
specific to VETH.

A similar test with just sending UDP between a single veth pair:  e <-> f
works fine.  Maybe it has something to do with raw packets?


The patch below is the culprit:


[greearb@ben-dt3 linux-2.6]$ git bisect bad
ce8c839b74e3017996fad4e1b7ba2e2625ede82f is the first bad commit
commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f
Author: Vijay Pandurangan 
Date:   Fri Dec 18 14:34:59 2015 -0500

veth: don’t modify ip_summed; doing so treats packets with bad checksums as 
good.

Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.

...

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0ef4a5a..ba21d07 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -117,12 +117,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
kfree_skb(skb);
goto drop;
}
-   /* don't change ip_summed == CHECKSUM_PARTIAL, as that
-* will cause bad checksum on forwarded packets
-*/
-   if (skb->ip_summed == CHECKSUM_NONE &&
-   rcv->features & NETIF_F_RXCSUM)
-   skb->ip_summed = CHECKSUM_UNNECESSARY;

if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);


Any suggestions for how to fix this so that I get the old working behaviour and
the bug this patch was trying to fix is also resolved?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com