Problem with fragmented packets on tun/tap interface

2015-07-31 Thread Prashant Upadhyaya
Hi,

I am having a linux user space application which gets data from an
interface (eg. eth0) using a packet socket. The application has a fast
path and a slow path. In the fast path the packets are processed by
the application and sent out via the packet socket. Certain packets
need processing by linux IP stack -- this constitutes the slow path. I
use a tun/tap interface to inject the packet into the kernel in case
it deserves slow path processing. When the kernel responds, I read the
packets back from the tap and send it out via the packet socket. I use
iptable rules to drop the packets at the entry from the interface so
that they are not processed by kernel directly (because I read them
via packet socket and then inject into the kernel via the tap
interface)
iptables -A INPUT -i eth0 -j DROP
iptables -A FORWARD -i eth0 -j DROP


Now the above mechanism works very well for me except when the slow
path encounters fragmented IP packets. When I inject the fragmented IP
packets into the kernel via the tap interface, the kernel does
not.respond (eg. for a ping bigger than mtu size) I have checked with
tcpdump on tap that I have injected all the fragments into the kernel.

Strangely enough, the same usecase works if I put the delays (usleep)
at two places in my application --

1. Just before writing the packet to tap  (injection into the kernel)
2. Just after reading the kernel response from the tap and just before
sending the packet out using the packet socket.

The delays work for me but is clearly not good for the performance of
the slow path. And more importantly, I was looking for a fundamental
reason regarding why it works with delays and why not without it. The
issue is reproducible with a big ping (3.11.10-301.fc20.x86_64)

Regards
-Prashant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-08-13 Thread Prashant Upadhyaya
On Fri, Jul 31, 2015 at 4:51 PM, Eric Dumazet  wrote:
> On Fri, 2015-07-31 at 16:42 +0530, Prashant Upadhyaya wrote:
>> On Fri, Jul 31, 2015 at 1:26 PM, Eric Dumazet  wrote:
>> > On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote:
>> >
>> >> The delays work for me but is clearly not good for the performance of
>> >> the slow path. And more importantly, I was looking for a fundamental
>> >> reason regarding why it works with delays and why not without it. The
>> >> issue is reproducible with a big ping (3.11.10-301.fc20.x86_64)
>> >
>> > How big ping needs to be to reproduce the problem ?
>> >
>> >
>>
>> If the MTU is 1500, I start getting problems anywhere starting from
>> 2900 bytes and surely comes when further big pings are used eg. 10 K.
>> (ping  -s  eg. ping 10.3.10.244 -s 1)
>> And the big pings do work, as I said, with the delay hack.
>
> It might help trying this while you receive such frags :
>
> perf record -a -g skb:kfree_skb sleep 10
>
> ...
>
> perf report
>
>
>

Hi,

I think I have a clue to the root cause of my issue, but I do not know
a solution.
Let me describe what I think is the problem.

Fragmented packets enter into the kernel through eth0 and the kernel
starts assembling them.
Simultaneously, my packet socket implementation also injects the very
same packets into the kernel via the tap. The kernel sees them as
overlapped packets during assembly and drops the packets injected via
the tap.
Eventually when the assembly gets complete inside kernel for all the
packets which entered via eth0, the whole packet gets dropped due to
the iptables rules that I have set on eth0.
So naturally there is no response to the bigger ping, because
everything got dropped one way or the other.

When I do introduce the delays (and it turns out that the delay that
matters is when injecting via tap), the kernel has already completed
the assembly of the packets via eth0 (during the delay I introduce for
submission on tap), and then the submission via tap works well because
it undergoes a fresh assembly (and ofcourse it does not get dropped
because iptables drop rule is only on eth0)

Now then, the question is -- how do I prevent the kernel from trying
to assemble the packets arriving on eth0 and drop them right away even
before assembly is attempted. This way the same packet injected via
the tap would be the only one undergoing assembly and hopefully it
would work.

Regards
-Prashant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-08-13 Thread Eric Dumazet
On Thu, 2015-08-13 at 12:52 +0530, Prashant Upadhyaya wrote:

> 
> Hi,
> 
> I think I have a clue to the root cause of my issue, but I do not know
> a solution.
> Let me describe what I think is the problem.
> 
> Fragmented packets enter into the kernel through eth0 and the kernel
> starts assembling them.
> Simultaneously, my packet socket implementation also injects the very
> same packets into the kernel via the tap. The kernel sees them as
> overlapped packets during assembly and drops the packets injected via
> the tap.
> Eventually when the assembly gets complete inside kernel for all the
> packets which entered via eth0, the whole packet gets dropped due to
> the iptables rules that I have set on eth0.
> So naturally there is no response to the bigger ping, because
> everything got dropped one way or the other.
> 
> When I do introduce the delays (and it turns out that the delay that
> matters is when injecting via tap), the kernel has already completed
> the assembly of the packets via eth0 (during the delay I introduce for
> submission on tap), and then the submission via tap works well because
> it undergoes a fresh assembly (and ofcourse it does not get dropped
> because iptables drop rule is only on eth0)
> 
> Now then, the question is -- how do I prevent the kernel from trying
> to assemble the packets arriving on eth0 and drop them right away even
> before assembly is attempted. This way the same packet injected via
> the tap would be the only one undergoing assembly and hopefully it
> would work.
> 

Nice theory ! 

What kind of iptables rule do you have to drop packets coming on eth0 ?

Have you tried to install this rule in raw table, PREROUTING hook ?

This should work, because the defrag is attempted from
ip_local_deliver() [ after raw table has given its verdict] , not from
ip_rcv().

iptables -t raw -I PREROUTING -i eth0 -j DROP




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-08-19 Thread Prashant Upadhyaya
On Thu, Aug 13, 2015 at 7:48 PM, Eric Dumazet  wrote:
> On Thu, 2015-08-13 at 12:52 +0530, Prashant Upadhyaya wrote:
>
>>
>> Hi,
>>
>> I think I have a clue to the root cause of my issue, but I do not know
>> a solution.
>> Let me describe what I think is the problem.
>>
>> Fragmented packets enter into the kernel through eth0 and the kernel
>> starts assembling them.
>> Simultaneously, my packet socket implementation also injects the very
>> same packets into the kernel via the tap. The kernel sees them as
>> overlapped packets during assembly and drops the packets injected via
>> the tap.
>> Eventually when the assembly gets complete inside kernel for all the
>> packets which entered via eth0, the whole packet gets dropped due to
>> the iptables rules that I have set on eth0.
>> So naturally there is no response to the bigger ping, because
>> everything got dropped one way or the other.
>>
>> When I do introduce the delays (and it turns out that the delay that
>> matters is when injecting via tap), the kernel has already completed
>> the assembly of the packets via eth0 (during the delay I introduce for
>> submission on tap), and then the submission via tap works well because
>> it undergoes a fresh assembly (and ofcourse it does not get dropped
>> because iptables drop rule is only on eth0)
>>
>> Now then, the question is -- how do I prevent the kernel from trying
>> to assemble the packets arriving on eth0 and drop them right away even
>> before assembly is attempted. This way the same packet injected via
>> the tap would be the only one undergoing assembly and hopefully it
>> would work.
>>
>
> Nice theory !
>
> What kind of iptables rule do you have to drop packets coming on eth0 ?
>
> Have you tried to install this rule in raw table, PREROUTING hook ?
>
> This should work, because the defrag is attempted from
> ip_local_deliver() [ after raw table has given its verdict] , not from
> ip_rcv().
>
> iptables -t raw -I PREROUTING -i eth0 -j DROP
>
>
>
>

Hi Eric,

For some reason, the dropping in the raw table does not work for me
for the usecase, though I recognize that the raw table operations
theory, when matched with my usecase theory, is the apparent solution.

I think the reason is that I use packet sockets with defrag option on
so that it can select the right queue for load balancing purposes.

Anyway, not disappointed with the above, I stuck to my theory and
tried a simple approach. To tie-break the reassembly/defrag done by
the kernel from the packets from the eth0 and the packets submitted
from tap (via application), I made a small change in the application.
I detected that the packets are fragmented in the app, and bumped up
the 'Identification' field in the IP header and re-checksummed the IP
header and then submitted it to tap. Since reassembly/defrag is done
on the basis of srcip, destip, protocol and Identification field
tupple from IP header, I expected it to work and it does !

So there we are, I have a nice little solution in place which suits me.


Regards
-Prashant
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-08-19 Thread Eric Dumazet
On Wed, 2015-08-19 at 15:44 +0530, Prashant Upadhyaya wrote:


> Hi Eric,
> 
> For some reason, the dropping in the raw table does not work for me
> for the usecase, though I recognize that the raw table operations
> theory, when matched with my usecase theory, is the apparent solution.
> 
> I think the reason is that I use packet sockets with defrag option on
> so that it can select the right queue for load balancing purposes.
> 
> Anyway, not disappointed with the above, I stuck to my theory and
> tried a simple approach. To tie-break the reassembly/defrag done by
> the kernel from the packets from the eth0 and the packets submitted
> from tap (via application), I made a small change in the application.
> I detected that the packets are fragmented in the app, and bumped up
> the 'Identification' field in the IP header and re-checksummed the IP
> header and then submitted it to tap. Since reassembly/defrag is done
> on the basis of srcip, destip, protocol and Identification field
> tupple from IP header, I expected it to work and it does !
> 
> So there we are, I have a nice little solution in place which suits me.

Another idea would have to put your tap device and ethernet device in
different namespaces, as the defrag unit is namespace aware.

Looks like eth0 could be put in a completely new namespace as it holds
no IP address ?

ip netns add eth0ns
ip link set eth0 netns eth0ns


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-07-31 Thread Eric Dumazet
On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote:

> The delays work for me but is clearly not good for the performance of
> the slow path. And more importantly, I was looking for a fundamental
> reason regarding why it works with delays and why not without it. The
> issue is reproducible with a big ping (3.11.10-301.fc20.x86_64)

How big ping needs to be to reproduce the problem ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with fragmented packets on tun/tap interface

2015-07-31 Thread Eric Dumazet
On Fri, 2015-07-31 at 16:42 +0530, Prashant Upadhyaya wrote:
> On Fri, Jul 31, 2015 at 1:26 PM, Eric Dumazet  wrote:
> > On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote:
> >
> >> The delays work for me but is clearly not good for the performance of
> >> the slow path. And more importantly, I was looking for a fundamental
> >> reason regarding why it works with delays and why not without it. The
> >> issue is reproducible with a big ping (3.11.10-301.fc20.x86_64)
> >
> > How big ping needs to be to reproduce the problem ?
> >
> >
> 
> If the MTU is 1500, I start getting problems anywhere starting from
> 2900 bytes and surely comes when further big pings are used eg. 10 K.
> (ping  -s  eg. ping 10.3.10.244 -s 1)
> And the big pings do work, as I said, with the delay hack.

It might help trying this while you receive such frags :

perf record -a -g skb:kfree_skb sleep 10

...

perf report



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html