Problem with fragmented packets on tun/tap interface
Hi, I am having a linux user space application which gets data from an interface (eg. eth0) using a packet socket. The application has a fast path and a slow path. In the fast path the packets are processed by the application and sent out via the packet socket. Certain packets need processing by linux IP stack -- this constitutes the slow path. I use a tun/tap interface to inject the packet into the kernel in case it deserves slow path processing. When the kernel responds, I read the packets back from the tap and send it out via the packet socket. I use iptable rules to drop the packets at the entry from the interface so that they are not processed by kernel directly (because I read them via packet socket and then inject into the kernel via the tap interface) iptables -A INPUT -i eth0 -j DROP iptables -A FORWARD -i eth0 -j DROP Now the above mechanism works very well for me except when the slow path encounters fragmented IP packets. When I inject the fragmented IP packets into the kernel via the tap interface, the kernel does not.respond (eg. for a ping bigger than mtu size) I have checked with tcpdump on tap that I have injected all the fragments into the kernel. Strangely enough, the same usecase works if I put the delays (usleep) at two places in my application -- 1. Just before writing the packet to tap (injection into the kernel) 2. Just after reading the kernel response from the tap and just before sending the packet out using the packet socket. The delays work for me but is clearly not good for the performance of the slow path. And more importantly, I was looking for a fundamental reason regarding why it works with delays and why not without it. The issue is reproducible with a big ping (3.11.10-301.fc20.x86_64) Regards -Prashant -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Fri, Jul 31, 2015 at 4:51 PM, Eric Dumazet wrote: > On Fri, 2015-07-31 at 16:42 +0530, Prashant Upadhyaya wrote: >> On Fri, Jul 31, 2015 at 1:26 PM, Eric Dumazet wrote: >> > On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote: >> > >> >> The delays work for me but is clearly not good for the performance of >> >> the slow path. And more importantly, I was looking for a fundamental >> >> reason regarding why it works with delays and why not without it. The >> >> issue is reproducible with a big ping (3.11.10-301.fc20.x86_64) >> > >> > How big ping needs to be to reproduce the problem ? >> > >> > >> >> If the MTU is 1500, I start getting problems anywhere starting from >> 2900 bytes and surely comes when further big pings are used eg. 10 K. >> (ping -s eg. ping 10.3.10.244 -s 1) >> And the big pings do work, as I said, with the delay hack. > > It might help trying this while you receive such frags : > > perf record -a -g skb:kfree_skb sleep 10 > > ... > > perf report > > > Hi, I think I have a clue to the root cause of my issue, but I do not know a solution. Let me describe what I think is the problem. Fragmented packets enter into the kernel through eth0 and the kernel starts assembling them. Simultaneously, my packet socket implementation also injects the very same packets into the kernel via the tap. The kernel sees them as overlapped packets during assembly and drops the packets injected via the tap. Eventually when the assembly gets complete inside kernel for all the packets which entered via eth0, the whole packet gets dropped due to the iptables rules that I have set on eth0. So naturally there is no response to the bigger ping, because everything got dropped one way or the other. When I do introduce the delays (and it turns out that the delay that matters is when injecting via tap), the kernel has already completed the assembly of the packets via eth0 (during the delay I introduce for submission on tap), and then the submission via tap works well because it undergoes a fresh assembly (and ofcourse it does not get dropped because iptables drop rule is only on eth0) Now then, the question is -- how do I prevent the kernel from trying to assemble the packets arriving on eth0 and drop them right away even before assembly is attempted. This way the same packet injected via the tap would be the only one undergoing assembly and hopefully it would work. Regards -Prashant -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Thu, 2015-08-13 at 12:52 +0530, Prashant Upadhyaya wrote: > > Hi, > > I think I have a clue to the root cause of my issue, but I do not know > a solution. > Let me describe what I think is the problem. > > Fragmented packets enter into the kernel through eth0 and the kernel > starts assembling them. > Simultaneously, my packet socket implementation also injects the very > same packets into the kernel via the tap. The kernel sees them as > overlapped packets during assembly and drops the packets injected via > the tap. > Eventually when the assembly gets complete inside kernel for all the > packets which entered via eth0, the whole packet gets dropped due to > the iptables rules that I have set on eth0. > So naturally there is no response to the bigger ping, because > everything got dropped one way or the other. > > When I do introduce the delays (and it turns out that the delay that > matters is when injecting via tap), the kernel has already completed > the assembly of the packets via eth0 (during the delay I introduce for > submission on tap), and then the submission via tap works well because > it undergoes a fresh assembly (and ofcourse it does not get dropped > because iptables drop rule is only on eth0) > > Now then, the question is -- how do I prevent the kernel from trying > to assemble the packets arriving on eth0 and drop them right away even > before assembly is attempted. This way the same packet injected via > the tap would be the only one undergoing assembly and hopefully it > would work. > Nice theory ! What kind of iptables rule do you have to drop packets coming on eth0 ? Have you tried to install this rule in raw table, PREROUTING hook ? This should work, because the defrag is attempted from ip_local_deliver() [ after raw table has given its verdict] , not from ip_rcv(). iptables -t raw -I PREROUTING -i eth0 -j DROP -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Thu, Aug 13, 2015 at 7:48 PM, Eric Dumazet wrote: > On Thu, 2015-08-13 at 12:52 +0530, Prashant Upadhyaya wrote: > >> >> Hi, >> >> I think I have a clue to the root cause of my issue, but I do not know >> a solution. >> Let me describe what I think is the problem. >> >> Fragmented packets enter into the kernel through eth0 and the kernel >> starts assembling them. >> Simultaneously, my packet socket implementation also injects the very >> same packets into the kernel via the tap. The kernel sees them as >> overlapped packets during assembly and drops the packets injected via >> the tap. >> Eventually when the assembly gets complete inside kernel for all the >> packets which entered via eth0, the whole packet gets dropped due to >> the iptables rules that I have set on eth0. >> So naturally there is no response to the bigger ping, because >> everything got dropped one way or the other. >> >> When I do introduce the delays (and it turns out that the delay that >> matters is when injecting via tap), the kernel has already completed >> the assembly of the packets via eth0 (during the delay I introduce for >> submission on tap), and then the submission via tap works well because >> it undergoes a fresh assembly (and ofcourse it does not get dropped >> because iptables drop rule is only on eth0) >> >> Now then, the question is -- how do I prevent the kernel from trying >> to assemble the packets arriving on eth0 and drop them right away even >> before assembly is attempted. This way the same packet injected via >> the tap would be the only one undergoing assembly and hopefully it >> would work. >> > > Nice theory ! > > What kind of iptables rule do you have to drop packets coming on eth0 ? > > Have you tried to install this rule in raw table, PREROUTING hook ? > > This should work, because the defrag is attempted from > ip_local_deliver() [ after raw table has given its verdict] , not from > ip_rcv(). > > iptables -t raw -I PREROUTING -i eth0 -j DROP > > > > Hi Eric, For some reason, the dropping in the raw table does not work for me for the usecase, though I recognize that the raw table operations theory, when matched with my usecase theory, is the apparent solution. I think the reason is that I use packet sockets with defrag option on so that it can select the right queue for load balancing purposes. Anyway, not disappointed with the above, I stuck to my theory and tried a simple approach. To tie-break the reassembly/defrag done by the kernel from the packets from the eth0 and the packets submitted from tap (via application), I made a small change in the application. I detected that the packets are fragmented in the app, and bumped up the 'Identification' field in the IP header and re-checksummed the IP header and then submitted it to tap. Since reassembly/defrag is done on the basis of srcip, destip, protocol and Identification field tupple from IP header, I expected it to work and it does ! So there we are, I have a nice little solution in place which suits me. Regards -Prashant -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Wed, 2015-08-19 at 15:44 +0530, Prashant Upadhyaya wrote: > Hi Eric, > > For some reason, the dropping in the raw table does not work for me > for the usecase, though I recognize that the raw table operations > theory, when matched with my usecase theory, is the apparent solution. > > I think the reason is that I use packet sockets with defrag option on > so that it can select the right queue for load balancing purposes. > > Anyway, not disappointed with the above, I stuck to my theory and > tried a simple approach. To tie-break the reassembly/defrag done by > the kernel from the packets from the eth0 and the packets submitted > from tap (via application), I made a small change in the application. > I detected that the packets are fragmented in the app, and bumped up > the 'Identification' field in the IP header and re-checksummed the IP > header and then submitted it to tap. Since reassembly/defrag is done > on the basis of srcip, destip, protocol and Identification field > tupple from IP header, I expected it to work and it does ! > > So there we are, I have a nice little solution in place which suits me. Another idea would have to put your tap device and ethernet device in different namespaces, as the defrag unit is namespace aware. Looks like eth0 could be put in a completely new namespace as it holds no IP address ? ip netns add eth0ns ip link set eth0 netns eth0ns -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote: > The delays work for me but is clearly not good for the performance of > the slow path. And more importantly, I was looking for a fundamental > reason regarding why it works with delays and why not without it. The > issue is reproducible with a big ping (3.11.10-301.fc20.x86_64) How big ping needs to be to reproduce the problem ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Fri, 2015-07-31 at 16:42 +0530, Prashant Upadhyaya wrote: > On Fri, Jul 31, 2015 at 1:26 PM, Eric Dumazet wrote: > > On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote: > > > >> The delays work for me but is clearly not good for the performance of > >> the slow path. And more importantly, I was looking for a fundamental > >> reason regarding why it works with delays and why not without it. The > >> issue is reproducible with a big ping (3.11.10-301.fc20.x86_64) > > > > How big ping needs to be to reproduce the problem ? > > > > > > If the MTU is 1500, I start getting problems anywhere starting from > 2900 bytes and surely comes when further big pings are used eg. 10 K. > (ping -s eg. ping 10.3.10.244 -s 1) > And the big pings do work, as I said, with the delay hack. It might help trying this while you receive such frags : perf record -a -g skb:kfree_skb sleep 10 ... perf report -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html