Hi Rafal, On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: > Getting better network performance (mostly for NAT) using some kind of > acceleration was always a hot topic and people are still > looking/asking for it. I'd like to write a short summary and share my > understanding of current state so that: > 1) People can undesrtand it better > 2) We can have some rough plan > > First of all there are two possible ways of accelerating network > traffic: in software and in hardware. Software solution is independent > of architecture/device and is mostly just bypassing in-kernel packets > flow. It still uses device's CPU which can be a bottleneck. Various > software implementations are reported to be faster from 2x to 5x.
This is what I've been observing for the software acceleration here, see slide 19 at: https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-updates-canada-2017.pdf The flowtable representation, in software, is providing a faster forwarding path between two nics. So it's basically an alternative to the classic forwarding path, that is faster. Packets kick in at the Netfilter ingress hook (right at the same location as 'tc' ingress), if there is a hit in the software flowtable, ttl gets decremented, NATs are done and the packet is placed in the destination NIC via neigh_xmit() - through the neighbour layer. > Hardware acceleration requires hw-specific implementation and can > offload device's CPU. > > Of course handling network traffic out of the networking subsystem > means some features like QoS / throughput limits / advanced firewall > rules may not/won't work. > > The hardest task (for both methods) was always a Linux kernel > integration. Drivers had to somehow: > 1) Get/build a table with rules for packets flow > 2) Update in-kernel state to e.g. avoid connection timeout & its removal > > The problem with all existing implementations was they used various > non-upstream patches for kernel integration. Some were less invasive, > some a bit more. They weren't properly reviewed by kernel developers > and usually were using hacks/solutions that couldn't be accepted. > > The rescue to this was Pablo's work on offloading infrastructure. He > worked on this hard by developing & sending his patchset for upstream > kernel: > [1] [PATCH RFC,WIP 0/5] Flow offload infrastructure > [2] [PATCH nf-next RFC,v2 0/6] Flow offload infrastructure > [3] [PATCH nf-next,v3 0/7] Flow offload infrastructure > > The best news is that his final patchset version was accepted and sits > now in the net-next [4] (and should become part of kernel 4.16). > > Now, what does it mean for LEDE project: > 1) There is upstream infrastructure that should be ready to use > 2) It's based on & requires nftables > 3) LEDE's firewall3 uses iptables (& friends) C API > 4) There aren't any drivers for offloading hardware (switches?) yet Yes, there is no drivers using the hardware offload infrastructure. So the patch to add the ndo_flow_offload hook to struct net_device has been kept back by now [1] until there is an initial driver client for this. I'll be sending a new version for [1] asap. Will push it to a branch in my nf-next.git tree [2] and will rebase it on top of my master so people developing a driver that uses this doesn't need to deal with this extra work. [1] http://patchwork.ozlabs.org/patch/852537/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git > One thing I'm not sure about is if software accelerator is ready or not. > Pablo is his e-mail wrote: > > So far, this is a generic software flow table representation, that > > matches basic flow table hardware semantics but that also provides a > > software faster path. So you can use it to purely forward packets > > between two nics even if they come with no hardware offload support. > > which could suggest software path is already there. Yes, software acceleration is working in my testbed, other than that, it's a bug that needs to be fixed ;-). I'm still finishing the userspace bits for libnftnl and nft, to provide the control plane to users to configure this. Will post this patchset asap, so these userspace bits can follow their path to upstream repositories. > So there is my idea of what is needed by LEDE to get it working: > 1) Rewrite firewall3 to use nftables There's a tentative C API for nftables: http://git.netfilter.org/nftables/tree/include/nftables/nftables.h http://git.netfilter.org/nftables/tree/src/libnftables.c There are plans to add an API to support batching too, ie. add several rules into the kernel in one go - using the nftables transaction infrastructure- this is almost done since it was part of original work done by Eric Leblond. I can see firewall3 builds strings that are passed to iptables/ipset, this approach matches the existing C API that we're providing. At this stage the high-level libnftables library is not yet exposed as shared object, there is a static library under src/.libs/libnftables.a, but we decided to keep it back for the 0.8.1 which only included fixes. But the plan is to fully expose this API in the next release. BTW, there is also the "iptables-compat" infrastructure that allows you to load iptables commands using the nftables engine. iptables-compat takes the same syntax as iptables. When listing your ruleset via 'nft list ruleset', you will get a translation of your iptables-compat rule to nft syntax. There's also the 'iptables-translate' tool, which provides translations from iptables to nftables. > 2) Switch to kernel 4.16 or backport offloading to 4.14 > 3) Work on implementing/enabling software acceleration path As soon as I post the userspace bits, you can start testing this. > Let me know if above description makes sense to you or correct me if > you think I misunderstood something :) LGTM, thanks ! > [1] https://www.spinics.net/lists/netfilter-devel/msg50141.html > [2] https://www.spinics.net/lists/netfilter-devel/msg50555.html > [3] https://www.spinics.net/lists/netfilter-devel/msg50759.html > [4] https://www.spinics.net/lists/netfilter-devel/msg50973.html _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev