Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Sorry to keep following up with this; the other thing it gives you is things like sysctl parameters, kernel, tcp window scaling (pre and post test) and a bunch of per stream and aggregated metadata relating to the entire suite. In a nice self contained gzip that can produce lovely graphs using matplotlib. Basically a repeatable standardized test with all the things you might be interested in captured for distribution/reference. flent-gui provides a nice interactive graphical interface (but you can just as easily using the cli) for interacting with the datasets. -Joel On 30 January 2018 at 10:52, Joel Wirāmu Pauling wrote: > In terms of what you need on the target netserver/netperf from ipkg is > tiny and is all you need. > > On 30 January 2018 at 10:51, Joel Wirāmu Pauling wrote: >> FLENT + RRUL testing is 4 up 4 down TCP streams with 4 different QoS >> Markings, and then 4 different QoS Marked UDP probes and ICMP. >> >> It gives you a measure of how much the CPU and Network path can cope >> with load conditions, which are more realistic for everyday use. >> >> iperf3 isn't going to give you any measure of that. >> >> On 30 January 2018 at 10:48, Karl Palsson wrote: >>> >>> Joel Wirāmu Pauling wrote: Any chance I can convince you to use netperf + FLENT for doing your tests rather than iperf(3)? flent.org >>> >>> For those playing at home, could you elaborate on _why_? What do >>> you expect to change? By what sort of percentage? >>> >>> Sincerely, >>> Karl Palsson ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
In terms of what you need on the target netserver/netperf from ipkg is tiny and is all you need. On 30 January 2018 at 10:51, Joel Wirāmu Pauling wrote: > FLENT + RRUL testing is 4 up 4 down TCP streams with 4 different QoS > Markings, and then 4 different QoS Marked UDP probes and ICMP. > > It gives you a measure of how much the CPU and Network path can cope > with load conditions, which are more realistic for everyday use. > > iperf3 isn't going to give you any measure of that. > > On 30 January 2018 at 10:48, Karl Palsson wrote: >> >> Joel Wirāmu Pauling wrote: >>> Any chance I can convince you to use netperf + FLENT for doing >>> your tests rather than iperf(3)? >>> >>> flent.org >>> >> >> For those playing at home, could you elaborate on _why_? What do >> you expect to change? By what sort of percentage? >> >> Sincerely, >> Karl Palsson ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
FLENT + RRUL testing is 4 up 4 down TCP streams with 4 different QoS Markings, and then 4 different QoS Marked UDP probes and ICMP. It gives you a measure of how much the CPU and Network path can cope with load conditions, which are more realistic for everyday use. iperf3 isn't going to give you any measure of that. On 30 January 2018 at 10:48, Karl Palsson wrote: > > Joel Wirāmu Pauling wrote: >> Any chance I can convince you to use netperf + FLENT for doing >> your tests rather than iperf(3)? >> >> flent.org >> > > For those playing at home, could you elaborate on _why_? What do > you expect to change? By what sort of percentage? > > Sincerely, > Karl Palsson ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Joel Wirāmu Pauling wrote: > Any chance I can convince you to use netperf + FLENT for doing > your tests rather than iperf(3)? > > flent.org > For those playing at home, could you elaborate on _why_? What do you expect to change? By what sort of percentage? Sincerely, Karl Palsson signature.html Description: OpenPGP Digital Signature ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Any chance I can convince you to use netperf + FLENT for doing your tests rather than iperf(3)? flent.org -Joel On 30 January 2018 at 03:12, Michael Richardson wrote: > > Laurent GUERBY wrote: > >> So that means that you have to do the performance testing for routing > >> between two subnets. > > > Hi, > > > With wired, firewall off and using routing (no MASQUERADE, explicit LAN > > route added on the NUC via WAN IP): > > Thanks for doing this again. > This is Openwrt/LEDE, or the stock firmware? > I think Openwrt. > > > - IPv4 590+ Mbit/s up and 690+ Mbit/s down > > - IPv6 270+ Mbit/s same both ways > > > So without NAT/conntrack we gain about 50% on IPv4 and we're closer to > > line rate. > > I wonder why the asymmetry. > > > For the record I tested without the router to check iperf3 my setup and > > IPv4 and IPv6 are 910+ Mbit/s both ways. > > > Sincerely, > > > Laurent > > > PS: kernel is 4.9.77 on the archer (not 4.4, thinko in my first mail) > > NUC and laptop are running 4.9 debian stretch too. > > > -- > ] Never tell me the odds! | ipv6 mesh networks [ > ] Michael Richardson, Sandelman Software Works| network architect [ > ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails > [ > > > ___ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev > ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Laurent GUERBY wrote: >> So that means that you have to do the performance testing for routing >> between two subnets. > Hi, > With wired, firewall off and using routing (no MASQUERADE, explicit LAN > route added on the NUC via WAN IP): Thanks for doing this again. This is Openwrt/LEDE, or the stock firmware? I think Openwrt. > - IPv4 590+ Mbit/s up and 690+ Mbit/s down > - IPv6 270+ Mbit/s same both ways > So without NAT/conntrack we gain about 50% on IPv4 and we're closer to > line rate. I wonder why the asymmetry. > For the record I tested without the router to check iperf3 my setup and > IPv4 and IPv6 are 910+ Mbit/s both ways. > Sincerely, > Laurent > PS: kernel is 4.9.77 on the archer (not 4.4, thinko in my first mail) > NUC and laptop are running 4.9 debian stretch too. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works| network architect [ ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails[ signature.asc Description: PGP signature ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
On Sun, 2018-01-28 at 19:12 -0500, Michael Richardson wrote: > Laurent GUERBY wrote: > > On Sun, 2018-01-28 at 17:09 -0500, Michael Richardson wrote: > >> Laurent GUERBY wrote: > >> > I tested today a few things on a brand new TP-Link > Archer C7 > >> v4.0, > >> > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / > 8275) > >> > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 > ms > >> latency > >> > (everything on the same table), IPv4 unless specified, > >> > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). > >> > >> > With the TP-Link firmware: > >> > - wired 930+ Mbit/s both ways > >> > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up > >> > - wireless 2.4G 100+ Mbit/s both ways > >> > >> > With OpenWRT/LEDE trunk 20180128 4.4 kernel: > >> > - wired 350-400 Mbit/s both ways > >> > - wired with firewall deactivated 550 Mbit/s > >> > (just "iptables -t nat -A POSTROUTING -j MASQUERADE") > >> > >> That still means you have conn-tracking loaded. > >> Have you tried without that? > > > What should I do to enable NAT without conn-tracking? > > (I see a few nf_conntrack* modules in lsmod) > > Unfortunately, you don't. > > It's also hard to get rid of the conntrack modules, other than > clearing > everything and then rmmod'ing them. Sometimes I've had to rename the > .ko > files and reboot to get rid of them. > > So that means that you have to do the performance testing for routing > between two subnets. Hi, With wired, firewall off and using routing (no MASQUERADE, explicit LAN route added on the NUC via WAN IP): - IPv4 590+ Mbit/s up and 690+ Mbit/s down - IPv6 270+ Mbit/s same both ways So without NAT/conntrack we gain about 50% on IPv4 and we're closer to line rate. For the record I tested without the router to check iperf3 my setup and IPv4 and IPv6 are 910+ Mbit/s both ways. Sincerely, Laurent PS: kernel is 4.9.77 on the archer (not 4.4, thinko in my first mail) NUC and laptop are running 4.9 debian stretch too. ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
On Sun, Jan 28, 2018 at 3:43 PM, Florian Fainelli wrote: > (please don't top post). > > On 01/28/2018 02:00 PM, Rosen Penev wrote: >> Compared to the Archer C7v2, the v4 has a single ethernet interface >> switched between all 5 ports. The v2 has two ethernet interfaces with >> 4 ports being switched. >> >> Now the disappointing performance has several reasons to it. The main >> one being that the ag71xx driver in OpenWrt is not very optimized for >> the hardware. > > The driver certainly contributes to that, but I don't think it is the > main reason behind it. Each time you send or receive a packet, you need > to invalidate your data cache for at least 1500 bytes, or whatever the > nomimal packet/buffer size has been allocated (e.g: 2KB), with very > small I and D caches (typically 64KB) and no L2 cache, you do this > trashing very frequently and you keep hitting the DRAM as well, this > hurts performance a lot. This is just something the networking stack > does, and it is really to diverge from this because that is inherently > how it is designed, and how drivers are designed as well. This is why > software bypass in hardware are so effective for low power CPUs. > Good point. Even with Qualcomm's solution to this (FastPath) gets good results. Some of the cache stuff I as well as Felix have been backporting to get less thrashing. > I would be curious to see the use of XDP redirect and implementing a > software NAT fast path, that is, for the most basic NATP translation, do > this in XDP as early as possible in the driver receive/transmit part and > send directly to the outgoing interface, this should lower the pressure > on the I and D caches by invalidating not the full packet length, but > just the header portion. For more complex protocols, we would keep using > the conntrack helpers to do the necessary operation (FTP, TFTP, SIP, > etc..) on the packet. This might avoid doing a sk_buff allocation for > each packet making it through, which is expensive. > Unfortunately, someone needs to get this done. Not I. >> >> Qualcomm forked the driver (in 2013 i think) and added some really >> nice features. Some of these need to be backported for ag71xx in >> OpenWrt to be competitive. > > Is it possible to just drop their driver in OpenWrt and get a feeling of > the performance gap? > I've tried. Qualcomm's driver(as well as their ar71xx platform) has some devicetree bindings that I have not been able to make sense of. To make matters worse, I can't find a git history for the driver except for an old one from 2013. They've also removed the driver from the usual location that I found at: https://portland.source.codeaurora.org/quic/qrdk/oss/kernel/linux-msm/ Maybe it's just hiding... The driver cherry picked from last year is available here: https://github.com/neheb/source/tree/qca-ag71xx >> >> It's going to take quite a bit of work to get the driver up to par. >> Biggest performance boost I imagine would be to add GRO support. It >> turns out for good routing performance, GRO requires hardware >> checksumming, which is not supported by ag71xx in OpenWrt at the >> moment. > > Does the hardware actually support checksum offloads? > As far as I know, hardware with QCA in the title has support . The ag71xx in OpenWrt has support for a lot more platforms that do not support offloads. >> >> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling >> wrote: >>> Hi as I also am using the archer c7's as my build targets (and c2600's) I >>> am watching this keenly; is anyone else running openvswtich on these with >>> the XDP patches? >>> >>> The c2600 which is arm a15 - currently really could do with optimization >>> and probably is a much better choice for CPE. I would not be caught dead >>> with the c7 as a 10Gbit CPE myself >>> the SoC even with the Openfast path patches just can't handle complex QoS >>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit. >>> >>> >>> >>> -Joel >>> --- >>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1 >>> >>> On 29 January 2018 at 09:43, Laurent GUERBY wrote: On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: > Hi Rafal, > > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: >> Getting better network performance (mostly for NAT) using some kind >> of >> acceleration was always a hot topic and people are still >> looking/asking for it. I'd like to write a short summary and share >> my >> understanding of current state so that: >> 1) People can undesrtand it better >> 2) We can have some rough plan >> >> First of all there are two possible ways of accelerating network >> traffic: in software and in hardware. Software solution is >> independent >> of architecture/device and is mostly just bypassing in-kernel >> packets >> flow. It still uses device's CPU which can be a bottleneck. Various >> software implementations are reported to be faster from 2x to 5x. > > This i
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Laurent GUERBY wrote: > On Sun, 2018-01-28 at 17:09 -0500, Michael Richardson wrote: >> Laurent GUERBY wrote: >> > I tested today a few things on a brand new TP-Link Archer C7 >> v4.0, >> > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) >> > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms >> latency >> > (everything on the same table), IPv4 unless specified, >> > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). >> >> > With the TP-Link firmware: >> > - wired 930+ Mbit/s both ways >> > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up >> > - wireless 2.4G 100+ Mbit/s both ways >> >> > With OpenWRT/LEDE trunk 20180128 4.4 kernel: >> > - wired 350-400 Mbit/s both ways >> > - wired with firewall deactivated 550 Mbit/s >> > (just "iptables -t nat -A POSTROUTING -j MASQUERADE") >> >> That still means you have conn-tracking loaded. >> Have you tried without that? > What should I do to enable NAT without conn-tracking? > (I see a few nf_conntrack* modules in lsmod) Unfortunately, you don't. It's also hard to get rid of the conntrack modules, other than clearing everything and then rmmod'ing them. Sometimes I've had to rename the .ko files and reboot to get rid of them. So that means that you have to do the performance testing for routing between two subnets. >> > - wired IPv6 routing, no NAT, no firewall 250 Mbit/s >> > - wireless 5G 150-200 Mbit/s >> > - wireless 2.4G forgot to test >> >> Does the TP-Link firmware support any IPv6? >> You could report 0Mb/s for IPv6 :-) > TP-Link has now added full IPv6 support AFAIK. I will > test it and report when I get my hand on another spare. Thanks! >> > IPv6 performance without NAT being below IPv4 with NAT seems >> > to indicate there are potential gains in software :). >> >> Depends upon whether there is hardware support for NAT, >> which many devices have, wrapped up under NDAs. > I don't think OpenWRT has support for NAT accelerators > at this point, IPv4 and IPv6 are both done in software. Yes, that's the case, because the details have been wrapped in NDAs. I see that Qualcomm has released something, so that's exciting. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works| network architect [ ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails[ signature.asc Description: PGP signature ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
FYI - the Openfast path patches are applied to several trees. I am running them on a c7 v2 right now and am able to hit close to stock numbers. The NAT acceleration stuff isn't needed to with open-fastpath patches at all. relevant thread: https://forum.lede-project.org/t/qualcomm-fast-path-for-lede/4582 -Joel On 29 January 2018 at 12:43, Florian Fainelli wrote: > (please don't top post). > > On 01/28/2018 02:00 PM, Rosen Penev wrote: >> Compared to the Archer C7v2, the v4 has a single ethernet interface >> switched between all 5 ports. The v2 has two ethernet interfaces with >> 4 ports being switched. >> >> Now the disappointing performance has several reasons to it. The main >> one being that the ag71xx driver in OpenWrt is not very optimized for >> the hardware. > > The driver certainly contributes to that, but I don't think it is the > main reason behind it. Each time you send or receive a packet, you need > to invalidate your data cache for at least 1500 bytes, or whatever the > nomimal packet/buffer size has been allocated (e.g: 2KB), with very > small I and D caches (typically 64KB) and no L2 cache, you do this > trashing very frequently and you keep hitting the DRAM as well, this > hurts performance a lot. This is just something the networking stack > does, and it is really to diverge from this because that is inherently > how it is designed, and how drivers are designed as well. This is why > software bypass in hardware are so effective for low power CPUs. > > I would be curious to see the use of XDP redirect and implementing a > software NAT fast path, that is, for the most basic NATP translation, do > this in XDP as early as possible in the driver receive/transmit part and > send directly to the outgoing interface, this should lower the pressure > on the I and D caches by invalidating not the full packet length, but > just the header portion. For more complex protocols, we would keep using > the conntrack helpers to do the necessary operation (FTP, TFTP, SIP, > etc..) on the packet. This might avoid doing a sk_buff allocation for > each packet making it through, which is expensive. > >> >> Qualcomm forked the driver (in 2013 i think) and added some really >> nice features. Some of these need to be backported for ag71xx in >> OpenWrt to be competitive. > > Is it possible to just drop their driver in OpenWrt and get a feeling of > the performance gap? > >> >> It's going to take quite a bit of work to get the driver up to par. >> Biggest performance boost I imagine would be to add GRO support. It >> turns out for good routing performance, GRO requires hardware >> checksumming, which is not supported by ag71xx in OpenWrt at the >> moment. > > Does the hardware actually support checksum offloads? > >> >> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling >> wrote: >>> Hi as I also am using the archer c7's as my build targets (and c2600's) I >>> am watching this keenly; is anyone else running openvswtich on these with >>> the XDP patches? >>> >>> The c2600 which is arm a15 - currently really could do with optimization >>> and probably is a much better choice for CPE. I would not be caught dead >>> with the c7 as a 10Gbit CPE myself >>> the SoC even with the Openfast path patches just can't handle complex QoS >>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit. >>> >>> >>> >>> -Joel >>> --- >>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1 >>> >>> On 29 January 2018 at 09:43, Laurent GUERBY wrote: On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: > Hi Rafal, > > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: >> Getting better network performance (mostly for NAT) using some kind >> of >> acceleration was always a hot topic and people are still >> looking/asking for it. I'd like to write a short summary and share >> my >> understanding of current state so that: >> 1) People can undesrtand it better >> 2) We can have some rough plan >> >> First of all there are two possible ways of accelerating network >> traffic: in software and in hardware. Software solution is >> independent >> of architecture/device and is mostly just bypassing in-kernel >> packets >> flow. It still uses device's CPU which can be a bottleneck. Various >> software implementations are reported to be faster from 2x to 5x. > > This is what I've been observing for the software acceleration here, > see slide 19 at: > > https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd > ates-canada-2017.pdf > > The flowtable representation, in software, is providing a faster > forwarding path between two nics. So it's basically an alternative to > the classic forwarding path, that is faster. Packets kick in at the > Netfilter ingress hook (right at the same location as 'tc' ingress), > if there is a hit in the software flowtable, ttl gets decremented, > N
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
(please don't top post). On 01/28/2018 02:00 PM, Rosen Penev wrote: > Compared to the Archer C7v2, the v4 has a single ethernet interface > switched between all 5 ports. The v2 has two ethernet interfaces with > 4 ports being switched. > > Now the disappointing performance has several reasons to it. The main > one being that the ag71xx driver in OpenWrt is not very optimized for > the hardware. The driver certainly contributes to that, but I don't think it is the main reason behind it. Each time you send or receive a packet, you need to invalidate your data cache for at least 1500 bytes, or whatever the nomimal packet/buffer size has been allocated (e.g: 2KB), with very small I and D caches (typically 64KB) and no L2 cache, you do this trashing very frequently and you keep hitting the DRAM as well, this hurts performance a lot. This is just something the networking stack does, and it is really to diverge from this because that is inherently how it is designed, and how drivers are designed as well. This is why software bypass in hardware are so effective for low power CPUs. I would be curious to see the use of XDP redirect and implementing a software NAT fast path, that is, for the most basic NATP translation, do this in XDP as early as possible in the driver receive/transmit part and send directly to the outgoing interface, this should lower the pressure on the I and D caches by invalidating not the full packet length, but just the header portion. For more complex protocols, we would keep using the conntrack helpers to do the necessary operation (FTP, TFTP, SIP, etc..) on the packet. This might avoid doing a sk_buff allocation for each packet making it through, which is expensive. > > Qualcomm forked the driver (in 2013 i think) and added some really > nice features. Some of these need to be backported for ag71xx in > OpenWrt to be competitive. Is it possible to just drop their driver in OpenWrt and get a feeling of the performance gap? > > It's going to take quite a bit of work to get the driver up to par. > Biggest performance boost I imagine would be to add GRO support. It > turns out for good routing performance, GRO requires hardware > checksumming, which is not supported by ag71xx in OpenWrt at the > moment. Does the hardware actually support checksum offloads? > > On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling > wrote: >> Hi as I also am using the archer c7's as my build targets (and c2600's) I >> am watching this keenly; is anyone else running openvswtich on these with >> the XDP patches? >> >> The c2600 which is arm a15 - currently really could do with optimization >> and probably is a much better choice for CPE. I would not be caught dead >> with the c7 as a 10Gbit CPE myself >> the SoC even with the Openfast path patches just can't handle complex QoS >> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit. >> >> >> >> -Joel >> --- >> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1 >> >> On 29 January 2018 at 09:43, Laurent GUERBY wrote: >>> >>> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: Hi Rafal, On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: > Getting better network performance (mostly for NAT) using some kind > of > acceleration was always a hot topic and people are still > looking/asking for it. I'd like to write a short summary and share > my > understanding of current state so that: > 1) People can undesrtand it better > 2) We can have some rough plan > > First of all there are two possible ways of accelerating network > traffic: in software and in hardware. Software solution is > independent > of architecture/device and is mostly just bypassing in-kernel > packets > flow. It still uses device's CPU which can be a bottleneck. Various > software implementations are reported to be faster from 2x to 5x. This is what I've been observing for the software acceleration here, see slide 19 at: https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd ates-canada-2017.pdf The flowtable representation, in software, is providing a faster forwarding path between two nics. So it's basically an alternative to the classic forwarding path, that is faster. Packets kick in at the Netfilter ingress hook (right at the same location as 'tc' ingress), if there is a hit in the software flowtable, ttl gets decremented, NATs are done and the packet is placed in the destination NIC via neigh_xmit() - through the neighbour layer. >>> >>> Hi Pablo, >>> >>> I tested today a few things on a brand new TP-Link Archer C7 v4.0, >>> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) >>> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency >>> (everything on the same table), IPv4 unless specified, >>> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). >>> >>> With the TP-Link firmwa
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Hi Michael, On Sun, 2018-01-28 at 17:09 -0500, Michael Richardson wrote: > Laurent GUERBY wrote: > > I tested today a few things on a brand new TP-Link Archer C7 > v4.0, > > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) > > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms > latency > > (everything on the same table), IPv4 unless specified, > > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). > > > With the TP-Link firmware: > > - wired 930+ Mbit/s both ways > > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up > > - wireless 2.4G 100+ Mbit/s both ways > > > With OpenWRT/LEDE trunk 20180128 4.4 kernel: > > - wired 350-400 Mbit/s both ways > > - wired with firewall deactivated 550 Mbit/s > > (just "iptables -t nat -A POSTROUTING -j MASQUERADE") > > That still means you have conn-tracking loaded. > Have you tried without that? What should I do to enable NAT without conn-tracking? (I see a few nf_conntrack* modules in lsmod) > > - wired IPv6 routing, no NAT, no firewall 250 Mbit/s > > - wireless 5G 150-200 Mbit/s > > - wireless 2.4G forgot to test > > Does the TP-Link firmware support any IPv6? > You could report 0Mb/s for IPv6 :-) TP-Link has now added full IPv6 support AFAIK. I will test it and report when I get my hand on another spare. > > IPv6 performance without NAT being below IPv4 with NAT seems > > to indicate there are potential gains in software :). > > Depends upon whether there is hardware support for NAT, > which many devices have, wrapped up under NDAs. I don't think OpenWRT has support for NAT accelerators at this point, IPv4 and IPv6 are both done in software. Sincerely, Laurent > -- > ] Never tell me the odds! | ipv6 mesh > networks [ > ] Michael Richardson, Sandelman Software Works| network > architect [ > ] m...@sandelman.ca http://www.sandelman.ca/| ruby on > rails[ > > ___ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Laurent GUERBY wrote: > I tested today a few things on a brand new TP-Link Archer C7 v4.0, > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency > (everything on the same table), IPv4 unless specified, > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). > With the TP-Link firmware: > - wired 930+ Mbit/s both ways > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up > - wireless 2.4G 100+ Mbit/s both ways > With OpenWRT/LEDE trunk 20180128 4.4 kernel: > - wired 350-400 Mbit/s both ways > - wired with firewall deactivated 550 Mbit/s > (just "iptables -t nat -A POSTROUTING -j MASQUERADE") That still means you have conn-tracking loaded. Have you tried without that? > - wired IPv6 routing, no NAT, no firewall 250 Mbit/s > - wireless 5G 150-200 Mbit/s > - wireless 2.4G forgot to test Does the TP-Link firmware support any IPv6? You could report 0Mb/s for IPv6 :-) > IPv6 performance without NAT being below IPv4 with NAT seems > to indicate there are potential gains in software :). Depends upon whether there is hardware support for NAT, which many devices have, wrapped up under NDAs. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works| network architect [ ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails[ signature.asc Description: PGP signature ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Compared to the Archer C7v2, the v4 has a single ethernet interface switched between all 5 ports. The v2 has two ethernet interfaces with 4 ports being switched. Now the disappointing performance has several reasons to it. The main one being that the ag71xx driver in OpenWrt is not very optimized for the hardware. Qualcomm forked the driver (in 2013 i think) and added some really nice features. Some of these need to be backported for ag71xx in OpenWrt to be competitive. It's going to take quite a bit of work to get the driver up to par. Biggest performance boost I imagine would be to add GRO support. It turns out for good routing performance, GRO requires hardware checksumming, which is not supported by ag71xx in OpenWrt at the moment. On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling wrote: > Hi as I also am using the archer c7's as my build targets (and c2600's) I > am watching this keenly; is anyone else running openvswtich on these with > the XDP patches? > > The c2600 which is arm a15 - currently really could do with optimization > and probably is a much better choice for CPE. I would not be caught dead > with the c7 as a 10Gbit CPE myself > the SoC even with the Openfast path patches just can't handle complex QoS > scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit. > > > > -Joel > --- > https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1 > > On 29 January 2018 at 09:43, Laurent GUERBY wrote: >> >> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: >> > Hi Rafal, >> > >> > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: >> > > Getting better network performance (mostly for NAT) using some kind >> > > of >> > > acceleration was always a hot topic and people are still >> > > looking/asking for it. I'd like to write a short summary and share >> > > my >> > > understanding of current state so that: >> > > 1) People can undesrtand it better >> > > 2) We can have some rough plan >> > > >> > > First of all there are two possible ways of accelerating network >> > > traffic: in software and in hardware. Software solution is >> > > independent >> > > of architecture/device and is mostly just bypassing in-kernel >> > > packets >> > > flow. It still uses device's CPU which can be a bottleneck. Various >> > > software implementations are reported to be faster from 2x to 5x. >> > >> > This is what I've been observing for the software acceleration here, >> > see slide 19 at: >> > >> > https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd >> > ates-canada-2017.pdf >> > >> > The flowtable representation, in software, is providing a faster >> > forwarding path between two nics. So it's basically an alternative to >> > the classic forwarding path, that is faster. Packets kick in at the >> > Netfilter ingress hook (right at the same location as 'tc' ingress), >> > if there is a hit in the software flowtable, ttl gets decremented, >> > NATs are done and the packet is placed in the destination NIC via >> > neigh_xmit() - through the neighbour layer. >> >> Hi Pablo, >> >> I tested today a few things on a brand new TP-Link Archer C7 v4.0, >> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) >> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency >> (everything on the same table), IPv4 unless specified, >> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). >> >> With the TP-Link firmware: >> - wired 930+ Mbit/s both ways >> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up >> - wireless 2.4G 100+ Mbit/s both ways >> >> With OpenWRT/LEDE trunk 20180128 4.4 kernel: >> - wired 350-400 Mbit/s both ways >> - wired with firewall deactivated 550 Mbit/s >> (just "iptables -t nat -A POSTROUTING -j MASQUERADE") >> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s >> - wireless 5G 150-200 Mbit/s >> - wireless 2.4G forgot to test >> >> top on the router shows sirq at 90%+ during network load, other load >> indicators are under 5%. >> >> IPv6 performance without NAT being below IPv4 with NAT seems >> to indicate there are potential gains in software :). >> >> I didn't test OpenWRT in bridge mode but I got with LEDE 17.01 >> on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think >> radio is good on these ath10k routers. >> >> So if OpenWRT can do about x2 in software routing performance we're >> good against our TP-Link firmware friends :). >> >> tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in >> FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each >> with individual gigabit fiber uplink (TP-Link MC220L fiber converter), >> and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our >> members will get 10G on their PC at home :). >> >> We build our images from git source, generating imagebuilder and then a >> custom python script. We have 5+ spare C7, fast build (20mn from >> scratch) and testing environment, and of course we're interested in >> suggestions on what to do. >> >> Thanks in advance for your help
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
Hi as I also am using the archer c7's as my build targets (and c2600's) I am watching this keenly; is anyone else running openvswtich on these with the XDP patches? The c2600 which is arm a15 - currently really could do with optimization and probably is a much better choice for CPE. I would not be caught dead with the c7 as a 10Gbit CPE myself the SoC even with the Openfast path patches just can't handle complex QoS scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit. -Joel --- https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1 On 29 January 2018 at 09:43, Laurent GUERBY wrote: > > On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: > > Hi Rafal, > > > > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: > > > Getting better network performance (mostly for NAT) using some kind > > > of > > > acceleration was always a hot topic and people are still > > > looking/asking for it. I'd like to write a short summary and share > > > my > > > understanding of current state so that: > > > 1) People can undesrtand it better > > > 2) We can have some rough plan > > > > > > First of all there are two possible ways of accelerating network > > > traffic: in software and in hardware. Software solution is > > > independent > > > of architecture/device and is mostly just bypassing in-kernel > > > packets > > > flow. It still uses device's CPU which can be a bottleneck. Various > > > software implementations are reported to be faster from 2x to 5x. > > > > This is what I've been observing for the software acceleration here, > > see slide 19 at: > > > > https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd > > ates-canada-2017.pdf > > > > The flowtable representation, in software, is providing a faster > > forwarding path between two nics. So it's basically an alternative to > > the classic forwarding path, that is faster. Packets kick in at the > > Netfilter ingress hook (right at the same location as 'tc' ingress), > > if there is a hit in the software flowtable, ttl gets decremented, > > NATs are done and the packet is placed in the destination NIC via > > neigh_xmit() - through the neighbour layer. > > Hi Pablo, > > I tested today a few things on a brand new TP-Link Archer C7 v4.0, > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency > (everything on the same table), IPv4 unless specified, > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). > > With the TP-Link firmware: > - wired 930+ Mbit/s both ways > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up > - wireless 2.4G 100+ Mbit/s both ways > > With OpenWRT/LEDE trunk 20180128 4.4 kernel: > - wired 350-400 Mbit/s both ways > - wired with firewall deactivated 550 Mbit/s > (just "iptables -t nat -A POSTROUTING -j MASQUERADE") > - wired IPv6 routing, no NAT, no firewall 250 Mbit/s > - wireless 5G 150-200 Mbit/s > - wireless 2.4G forgot to test > > top on the router shows sirq at 90%+ during network load, other load > indicators are under 5%. > > IPv6 performance without NAT being below IPv4 with NAT seems > to indicate there are potential gains in software :). > > I didn't test OpenWRT in bridge mode but I got with LEDE 17.01 > on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think > radio is good on these ath10k routers. > > So if OpenWRT can do about x2 in software routing performance we're > good against our TP-Link firmware friends :). > > tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in > FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each > with individual gigabit fiber uplink (TP-Link MC220L fiber converter), > and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our > members will get 10G on their PC at home :). > > We build our images from git source, generating imagebuilder and then a > custom python script. We have 5+ spare C7, fast build (20mn from > scratch) and testing environment, and of course we're interested in > suggestions on what to do. > > Thanks in advance for your help, > > Sincerely, > > Laurent > http://tetaneutral.net > > > ___ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4
On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: > Hi Rafal, > > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: > > Getting better network performance (mostly for NAT) using some kind > > of > > acceleration was always a hot topic and people are still > > looking/asking for it. I'd like to write a short summary and share > > my > > understanding of current state so that: > > 1) People can undesrtand it better > > 2) We can have some rough plan > > > > First of all there are two possible ways of accelerating network > > traffic: in software and in hardware. Software solution is > > independent > > of architecture/device and is mostly just bypassing in-kernel > > packets > > flow. It still uses device's CPU which can be a bottleneck. Various > > software implementations are reported to be faster from 2x to 5x. > > This is what I've been observing for the software acceleration here, > see slide 19 at: > > https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd > ates-canada-2017.pdf > > The flowtable representation, in software, is providing a faster > forwarding path between two nics. So it's basically an alternative to > the classic forwarding path, that is faster. Packets kick in at the > Netfilter ingress hook (right at the same location as 'tc' ingress), > if there is a hit in the software flowtable, ttl gets decremented, > NATs are done and the packet is placed in the destination NIC via > neigh_xmit() - through the neighbour layer. Hi Pablo, I tested today a few things on a brand new TP-Link Archer C7 v4.0, LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency (everything on the same table), IPv4 unless specified, using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). With the TP-Link firmware: - wired 930+ Mbit/s both ways - wireless 5G 560+ Mbit/s down 440+ Mbit/s up - wireless 2.4G 100+ Mbit/s both ways With OpenWRT/LEDE trunk 20180128 4.4 kernel: - wired 350-400 Mbit/s both ways - wired with firewall deactivated 550 Mbit/s (just "iptables -t nat -A POSTROUTING -j MASQUERADE") - wired IPv6 routing, no NAT, no firewall 250 Mbit/s - wireless 5G 150-200 Mbit/s - wireless 2.4G forgot to test top on the router shows sirq at 90%+ during network load, other load indicators are under 5%. IPv6 performance without NAT being below IPv4 with NAT seems to indicate there are potential gains in software :). I didn't test OpenWRT in bridge mode but I got with LEDE 17.01 on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think radio is good on these ath10k routers. So if OpenWRT can do about x2 in software routing performance we're good against our TP-Link firmware friends :). tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each with individual gigabit fiber uplink (TP-Link MC220L fiber converter), and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our members will get 10G on their PC at home :). We build our images from git source, generating imagebuilder and then a custom python script. We have 5+ spare C7, fast build (20mn from scratch) and testing environment, and of course we're interested in suggestions on what to do. Thanks in advance for your help, Sincerely, Laurent http://tetaneutral.net ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
Re: [LEDE-DEV] A state of network acceleration
Hi Rafal, On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: > Getting better network performance (mostly for NAT) using some kind of > acceleration was always a hot topic and people are still > looking/asking for it. I'd like to write a short summary and share my > understanding of current state so that: > 1) People can undesrtand it better > 2) We can have some rough plan > > First of all there are two possible ways of accelerating network > traffic: in software and in hardware. Software solution is independent > of architecture/device and is mostly just bypassing in-kernel packets > flow. It still uses device's CPU which can be a bottleneck. Various > software implementations are reported to be faster from 2x to 5x. This is what I've been observing for the software acceleration here, see slide 19 at: https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-updates-canada-2017.pdf The flowtable representation, in software, is providing a faster forwarding path between two nics. So it's basically an alternative to the classic forwarding path, that is faster. Packets kick in at the Netfilter ingress hook (right at the same location as 'tc' ingress), if there is a hit in the software flowtable, ttl gets decremented, NATs are done and the packet is placed in the destination NIC via neigh_xmit() - through the neighbour layer. > Hardware acceleration requires hw-specific implementation and can > offload device's CPU. > > Of course handling network traffic out of the networking subsystem > means some features like QoS / throughput limits / advanced firewall > rules may not/won't work. > > The hardest task (for both methods) was always a Linux kernel > integration. Drivers had to somehow: > 1) Get/build a table with rules for packets flow > 2) Update in-kernel state to e.g. avoid connection timeout & its removal > > The problem with all existing implementations was they used various > non-upstream patches for kernel integration. Some were less invasive, > some a bit more. They weren't properly reviewed by kernel developers > and usually were using hacks/solutions that couldn't be accepted. > > The rescue to this was Pablo's work on offloading infrastructure. He > worked on this hard by developing & sending his patchset for upstream > kernel: > [1] [PATCH RFC,WIP 0/5] Flow offload infrastructure > [2] [PATCH nf-next RFC,v2 0/6] Flow offload infrastructure > [3] [PATCH nf-next,v3 0/7] Flow offload infrastructure > > The best news is that his final patchset version was accepted and sits > now in the net-next [4] (and should become part of kernel 4.16). > > Now, what does it mean for LEDE project: > 1) There is upstream infrastructure that should be ready to use > 2) It's based on & requires nftables > 3) LEDE's firewall3 uses iptables (& friends) C API > 4) There aren't any drivers for offloading hardware (switches?) yet Yes, there is no drivers using the hardware offload infrastructure. So the patch to add the ndo_flow_offload hook to struct net_device has been kept back by now [1] until there is an initial driver client for this. I'll be sending a new version for [1] asap. Will push it to a branch in my nf-next.git tree [2] and will rebase it on top of my master so people developing a driver that uses this doesn't need to deal with this extra work. [1] http://patchwork.ozlabs.org/patch/852537/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git > One thing I'm not sure about is if software accelerator is ready or not. > Pablo is his e-mail wrote: > > So far, this is a generic software flow table representation, that > > matches basic flow table hardware semantics but that also provides a > > software faster path. So you can use it to purely forward packets > > between two nics even if they come with no hardware offload support. > > which could suggest software path is already there. Yes, software acceleration is working in my testbed, other than that, it's a bug that needs to be fixed ;-). I'm still finishing the userspace bits for libnftnl and nft, to provide the control plane to users to configure this. Will post this patchset asap, so these userspace bits can follow their path to upstream repositories. > So there is my idea of what is needed by LEDE to get it working: > 1) Rewrite firewall3 to use nftables There's a tentative C API for nftables: http://git.netfilter.org/nftables/tree/include/nftables/nftables.h http://git.netfilter.org/nftables/tree/src/libnftables.c There are plans to add an API to support batching too, ie. add several rules into the kernel in one go - using the nftables transaction infrastructure- this is almost done since it was part of original work done by Eric Leblond. I can see firewall3 builds strings that are passed to iptables/ipset, this approach matches the existing C API that we're providing. At this stage the high-level libnftables library is not yet exposed as shared object, there is a static library under src/.libs/l
Re: [LEDE-DEV] A state of network acceleration
On 17 January 2018 at 16:25, Rafał Miłecki wrote: > The problem with all existing implementations was they used various > non-upstream patches for kernel integration. Some were less invasive, > some a bit more. They weren't properly reviewed by kernel developers > and usually were using hacks/solutions that couldn't be accepted. If someone is interested in these existing implementations, there is a list of these I'm aware of. One of the earliest ones was Broadcom's CTF (Cut-Through Forwarding). For BCM4706-based device it could bump NAT from 120 Mb/s to 850 Mb/s. It was described by me in: [1] Understanding/reimplementing forwarding acceleration used by Broadcom (ctf) e-mail thread. It consisted of kernel modification (see ctf.diff in above thread) and closed source ctf.ko. Marvell announced their own "fastpath" implementation in 2014 in e-mail thread: [2] Introducing "fastpath" - Kernel module for speeding up IP forwarding They referenced a nice article on embedded.com [3]. AFAIU a year or two later they released it as the OpenFastPath project [3] [4] under the BSD 3-clause license. Noone tried integrating it with OpenWrt/LEDE AFAIK. Finally there is Qualcomm's Shortcut Forwarding Engine. It's open source and it's reported to increase NAT performance on AR9132-based device from ~235 Mb/s to ~525 Mb/s. It's ported to LEDE as set of patches to be applied on top of cloned repo [6] by a gwlim user. [1] https://lists.openwrt.org/pipermail/openwrt-devel/2013-August/021112.html [2] https://lists.openwrt.org/pipermail/openwrt-devel/2014-December/030179.html [3] https://www.embedded.com/design/operating-systems/4403058/Accelerating-network-packet-processing-in-Linux [4] https://openfastpath.org/ [5] https://github.com/MarvellEmbeddedProcessors/ofp-marvell [6] https://github.com/gwlim/Fast-Path-LEDE-OpenWRT ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev
[LEDE-DEV] A state of network acceleration
Getting better network performance (mostly for NAT) using some kind of acceleration was always a hot topic and people are still looking/asking for it. I'd like to write a short summary and share my understanding of current state so that: 1) People can undesrtand it better 2) We can have some rough plan First of all there are two possible ways of accelerating network traffic: in software and in hardware. Software solution is independent of architecture/device and is mostly just bypassing in-kernel packets flow. It still uses device's CPU which can be a bottleneck. Various software implementations are reported to be faster from 2x to 5x. Hardware acceleration requires hw-specific implementation and can offload device's CPU. Of course handling network traffic out of the networking subsystem means some features like QoS / throughput limits / advanced firewall rules may not/won't work. The hardest task (for both methods) was always a Linux kernel integration. Drivers had to somehow: 1) Get/build a table with rules for packets flow 2) Update in-kernel state to e.g. avoid connection timeout & its removal The problem with all existing implementations was they used various non-upstream patches for kernel integration. Some were less invasive, some a bit more. They weren't properly reviewed by kernel developers and usually were using hacks/solutions that couldn't be accepted. The rescue to this was Pablo's work on offloading infrastructure. He worked on this hard by developing & sending his patchset for upstream kernel: [1] [PATCH RFC,WIP 0/5] Flow offload infrastructure [2] [PATCH nf-next RFC,v2 0/6] Flow offload infrastructure [3] [PATCH nf-next,v3 0/7] Flow offload infrastructure The best news is that his final patchset version was accepted and sits now in the net-next [4] (and should become part of kernel 4.16). Now, what does it mean for LEDE project: 1) There is upstream infrastructure that should be ready to use 2) It's based on & requires nftables 3) LEDE's firewall3 uses iptables (& friends) C API 4) There aren't any drivers for offloading hardware (switches?) yet One thing I'm not sure about is if software accelerator is ready or not. Pablo is his e-mail wrote: > So far, this is a generic software flow table representation, that > matches basic flow table hardware semantics but that also provides a > software faster path. So you can use it to purely forward packets > between two nics even if they come with no hardware offload support. which could suggest software path is already there. So there is my idea of what is needed by LEDE to get it working: 1) Rewrite firewall3 to use nftables 2) Switch to kernel 4.16 or backport offloading to 4.14 3) Work on implementing/enabling software acceleration path Let me know if above description makes sense to you or correct me if you think I misunderstood something :) [1] https://www.spinics.net/lists/netfilter-devel/msg50141.html [2] https://www.spinics.net/lists/netfilter-devel/msg50555.html [3] https://www.spinics.net/lists/netfilter-devel/msg50759.html [4] https://www.spinics.net/lists/netfilter-devel/msg50973.html ___ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev