Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Joel Wirāmu Pauling
Sorry to keep following up with this;

the other thing it gives you is things like sysctl parameters, kernel,
tcp window scaling (pre and post test) and a bunch of per stream and
aggregated metadata relating to the entire suite. In a nice self
contained gzip that can produce lovely graphs using matplotlib.

Basically a repeatable standardized test with all the things you might
be interested in captured for distribution/reference.

flent-gui provides a nice interactive graphical interface (but you can
just as easily using the cli) for interacting with the datasets.


-Joel

On 30 January 2018 at 10:52, Joel Wirāmu Pauling  wrote:
> In terms of what you need on the target netserver/netperf from ipkg is
> tiny and is all you need.
>
> On 30 January 2018 at 10:51, Joel Wirāmu Pauling  wrote:
>> FLENT + RRUL testing is 4 up 4 down TCP streams with 4 different QoS
>> Markings, and then 4 different QoS Marked UDP probes and ICMP.
>>
>> It gives you a measure of how much the CPU and Network path can cope
>> with load conditions, which are more realistic for everyday use.
>>
>> iperf3 isn't going to give you any measure of that.
>>
>> On 30 January 2018 at 10:48, Karl Palsson  wrote:
>>>
>>> Joel Wirāmu Pauling   wrote:
 Any chance I can convince you to use netperf + FLENT for doing
 your tests rather than iperf(3)?

 flent.org

>>>
>>> For those playing at home, could you elaborate on _why_? What do
>>> you expect to change? By what sort of percentage?
>>>
>>> Sincerely,
>>> Karl Palsson

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Joel Wirāmu Pauling
In terms of what you need on the target netserver/netperf from ipkg is
tiny and is all you need.

On 30 January 2018 at 10:51, Joel Wirāmu Pauling  wrote:
> FLENT + RRUL testing is 4 up 4 down TCP streams with 4 different QoS
> Markings, and then 4 different QoS Marked UDP probes and ICMP.
>
> It gives you a measure of how much the CPU and Network path can cope
> with load conditions, which are more realistic for everyday use.
>
> iperf3 isn't going to give you any measure of that.
>
> On 30 January 2018 at 10:48, Karl Palsson  wrote:
>>
>> Joel Wirāmu Pauling   wrote:
>>> Any chance I can convince you to use netperf + FLENT for doing
>>> your tests rather than iperf(3)?
>>>
>>> flent.org
>>>
>>
>> For those playing at home, could you elaborate on _why_? What do
>> you expect to change? By what sort of percentage?
>>
>> Sincerely,
>> Karl Palsson

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Joel Wirāmu Pauling
FLENT + RRUL testing is 4 up 4 down TCP streams with 4 different QoS
Markings, and then 4 different QoS Marked UDP probes and ICMP.

It gives you a measure of how much the CPU and Network path can cope
with load conditions, which are more realistic for everyday use.

iperf3 isn't going to give you any measure of that.

On 30 January 2018 at 10:48, Karl Palsson  wrote:
>
> Joel Wirāmu Pauling   wrote:
>> Any chance I can convince you to use netperf + FLENT for doing
>> your tests rather than iperf(3)?
>>
>> flent.org
>>
>
> For those playing at home, could you elaborate on _why_? What do
> you expect to change? By what sort of percentage?
>
> Sincerely,
> Karl Palsson

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Karl Palsson

Joel Wirāmu Pauling   wrote:
> Any chance I can convince you to use netperf + FLENT for doing
> your tests rather than iperf(3)?
> 
> flent.org
> 

For those playing at home, could you elaborate on _why_? What do
you expect to change? By what sort of percentage?

Sincerely,
Karl Palsson

signature.html
Description: OpenPGP Digital Signature
___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Joel Wirāmu Pauling
Any chance I can convince you to use netperf + FLENT for doing your
tests rather than iperf(3)?

flent.org

-Joel

On 30 January 2018 at 03:12, Michael Richardson  wrote:
>
> Laurent GUERBY  wrote:
> >> So that means that you have to do the performance testing for routing
> >> between two subnets.
>
> > Hi,
>
> > With wired, firewall off and using routing (no MASQUERADE, explicit LAN
> > route added on the NUC via WAN IP):
>
> Thanks for doing this again.
> This is Openwrt/LEDE, or the stock firmware?
> I think Openwrt.
>
> > - IPv4 590+ Mbit/s up and 690+ Mbit/s down
> > - IPv6 270+ Mbit/s same both ways
>
> > So without NAT/conntrack we gain about 50% on IPv4 and we're closer to
> > line rate.
>
> I wonder why the asymmetry.
>
> > For the record I tested without the router to check iperf3 my setup and
> > IPv4 and IPv6 are 910+ Mbit/s both ways.
>
> > Sincerely,
>
> > Laurent
>
> > PS: kernel is 4.9.77 on the archer (not 4.4, thinko in my first mail)
> > NUC and laptop are running 4.9 debian stretch too.
>
>
> --
> ]   Never tell me the odds! | ipv6 mesh networks [
> ]   Michael Richardson, Sandelman Software Works| network architect  [
> ] m...@sandelman.ca  http://www.sandelman.ca/|   ruby on rails
> [
>
>
> ___
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
>

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Michael Richardson

Laurent GUERBY  wrote:
>> So that means that you have to do the performance testing for routing
>> between two subnets.

> Hi,

> With wired, firewall off and using routing (no MASQUERADE, explicit LAN
> route added on the NUC via WAN IP):

Thanks for doing this again.
This is Openwrt/LEDE, or the stock firmware?
I think Openwrt.

> - IPv4 590+ Mbit/s up and 690+ Mbit/s down
> - IPv6 270+ Mbit/s same both ways

> So without NAT/conntrack we gain about 50% on IPv4 and we're closer to
> line rate.

I wonder why the asymmetry.

> For the record I tested without the router to check iperf3 my setup and
> IPv4 and IPv6 are 910+ Mbit/s both ways.

> Sincerely,

> Laurent

> PS: kernel is 4.9.77 on the archer (not 4.4, thinko in my first mail)
> NUC and laptop are running 4.9 debian stretch too.


--
]   Never tell me the odds! | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works| network architect  [
] m...@sandelman.ca  http://www.sandelman.ca/|   ruby on rails[



signature.asc
Description: PGP signature
___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-29 Thread Laurent GUERBY
On Sun, 2018-01-28 at 19:12 -0500, Michael Richardson wrote:
> Laurent GUERBY  wrote:
> > On Sun, 2018-01-28 at 17:09 -0500, Michael Richardson wrote:
> >> Laurent GUERBY  wrote:
> >> > I tested today a few things on a brand new TP-Link
> Archer C7
> >> v4.0,
> >> > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 /
> 8275)
> >> > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1
> ms
> >> latency
> >> > (everything on the same table), IPv4 unless specified,
> >> > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
> >>
> >> > With the TP-Link firmware:
> >> > - wired 930+ Mbit/s both ways
> >> > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
> >> > - wireless 2.4G 100+ Mbit/s both ways
> >>
> >> > With OpenWRT/LEDE trunk 20180128 4.4 kernel:
> >> > - wired 350-400 Mbit/s both ways
> >> > - wired with firewall deactivated 550 Mbit/s
> >> > (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
> >>
> >> That still means you have conn-tracking loaded.
> >> Have you tried without that?
> 
> > What should I do to enable NAT without conn-tracking?
> > (I see a few nf_conntrack* modules in lsmod)
> 
> Unfortunately, you don't.
> 
> It's also hard to get rid of the conntrack modules, other than
> clearing
> everything and then rmmod'ing them.  Sometimes I've had to rename the
> .ko
> files and reboot to get rid of them.
> 
> So that means that you have to do the performance testing for routing
> between two subnets.

Hi,

With wired, firewall off and using routing (no MASQUERADE, explicit LAN
route added on the NUC via WAN IP):

- IPv4 590+ Mbit/s up and 690+ Mbit/s down
- IPv6 270+ Mbit/s same both ways

So without NAT/conntrack we gain about 50% on IPv4 and we're closer to
line rate.

For the record I tested without the router to check iperf3 my setup and
IPv4 and IPv6 are 910+ Mbit/s both ways.

Sincerely,

Laurent

PS: kernel is 4.9.77 on the archer (not 4.4, thinko in my first mail)
NUC and laptop are running 4.9 debian stretch too.


___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Rosen Penev
On Sun, Jan 28, 2018 at 3:43 PM, Florian Fainelli  wrote:
> (please don't top post).
>
> On 01/28/2018 02:00 PM, Rosen Penev wrote:
>> Compared to the Archer C7v2, the v4 has a single ethernet interface
>> switched between all 5 ports. The v2 has two ethernet interfaces with
>> 4 ports being switched.
>>
>> Now the disappointing performance has several reasons to it. The main
>> one being that the ag71xx driver in OpenWrt is not very optimized for
>> the hardware.
>
> The driver certainly contributes to that, but I don't think it is the
> main reason behind it. Each time you send or receive a packet, you need
> to invalidate your data cache for at least 1500 bytes, or whatever the
> nomimal packet/buffer size has been allocated (e.g: 2KB), with very
> small I and D caches (typically 64KB) and no L2 cache, you do this
> trashing very frequently and you keep hitting the DRAM as well, this
> hurts performance a lot. This is just something the networking stack
> does, and it is really to diverge from this because that is inherently
> how it is designed, and how drivers are designed as well. This is why
> software bypass in hardware are so effective for low power CPUs.
>
Good point. Even with Qualcomm's solution to this (FastPath) gets good results.
Some of the cache stuff I as well as Felix have been backporting to
get less thrashing.
> I would be curious to see the use of XDP redirect and implementing a
> software NAT fast path, that is, for the most basic NATP translation, do
> this in XDP as early as possible in the driver receive/transmit part and
> send directly to the outgoing interface, this should lower the pressure
> on the I and D caches by invalidating not the full packet length, but
> just the header portion. For more complex protocols, we would keep using
> the conntrack helpers to do the necessary operation (FTP, TFTP, SIP,
> etc..) on the packet. This might avoid doing a sk_buff allocation for
> each packet making it through, which is expensive.
>
Unfortunately, someone needs to get this done. Not I.
>>
>> Qualcomm forked the driver (in 2013 i  think) and added some really
>> nice features. Some of these need to be backported for ag71xx in
>> OpenWrt to be competitive.
>
> Is it possible to just drop their driver in OpenWrt and get a feeling of
> the performance gap?
>
I've tried. Qualcomm's driver(as well as their ar71xx platform) has
some devicetree bindings that I have not been able to make sense of.
To make matters worse, I can't find a git history for the driver
except for an old one from 2013. They've also removed the driver from
the usual location that I found at:
https://portland.source.codeaurora.org/quic/qrdk/oss/kernel/linux-msm/

Maybe it's just hiding...

The driver cherry picked from last year is available here:
https://github.com/neheb/source/tree/qca-ag71xx
>>
>> It's going to take quite a bit of work to get the driver up to par.
>> Biggest performance boost I imagine would be to add GRO support. It
>> turns out for good routing performance, GRO requires hardware
>> checksumming, which is not supported by ag71xx in OpenWrt at the
>> moment.
>
> Does the hardware actually support checksum offloads?
>
As far as I know, hardware with QCA in the title has support . The
ag71xx in OpenWrt has support for a lot more platforms that do not
support offloads.
>>
>> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling  
>> wrote:
>>> Hi as I also am using the archer c7's as my build targets (and c2600's) I
>>> am watching this keenly; is anyone else running openvswtich on these with
>>> the XDP patches?
>>>
>>> The c2600 which is arm a15 - currently really could do with optimization
>>> and probably is a much better choice for CPE. I would not be caught dead
>>> with the c7 as a 10Gbit CPE myself
>>> the SoC even with the Openfast path patches just can't handle complex QoS
>>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.
>>>
>>>
>>>
>>> -Joel
>>> ---
>>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1
>>>
>>> On 29 January 2018 at 09:43, Laurent GUERBY  wrote:

 On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
> Hi Rafal,
>
> On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
>> Getting better network performance (mostly for NAT) using some kind
>> of
>> acceleration was always a hot topic and people are still
>> looking/asking for it. I'd like to write a short summary and share
>> my
>> understanding of current state so that:
>> 1) People can undesrtand it better
>> 2) We can have some rough plan
>>
>> First of all there are two possible ways of accelerating network
>> traffic: in software and in hardware. Software solution is
>> independent
>> of architecture/device and is mostly just bypassing in-kernel
>> packets
>> flow. It still uses device's CPU which can be a bottleneck. Various
>> software implementations are reported to be faster from 2x to 5x.
>
> This i

Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Michael Richardson

Laurent GUERBY  wrote:
> On Sun, 2018-01-28 at 17:09 -0500, Michael Richardson wrote:
>> Laurent GUERBY  wrote:
>> > I tested today a few things on a brand new TP-Link Archer C7
>> v4.0,
>> > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
>> > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms
>> latency
>> > (everything on the same table), IPv4 unless specified,
>> > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
>>
>> > With the TP-Link firmware:
>> > - wired 930+ Mbit/s both ways
>> > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
>> > - wireless 2.4G 100+ Mbit/s both ways
>>
>> > With OpenWRT/LEDE trunk 20180128 4.4 kernel:
>> > - wired 350-400 Mbit/s both ways
>> > - wired with firewall deactivated 550 Mbit/s
>> > (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
>>
>> That still means you have conn-tracking loaded.
>> Have you tried without that?

> What should I do to enable NAT without conn-tracking?
> (I see a few nf_conntrack* modules in lsmod)

Unfortunately, you don't.

It's also hard to get rid of the conntrack modules, other than clearing
everything and then rmmod'ing them.  Sometimes I've had to rename the .ko
files and reboot to get rid of them.

So that means that you have to do the performance testing for routing
between two subnets.

>> > - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
>> > - wireless 5G 150-200 Mbit/s
>> > - wireless 2.4G forgot to test
>>
>> Does the TP-Link firmware support any IPv6?
>> You could report 0Mb/s for IPv6 :-)

> TP-Link has now added full IPv6 support AFAIK. I will
> test it and report when I get my hand on another spare.

Thanks!

>> > IPv6 performance without NAT being below IPv4 with NAT seems
>> > to indicate there are potential gains in software :).
>>
>> Depends upon whether there is hardware support for NAT,
>> which many devices have, wrapped up under NDAs.

> I don't think OpenWRT has support for NAT accelerators
> at this point, IPv4 and IPv6 are both done in software.

Yes, that's the case, because the details have been wrapped in NDAs.
I see that Qualcomm has released something, so that's exciting.

--
]   Never tell me the odds! | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works| network architect  [
] m...@sandelman.ca  http://www.sandelman.ca/|   ruby on rails[



signature.asc
Description: PGP signature
___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Joel Wirāmu Pauling
FYI - the Openfast path patches are applied to several trees. I am
running them on a c7 v2 right now and am able to hit close to stock
numbers.

The NAT acceleration stuff isn't needed to with open-fastpath patches at all.

relevant thread:
https://forum.lede-project.org/t/qualcomm-fast-path-for-lede/4582

-Joel

On 29 January 2018 at 12:43, Florian Fainelli  wrote:
> (please don't top post).
>
> On 01/28/2018 02:00 PM, Rosen Penev wrote:
>> Compared to the Archer C7v2, the v4 has a single ethernet interface
>> switched between all 5 ports. The v2 has two ethernet interfaces with
>> 4 ports being switched.
>>
>> Now the disappointing performance has several reasons to it. The main
>> one being that the ag71xx driver in OpenWrt is not very optimized for
>> the hardware.
>
> The driver certainly contributes to that, but I don't think it is the
> main reason behind it. Each time you send or receive a packet, you need
> to invalidate your data cache for at least 1500 bytes, or whatever the
> nomimal packet/buffer size has been allocated (e.g: 2KB), with very
> small I and D caches (typically 64KB) and no L2 cache, you do this
> trashing very frequently and you keep hitting the DRAM as well, this
> hurts performance a lot. This is just something the networking stack
> does, and it is really to diverge from this because that is inherently
> how it is designed, and how drivers are designed as well. This is why
> software bypass in hardware are so effective for low power CPUs.
>
> I would be curious to see the use of XDP redirect and implementing a
> software NAT fast path, that is, for the most basic NATP translation, do
> this in XDP as early as possible in the driver receive/transmit part and
> send directly to the outgoing interface, this should lower the pressure
> on the I and D caches by invalidating not the full packet length, but
> just the header portion. For more complex protocols, we would keep using
> the conntrack helpers to do the necessary operation (FTP, TFTP, SIP,
> etc..) on the packet. This might avoid doing a sk_buff allocation for
> each packet making it through, which is expensive.
>
>>
>> Qualcomm forked the driver (in 2013 i  think) and added some really
>> nice features. Some of these need to be backported for ag71xx in
>> OpenWrt to be competitive.
>
> Is it possible to just drop their driver in OpenWrt and get a feeling of
> the performance gap?
>
>>
>> It's going to take quite a bit of work to get the driver up to par.
>> Biggest performance boost I imagine would be to add GRO support. It
>> turns out for good routing performance, GRO requires hardware
>> checksumming, which is not supported by ag71xx in OpenWrt at the
>> moment.
>
> Does the hardware actually support checksum offloads?
>
>>
>> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling  
>> wrote:
>>> Hi as I also am using the archer c7's as my build targets (and c2600's) I
>>> am watching this keenly; is anyone else running openvswtich on these with
>>> the XDP patches?
>>>
>>> The c2600 which is arm a15 - currently really could do with optimization
>>> and probably is a much better choice for CPE. I would not be caught dead
>>> with the c7 as a 10Gbit CPE myself
>>> the SoC even with the Openfast path patches just can't handle complex QoS
>>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.
>>>
>>>
>>>
>>> -Joel
>>> ---
>>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1
>>>
>>> On 29 January 2018 at 09:43, Laurent GUERBY  wrote:

 On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
> Hi Rafal,
>
> On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
>> Getting better network performance (mostly for NAT) using some kind
>> of
>> acceleration was always a hot topic and people are still
>> looking/asking for it. I'd like to write a short summary and share
>> my
>> understanding of current state so that:
>> 1) People can undesrtand it better
>> 2) We can have some rough plan
>>
>> First of all there are two possible ways of accelerating network
>> traffic: in software and in hardware. Software solution is
>> independent
>> of architecture/device and is mostly just bypassing in-kernel
>> packets
>> flow. It still uses device's CPU which can be a bottleneck. Various
>> software implementations are reported to be faster from 2x to 5x.
>
> This is what I've been observing for the software acceleration here,
> see slide 19 at:
>
> https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
> ates-canada-2017.pdf
>
> The flowtable representation, in software, is providing a faster
> forwarding path between two nics. So it's basically an alternative to
> the classic forwarding path, that is faster. Packets kick in at the
> Netfilter ingress hook (right at the same location as 'tc' ingress),
> if there is a hit in the software flowtable, ttl gets decremented,
> N

Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Florian Fainelli
(please don't top post).

On 01/28/2018 02:00 PM, Rosen Penev wrote:
> Compared to the Archer C7v2, the v4 has a single ethernet interface
> switched between all 5 ports. The v2 has two ethernet interfaces with
> 4 ports being switched.
> 
> Now the disappointing performance has several reasons to it. The main
> one being that the ag71xx driver in OpenWrt is not very optimized for
> the hardware.

The driver certainly contributes to that, but I don't think it is the
main reason behind it. Each time you send or receive a packet, you need
to invalidate your data cache for at least 1500 bytes, or whatever the
nomimal packet/buffer size has been allocated (e.g: 2KB), with very
small I and D caches (typically 64KB) and no L2 cache, you do this
trashing very frequently and you keep hitting the DRAM as well, this
hurts performance a lot. This is just something the networking stack
does, and it is really to diverge from this because that is inherently
how it is designed, and how drivers are designed as well. This is why
software bypass in hardware are so effective for low power CPUs.

I would be curious to see the use of XDP redirect and implementing a
software NAT fast path, that is, for the most basic NATP translation, do
this in XDP as early as possible in the driver receive/transmit part and
send directly to the outgoing interface, this should lower the pressure
on the I and D caches by invalidating not the full packet length, but
just the header portion. For more complex protocols, we would keep using
the conntrack helpers to do the necessary operation (FTP, TFTP, SIP,
etc..) on the packet. This might avoid doing a sk_buff allocation for
each packet making it through, which is expensive.

> 
> Qualcomm forked the driver (in 2013 i  think) and added some really
> nice features. Some of these need to be backported for ag71xx in
> OpenWrt to be competitive.

Is it possible to just drop their driver in OpenWrt and get a feeling of
the performance gap?

> 
> It's going to take quite a bit of work to get the driver up to par.
> Biggest performance boost I imagine would be to add GRO support. It
> turns out for good routing performance, GRO requires hardware
> checksumming, which is not supported by ag71xx in OpenWrt at the
> moment.

Does the hardware actually support checksum offloads?

> 
> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling  
> wrote:
>> Hi as I also am using the archer c7's as my build targets (and c2600's) I
>> am watching this keenly; is anyone else running openvswtich on these with
>> the XDP patches?
>>
>> The c2600 which is arm a15 - currently really could do with optimization
>> and probably is a much better choice for CPE. I would not be caught dead
>> with the c7 as a 10Gbit CPE myself
>> the SoC even with the Openfast path patches just can't handle complex QoS
>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.
>>
>>
>>
>> -Joel
>> ---
>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1
>>
>> On 29 January 2018 at 09:43, Laurent GUERBY  wrote:
>>>
>>> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
 Hi Rafal,

 On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
> Getting better network performance (mostly for NAT) using some kind
> of
> acceleration was always a hot topic and people are still
> looking/asking for it. I'd like to write a short summary and share
> my
> understanding of current state so that:
> 1) People can undesrtand it better
> 2) We can have some rough plan
>
> First of all there are two possible ways of accelerating network
> traffic: in software and in hardware. Software solution is
> independent
> of architecture/device and is mostly just bypassing in-kernel
> packets
> flow. It still uses device's CPU which can be a bottleneck. Various
> software implementations are reported to be faster from 2x to 5x.

 This is what I've been observing for the software acceleration here,
 see slide 19 at:

 https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
 ates-canada-2017.pdf

 The flowtable representation, in software, is providing a faster
 forwarding path between two nics. So it's basically an alternative to
 the classic forwarding path, that is faster. Packets kick in at the
 Netfilter ingress hook (right at the same location as 'tc' ingress),
 if there is a hit in the software flowtable, ttl gets decremented,
 NATs are done and the packet is placed in the destination NIC via
 neigh_xmit() - through the neighbour layer.
>>>
>>> Hi Pablo,
>>>
>>> I tested today a few things on a brand new TP-Link Archer C7 v4.0,
>>> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
>>> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
>>> (everything on the same table), IPv4 unless specified,
>>> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
>>>
>>> With the TP-Link firmwa

Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Laurent GUERBY
Hi Michael,

On Sun, 2018-01-28 at 17:09 -0500, Michael Richardson wrote:
> Laurent GUERBY  wrote:
> > I tested today a few things on a brand new TP-Link Archer C7
> v4.0,
> > LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
> > WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms
> latency
> > (everything on the same table), IPv4 unless specified,
> > using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
> 
> > With the TP-Link firmware:
> > - wired 930+ Mbit/s both ways
> > - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
> > - wireless 2.4G 100+ Mbit/s both ways
> 
> > With OpenWRT/LEDE trunk 20180128 4.4 kernel:
> > - wired 350-400 Mbit/s both ways
> > - wired with firewall deactivated 550 Mbit/s
> > (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
> 
> That still means you have conn-tracking loaded.
> Have you tried without that?

What should I do to enable NAT without conn-tracking?
(I see a few nf_conntrack* modules in lsmod)

> > - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
> > - wireless 5G 150-200 Mbit/s
> > - wireless 2.4G forgot to test
> 
> Does the TP-Link firmware support any IPv6?
> You could report 0Mb/s for IPv6 :-)

TP-Link has now added full IPv6 support AFAIK. I will
test it and report when I get my hand on another spare.

> > IPv6 performance without NAT being below IPv4 with NAT seems
> > to indicate there are potential gains in software :).
> 
> Depends upon whether there is hardware support for NAT,
> which many devices have, wrapped up under NDAs.

I don't think OpenWRT has support for NAT accelerators
at this point, IPv4 and IPv6 are both done in software.

Sincerely,

Laurent

> --
> ]   Never tell me the odds! | ipv6 mesh
> networks [
> ]   Michael Richardson, Sandelman Software Works| network
> architect  [
> ] m...@sandelman.ca  http://www.sandelman.ca/|   ruby on
> rails[
> 
> ___
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Michael Richardson

Laurent GUERBY  wrote:
> I tested today a few things on a brand new TP-Link Archer C7 v4.0,
> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
> (everything on the same table), IPv4 unless specified,
> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).

> With the TP-Link firmware:
> - wired 930+ Mbit/s both ways
> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
> - wireless 2.4G 100+ Mbit/s both ways

> With OpenWRT/LEDE trunk 20180128 4.4 kernel:
> - wired 350-400 Mbit/s both ways
> - wired with firewall deactivated 550 Mbit/s
> (just "iptables -t nat -A POSTROUTING -j MASQUERADE")

That still means you have conn-tracking loaded.
Have you tried without that?

> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
> - wireless 5G 150-200 Mbit/s
> - wireless 2.4G forgot to test

Does the TP-Link firmware support any IPv6?
You could report 0Mb/s for IPv6 :-)

> IPv6 performance without NAT being below IPv4 with NAT seems
> to indicate there are potential gains in software :).

Depends upon whether there is hardware support for NAT,
which many devices have, wrapped up under NDAs.

--
]   Never tell me the odds! | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works| network architect  [
] m...@sandelman.ca  http://www.sandelman.ca/|   ruby on rails[



signature.asc
Description: PGP signature
___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Rosen Penev
Compared to the Archer C7v2, the v4 has a single ethernet interface
switched between all 5 ports. The v2 has two ethernet interfaces with
4 ports being switched.

Now the disappointing performance has several reasons to it. The main
one being that the ag71xx driver in OpenWrt is not very optimized for
the hardware.

Qualcomm forked the driver (in 2013 i  think) and added some really
nice features. Some of these need to be backported for ag71xx in
OpenWrt to be competitive.

It's going to take quite a bit of work to get the driver up to par.
Biggest performance boost I imagine would be to add GRO support. It
turns out for good routing performance, GRO requires hardware
checksumming, which is not supported by ag71xx in OpenWrt at the
moment.

On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling  wrote:
> Hi as I also am using the archer c7's as my build targets (and c2600's) I
> am watching this keenly; is anyone else running openvswtich on these with
> the XDP patches?
>
> The c2600 which is arm a15 - currently really could do with optimization
> and probably is a much better choice for CPE. I would not be caught dead
> with the c7 as a 10Gbit CPE myself
> the SoC even with the Openfast path patches just can't handle complex QoS
> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.
>
>
>
> -Joel
> ---
> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1
>
> On 29 January 2018 at 09:43, Laurent GUERBY  wrote:
>>
>> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
>> > Hi Rafal,
>> >
>> > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
>> > > Getting better network performance (mostly for NAT) using some kind
>> > > of
>> > > acceleration was always a hot topic and people are still
>> > > looking/asking for it. I'd like to write a short summary and share
>> > > my
>> > > understanding of current state so that:
>> > > 1) People can undesrtand it better
>> > > 2) We can have some rough plan
>> > >
>> > > First of all there are two possible ways of accelerating network
>> > > traffic: in software and in hardware. Software solution is
>> > > independent
>> > > of architecture/device and is mostly just bypassing in-kernel
>> > > packets
>> > > flow. It still uses device's CPU which can be a bottleneck. Various
>> > > software implementations are reported to be faster from 2x to 5x.
>> >
>> > This is what I've been observing for the software acceleration here,
>> > see slide 19 at:
>> >
>> > https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
>> > ates-canada-2017.pdf
>> >
>> > The flowtable representation, in software, is providing a faster
>> > forwarding path between two nics. So it's basically an alternative to
>> > the classic forwarding path, that is faster. Packets kick in at the
>> > Netfilter ingress hook (right at the same location as 'tc' ingress),
>> > if there is a hit in the software flowtable, ttl gets decremented,
>> > NATs are done and the packet is placed in the destination NIC via
>> > neigh_xmit() - through the neighbour layer.
>>
>> Hi Pablo,
>>
>> I tested today a few things on a brand new TP-Link Archer C7 v4.0,
>> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
>> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
>> (everything on the same table), IPv4 unless specified,
>> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
>>
>> With the TP-Link firmware:
>> - wired 930+ Mbit/s both ways
>> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
>> - wireless 2.4G 100+ Mbit/s both ways
>>
>> With OpenWRT/LEDE trunk 20180128 4.4 kernel:
>> - wired 350-400 Mbit/s both ways
>> - wired with firewall deactivated 550 Mbit/s
>>   (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
>> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
>> - wireless 5G 150-200 Mbit/s
>> - wireless 2.4G forgot to test
>>
>> top on the router shows sirq at 90%+ during network load, other load
>> indicators are under 5%.
>>
>> IPv6 performance without NAT being below IPv4 with NAT seems
>> to indicate there are potential gains in software :).
>>
>> I didn't test OpenWRT in bridge mode but I got with LEDE 17.01
>> on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think
>> radio is good on these ath10k routers.
>>
>> So if OpenWRT can do about x2 in software routing performance we're
>> good against our TP-Link firmware friends :).
>>
>> tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in
>> FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each
>> with individual gigabit fiber uplink (TP-Link MC220L fiber converter),
>> and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our
>> members will get 10G on their PC at home :).
>>
>> We build our images from git source, generating imagebuilder and then a
>> custom python script. We have 5+ spare C7, fast build (20mn from
>> scratch) and testing environment, and of course we're interested in
>> suggestions on what to do.
>>
>> Thanks in advance for your help

Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Joel Wirāmu Pauling
Hi as I also am using the archer c7's as my build targets (and c2600's) I
am watching this keenly; is anyone else running openvswtich on these with
the XDP patches?

The c2600 which is arm a15 - currently really could do with optimization
and probably is a much better choice for CPE. I would not be caught dead
with the c7 as a 10Gbit CPE myself
the SoC even with the Openfast path patches just can't handle complex QoS
scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.



-Joel
---
https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1

On 29 January 2018 at 09:43, Laurent GUERBY  wrote:
>
> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
> > Hi Rafal,
> >
> > On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
> > > Getting better network performance (mostly for NAT) using some kind
> > > of
> > > acceleration was always a hot topic and people are still
> > > looking/asking for it. I'd like to write a short summary and share
> > > my
> > > understanding of current state so that:
> > > 1) People can undesrtand it better
> > > 2) We can have some rough plan
> > >
> > > First of all there are two possible ways of accelerating network
> > > traffic: in software and in hardware. Software solution is
> > > independent
> > > of architecture/device and is mostly just bypassing in-kernel
> > > packets
> > > flow. It still uses device's CPU which can be a bottleneck. Various
> > > software implementations are reported to be faster from 2x to 5x.
> >
> > This is what I've been observing for the software acceleration here,
> > see slide 19 at:
> >
> > https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
> > ates-canada-2017.pdf
> >
> > The flowtable representation, in software, is providing a faster
> > forwarding path between two nics. So it's basically an alternative to
> > the classic forwarding path, that is faster. Packets kick in at the
> > Netfilter ingress hook (right at the same location as 'tc' ingress),
> > if there is a hit in the software flowtable, ttl gets decremented,
> > NATs are done and the packet is placed in the destination NIC via
> > neigh_xmit() - through the neighbour layer.
>
> Hi Pablo,
>
> I tested today a few things on a brand new TP-Link Archer C7 v4.0,
> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
> (everything on the same table), IPv4 unless specified,
> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
>
> With the TP-Link firmware:
> - wired 930+ Mbit/s both ways
> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
> - wireless 2.4G 100+ Mbit/s both ways
>
> With OpenWRT/LEDE trunk 20180128 4.4 kernel:
> - wired 350-400 Mbit/s both ways
> - wired with firewall deactivated 550 Mbit/s
>   (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
> - wireless 5G 150-200 Mbit/s
> - wireless 2.4G forgot to test
>
> top on the router shows sirq at 90%+ during network load, other load
> indicators are under 5%.
>
> IPv6 performance without NAT being below IPv4 with NAT seems
> to indicate there are potential gains in software :).
>
> I didn't test OpenWRT in bridge mode but I got with LEDE 17.01
> on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think
> radio is good on these ath10k routers.
>
> So if OpenWRT can do about x2 in software routing performance we're
> good against our TP-Link firmware friends :).
>
> tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in
> FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each
> with individual gigabit fiber uplink (TP-Link MC220L fiber converter),
> and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our
> members will get 10G on their PC at home :).
>
> We build our images from git source, generating imagebuilder and then a
> custom python script. We have 5+ spare C7, fast build (20mn from
> scratch) and testing environment, and of course we're interested in
> suggestions on what to do.
>
> Thanks in advance for your help,
>
> Sincerely,
>
> Laurent
> http://tetaneutral.net
>
>
> ___
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration / test on Archer C7 v4

2018-01-28 Thread Laurent GUERBY
On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
> Hi Rafal,
> 
> On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
> > Getting better network performance (mostly for NAT) using some kind
> > of
> > acceleration was always a hot topic and people are still
> > looking/asking for it. I'd like to write a short summary and share
> > my
> > understanding of current state so that:
> > 1) People can undesrtand it better
> > 2) We can have some rough plan
> > 
> > First of all there are two possible ways of accelerating network
> > traffic: in software and in hardware. Software solution is
> > independent
> > of architecture/device and is mostly just bypassing in-kernel
> > packets
> > flow. It still uses device's CPU which can be a bottleneck. Various
> > software implementations are reported to be faster from 2x to 5x.
> 
> This is what I've been observing for the software acceleration here,
> see slide 19 at:
> 
> https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
> ates-canada-2017.pdf
> 
> The flowtable representation, in software, is providing a faster
> forwarding path between two nics. So it's basically an alternative to
> the classic forwarding path, that is faster. Packets kick in at the
> Netfilter ingress hook (right at the same location as 'tc' ingress),
> if there is a hit in the software flowtable, ttl gets decremented,
> NATs are done and the packet is placed in the destination NIC via
> neigh_xmit() - through the neighbour layer.

Hi Pablo,

I tested today a few things on a brand new TP-Link Archer C7 v4.0,
LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
(everything on the same table), IPv4 unless specified,
using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).

With the TP-Link firmware:
- wired 930+ Mbit/s both ways
- wireless 5G 560+ Mbit/s down 440+ Mbit/s up
- wireless 2.4G 100+ Mbit/s both ways

With OpenWRT/LEDE trunk 20180128 4.4 kernel:
- wired 350-400 Mbit/s both ways
- wired with firewall deactivated 550 Mbit/s
  (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
- wired IPv6 routing, no NAT, no firewall 250 Mbit/s
- wireless 5G 150-200 Mbit/s
- wireless 2.4G forgot to test

top on the router shows sirq at 90%+ during network load, other load
indicators are under 5%.

IPv6 performance without NAT being below IPv4 with NAT seems
to indicate there are potential gains in software :).

I didn't test OpenWRT in bridge mode but I got with LEDE 17.01
on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think
radio is good on these ath10k routers.

So if OpenWRT can do about x2 in software routing performance we're
good against our TP-Link firmware friends :).

tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in
FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each
with individual gigabit fiber uplink (TP-Link MC220L fiber converter),
and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our
members will get 10G on their PC at home :).

We build our images from git source, generating imagebuilder and then a
custom python script. We have 5+ spare C7, fast build (20mn from
scratch) and testing environment, and of course we're interested in
suggestions on what to do.

Thanks in advance for your help,

Sincerely,

Laurent
http://tetaneutral.net


___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] A state of network acceleration

2018-01-17 Thread Pablo Neira Ayuso
Hi Rafal,

On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
> Getting better network performance (mostly for NAT) using some kind of
> acceleration was always a hot topic and people are still
> looking/asking for it. I'd like to write a short summary and share my
> understanding of current state so that:
> 1) People can undesrtand it better
> 2) We can have some rough plan
> 
> First of all there are two possible ways of accelerating network
> traffic: in software and in hardware. Software solution is independent
> of architecture/device and is mostly just bypassing in-kernel packets
> flow. It still uses device's CPU which can be a bottleneck. Various
> software implementations are reported to be faster from 2x to 5x.

This is what I've been observing for the software acceleration here,
see slide 19 at:

https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-updates-canada-2017.pdf

The flowtable representation, in software, is providing a faster
forwarding path between two nics. So it's basically an alternative to
the classic forwarding path, that is faster. Packets kick in at the
Netfilter ingress hook (right at the same location as 'tc' ingress),
if there is a hit in the software flowtable, ttl gets decremented,
NATs are done and the packet is placed in the destination NIC via
neigh_xmit() - through the neighbour layer.

> Hardware acceleration requires hw-specific implementation and can
> offload device's CPU.
> 
> Of course handling network traffic out of the networking subsystem
> means some features like QoS / throughput limits / advanced firewall
> rules may not/won't work.
> 
> The hardest task (for both methods) was always a Linux kernel
> integration. Drivers had to somehow:
> 1) Get/build a table with rules for packets flow
> 2) Update in-kernel state to e.g. avoid connection timeout & its removal
> 
> The problem with all existing implementations was they used various
> non-upstream patches for kernel integration. Some were less invasive,
> some a bit more. They weren't properly reviewed by kernel developers
> and usually were using hacks/solutions that couldn't be accepted.
> 
> The rescue to this was Pablo's work on offloading infrastructure. He
> worked on this hard by developing & sending his patchset for upstream
> kernel:
> [1] [PATCH RFC,WIP 0/5] Flow offload infrastructure
> [2] [PATCH nf-next RFC,v2 0/6] Flow offload infrastructure
> [3] [PATCH nf-next,v3 0/7] Flow offload infrastructure
> 
> The best news is that his final patchset version was accepted and sits
> now in the net-next [4] (and should become part of kernel 4.16).
> 
> Now, what does it mean for LEDE project:
> 1) There is upstream infrastructure that should be ready to use
> 2) It's based on & requires nftables
> 3) LEDE's firewall3 uses iptables (& friends) C API
> 4) There aren't any drivers for offloading hardware (switches?) yet

Yes, there is no drivers using the hardware offload infrastructure. So
the patch to add the ndo_flow_offload hook to struct net_device has
been kept back by now [1] until there is an initial driver client for
this. I'll be sending a new version for [1] asap. Will push it to a
branch in my nf-next.git tree [2] and will rebase it on top of my
master so people developing a driver that uses this doesn't need to
deal with this extra work.

[1] http://patchwork.ozlabs.org/patch/852537/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

> One thing I'm not sure about is if software accelerator is ready or not.
> Pablo is his e-mail wrote:
> > So far, this is a generic software flow table representation, that
> > matches basic flow table hardware semantics but that also provides a
> > software faster path. So you can use it to purely forward packets
> > between two nics even if they come with no hardware offload support.
>
> which could suggest software path is already there.

Yes, software acceleration is working in my testbed, other than that,
it's a bug that needs to be fixed ;-).

I'm still finishing the userspace bits for libnftnl and nft, to
provide the control plane to users to configure this. Will post this
patchset asap, so these userspace bits can follow their path to
upstream repositories.

> So there is my idea of what is needed by LEDE to get it working:
> 1) Rewrite firewall3 to use nftables

There's a tentative C API for nftables:

http://git.netfilter.org/nftables/tree/include/nftables/nftables.h
http://git.netfilter.org/nftables/tree/src/libnftables.c

There are plans to add an API to support batching too, ie. add several
rules into the kernel in one go - using the nftables transaction
infrastructure- this is almost done since it was part of original work
done by Eric Leblond.

I can see firewall3 builds strings that are passed to iptables/ipset,
this approach matches the existing C API that we're providing.

At this stage the high-level libnftables library is not yet exposed as
shared object, there is a static library under
src/.libs/l

Re: [LEDE-DEV] A state of network acceleration

2018-01-17 Thread Rafał Miłecki
On 17 January 2018 at 16:25, Rafał Miłecki  wrote:
> The problem with all existing implementations was they used various
> non-upstream patches for kernel integration. Some were less invasive,
> some a bit more. They weren't properly reviewed by kernel developers
> and usually were using hacks/solutions that couldn't be accepted.

If someone is interested in these existing implementations, there is a
list of these I'm aware of.

One of the earliest ones was Broadcom's CTF (Cut-Through Forwarding).
For BCM4706-based device it could bump NAT from 120 Mb/s to 850 Mb/s.
It was described by me in:
[1] Understanding/reimplementing forwarding acceleration used by Broadcom (ctf)
e-mail thread. It consisted of kernel modification (see ctf.diff in
above thread) and closed source ctf.ko.

Marvell announced their own "fastpath" implementation in 2014 in e-mail thread:
[2] Introducing "fastpath" - Kernel module for speeding up IP forwarding
They referenced a nice article on embedded.com [3].
AFAIU a year or two later they released it as the OpenFastPath project
[3] [4] under the BSD 3-clause license. Noone tried integrating it
with OpenWrt/LEDE AFAIK.

Finally there is Qualcomm's Shortcut Forwarding Engine. It's open
source and it's reported to increase NAT performance on AR9132-based
device from ~235 Mb/s to ~525 Mb/s. It's ported to LEDE as set of
patches to be applied on top of cloned repo [6] by a gwlim user.

[1] https://lists.openwrt.org/pipermail/openwrt-devel/2013-August/021112.html
[2] https://lists.openwrt.org/pipermail/openwrt-devel/2014-December/030179.html
[3] 
https://www.embedded.com/design/operating-systems/4403058/Accelerating-network-packet-processing-in-Linux
[4] https://openfastpath.org/
[5] https://github.com/MarvellEmbeddedProcessors/ofp-marvell
[6] https://github.com/gwlim/Fast-Path-LEDE-OpenWRT

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


[LEDE-DEV] A state of network acceleration

2018-01-17 Thread Rafał Miłecki
Getting better network performance (mostly for NAT) using some kind of
acceleration was always a hot topic and people are still
looking/asking for it. I'd like to write a short summary and share my
understanding of current state so that:
1) People can undesrtand it better
2) We can have some rough plan

First of all there are two possible ways of accelerating network
traffic: in software and in hardware. Software solution is independent
of architecture/device and is mostly just bypassing in-kernel packets
flow. It still uses device's CPU which can be a bottleneck. Various
software implementations are reported to be faster from 2x to 5x.
Hardware acceleration requires hw-specific implementation and can
offload device's CPU.

Of course handling network traffic out of the networking subsystem
means some features like QoS / throughput limits / advanced firewall
rules may not/won't work.

The hardest task (for both methods) was always a Linux kernel
integration. Drivers had to somehow:
1) Get/build a table with rules for packets flow
2) Update in-kernel state to e.g. avoid connection timeout & its removal

The problem with all existing implementations was they used various
non-upstream patches for kernel integration. Some were less invasive,
some a bit more. They weren't properly reviewed by kernel developers
and usually were using hacks/solutions that couldn't be accepted.

The rescue to this was Pablo's work on offloading infrastructure. He
worked on this hard by developing & sending his patchset for upstream
kernel:
[1] [PATCH RFC,WIP 0/5] Flow offload infrastructure
[2] [PATCH nf-next RFC,v2 0/6] Flow offload infrastructure
[3] [PATCH nf-next,v3 0/7] Flow offload infrastructure

The best news is that his final patchset version was accepted and sits
now in the net-next [4] (and should become part of kernel 4.16).

Now, what does it mean for LEDE project:
1) There is upstream infrastructure that should be ready to use
2) It's based on & requires nftables
3) LEDE's firewall3 uses iptables (& friends) C API
4) There aren't any drivers for offloading hardware (switches?) yet

One thing I'm not sure about is if software accelerator is ready or not.
Pablo is his e-mail wrote:
> So far, this is a generic software flow table representation, that
> matches basic flow table hardware semantics but that also provides a
> software faster path. So you can use it to purely forward packets
> between two nics even if they come with no hardware offload support.

which could suggest software path is already there.

So there is my idea of what is needed by LEDE to get it working:
1) Rewrite firewall3 to use nftables
2) Switch to kernel 4.16 or backport offloading to 4.14
3) Work on implementing/enabling software acceleration path

Let me know if above description makes sense to you or correct me if
you think I misunderstood something :)


[1] https://www.spinics.net/lists/netfilter-devel/msg50141.html
[2] https://www.spinics.net/lists/netfilter-devel/msg50555.html
[3] https://www.spinics.net/lists/netfilter-devel/msg50759.html
[4] https://www.spinics.net/lists/netfilter-devel/msg50973.html

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev