[Cerowrt-devel] SQM and PPPoE, more questions than answers...

2014-10-11 Thread Sebastian Moeller
Hi,

just to document my current understanding of using SQM on a router that also 
terminates a pppoe wan connection. We basically have two options either set up 
SQM on the real interface (let’s call it ge00 like cerowrt does) or on the 
associated pop device, pppoe-ge00. In theory both should produce the same 
results; in praxis current SQM has significant different results. Let me 
enumerate the main differences that show up when testing with netperf-wrapper’s 
RRUL test:

1) SQM on ge00 does not show a working egress classification in the RRUL test 
(no visible “banding”/stratification of the 4 different priority TCP flows), 
while SQM on pppoe-ge00 does show this stratification.

Now the reason for this is quite obvious once we take into account that 
on ge00 the kernel sees a packet that already contains a PPP header between 
ethernet and IP header and has a different ether_type field, and our diffserv 
filters currently ignore everything except straight ipv4 and ipv6 packets, so 
due to the unexpected/un-handled PPP header everything lands in the default 
priority class and hence no stratification. If we shape on pppoe-ge00 the 
kernel seems to do all processing before encapsulating the data with PP so all 
filters just work. In theory that should be relatively easy to fix (at least 
for the specific PPPoE case, I am unsure about a generic solution) by using 
offsets to try to access the TOS bits in PPP-packets. Also most likely we face 
the same issue in other encapsulations that pass through cerowrt to some degree 
(except most of those will use an outer IP header from where we can scratch 
DSCPs…, but I digress)

2) SQM on ge00 shows better latency under load (LUL), the LUL increases for 
~2*fq_codels target so 10ms, while SQM on pppeo-ge00 shows a LUL-increase 
(LULI) roughly twice as large or around 20ms.

I have no idea why that is, if anybody has an idea please chime in.

3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on ge00 (with 
ingress more or less identical between the two). Also 2) and 3) do not seem to 
be coupled, artificially reducing the egress rate on pppoe-ge00 to yield the 
same egress rate as seen on ge00 does not reduce the LULI to the ge00 typical 
10ms, but it stays at 20ms.

For this I also have no good hypothesis, any ideas?


So the current choice is either to accept a noticeable increase in LULI (but 
note some years ago even an average of 20ms most likely was rare in the real 
life) or a equally noticeable decrease in egress bandwidth… 

Best Regards
Sebastian

P.S.: It turns out, at least on my link, that for shaping on pppoe-ge00 the 
kernel does not account for any header automatically, so I need to specify a 
per-packet-overhead (PPOH) of 40 bytes (an an ADSL2+ link with ATM linklayer); 
when shaping on ge00 however (with the kernel still terminating the PPPoE link 
to my ISP) I only need to specify an PPOH of 26 as the kernel already adds the 
14 bytes for the ethernet header…

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2014-10-14 Thread Sebastian Moeller
Hi All,

some more testing:
On Oct 12, 2014, at 01:12 , Sebastian Moeller  wrote:

> Hi,
> 
> just to document my current understanding of using SQM on a router that also 
> terminates a pppoe wan connection. We basically have two options either set 
> up SQM on the real interface (let’s call it ge00 like cerowrt does) or on the 
> associated pop device, pppoe-ge00. In theory both should produce the same 
> results; in praxis current SQM has significant different results. Let me 
> enumerate the main differences that show up when testing with 
> netperf-wrapper’s RRUL test:
> 
> 1) SQM on ge00 does not show a working egress classification in the RRUL test 
> (no visible “banding”/stratification of the 4 different priority TCP flows), 
> while SQM on pppoe-ge00 does show this stratification.
> 
>   Now the reason for this is quite obvious once we take into account that 
> on ge00 the kernel sees a packet that already contains a PPP header between 
> ethernet and IP header and has a different ether_type field, and our diffserv 
> filters currently ignore everything except straight ipv4 and ipv6 packets, so 
> due to the unexpected/un-handled PPP header everything lands in the default 
> priority class and hence no stratification. If we shape on pppoe-ge00 the 
> kernel seems to do all processing before encapsulating the data with PP so 
> all filters just work. In theory that should be relatively easy to fix (at 
> least for the specific PPPoE case, I am unsure about a generic solution) by 
> using offsets to try to access the TOS bits in PPP-packets. Also most likely 
> we face the same issue in other encapsulations that pass through cerowrt to 
> some degree (except most of those will use an outer IP header from where we 
> can scratch DSCPs…, but I digress)

Usind tc filters u32 filter makes it possible to actually dive into 
PPPoE encapsulated ipv4 and ipv6 packets and perform classification on 
“pass-through” PPPoE packets (as encountered when starting SQM on ge00 instead 
of pppoe-ge00, if the latter actually handles the wan connection), so that one 
is solved (but see below).

> 
> 2) SQM on ge00 shows better latency under load (LUL), the LUL increases for 
> ~2*fq_codels target so 10ms, while SQM on pppeo-ge00 shows a LUL-increase 
> (LULI) roughly twice as large or around 20ms.
> 
>   I have no idea why that is, if anybody has an idea please chime in.

Once SQM on ge00 actually dives into the PPPoE packets and 
applies/tests u32 filters the LUL increases to be almost identical to 
pppoe-ge00’s if both ingress and egress classification are active and do work. 
So it looks like the u32 filters I naively set up are quite costly. Maybe there 
is a better way to set these up...

> 
> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on ge00 
> (with ingress more or less identical between the two). Also 2) and 3) do not 
> seem to be coupled, artificially reducing the egress rate on pppoe-ge00 to 
> yield the same egress rate as seen on ge00 does not reduce the LULI to the 
> ge00 typical 10ms, but it stays at 20ms.
> 
>   For this I also have no good hypothesis, any ideas?

With classification fixed the difference in egress rate shrinks to ~10% 
instead of 20, so this partly seems related to the classification issue as well.

> 
> 
> So the current choice is either to accept a noticeable increase in LULI (but 
> note some years ago even an average of 20ms most likely was rare in the real 
> life) or a equally noticeable decrease in egress bandwidth… 

I guess it is back to the drawing board to figure out how to speed up 
the classification… and then revisit the PPPoE question again…

Regards
Sebastian

> 
> Best Regards
>   Sebastian
> 
> P.S.: It turns out, at least on my link, that for shaping on pppoe-ge00 the 
> kernel does not account for any header automatically, so I need to specify a 
> per-packet-overhead (PPOH) of 40 bytes (an an ADSL2+ link with ATM 
> linklayer); when shaping on ge00 however (with the kernel still terminating 
> the PPPoE link to my ISP) I only need to specify an PPOH of 26 as the kernel 
> already adds the 14 bytes for the ethernet header…
> 

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2014-10-15 Thread Török Edwin
On 10/15/2014 03:03 AM, Sebastian Moeller wrote:
>   I guess it is back to the drawing board to figure out how to speed up 
> the classification… and then revisit the PPPoE question again…

FWIW I had to add this to /etc/config/network (done via luci actually):
option keepalive '500 30'

Otherwise it uses these default values from /etc/ppp/options, and then I hit: 
https://dev.openwrt.org/ticket/7793:
lcp-echo-failure 5
lcp-echo-interval 1

The symptomps are that if I start a large download after half a minute or so 
pppd complains that it didn't receive reply to 5 LCP echo packets and 
disconnects/reconnects.
Sounds like the LCP echo/reply packets should get prioritized, but I don't know 
if it is my router that is dropping them or my ISP.

When you tested PPPoE did you notice pppd dropping the connection and 
restarting, cause that would affect the timings for sure...

Best regards,
--Edwin


___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2014-10-15 Thread Sebastian Moeller
Hi Edwin,


On Oct 15, 2014, at 14:02 , Török Edwin  wrote:

> On 10/15/2014 03:03 AM, Sebastian Moeller wrote:
>>  I guess it is back to the drawing board to figure out how to speed up 
>> the classification… and then revisit the PPPoE question again…
> 
> FWIW I had to add this to /etc/config/network (done via luci actually):
> option keepalive '500 30'
> 
> Otherwise it uses these default values from /etc/ppp/options, and then I hit: 
> https://dev.openwrt.org/ticket/7793:
> lcp-echo-failure 5
> lcp-echo-interval 1
> 
> The symptomps are that if I start a large download after half a minute or so 
> pppd complains that it didn't receive reply to 5 LCP echo packets and 
> disconnects/reconnects.

I have not yet seen these in the logs, but I will keep my eyes open.

> Sounds like the LCP echo/reply packets should get prioritized, but I don't 
> know if it is my router that is dropping them or my ISP.

I think that is something we should be able to teach SQM (as long as 
the shaper is running on the lower ethernet interface and not the pppoe 
interface). 

> 
> When you tested PPPoE did you notice pppd dropping the connection and 
> restarting, cause that would affect the timings for sure…

Nope, what I see is simply more variance in bandwidth and latency 
numbers and a less step slope on a right shifted ICMP CDF… I assume that the 
disconnect reconnects should show up as periods without any data transfer…. 

Mmmh, I will try to put the PPP service packets into the highest priority class 
and see whether that changes things, as well as testing your PPP options.

Thanks for your help

Sebastian

> 
> Best regards,
> --Edwin
> 
> 
> ___
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2014-10-15 Thread Dave Taht
hmm. The pppoe LLC packets are sparse and should already be optimized
by fq_codel, but I guess I'll go look at the construction of those
headers. Perhaps they need to be decoded better in the flow_dissector
code?

I also made some comments re the recent openwrt pull request.

https://github.com/dtaht/ceropackages-3.10/commit/b9e3bafdabb3c5aa47f8f63eae2ecfe34c361855

SQM need not require the advanced qdiscs package, if it checks for
availability of the other qdiscs, and even then nobody's proposed
putting the new nfq_codel stuff into openwrt - as it's still rather
inadaquately tested, and it's my hope that cake simplifies matters
significantly when it's baked. I already have patches for sqm for it,
but it's just not baked enough...

Also I think exploring policing at higher ingres bandwidths is warrented...

On Wed, Oct 15, 2014 at 6:39 AM, Sebastian Moeller  wrote:
> Hi Edwin,
>
>
> On Oct 15, 2014, at 14:02 , Török Edwin  wrote:
>
>> On 10/15/2014 03:03 AM, Sebastian Moeller wrote:
>>>  I guess it is back to the drawing board to figure out how to speed up 
>>> the classification… and then revisit the PPPoE question again…
>>
>> FWIW I had to add this to /etc/config/network (done via luci actually):
>> option keepalive '500 30'
>>
>> Otherwise it uses these default values from /etc/ppp/options, and then I 
>> hit: https://dev.openwrt.org/ticket/7793:
>> lcp-echo-failure 5
>> lcp-echo-interval 1
>>
>> The symptomps are that if I start a large download after half a minute or so 
>> pppd complains that it didn't receive reply to 5 LCP echo packets and 
>> disconnects/reconnects.
>
> I have not yet seen these in the logs, but I will keep my eyes open.
>
>> Sounds like the LCP echo/reply packets should get prioritized, but I don't 
>> know if it is my router that is dropping them or my ISP.
>
> I think that is something we should be able to teach SQM (as long as 
> the shaper is running on the lower ethernet interface and not the pppoe 
> interface).
>
>>
>> When you tested PPPoE did you notice pppd dropping the connection and 
>> restarting, cause that would affect the timings for sure…
>
> Nope, what I see is simply more variance in bandwidth and latency 
> numbers and a less step slope on a right shifted ICMP CDF… I assume that the 
> disconnect reconnects should show up as periods without any data transfer….
>
> Mmmh, I will try to put the PPP service packets into the highest priority 
> class and see whether that changes things, as well as testing your PPP 
> options.
>
> Thanks for your help
>
> Sebastian
>
>>
>> Best regards,
>> --Edwin
>>
>>
>> ___
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>
> ___
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel



-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2014-10-15 Thread Sebastian Moeller
Hi Dave,

On Oct 15, 2014, at 19:28 , Dave Taht  wrote:

> hmm. The pppoe LLC packets are sparse and should already be optimized
> by fq_codel, but I guess I'll go look at the construction of those
> headers. Perhaps they need to be decoded better in the flow_dissector
> code?

So when shaping on pppoe-ge00 one does not see the LLC packets at all 
(tested with tcpdump -i pppoe-ge00), since they are added after the shaping. 
(tcpdump -i ge00 does see th dllc packets) I have no idea whether pppd issues 
these with higher priority or not.


> 
> I also made some comments re the recent openwrt pull request.
> 
> https://github.com/dtaht/ceropackages-3.10/commit/b9e3bafdabb3c5aa47f8f63eae2ecfe34c361855
> 
> SQM need not require the advanced qdiscs package, if it checks for
> availability of the other qdiscs,

Well, but how to do this, I know of no safe way, except testing 
availability of modules for a known set of qdiscs, but what if the qdiscs are 
built into a monolithic kernel? Does anyone here have a good idea of how to 
detect all qdiscs available to the running kernel?

Best Regards
Sebastian

> and even then nobody's proposed
> putting the new nfq_codel stuff into openwrt - as it's still rather
> inadaquately tested, and it's my hope that cake simplifies matters
> significantly when it's baked. I already have patches for sqm for it,
> but it's just not baked enough...
> 
> Also I think exploring policing at higher ingres bandwidths is warrented…
> 
> On Wed, Oct 15, 2014 at 6:39 AM, Sebastian Moeller  wrote:
>> Hi Edwin,
>> 
>> 
>> On Oct 15, 2014, at 14:02 , Török Edwin  wrote:
>> 
>>> On 10/15/2014 03:03 AM, Sebastian Moeller wrote:
 I guess it is back to the drawing board to figure out how to speed up 
 the classification… and then revisit the PPPoE question again…
>>> 
>>> FWIW I had to add this to /etc/config/network (done via luci actually):
>>> option keepalive '500 30'
>>> 
>>> Otherwise it uses these default values from /etc/ppp/options, and then I 
>>> hit: https://dev.openwrt.org/ticket/7793:
>>> lcp-echo-failure 5
>>> lcp-echo-interval 1
>>> 
>>> The symptomps are that if I start a large download after half a minute or 
>>> so pppd complains that it didn't receive reply to 5 LCP echo packets and 
>>> disconnects/reconnects.
>> 
>>I have not yet seen these in the logs, but I will keep my eyes open.
>> 
>>> Sounds like the LCP echo/reply packets should get prioritized, but I don't 
>>> know if it is my router that is dropping them or my ISP.
>> 
>>I think that is something we should be able to teach SQM (as long as 
>> the shaper is running on the lower ethernet interface and not the pppoe 
>> interface).
>> 
>>> 
>>> When you tested PPPoE did you notice pppd dropping the connection and 
>>> restarting, cause that would affect the timings for sure…
>> 
>>Nope, what I see is simply more variance in bandwidth and latency 
>> numbers and a less step slope on a right shifted ICMP CDF… I assume that the 
>> disconnect reconnects should show up as periods without any data transfer….
>> 
>> Mmmh, I will try to put the PPP service packets into the highest priority 
>> class and see whether that changes things, as well as testing your PPP 
>> options.
>> 
>> Thanks for your help
>> 
>>Sebastian
>> 
>>> 
>>> Best regards,
>>> --Edwin
>>> 
>>> 
>>> ___
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> 
>> ___
>> Cerowrt-devel mailing list
>> Cerowrt-devel@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> 
> 
> 
> -- 
> Dave Täht
> 
> thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-18 Thread Alan Jenkins

Hi Seb

I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On 
Barrier Breaker + sqm-scripts).  Maybe this is going back a bit & no 
longer interesting to read.  But it seemed suspicious & interesting 
enough that I wanted to test it.


My conclusion was 1) I should stick with pppoe-wan, 2) the question 
really means do you want to disable classification 3) I personally want 
to preserve the upload bandwidth and accept slightly higher latency.



On 15/10/14 01:03, Sebastian Moeller wrote:

Hi All,

some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller
 wrote:



1) SQM on ge00 does not show a working egress classification in the
RRUL test (no visible “banding”/stratification of the 4 different
priority TCP flows), while SQM on pppoe-ge00 does show this
stratification.



Usind tc filters u32 filter makes it possible to actually dive into
PPPoE encapsulated ipv4 and ipv6 packets and perform classification
on “pass-through” PPPoE packets (as encountered when starting SQM on
ge00 instead of pppoe-ge00, if the latter actually handles the wan
connection), so that one is solved (but see below).



2) SQM on ge00 shows better latency under load (LUL), the LUL
increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00
shows a LUL-increase (LULI) roughly twice as large or around 20ms.

I have no idea why that is, if anybody has an idea please chime
in.


I saw the same, though with higher difference for egress rate.  See 
first three files here:


https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

[netperf-wrapper noob puzzle: most of the ping lines vanish part-way 
through.  Maybe I failed it somehow.]



Once SQM on ge00 actually dives into the PPPoE packets and
applies/tests u32 filters the LUL increases to be almost identical to
pppoe-ge00’s if both ingress and egress classification are active and
do work. So it looks like the u32 filters I naively set up are quite
costly. Maybe there is a better way to set these up...


Later you mentioned testing for coupling with egress rate.  But you 
didn't test coupling with classification!


I switched from simple.qos to simplest.qos, and that achieved the lower 
latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the 
real problem.


I did think ECN wouldn't be applied on eth1, and that would be the cause 
of the latency.  But disabling ECN didn't affect it.  See files 3 to 6:


https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

I also admit surprise at fq_codel working within 20%/10ms on eth1.  I 
thought it'd really hurt, by breaking the FQ part.  Now I guess it 
doesn't.  I still wonder about ECN marking, though I didn't check my 
endpoint is using ECN.




3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
ge00 (with ingress more or less identical between the two). Also 2)
and 3) do not seem to be coupled, artificially reducing the egress
rate on pppoe-ge00 to yield the same egress rate as seen on ge00
does not reduce the LULI to the ge00 typical 10ms, but it stays at
20ms.

For this I also have no good hypothesis, any ideas?


With classification fixed the difference in egress rate shrinks to
~10% instead of 20, so this partly seems related to the
classification issue as well.


My tests look like simplest.qos gives a lower egress rate, but not as 
low as eth1.  (Like 20% vs 40%).  So that's also similar.



So the current choice is either to accept a noticeable increase in
LULI (but note some years ago even an average of 20ms most likely
was rare in the real life) or a equally noticeable decrease in
egress bandwidth…


I guess it is back to the drawing board to figure out how to speed up
the classification… and then revisit the PPPoE question again…


so maybe the question is actually classification v.s. not?

 + IMO slow asymmetric links don't want to lose more upload bandwidth 
than necessary.  And I'm losing a *lot* in this test.
 + As you say, having only 20ms excess would still be a big 
improvement.  We could ignore the bait of 10ms right now.


vs

 - lowest latency I've seen testing my link. almost suspicious. looks 
close to 10ms average, when the dsl rate puts a lower bound of 7ms on 
the average.
 - fq_codel honestly works miracles already. classification is the knob 
people had to use previously, who had enough time to twiddle it.
 - on netperf-runner plots the "banding" doesn't look brilliant on slow 
links anyway




Regards Sebastian



Best Regards Sebastian

P.S.: It turns out, at least on my link, that for shaping on
pppoe-ge00 the kernel does not account for any header
automatically, so I need to specify a per-packet-overhead (PPOH) of
40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
ge00 however (with the kernel still terminating the PPPoE link to
my ISP) I only need to specify an PPOH of 26 as the kernel already
adds the 14 bytes for the ethernet header…

___
Cerowrt-dev

Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-18 Thread David Lang

On Wed, 18 Mar 2015, Alan Jenkins wrote:


Once SQM on ge00 actually dives into the PPPoE packets and
applies/tests u32 filters the LUL increases to be almost identical to
pppoe-ge00’s if both ingress and egress classification are active and
do work. So it looks like the u32 filters I naively set up are quite
costly. Maybe there is a better way to set these up...


Later you mentioned testing for coupling with egress rate.  But you didn't 
test coupling with classification!


I switched from simple.qos to simplest.qos, and that achieved the lower 
latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the real 
problem.


I did think ECN wouldn't be applied on eth1, and that would be the cause of 
the latency.  But disabling ECN didn't affect it.  See files 3 to 6:


https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

I also admit surprise at fq_codel working within 20%/10ms on eth1.  I thought 
it'd really hurt, by breaking the FQ part.  Now I guess it doesn't.  I still 
wonder about ECN marking, though I didn't check my endpoint is using ECN.


ECN should never increase latency, if it has any effect it should improve 
latency because you slow down sending packets when some hop along the path is 
overloaded rather than sending the packets anyway and having them sit in a 
buffer for a while. This doesn't decrease actual throughput either (although if 
you are doing a test that doesn't actually wait for all the packets to arrive at 
the far end, it will look like it decreases throughput)




3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
ge00 (with ingress more or less identical between the two). Also 2)
and 3) do not seem to be coupled, artificially reducing the egress
rate on pppoe-ge00 to yield the same egress rate as seen on ge00
does not reduce the LULI to the ge00 typical 10ms, but it stays at
20ms.

For this I also have no good hypothesis, any ideas?


With classification fixed the difference in egress rate shrinks to
~10% instead of 20, so this partly seems related to the
classification issue as well.


My tests look like simplest.qos gives a lower egress rate, but not as low as 
eth1.  (Like 20% vs 40%).  So that's also similar.



So the current choice is either to accept a noticeable increase in
LULI (but note some years ago even an average of 20ms most likely
was rare in the real life) or a equally noticeable decrease in
egress bandwidth…


I guess it is back to the drawing board to figure out how to speed up
the classification… and then revisit the PPPoE question again…


so maybe the question is actually classification v.s. not?

+ IMO slow asymmetric links don't want to lose more upload bandwidth than 
necessary.  And I'm losing a *lot* in this test.
+ As you say, having only 20ms excess would still be a big improvement.  We 
could ignore the bait of 10ms right now.


vs

- lowest latency I've seen testing my link. almost suspicious. looks close 
to 10ms average, when the dsl rate puts a lower bound of 7ms on the average.
- fq_codel honestly works miracles already. classification is the knob 
people had to use previously, who had enough time to twiddle it.


That's what most people find when they try it. Classification doesn't result in 
throughput vs latency tradeoffs as much as it gives absolute priority to some 
types of traffic. But unless you are really up against your bandwidth limit, 
this seldom matters in the real world. As long as latency is kept low, 
everything works so you don't need to give VoIP priority over other traffic or 
things like that.


David Lang___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-18 Thread Dave Taht
On Wed, Mar 18, 2015 at 7:43 PM, David Lang  wrote:
> On Wed, 18 Mar 2015, Alan Jenkins wrote:
>
>>> Once SQM on ge00 actually dives into the PPPoE packets and
>>> applies/tests u32 filters the LUL increases to be almost identical to
>>> pppoe-ge00’s if both ingress and egress classification are active and
>>> do work. So it looks like the u32 filters I naively set up are quite
>>> costly. Maybe there is a better way to set these up...
>>
>>
>> Later you mentioned testing for coupling with egress rate.  But you didn't
>> test coupling with classification!
>>
>> I switched from simple.qos to simplest.qos, and that achieved the lower
>> latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the
>> real problem.
>>
>> I did think ECN wouldn't be applied on eth1, and that would be the cause
>> of the latency.  But disabling ECN didn't affect it.  See files 3 to 6:
>>
>> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
>>
>> I also admit surprise at fq_codel working within 20%/10ms on eth1.  I
>> thought it'd really hurt, by breaking the FQ part.  Now I guess it doesn't.
>> I still wonder about ECN marking, though I didn't check my endpoint is using
>> ECN.
>
>
> ECN should never increase latency, if it has any effect it should improve
> latency because you slow down sending packets when some hop along the path
> is overloaded rather than sending the packets anyway and having them sit in
> a buffer for a while. This doesn't decrease actual throughput either
> (although if you are doing a test that doesn't actually wait for all the
> packets to arrive at the far end, it will look like it decreases throughput)

ECN does, provably, increase latency (and loss) for other non-ecn marked flows.

Not by a lot, but it does. In the case of a malignantly mis-marked
flow, the present
codel aqm algorithm does pretty bad things to itself and to other
non-ecn marked packets.

(have fixes for codel, but fq_codel doesn't have this problem, pie
somewhat has it)



 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
 ge00 (with ingress more or less identical between the two). Also 2)
 and 3) do not seem to be coupled, artificially reducing the egress
 rate on pppoe-ge00 to yield the same egress rate as seen on ge00
 does not reduce the LULI to the ge00 typical 10ms, but it stays at
 20ms.

 For this I also have no good hypothesis, any ideas?
>>>
>>>
>>> With classification fixed the difference in egress rate shrinks to
>>> ~10% instead of 20, so this partly seems related to the
>>> classification issue as well.

One of the things we really have to get around to doing is more high
rate testing,
and actually measuring how much latency the tcp flows are experiencing.

>>
>> My tests look like simplest.qos gives a lower egress rate, but not as low
>> as eth1.  (Like 20% vs 40%).  So that's also similar.
>>
 So the current choice is either to accept a noticeable increase in
 LULI (but note some years ago even an average of 20ms most likely
 was rare in the real life) or a equally noticeable decrease in
 egress bandwidth…
>>>
>>>
>>> I guess it is back to the drawing board to figure out how to speed up
>>> the classification… and then revisit the PPPoE question again…
>>
>>
>> so maybe the question is actually classification v.s. not?
>>
>> + IMO slow asymmetric links don't want to lose more upload bandwidth than
>> necessary.  And I'm losing a *lot* in this test.
>> + As you say, having only 20ms excess would still be a big improvement.
>> We could ignore the bait of 10ms right now.
>>
>> vs
>>
>> - lowest latency I've seen testing my link. almost suspicious. looks close
>> to 10ms average, when the dsl rate puts a lower bound of 7ms on the average.
>> - fq_codel honestly works miracles already. classification is the knob
>> people had to use previously, who had enough time to twiddle it.
>
>
> That's what most people find when they try it. Classification doesn't result
> in throughput vs latency tradeoffs as much as it gives absolute priority to
> some types of traffic. But unless you are really up against your bandwidth
> limit, this seldom matters in the real world. As long as latency is kept
> low, everything works so you don't need to give VoIP priority over other
> traffic or things like that.

+10.

>
> David Lang
> ___
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Sebastian Moeller
Hi Alan,


On Mar 18, 2015, at 23:14 , Alan Jenkins  
wrote:

> Hi Seb
> 
> I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On Barrier 
> Breaker + sqm-scripts).  Maybe this is going back a bit & no longer 
> interesting to read.  But it seemed suspicious & interesting enough that I 
> wanted to test it.
> 
> My conclusion was 1) I should stick with pppoe-wan,

Not a bad decision, especially given the recent changes to SQM to make 
it survive transient pppoe-interface disappearances. Before those changes the 
beauty of shaping on the ethernet device was that pppoe could come and go, but 
SQM stayed active and working. But due to your help this problem seems fixed 
now.

> 2) the question really means do you want to disable classification
> 3) I personally want to preserve the upload bandwidth and accept slightly 
> higher latency.

My question still is, is the bandwidth sacrifice really necessary or is 
this test just showing a corner case in simple.qos that can be fixed. I 
currently lack enough time to tackle this effectively.

> 
> 
> On 15/10/14 01:03, Sebastian Moeller wrote:
>> Hi All,
>> 
>> some more testing: On Oct 12, 2014, at 01:12 , Sebastian Moeller
>>  wrote:
> 
>>> 1) SQM on ge00 does not show a working egress classification in the
>>> RRUL test (no visible “banding”/stratification of the 4 different
>>> priority TCP flows), while SQM on pppoe-ge00 does show this
>>> stratification.
> 
>> Usind tc filters u32 filter makes it possible to actually dive into
>> PPPoE encapsulated ipv4 and ipv6 packets and perform classification
>> on “pass-through” PPPoE packets (as encountered when starting SQM on
>> ge00 instead of pppoe-ge00, if the latter actually handles the wan
>> connection), so that one is solved (but see below).
>> 
>>> 
>>> 2) SQM on ge00 shows better latency under load (LUL), the LUL
>>> increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00
>>> shows a LUL-increase (LULI) roughly twice as large or around 20ms.
>>> 
>>> I have no idea why that is, if anybody has an idea please chime
>>> in.
> 
> I saw the same, though with higher difference for egress rate.  See first 
> three files here:
> 
> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
> 
> [netperf-wrapper noob puzzle: most of the ping lines vanish part-way through. 
>  Maybe I failed it somehow.]

This is not your fault, the UDP probes net-perf wrapper uses do not 
accept packet loss, once a packet (I believe) is lost the stream stops. This is 
not ideal, but it gives a good quick indicator of packet loss for sparse 
streams ;)

> 
>> Once SQM on ge00 actually dives into the PPPoE packets and
>> applies/tests u32 filters the LUL increases to be almost identical to
>> pppoe-ge00’s if both ingress and egress classification are active and
>> do work. So it looks like the u32 filters I naively set up are quite
>> costly. Maybe there is a better way to set these up...
> 
> Later you mentioned testing for coupling with egress rate.  But you didn't 
> test coupling with classification!

True, I was interesting in getting the 3-tier shaper to behave sanely, 
so I did not look at the 1-tier simplest.qos.

> 
> I switched from simple.qos to simplest.qos, and that achieved the lower 
> latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the real 
> problem.

Erm, but simplest.qos is not using the relevant tc filters, so the 
these could still account for the issue; that or some loss due to the 3 htb 
shapers...
> 
> I did think ECN wouldn't be applied on eth1, and that would be the cause of 
> the latency.  But disabling ECN didn't affect it.  See files 3 to 6:
> 
> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

We typically only enable ECN on the downlink so far (under the 
assumption that this is a faster congestion signal to the receiver than 
dropping the packet and then having to wait for the next packet to create 
dupACKs; typically the router is close to the end-hosts and the packets already 
cleared the real bottleneck, so dropping them is not going to help the 
effective bandwidth use); on the uplink the reasoning reverses, here dropping 
instead of marking saves bandwidth for other packets (also often uplink 
bandwidth is more precious) and the packets basically just started their 
journey so the control loop still can take a long time to complete and other 
hops can drop the packet. (I guess my current link is fast enough to activate 
ECN on uplink as well to see how that behaves, so I will try that for a bit...)

> 
> I also admit surprise at fq_codel working within 20%/10ms on eth1.  I thought 
> it'd really hurt, by breaking the FQ part.  Now I guess it doesn't.  I still 
> wonder about ECN marking, though I didn't check my endpoint is using ECN.
> 
>>> 
>>> 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
>>> ge00 (with ingress more or less identical between t

Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Sebastian Moeller
Hi David,

On Mar 19, 2015, at 03:43 , David Lang  wrote:

> On Wed, 18 Mar 2015, Alan Jenkins wrote:
> 
>>> Once SQM on ge00 actually dives into the PPPoE packets and
>>> applies/tests u32 filters the LUL increases to be almost identical to
>>> pppoe-ge00’s if both ingress and egress classification are active and
>>> do work. So it looks like the u32 filters I naively set up are quite
>>> costly. Maybe there is a better way to set these up...
>> 
>> Later you mentioned testing for coupling with egress rate.  But you didn't 
>> test coupling with classification!
>> 
>> I switched from simple.qos to simplest.qos, and that achieved the lower 
>> latency on pppoe-wan.  So I think your naive u32 filter setup wasn't the 
>> real problem.
>> 
>> I did think ECN wouldn't be applied on eth1, and that would be the cause of 
>> the latency.  But disabling ECN didn't affect it.  See files 3 to 6:
>> 
>> https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0
>> 
>> I also admit surprise at fq_codel working within 20%/10ms on eth1.  I 
>> thought it'd really hurt, by breaking the FQ part.  Now I guess it doesn't.  
>> I still wonder about ECN marking, though I didn't check my endpoint is using 
>> ECN.
> 
> ECN should never increase latency, if it has any effect it should improve 
> latency because you slow down sending packets when some hop along the path is 
> overloaded rather than sending the packets anyway and having them sit in a 
> buffer for a while. This doesn't decrease actual throughput either (although 
> if you are doing a test that doesn't actually wait for all the packets to 
> arrive at the far end, it will look like it decreases throughput)
> 
 3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on
 ge00 (with ingress more or less identical between the two). Also 2)
 and 3) do not seem to be coupled, artificially reducing the egress
 rate on pppoe-ge00 to yield the same egress rate as seen on ge00
 does not reduce the LULI to the ge00 typical 10ms, but it stays at
 20ms.
 For this I also have no good hypothesis, any ideas?
>>> With classification fixed the difference in egress rate shrinks to
>>> ~10% instead of 20, so this partly seems related to the
>>> classification issue as well.
>> 
>> My tests look like simplest.qos gives a lower egress rate, but not as low as 
>> eth1.  (Like 20% vs 40%).  So that's also similar.
>> 
 So the current choice is either to accept a noticeable increase in
 LULI (but note some years ago even an average of 20ms most likely
 was rare in the real life) or a equally noticeable decrease in
 egress bandwidth…
>>> I guess it is back to the drawing board to figure out how to speed up
>>> the classification… and then revisit the PPPoE question again…
>> 
>> so maybe the question is actually classification v.s. not?
>> 
>> + IMO slow asymmetric links don't want to lose more upload bandwidth than 
>> necessary.  And I'm losing a *lot* in this test.
>> + As you say, having only 20ms excess would still be a big improvement.  We 
>> could ignore the bait of 10ms right now.
>> 
>> vs
>> 
>> - lowest latency I've seen testing my link. almost suspicious. looks close 
>> to 10ms average, when the dsl rate puts a lower bound of 7ms on the average.
>> - fq_codel honestly works miracles already. classification is the knob 
>> people had to use previously, who had enough time to twiddle it.
> 
> That's what most people find when they try it. Classification doesn't result 
> in throughput vs latency tradeoffs as much as it gives absolute priority to 
> some types of traffic. But unless you are really up against your bandwidth 
> limit, this seldom matters in the real world. As long as latency is kept low, 
> everything works so you don't need to give VoIP priority over other traffic 
> or things like that.

But note, not all traffic is equal ;) Take the example from the mail 
Alan was quoting from, shaping on an ethernet interface that handles pppoe 
traffic: the shaper sees all packets including the packets PPP uses to 
establish and maintain the link, I would argue that these actually need a 
guaranteed delivery as dropping them can take out the pop link and hence the 
internet connection. I admit it is rare for home users to actually encounter 
such drop-averse packets, but they at least justify the use of 
classification/priorities. Whether VoIP makes the cut really depends on its 
drop probability on each end link (I just want to note that commercial VoIP 
system at least use precedence and EF markings on their packets, so 
classification of these is a) easy and b) is actually performed by many ISP’s 
home router offerings for that ISP’s brand of VoIP)

Best Regards
Sebastian

> 
> David Lang

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Alan Jenkins

On 19/03/15 08:29, Sebastian Moeller wrote:

Hi Alan,


On Mar 18, 2015, at 23:14 , Alan Jenkins  
wrote:


Hi Seb

I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On Barrier Breaker + 
sqm-scripts).  Maybe this is going back a bit & no longer interesting to read.  But 
it seemed suspicious & interesting enough that I wanted to test it.

My conclusion was 1) I should stick with pppoe-wan,

Not a bad decision, especially given the recent changes to SQM to make 
it survive transient pppoe-interface disappearances. Before those changes the 
beauty of shaping on the ethernet device was that pppoe could come and go, but 
SQM stayed active and working. But due to your help this problem seems fixed 
now.

I'd say your help and my selfish prodding :).


2) the question really means do you want to disable classification
3) I personally want to preserve the upload bandwidth and accept slightly 
higher latency.

My question still is, is the bandwidth sacrifice really necessary or is 
this test just showing a corner case in simple.qos that can be fixed. I 
currently lack enough time to tackle this effectively.

Yep ok (no complaint).


[netperf-wrapper noob puzzle: most of the ping lines vanish part-way through.  
Maybe I failed it somehow.]

This is not your fault, the UDP probes net-perf wrapper uses do not 
accept packet loss, once a packet (I believe) is lost the stream stops. This is 
not ideal, but it gives a good quick indicator of packet loss for sparse 
streams ;)

Heh, thanks.


My tests look like simplest.qos gives a lower egress rate, but not as low as 
eth1.  (Like 20% vs 40%).  So that's also similar.


So the current choice is either to accept a noticeable increase in
LULI (but note some years ago even an average of 20ms most likely
was rare in the real life) or a equally noticeable decrease in
egress bandwidth…

I guess it is back to the drawing board to figure out how to speed up
the classification… and then revisit the PPPoE question again…

so maybe the question is actually classification v.s. not?

+ IMO slow asymmetric links don't want to lose more upload bandwidth than 
necessary.  And I'm losing a *lot* in this test.
+ As you say, having only 20ms excess would still be a big improvement.  We 
could ignore the bait of 10ms right now.

vs

- lowest latency I've seen testing my link. almost suspicious. looks close to 
10ms average, when the dsl rate puts a lower bound of 7ms on the average.

Curious: what is your link speed?


dsl sync 912k up
shaped at 850
fq_codel auto target says => 14.5ms <=

MTU time is
912kbps / (1500*8)b = 0.0132s
so if the link is filled with MTU packets, there's a hard 7ms lower 
bound, on average icmp ping increase v.s. an empty link

and the same logic says on achieving that average, you have >= 7ms jitter


(or 6.5ms, but since my download rate is about 10x better, 6.5 + 0.65 ~= 7).


- fq_codel honestly works miracles already. classification is the knob people 
had to use previously, who had enough time to twiddle it.
- on netperf-runner plots the "banding" doesn't look brilliant on slow links 
anyway

On slow links I always used to add “-s 0.8” with higher numbers the 
slower the link to increase the temporal averaging window, this reduces 
accuracy of the display for the downlink, but at least allows better 
understanding of the uplink. I always wanted to see whether I could treach 
netperf-wrapper to allow larger averaging windows after measurements, just for 
display purposes, but I am a total beginner with python...


P.S.: It turns out, at least on my link, that for shaping on
pppoe-ge00 the kernel does not account for any header
automatically, so I need to specify a per-packet-overhead (PPOH) of
40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
ge00 however (with the kernel still terminating the PPPoE link to
my ISP) I only need to specify an PPOH of 26 as the kernel already
adds the 14 bytes for the ethernet header…

Please disregard this part, I need to implement better tests for this 
instead on only relaying on netperf-wrapper results ;)
.  Apart from kernel code, I did wonder how this 
was tested :).


Thanks again
Alan
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Sebastian Moeller
HI Alan,

On Mar 19, 2015, at 10:42 , Alan Jenkins  
wrote:

> On 19/03/15 08:29, Sebastian Moeller wrote:
>> Hi Alan,
>> 
>> 
>> On Mar 18, 2015, at 23:14 , Alan Jenkins 
>>  wrote:
>> 
>>> Hi Seb
>>> 
>>> I tested shaping on eth1 vs pppoe-wan, as it applies to ADSL.  (On Barrier 
>>> Breaker + sqm-scripts).  Maybe this is going back a bit & no longer 
>>> interesting to read.  But it seemed suspicious & interesting enough that I 
>>> wanted to test it.
>>> 
>>> My conclusion was 1) I should stick with pppoe-wan,
>>  Not a bad decision, especially given the recent changes to SQM to make 
>> it survive transient pppoe-interface disappearances. Before those changes 
>> the beauty of shaping on the ethernet device was that pppoe could come and 
>> go, but SQM stayed active and working. But due to your help this problem 
>> seems fixed now.
> I'd say your help and my selfish prodding :).
> 
>>> 2) the question really means do you want to disable classification
>>> 3) I personally want to preserve the upload bandwidth and accept slightly 
>>> higher latency.
>>  My question still is, is the bandwidth sacrifice really necessary or is 
>> this test just showing a corner case in simple.qos that can be fixed. I 
>> currently lack enough time to tackle this effectively.
> Yep ok (no complaint).
> 
>>> [netperf-wrapper noob puzzle: most of the ping lines vanish part-way 
>>> through.  Maybe I failed it somehow.]
>>  This is not your fault, the UDP probes net-perf wrapper uses do not 
>> accept packet loss, once a packet (I believe) is lost the stream stops. This 
>> is not ideal, but it gives a good quick indicator of packet loss for sparse 
>> streams ;)
> Heh, thanks.
> 
>>> My tests look like simplest.qos gives a lower egress rate, but not as low 
>>> as eth1.  (Like 20% vs 40%).  So that's also similar.
>>> 
> So the current choice is either to accept a noticeable increase in
> LULI (but note some years ago even an average of 20ms most likely
> was rare in the real life) or a equally noticeable decrease in
> egress bandwidth…
 I guess it is back to the drawing board to figure out how to speed up
 the classification… and then revisit the PPPoE question again…
>>> so maybe the question is actually classification v.s. not?
>>> 
>>> + IMO slow asymmetric links don't want to lose more upload bandwidth than 
>>> necessary.  And I'm losing a *lot* in this test.
>>> + As you say, having only 20ms excess would still be a big improvement.  We 
>>> could ignore the bait of 10ms right now.
>>> 
>>> vs
>>> 
>>> - lowest latency I've seen testing my link. almost suspicious. looks close 
>>> to 10ms average, when the dsl rate puts a lower bound of 7ms on the average.
>>  Curious: what is your link speed?
> 
> dsl sync 912k up
> shaped at 850
> fq_codel auto target says => 14.5ms <=
> 
> MTU time is
> 912kbps / (1500*8)b = 0.0132s
> so if the link is filled with MTU packets, there's a hard 7ms lower bound, on 
> average icmp ping increase v.s. an empty link
> and the same logic says on achieving that average, you have >= 7ms jitter

Ah I see, 50% chance of getting the link immediately versus having to 
wait for a full packet transmit time.

> 
> 
> (or 6.5ms, but since my download rate is about 10x better, 6.5 + 0.65 ~= 7).
> 
>>> - fq_codel honestly works miracles already. classification is the knob 
>>> people had to use previously, who had enough time to twiddle it.
>>> - on netperf-runner plots the "banding" doesn't look brilliant on slow 
>>> links anyway
>>  On slow links I always used to add “-s 0.8” with higher numbers the 
>> slower the link to increase the temporal averaging window, this reduces 
>> accuracy of the display for the downlink, but at least allows better 
>> understanding of the uplink. I always wanted to see whether I could treach 
>> netperf-wrapper to allow larger averaging windows after measurements, just 
>> for display purposes, but I am a total beginner with python...
>> 
> P.S.: It turns out, at least on my link, that for shaping on
> pppoe-ge00 the kernel does not account for any header
> automatically, so I need to specify a per-packet-overhead (PPOH) of
> 40 bytes (an an ADSL2+ link with ATM linklayer); when shaping on
> ge00 however (with the kernel still terminating the PPPoE link to
> my ISP) I only need to specify an PPOH of 26 as the kernel already
> adds the 14 bytes for the ethernet header…
>>  Please disregard this part, I need to implement better tests for this 
>> instead on only relaying on netperf-wrapper results ;)
> .  Apart from kernel code, I did wonder how this was 
> tested :).

Oh, quite roughly… at that time I was only limited by my DSLAM (now I 
have a lower throttle in the BRAS that is somewhat hard to measure), I realized 
I could get decent RRUL results with egress shaping at 100% if the 
encapsulation and per packet overhead was set correctly. Increasing the per 
packet overhead 

Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Alan Jenkins

Hi Seb, I have one last suspicion on this topic

On 19/03/15 08:29, Sebastian Moeller wrote:

My question still is, is the bandwidth sacrifice really necessary or is this 
test just showing a corner case in simple.qos that can be fixed. I currently 
lack enough time to tackle this effectively.



2) SQM on ge00 shows better latency under load (LUL), the LUL
increases for ~2*fq_codels target so 10ms, while SQM on pppeo-ge00
shows a LUL-increase (LULI) roughly twice as large or around 20ms.

I have no idea why that is, if anybody has an idea please chime
in.

I saw the same, though with higher difference for egress rate.  See first three 
files here:

https://www.dropbox.com/sh/shwz0l7j4syp2ea/AAAxrhDkJ3TTy_Mq5KiFF3u2a?dl=0

[netperf-wrapper noob puzzle: most of the ping lines vanish part-way through.  
Maybe I failed it somehow.]

This is not your fault, the UDP probes net-perf wrapper uses do not 
accept packet loss, once a packet (I believe) is lost the stream stops. This is 
not ideal, but it gives a good quick indicator of packet loss for sparse 
streams ;)
Thinking about this, I remembered the issue that sqm de-priotises ICMP 
ping.  (Back when I used betterspeedtest and netperf-runner, I did 
assume this would be an issue).


I also notice that my test with eth1 (disabling classification) is the 
only one where UDP ping (including UDP EF) is visible for any time at 
all.  (ok, pppoe-wan shows UDP BK, and it very clearly gets higher 
latency as I would expect).


So I don't know if your results were clearer, but the results I showed 
so far should be treated as a measurement problem.



Once SQM on ge00 actually dives into the PPPoE packets and
applies/tests u32 filters the LUL increases to be almost identical to
pppoe-ge00’s if both ingress and egress classification are active and
do work. So it looks like the u32 filters I naively set up are quite
costly. Maybe there is a better way to set these up...

Later you mentioned testing for coupling with egress rate.  But you didn't test 
coupling with classification!

True, I was interesting in getting the 3-tier shaper to behave sanely, 
so I did not look at the 1-tier simplest.qos.


I switched from simple.qos to simplest.qos, and that achieved the lower latency 
on pppoe-wan.  So I think your naive u32 filter setup wasn't the real problem.

Erm, but simplest.qos is not using the relevant tc filters, so the 
these could still account for the issue; that or some loss due to the 3 htb 
shapers...


___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Toke Høiland-Jørgensen
Alan Jenkins  writes:

> I also notice that my test with eth1 (disabling classification) is the
> only one where UDP ping (including UDP EF) is visible for any time at
> all. (ok, pppoe-wan shows UDP BK, and it very clearly gets higher
> latency as I would expect).

FYI the svn version of netperf has a feature to restart the UDP
measurement flows after a timeout. If you build that (don't forget the
--enable-demo switch to ./configure) and stick it in your $PATH,
netperf-wrapper should pick it up automatically and use the option. This
might get you better results on the UDP flows...

-Toke
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] SQM and PPPoE, more questions than answers...

2015-03-19 Thread Dave Taht
On Thu, Mar 19, 2015 at 6:59 AM, Toke Høiland-Jørgensen  wrote:
> Alan Jenkins  writes:
>
>> I also notice that my test with eth1 (disabling classification) is the
>> only one where UDP ping (including UDP EF) is visible for any time at
>> all. (ok, pppoe-wan shows UDP BK, and it very clearly gets higher
>> latency as I would expect).
>
> FYI the svn version of netperf has a feature to restart the UDP
> measurement flows after a timeout. If you build that (don't forget the
> --enable-demo switch to ./configure) and stick it in your $PATH,
> netperf-wrapper should pick it up automatically and use the option. This
> might get you better results on the UDP flows...

I note that time of first loss is a valuable statistic in itself, and
I would like
to see it called out more on the measurement flows on the graph.

> -Toke
> ___
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel