Hi Fred, On Dec 28, 2013, at 21:09 , Fred Stratton <[email protected]> wrote:
> > On 28/12/13 19:54, Sebastian Moeller wrote: >> Hi Fred, >> >> >> On Dec 28, 2013, at 15:27 , Fred Stratton <[email protected]> wrote: >> >>> On 28/12/13 13:42, Sebastian Moeller wrote: >>>> Hi Fred, >>>> >>>> >>>> On Dec 28, 2013, at 12:09 , Fred Stratton >>>> <[email protected]> >>>> wrote: >>>> >>>> >>>>> IThe UK consensus fudge factor has always been 85 per cent of the rate >>>>> achieved, not 95 or 99 per cent. >>>>> >>>> I know that the recommendations have been lower in the past; I think >>>> this is partly because before Jesper Brouer's and Russels Stuart's work to >>>> properly account for ATM "quantization" people typically had to deal with >>>> a ~10% rate tax for the 5byte per cell overhead (48 byte payload in 53 >>>> byte cells 90.57% useable rate) plus an additional 5% to stochastically >>>> account for the padding of the last cell and the per packet overhead both >>>> of which affect the effective good put way more for small than large >>>> packets, so the 85% never worked well for all packet sizes. My hypothesis >>>> now is since we can and do properly account for these effects of ATM >>>> framing we can afford to start with a fudge factor of 90% or even 95% >>>> percent. As far as I know the recommended fudge factors are never ever >>>> explained by more than "this works empirically"... >>> The fudge factors are totally empirical. IF you are proposing a more formal >>> approach, I shall try a 90 per cent fudge factor, although 'current rate' >>> varies here. >> My hypothesis is that we can get away with less fudge as we have a >> better handle on the actual wire size. Personally, I do start at 95% to >> figure out the trade-off between bandwidth loss and latency increase. > > You are now saying something slightly different. You are implying now that > you are starting at 95 per cent, and then reducing the nominal download speed > until you achieve an unspecified endpoint. So I typically start with 95%, run RRUL and look at the ping latency increase under load. I try to go as high with the bandwidth as I can and still keep the latency increase close to 10ms (the default fq_codel target of 5ms will allow RTT increases of 5ms in both directions so it adds up to 10). The last time I tried this I ended up at 97% of link rate. >> >>>>> Devices express 2 values: the sync rate - or 'maximum rate attainable' - >>>>> and the dynamic value of 'current rate'. >>>>> >>>> The actual data rate is the relevant information for shaping, often DSL >>>> modems report the link capacity as "maximum rate attainable" or some such, >>>> while the actual bandwidth is limited to a rate below what the line would >>>> support by contract (often this bandwidth reduction is performed on the >>>> PPPoE link to the BRAS). >>>> >>>> >>>>> As the sync rate is fairly stable for any given installation - ADSL or >>>>> Fibre - this could be used as a starting value. decremented by the >>>>> traditional 15 per cent of 'overhead'. and the 85 per cent fudge factor >>>>> applied to that. >>>>> >>>> I would like to propose to use the "current rate" as starting point, as >>>> 'maximum rate attainable' >= 'current rate'. >>> 'current rate' is still a sync rate, and so is conventionally viewed as 15 >>> per cent above the unmeasurable actual rate. >> No no, the current rate really is the current link capacity between >> modem and DSLAM (or CPE and CTS), only this rate typically is for the raw >> ATM stream, so we have to subtract all the additional layers until we reach >> the IP layer... > > You are saying the same thing as I am. I guess the point I want to make is that we are able to measure the unmeasurable actual rate, that is what the link layer adaptation does for us, if configured properly :) Best Regards Sebastian >> >>> As you are proposing a new approach, I shall take 90 per cent of 'current >>> rate' as a starting point. >> I would love to learn how that works put for you. Because for all my >> theories about why 85% was used, the proof still is in the (plum-) pudding... >> >>> No one in the UK uses SRA currently. One small ISP used to. >> That is sad, because on paper SRA looks like a good feature to have >> (lower bandwidth sure beats synchronization loss). >> >>> The ISP I currently use has Dynamic Line Management, which changes target >>> SNR constantly. >> Now that is much better, as we should neuter notice nor care; I assume >> that this happens on layers below ATM even. > > >> >>> The DSLAM is made by Infineon. >>> >>> >>>>> Fibre - FTTC - connections can suffer quite large download speed >>>>> fluctuations over the 200 - 500 metre link to the MSAN. This phenomenon >>>>> is not confined to ADSL links. >>>>> >>>> On the actual xDSL link? As far as I know no telco actually uses SRA >>>> (seamless rate adaptation or so) so the current link speed will only get >>>> lower not higher, so I would expect a relative stable current rate (it >>>> might take a while, a few days to actually slowly degrade to the highest >>>> link speed supported under all conditions, but I hope you still get my >>>> point) >>> I understand the point, but do not think it is the case, from data I have >>> seen, but cannot find now, unfortunately. >> I see, maybe my assumption here is wrong, I would love to see data >> though before changing my hypothesis. >> >>>>> An alternative speed test is something like this >>>>> >>>>> >>>>> http://download.bethere.co.uk/downloadMeter.html >>>>> >>>>> >>>>> which, as Be has been bought by Sky, may not exist after the end of April >>>>> 2014. >>>>> >>>> But, if we recommend to run speed tests we really need to advise our >>>> users to start several concurrent up- and downloads to independent servers >>>> to actually measure the bandwidth of our bottleneck link; often a single >>>> server connection will not saturate a link (I seem to recall that with TCP >>>> it is guaranteed to only reach 75% or so averaged over time, is that >>>> correct?). >>>> But I think this is not the proper way to set the bandwidth for the >>>> shaper, because upstream of our link to the ISP we have no guaranteed >>>> bandwidth at all and just can hope the ISP is oing the right thing >>>> AQM-wise. >>>> >>> I quote the Be site as an alternative to a java based approach. I would be >>> very happy to see your suggestion adopted. >>>> >>>> >>>>> • [What is the proper description here?] If you use PPPoE (but not over >>>>> ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should >>>>> choose [what?] and set the Per-packet Overhead to [what?] >>>>> >>>>> For a PPPoA service, the PPPoA link is treated as PPPoE on the second >>>>> device, here running ceroWRT. >>>>> >>>> This still means you should specify the PPPoA overhead, not PPPoE. >>> I shall try the PPPoA overhead. >> Great, let me know how that works. >> >>>>> The packet overhead values are written in the dubious man page for >>>>> tc_stab. >>>>> >>>> The only real flaw in that man page, as far as I know, is the fact that >>>> it indicates that the kernel will account for the 18byte ethernet header >>>> automatically, while the kernel does no such thing (which I hope to >>>> change). >>> It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no >>> reference anywhere to the last. I do not see its relevance. >> If you have a look inside the source code for tc and the kernel, you >> will notice that atm and adel are aliases for the same thing. I just think >> that we should keep naming the thing ATM since that is the problematic layer >> in the stack that causes most of the useable link rate judgements, adel just >> happens to use ATM exclusively. > > I have reviewed the source. I see what you mean. >> >>>>> Sebastian has a potential alternative method of formal calculation. >>>>> >>>> So, I have no formal calculation method available, but an empirical way >>>> of detecting ATM quantization as well as measuring the per packet overhead >>>> of an ATM link. >>>> The idea is to measure the RTT of ICMP packets of increasing length and >>>> then displaying the distribution of RTTs by ICMP packet length, on an ATM >>>> carrier we expect to see a step function with steps 48 bytes apart. For >>>> non-ATM carrier we expect to rather see a smooth ramp. By comparing the >>>> residuals of a linear fit of the data with the residuals of the best step >>>> function fit to the data. The fit with the lower residuals "wins". >>>> Attached you will find an example of this approach, ping data in red >>>> (median of NNN repetitions for each ICMP packet size), linear fit in blue, >>>> and best staircase fit in green. You notice that data starts somewhere in >>>> a 48 byte ATM cell. Since the ATM encapsulation overhead is maximally 44 >>>> bytes and we know the IP and ICMP overhead of the ping probe we can >>>> calculate the overhead preceding the IP header, which is what needs to be >>>> put in the overhead field in the GUI. (Note where the green line intersect >>>> the y-axis at 0 bytes packet size? this is where the IP hea >>>> der starts, the "missing" part of this ATM cell is the overhead). >>>> >>> You are curve fitting. This is calculation. >> I see, that is certainly a valid way to look at it, just one that had >> not occurred to me. >> >>>> >>>> >>>> >>>> >>>> >>>> Believe it or not, this methods works reasonable well (I tested >>>> successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead >>>> 32 bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ >>>> 3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to >>>> measure the ping train especially at the higher rates… and it requires >>>> ping time stamps with decent resolution (which rules out windows) and my >>>> naive data acquisition scripts creates really large raw data files. I >>>> guess I should post the code somewhere so others can test and improve it. >>>> Fred I would be delighted to get a data set from your connection, to >>>> test a known different encapsulation. >>>> >>> I shall try this. If successful, I shall initially pass you the raw data. >> Great, but be warned this will be hundreds of megabytes. (For >> production use the measurement script would need to prune the generated log >> file down to the essential values… and potentially store the data in binary) >> >>> I have not used MatLab since the 1980s. >> Lucky you, I sort of have to use matlab in my day job and hence are >> most "fluent" in matlabese, but the code should also work with octave (I >> tested version 3.6.4) so it should be relatively easy to run the analysis >> yourself. That said, I would love to get a copy of the ping sweep :) >> >>>>> TYPICAL OVERHEADS >>>>> The following values are typical for different adsl scenarios >>>>> (based on >>>>> [1] and [2]): >>>>> >>>>> LLC based: >>>>> PPPoA - 14 (PPP - 2, ATM - 12) >>>>> PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - >>>>> 4+padding) >>>>> Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding) >>>>> IPoA - 16 (ATM - 16) >>>>> >>>>> VC Mux based: >>>>> PPPoA - 10 (PPP - 2, ATM - 8) >>>>> PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - >>>>> 4+padding) >>>>> Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding) >>>>> IPoA - 8 (ATM - 8) >>>>> >>>>> >>>>> For VC Mux based PPPoA, I am currently using an overhead of 18 for the >>>>> PPPoE setting in ceroWRT. >>>>> >>>> Yeah we could put this list into the wiki, but how shall a typical user >>>> figure out which encapsulation is used? And good luck in figuring out >>>> whether the frame check sequence (FCS) is included or not… >>>> BTW 18, I predict that if PPPoE is only used between cerowrt and the >>>> "modem' or gateway your effective overhead should be 10 bytes; I would >>>> love if you could run the following against your link at night (also >>>> attached >>>> >>>> >>>> >>>> ): >>>> >>>> #! /bin/bash >>>> # TODO use seq or bash to generate a list of the requested sizes (to allow >>>> for non-equidistantly spaced sizes) >>>> >>>> #. >>>> TECH=ADSL2 # just to give some meaning to the ping trace file name >>>> # finding a proper target IP is somewhat of an art, just traceroute a >>>> remote site. >>>> # and find the nearest host reliably responding to pings showing the >>>> smallet variation of pingtimes >>>> TARGET=${1} # the IP against which to run the ICMP pings >>>> DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records >>>> LOG=ping_sweep_${TECH}_${DATESTR}.txt >>>> >>>> >>>> # by default non-root ping will only end one packet per second, so work >>>> around that by calling ping independently for each package >>>> # empirically figure out the shortest period still giving the standard >>>> ping time (to avoid being slow-pathed by our target) >>>> PINGPERIOD=0.01><------># in seconds >>>> PINGSPERSIZE=10000 >>>> >>>> # Start, needed to find the per packet overhead dependent on the ATM >>>> encapsulation >>>> # to reiably show ATM quantization one would like to see at least two >>>> steps, so cover a range > 2 ATM cells (so > 96 bytes) >>>> SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of >>>> payload to include a timestamp... >>>> SWEEPMAXSIZE=116 >>>> >>>> n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}` >>>> >>>> i_sweep=0 >>>> i_size=0 >>>> >>>> echo "Running ICMP RTT measurement against: ${TARGET}" >>>> while [ ${i_sweep} -lt ${PINGSPERSIZE} ] >>>> do >>>> (( i_sweep++ )) >>>> echo "Current iteration: ${i_sweep}" >>>> # now loop from sweepmin to sweepmax >>>> i_size=${SWEEPMINSIZE} >>>> while [ ${i_size} -le ${SWEEPMAXSIZE} ] >>>> do >>>> echo "${i_sweep}. repetition of ping size ${i_size}" >>>> ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\ >>>> (( i_size++ )) >>>> # we need a sleep binary that allows non integer times (GNU sleep is >>>> fine as is sleep of macosx 10.8.4) >>>> sleep ${PINGPERIOD} >>>> done >>>> done >>>> echo "Done... ($0)" >>>> >>>> >>>> This will try to run 10000 repetitions for ICMP packet sizes from 16 to >>>> 116 bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should >>>> be able to stop it with ctrl c if you are not patience enough, with your >>>> link I would estimate that 3000 should be plenty, but if you could run it >>>> over night that would be great and then ~3 hours should not matter much. >>>> And then run the following attached code in octave or matlab >>>> >>>> >>>> >>>> . Invoce with >>>> "tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')". >>>> The parser will run on the first invocation and is reallr really slow, >>>> but further invocations should be faster. If issues arise, let me know, I >>>> am happy to help. >>>> >>>> >>>>> Were I to use a single directly connected gateway, I would input a >>>>> suitable value for PPPoA in that openWRT firmware. >>>>> >>>> I think you should do that right now. >>> The firmware has not yet been released. >>>>> In theory, I might need to use a negative value, bmt the current kernel >>>>> does not support that. >>>>> >>>> If you use tc_stab, negative overheads are fully supported, only >>>> htb_private has overhead defined as unsigned integer and hence does not >>>> allow negative values. >>> Jesper Brouer posted about this. I thought he was referring to tc_stab. >> I recall having a discussion with Jesper about this topic, where he >> agreed that tc_stab was not affected, only htb_private. > Reading what was said on 23rd August, you corrected his error in > interpretation. > > >>>>> I have used many different arbitrary values for overhead. All appear to >>>>> have little effect. >>>>> >>>> So the issue here is that only at small packet sizes does the overhead >>>> and last cell padding eat a disproportionate amount of your bandwidth (64 >>>> byte packet plus 44 byte overhead plus 47 byte worst case cell padding: >>>> 100* (44+47+64)/64 = 242% effective packet size to what the shaper >>>> estimated ), at typical packet sizes the max error (44 bytes missing >>>> overhead and potentially misjudged cell padding of 47 bytes adds up to a >>>> theoretical 100*(44+47+1500)/1500 = 106% effective packet size to what >>>> the shaper estimated). It is obvious that at 1500 byte packets the whole >>>> ATM issue can be easily dismissed with just reducing the link rate by ~10% >>>> for the 48 in 53 framing and an additional ~6% for overhead and cell >>>> padding. But once you mix smaller packets in your traffic for say VoIP, >>>> the effective wire size misjudgment will kill your ability to control the >>>> queueing. Note that the common wisdom of shape down to 85% might be fem >>>> the ~15% ATM "tax" on 1500 byte traffic size... >>>> >>>> >>>>> As I understand it, the current recommendation is to use tc_stab in >>>>> preference to htb_private. I do not know the basis for this value >>>>> judgement. >>>>> >>>> In short: tc_stab allows negative overheads, tc_stab works with HTB, >>>> TBF, HFSC while htb_private only works with HTB. Currently htb_private has >>>> two advantages: it will estimate the per packet overhead correctly of GSO >>>> (generic segmentation offload) is enabled and it will produce exact ATM >>>> link layer estimates for all possible packet sizes. In practice almost >>>> everyone uses an MTU of 1500 or less for their internet access making both >>>> htb_private advantages effectively moot. (Plus if no one beats me to it I >>>> intend to address both theoretical short coming of tc_stab next year). >>>> >>>> Best Regards >>>> Sebastian >>>> >>>> >>>>> >>>>> >>>>> >>>>> On 28/12/13 10:01, Sebastian Moeller wrote: >>>>> >>>>>> Hi Rich, >>>>>> >>>>>> great! A few comments: >>>>>> >>>>>> Basic Settings: >>>>>> [Is 95% the right fudge factor?] I think that ideally, if we get can >>>>>> precisely measure the useable link rate even 99% of that should work out >>>>>> well, to keep the queue in our device. I assume that due to the >>>>>> difficulties in measuring and accounting for the link properties as link >>>>>> layer and overhead people typically rely on setting the shaped rate a >>>>>> bit lower than required to stochastically/empirically account for the >>>>>> link properties. I predict that if we get a correct description of the >>>>>> link properties to the shaper we should be fine with 95% shaping. Note >>>>>> though, it is not trivial on an adel link to get the actually useable >>>>>> bit rate from the modem so 95% of what can be deduced from the modem or >>>>>> the ISP's invoice might be a decent proxy… >>>>>> >>>>>> [Do we have a recommendation for an easy way to tell if it's working? >>>>>> Perhaps a link to a new Quick Test for Bufferbloat page. ] The linked >>>>>> page looks like a decent probe for buffer bloat. >>>>>> >>>>>> >>>>>> >>>>>>> Basic Settings - the details... >>>>>>> >>>>>>> CeroWrt is designed to manage the queues of packets waiting to be sent >>>>>>> across the slowest (bottleneck) link, which is usually your connection >>>>>>> to the Internet. >>>>>>> >>>>>>> >>>>>> I think we can only actually control the first link to the ISP, which >>>>>> often happens to be the bottleneck. At a typical DSLAM (xDSL head end >>>>>> station) the cumulative sold bandwidth to the customers is larger than >>>>>> the back bone connection (which is called over-subscription and is >>>>>> almost guaranteed to be the case in every DSLAM) which typically is not >>>>>> a problem, as typically people do not use their internet that much. My >>>>>> point being we can not really control congestion in the DSLAM's uplink >>>>>> (as we have no idea what the reserved rate per customer is in the worst >>>>>> case, if there is any). >>>>>> >>>>>> >>>>>> >>>>>>> CeroWrt can automatically adapt to network conditions to improve the >>>>>>> delay/latency of data without any settings. >>>>>>> >>>>>>> >>>>>> Does this describe the default fq_codels on each interface (except >>>>>> fib?)? >>>>>> >>>>>> >>>>>> >>>>>>> However, it can do a better job if it knows more about the actual link >>>>>>> speeds available. You can adjust this setting by entering link speeds >>>>>>> that are a few percent below the actual speeds. >>>>>>> >>>>>>> Note: it can be difficult to get an accurate measurement of the link >>>>>>> speeds. The speed advertised by your provider is a starting point, but >>>>>>> your experience often won't meet their published specs. You can also >>>>>>> use a speed test program or web site like >>>>>>> >>>>>>> http://speedtest.net >>>>>>> >>>>>>> to estimate actual operating speeds. >>>>>>> >>>>>>> >>>>>> While this approach is commonly recommended on the internet, I do not >>>>>> believe that it is that useful. Between a user and the speediest site >>>>>> there are a number of potential congestion points that can affect >>>>>> (reduce) the throughput, like bad peering. Now that said the sppedtets >>>>>> will report something <= the actual link speed and hence be conservative >>>>>> (interactivity stays great at 90% of link rate as well as 80% so >>>>>> underestimating the bandwidth within reason does not affect the latency >>>>>> gains from traffic shaping it just sacrifices a bit more bandwidth; and >>>>>> given the difficulty to actually measure the actually attainable >>>>>> bandwidth might have been effectively a decent recommendation even >>>>>> though the theory of it seems flawed) >>>>>> >>>>>> >>>>>> >>>>>>> Be sure to make your measurement when network is quiet, and others in >>>>>>> your home aren’t generating traffic. >>>>>>> >>>>>>> >>>>>> This is great advise. >>>>>> >>>>>> I would love to comment further, but after reloading >>>>>> >>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 >>>>>> >>>>>> just returns a blank page and I can not get back to the page as of >>>>>> yesterday evening… I will have a look later to see whether the page >>>>>> resurfaces… >>>>>> >>>>>> Best >>>>>> Sebastian >>>>>> >>>>>> >>>>>> On Dec 27, 2013, at 23:09 , Rich Brown >>>>>> >>>>>> <[email protected]> >>>>>> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>>> You are a very good writer and I am on a tablet. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>>> Ill take a pass at the wiki tomorrow. >>>>>>>> >>>>>>>> The shaper does up and down was my first thought... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Everyone else… Don’t let Dave hog all the fun! Read the tech note and >>>>>>> give feedback! >>>>>>> >>>>>>> Rich >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Dec 27, 2013 10:48 AM, "Rich Brown" <[email protected]> >>>>>>>> >>>>>>>> wrote: >>>>>>>> I updated the page to reflect the 3.10.24-8 build, and its new GUI >>>>>>>> pages. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> There are still lots of open questions. Comments, please. >>>>>>>> >>>>>>>> Rich >>>>>>>> _______________________________________________ >>>>>>>> Cerowrt-devel mailing list >>>>>>>> >>>>>>>> >>>>>>>> [email protected] >>>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>>>>> _______________________________________________ >>>>>>> Cerowrt-devel mailing list >>>>>>> >>>>>>> >>>>>>> [email protected] >>>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>>>>> _______________________________________________ >>>>>> Cerowrt-devel mailing list >>>>>> >>>>>> >>>>>> [email protected] >>>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel > > _______________________________________________ Cerowrt-devel mailing list [email protected] https://lists.bufferbloat.net/listinfo/cerowrt-devel
