Hi Michael,

> On Jul 10, 2022, at 22:01, Michael Welzl <mich...@ifi.uio.no> wrote:
> 
> Hi !
> 
> 
>> On Jul 10, 2022, at 7:27 PM, Sebastian Moeller <moell...@gmx.de> wrote:
>> 
>> Hi Michael,
>> 
>> so I reread your paper and stewed a bit on it.
> 
> Many thanks for doing that! :)
> 
> 
>> I believe that I do not buy some of your premises.
> 
> you say so, but I don’t really see much disagreement here. Let’s see:
> 
> 
>> e.g. you write:
>> 
>> "We will now examine two factors that make the the present situation 
>> particularly worrisome. First, the way the infrastructure has been evolving 
>> gives TCP an increasingly large operational space in which it does not see 
>> any feedback at all. Second, most TCP connections are extremely short. As a 
>> result, it is quite rare for a TCP connection to even see a single 
>> congestion notification during its lifetime."
>> 
>> And seem to see a problem that flows might be able to finish their data 
>> transfer business while still in slow start. I see the same data, but see no 
>> problem. Unless we have an oracle that tells each sender (over a shared 
>> bottleneck) exactly how much to send at any given time point, different 
>> control loops will interact on those intermediary nodes.
> 
> You really say that you don’t see the solution. The problem is that 
> capacities are underutilized, which means that flows take longer (sometimes, 
> much longer!) to finish than they theoretically could, if we had a better 
> solution.

        [SM] No IMHO the underutilization is the direct consequence of 
requiring a gradual filling of the "pipes" to probe he available capacity. I 
see no way how this could be done differently with the traffic sources/sinks 
being uncoordinated entities at the edge, and I see no way of coordinating all 
end points and handle all paths. In other words, we can fine tune a parameters 
to tweak the probing a bit, make it more or less aggressive/fast, but the fact 
that we need to probe capacity somehow means underutilization can not be 
avoided unless we find a way of coordinating all of the sinks and sources. But 
being sufficiently dumb, all I can come up with is an all-knowing oracle or 
faster than light communication, and neither strikes me to be realistic ;)


> 
> 
>> I might be limited in my depth of thought here, but having each flow probing 
>> for capacity seems exactly the right approach... and doubling CWND or rate 
>> every RTT is pretty aggressive already (making slow start shorter by 
>> reaching capacity faster within the slow-start framework requires either to 
>> start with a higher initial value (what increasing IW tries to achieve?) or 
>> use a larger increase factor than 2 per RTT). I consider increased IW a 
>> milder approach than the alternative. And once one accepts that a gradual 
>> rate increasing is the way forward it falls out logically that some flows 
>> will finish before they reach steady state capacity especially if that flows 
>> available capacity is large. So what exactly is the problem with short flows 
>> not reaching capacity and what alternative exists that does not lead to 
>> carnage if more-aggressive start-up phases drive the bottleneck load into 
>> emergency drop territory?
> 
> There are various ways to do this; one is to cache information and re-use it, 
> assuming that - at least sometimes - new flows will see the same path again.

        [SM] And equally important, that a flow's capacity share along a path 
did not change be other flows appearing on the same path. This is a case of 
speculation which depending on link and path type will work out well more or 
less often, the question then becomes is the improvement on successful 
speculation worth the cost of unsuccessful speculation (mostly the case where 
the estimate is wildly above the path capacity). Personally I think that having 
each flow start searching achievable capacity from the "bottom" seems more 
robust and reliable. I would agree though that better managing the typical 
overshoot of slow start is a worthy goal (one if tackled successfully might 
allow a faster capacity search approach).

> Another is to let parallel flows share information.

        [SM] Sounds sweet, but since not even two back-to-back packets send 
over the internet from A to B are guaranteed to take exactly the same path, 
confirming that flows actually share a sufficiently similar path seems tricky. 
Also stipulating two flows actually share a common path say over a capacity 
limiting node we have say 99 other flows and our parallel already established 
flow in equilibrium, now our new flow probably could start with an CWND close 
to the established flows. But if the bottleneck is fully occupied with our 
parallel established flow the limit would be 50% of the existing flow's rate, 
but only if that flow actually has enough time to give way...

Could you elaborate how that could work, please?

> Yet another is to just be blindly more aggressive.

        [SM] Sure, works if the "cost" of admitting too much data is 
acceptable, alas from an end users perspective I know I have flows where I do 
not care much if the overcommit and start throttling themselves (think 
background bulk transfer) but where I would get unhappy if their over 
aggression would interfere with other more important to me flows (that is a 
part why i am a happy flow queueing user, FQ helps a lot in isolating the 
fall-out from overly aggressive flows mainly to themselves).


> Yet another, chirping.

        [SM] I would love for that to work, but I have seen no convincing data 
yet demonstrating that over the existing internet, however we know already from 
other papers, that inter-packet delay is a somewhat unreliable estimator for 
capacity, so using that, even in clever ways requires some accumulation and 
smoothing, so I wonder how much faster/better this is actually going compared 
to existing slow start with a sufficiently high starting IW? In a meta 
criticism way, I am somewhat surprised how little splash paced chirping seems 
to be making given how positive its inventors presented it, might be the 
typical inertia of the field or an indication that PC might not yet be ready 
for show-time. However, if you think of somethnig else than paced chirping 
here, could you share a reference, please?


> 
> 
>> And as an aside, a PEP (performance enhancing proxy) that does not enhance 
>> performance is useless at best and likely harmful (rather a PDP, performance 
>> degrading proxy).
> 
> You’ve made it sound worse by changing the term, for whatever that’s worth. 
> If they never help, why has anyone ever called them PEPs in the first place?

        [SM] I would guess because "marketing" was unhappy with "engineering" 
emphasizing the side-effects/potential problems and focussed in the best-case 
scenario? ;)

> Why do people buy these boxes?

        [SM] Because e.g. for GEO links, latency is in a range where default 
unadulterated TCP will likely choke on itself, and when faced with requiring 
customers to change/tune TCPs or having "PEP" fudge it, ease of use of fudging 
won the day. That is a generous explanation (as this fudging is beneficial to 
both the operator and most end-users), I can come up with less charitable 
theories if you want ;) .

>> The network so far has been doing reasonably well with putting more protocol 
>> smarts at the ends than in the parts in between.
> 
> Truth is, PEPs are used a lot: at cellular edges, at satellite links… because 
> the network is *not* always doing reasonably well without them.

        [SM] Fair enough, I accept that there are use cases for those, but 
again, only if the actually enhance the "experience" will users be happy to 
accept them. The goals of the operators and the paying customers are not always 
aligned here, a PEP might be advantageous more to the operator than the 
end-user (theoretically also the other direction, but since operators pay for 
PEPs they are unlikely to deploy those) think mandatory image recompression or 
forced video quality downscaling.... (and sure these are not as clear as I 
pitched them, if after an emergency a PEP allows most/all users in a cell to 
still send somewhat degraded images that is better than the network choking 
itself with a few high quality images, assuming images from the emergency are 
somewhat useful).

>> I have witnessed the arguments in the "L4S wars" about how little processing 
>> one can ask the more central network nodes perform, e.g. flow queueing which 
>> would solve a lot of the issues (e.g. a hyper aggressive slow-start flow 
>> would mostly hurt itself if it overshoots its capacity) seems to be a 
>> complete no-go.
> 
> That’s to do with scalability, which depends on how close to the network’s 
> edge one is.

        [SM] I have heard the alternative that it has to do with what operators 
of core-links request from their vendors and what features they are willing to 
pay for... but this is very anecdotal as I have little insight into big-iron 
vendors or core-link operators. 

>> I personally think what we should do is have the network supply more 
>> information to the end points to control their behavior better. E.g. if we 
>> would mandate a max_queue-fill-percentage field in a protocol header and 
>> have each node write max(current_value_of_the_field, 
>> queue-filling_percentage_of_the_current_node) in every packet, end points 
>> could estimate how close to congestion the path is (e.g. by looking at the 
>> rate of %queueing changes) and tailor their growth/shrinkage rates 
>> accordingly, both during slow-start and during congestion avoidance.
> 
> That could well be one way to go. Nice if we provoked you to think!

        [SM] You mostly made me realize what the recent increases in IW 
actually aim to accomplish ;) and that current slow start seems actually better 
than its reputation; it solves a hard problem surprisingly well. The 
max(pat_queue%) idea has been kicking around in my head ever since reading a 
paper about storing queue occupancy into packets to help CC along (sorry, do 
not recall the authors or the title right now) so that is not even my own 
original idea, but simply something I borrowed from smarter engineers simply 
because I found the data convincing and the theory sane. (Also because I 
grudgingly accept that latency increases measured over the internet are a tad 
too noisy to be easily useful* and too noisy for a meaningful controller based 
on the latency rate of change**)

>> But alas we seem to go the path of a relative dumb 1 bit signal giving us an 
>> under-defined queue filling state instead and to estimate relative queue 
>> filling dynamics from that we need many samples (so literally too little too 
>> late, or L3T2), but I digress.
> 
> Yeah you do :-)

        [SM] Less than you let on ;). If L4S gets ratified (increasingly 
likely, mostly for political*** reasons) it gets considerably harder to get yet 
another queue size related bits into the IP header...


Regards
        Sebastian

*) Participating in discussions about using active latency measurements to 
adapt traffic shapers for variable rate links which exposes quite a number of 
latency and throughput related issues, albeit for me on an amateurs level of 
understanding: https://github.com/lynxthecat/CAKE-autorate

**) I naively think that to make slow-start exist gracefully we need a quick 
and reliable measure for pre-congestion, and latency increases are so noisy 
that neither quick nor reliable can be achieved, let alone both at the same 
time.


***) Well aware that "political" is a "problematic" word in view of the IETF, 
but L4S certainly will not be ratified on its merits, because these have not 
(yet?) been conclusively demonstrated; not ruling out the merits can not be 
realized, just that currently there is not sufficient hard data to make a 
reasonable prediction.



> 
> Cheers,
> Michael

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to