Hello Luca, tsvwg'ers,

        I believe that there is some confusion around about how video
conference streams, and video *streams* in general differ from other
forms of traffic.  I believe some of that confusion comes about not
only becasue of the FEC nature that many use but also over the terms
"elastic", "greedy" and "capacity seeking."

        Though video streams *do* adapt to network conditions, they
do so at fixed consumption steps, this is the elastic nature of a
video stream.  They do not continually seek to find full bandwidth,
that is a greedy or capacity seeking flow which video streams are *not*.

        There is a differenece between watching a video vs downloading
a video on the internet.

        The above are *rough* statements, as the details are much more
involved with things like traffic burst of next frame chunk, and other
techniques that have come onto the market.  I would love to hear from
an expert on the current true nature, but there was certainly some
mis-statements about video conference streams during the meeting.


> Hi Jake,
> Thanks for the notes. Very useful.
> The other issue with the meeting was that the virtual mic queue control
> channel was the WebEx Meeting chat that does not exist in WebEx Teams. So,
> I had to switch to Meetings and lost some pieces of the discussion.
> Yes there might be a terminology difference. Elastic traffic is usually
> used in the sense of bandwidth sharing not just to define variable bit
> rates.
> The point is that there are incentives to cheat in L4S.
> There is a priority queue that my application can enter by providing as
> input ECT(1).
> Applications such as on-line meetings will have a relatively low and highly
> paced rate.
> This traffic is conformant to dualQ L queue but is unresponsive to
> congestion notifications.
> This is especially true for FEC streams which could be used to ameliorate
> the media quality in presence of losses(e.g. Wi-Fi)
> or increased jitter.
> That was one more point on why using ECT(1) as input assumes trust or a
> black list after being caught.
> In both cases the ECT(1) as input is DoSable.
Luca Muscariello
Tuesday, April 28, 2020 at 1:54 AM
> >
> >
> > TL;DR
> >
> > To Dave: you asked several times what  Cisco does on latency reduction in
> >
> > network equipment. I tend to be very shy when replying on these questions
> >
> > as this is not vendor neutral. If chairs think this is not appropriate for
> >
> > the list, please say it and I'll reply privately only.
> >
> >
> >
> > What I write below can be found in Cisco products data sheets and is not
> >
> > trade secret. There are very good blog posts explaining details.
> >
> > Not surprisingly Cisco implements the state of the art on the topic
> >
> > and it is totally feasible to do-the-right-thing in software and hardware..
> >
> >
> >
> > Cisco implements AFD (one queue + a flow table) accompanied by a priority
> > queue for
> >
> > flows that have a certain profile in rate and size. The concept is well
> > known and well
> >
> > studied in the literature. AFD is safe and can well serve a complex
> > traffic mix when
> >
> > accompanied by a priority queue. This prio-queue should not be confused
> > with a strict
> >
> > priority queue (e.g. EF in diffserv). There are subtleties related to the
> >
> > shared medium which would be too long to describe here.
> >
> >
> >
> > This is available in Cisco CMTS for the DOCSIS segment. Bottleneck traffic
> >
> > does not negatively impact non-bottlenecked-traffic such as an on-line
> > meeting like
> >
> > the WebEx call we had yesterday. It is safe from a network neutrality
> > point-of-view
> >
> > and no applications get hurt.
> >
> >
> >
> > Cisco implements AFD+prio also for some DC switches such as the Nexus 9k.
> > There
> >
> > is a blog post written by Tom Edsal online that explains pretty well how
> > that works.
> >
> > This includes mechanisms such as p-fabric to approximate SRPT (shortest
> > remaining processing time)
> >
> > and minimize flow completion time for many DC workloads. The mix of the two
> >
> > brings FCT minimization AND latency minimization. This is silicon and
> > scales at any speed.
> >
> > For those who are not familiar with these concepts, please search the
> > research work of Balaji
> >
> > Prabhakar and Ron Pang at Stanford.
> >
> >
> >
> > Wi-Fi: Cisco does airtime fairness in Aironet but I think in the Meraki
> > series too.
> >
> > The concept is similar to what described above but there are several
> > queues, one per STA.
> >
> > Packets are enqueued in the access (category) queue at dequeue time from
> > the air-time
> >
> > packet scheduler.
> >
> >
> >
> >
> > It looks like the majority of what I say below is not related to the
> > fate of the "bit". The push to take the bit was
> > strong with this one, and me... can't we deploy more of what we
> > already got in places where it matters?
> >
> > ...
> >
> > so: A) PLEA: From 10 years now, of me working on bufferbloat, working
> > on real end-user and wifi traffic and real networks....
> >
> > I would like folk here to stop benchmarking two flows that run for a long
> > time
> > and in one direction only... and thus exclusively in tcp congestion
> > avoidance mode.
> >
> > Please. just. stop. Real traffic looks nothing like that. The internet
> > looks nothing like that.
> > The netops folk I know just roll their eyes up at benchmarks like this
> > that prove nothing and tell me to go to ripe meetings instead.
> > When y'all talk about "not looking foolish for not mandating ecn now",
> > you've already lost that audience with benchmarks like these.
> >
> > Sure, setup a background flow(s)  like that, but then hit the result
> > with a mix of
> > far more normal traffic? Please? networks are never used unidirectionally
> > and both directions congesting is frequent. To illustrate that problem...
> >
> > I have a really robust benchmark that we have used throughout the
> > bufferbloat
> > project that I would like everyone to run in their environments, the flent
> > "rrul" test. Everybody on both sides has big enough testbeds setup that a
> > few
> > hours spent on doing that - and please add in asymmetric networks
> > especially -
> > and perusing the results ought to be enlightening to everyone as to the
> > kind
> > of problems real people have, on real networks.
> >
> > Can the L4S and SCE folk run the rrul test some day soon? Please?
> >
> > I rather liked this benchmark that tested another traffic mix,
> >
> > (
> > https://www.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cablelabs.com_wp-2Dcontent_uploads_2014_06_DOCSIS-2DAQM-5FMay2014.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=j5nEJ3W8fRmqjnBSWapTVKj6dNbpegl4kSeynebCQT4&s=DrB4ENWjWbVu9SqtIh7lXKJj96fwm6TqESC6E8_IdnY&e=>
> > )
> >
> > although it had many flaws (like not doing dns lookups), I wish it
> > could be dusted off and used to compare this
> > new fangled ecn enabled stuff with the kind of results you can merely get
> > with packet loss and rtt awareness. It would be so great to be able
> > to directly compare all these new algorithms against this benchmark.
> >
> > Adding in a non ecn'd udp based routing protocol on heavily
> > oversubscribed 100mbit link is also enlightening.
> >
> > I'd rather like to see that benchmark improved for a more modernized
> > home traffic mix
> > where it is projected there may be 30 devices on the network on average,
> > in a few years.
> >
> > If there is any one thing y'all can do to reduce my blood pressure and
> > keep me engaged here whilst you
> > debate the end of the internet as I understand it, it would be to run
> > the rrul test as part of all your benchmarks.
> >
> > thank you.
> >
> > B) Stuart Cheshire regaled us with several anecdotes - one concerning
> > his problems
> > with comcast's 1Gbit/35mbit service being unusable, under load, for
> > videoconferencing. This is true. The overbuffering at the CMTSes
> > still, has to be seen to be believed, at all rates. At lower rates
> > it's possible to shape this, with another device (which is what
> > the entire SQM deployment does in self defense and why cake has a
> > specific docsis ingress mode), but it is cpu intensive
> > and requires x86 hardware to do well at rates above 500Mbits, presently.
> >
> > So I wish CMTS makers (Arris and Cisco) were in this room. are they?
> >
> > (Stuart, if you'd like a box that can make your comcast link pleasurable
> > under all workloads, whenever you get back to los gatos, I've got a few
> > lying around. Was so happy to get a few ietfers this past week to apply
> > what's off the shelf for end users today. :)
> >
> > C) I am glad bob said the L4S is finally looking at asymmetric
> > networks, and starting to tackle ack-filtering and accecn issues
> > there.
> >
> > But... I would have *started there*. Asymmetric access is the predominate
> > form
> > of all edge technologies.
> >
> > I would love to see flent rrul test results for 1gig/35mbit, 100/10, 200/10
> > services, in particular. (from SCE also!). "lifeline" service (11/2)
> > would be good
> > to have results on. It would be especially good to have baseline
> > comparison data from the measured, current deployment
> > of the CMTSes at these rates, to start with, with no queue management in
> > play, then pie on the uplink, then fq_codel on the uplink, and then
> > this ecn stuff, and so on.
> >
> > D) The two CPE makers in the room have dismissed both fq and sce as
> > being too difficult to implement. They did say that dualpi was
> > actually implemented in software, not hardware.
> >
> > I would certainly like them to benchmark what they plan to offer in L4S
> > vs what is already available in the edgerouter X, as one low end
> > example among thousands.
> >
> > I also have to note, at higher speeds, all the buffering moves into
> > the wifi and the results are currently ugly. I imagine
> > they are exploring how to fix their wifi stacks also? I wish more folk
> > were using RVR + latency benchmarks like this one:
> >
> >
> > http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf
> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__flent-2Dnewark.bufferbloat.net_-7Ed_Airtime-2520based-2520queue-2520limit-2520for-2520FQ-5FCoDel-2520in-2520wireless-2520interface.pdf&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=j5nEJ3W8fRmqjnBSWapTVKj6dNbpegl4kSeynebCQT4&s=UEzrGb3xL5zElDhYxB7wHpux1_SLFHGUcEkgTNMOe2Q&e=>
> >
> > Same goes for the LTE folk.
> >
> > E) Andrew mcgregor mentioned how great it would be for a closeted musician
> > to
> > be able to play in real time with someone across town. that has been my
> > goal
> > for nearly 30 years now!! And although I rather enjoyed his participation
> > in
> > my last talk on the subject (
> >
> > https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over-yet/
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.apnic.net_2020_01_22_bufferbloat-2Dmay-2Dbe-2Dsolved-2Dbut-2Dits-2Dnot-2Dover-2Dyet_&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=j5nEJ3W8fRmqjnBSWapTVKj6dNbpegl4kSeynebCQT4&s=BSDbzxnB7k7krFmkHv9id0BeDC6Vh39LgPNxyHUIg34&e=>
> > ) conflating
> > a need for ecn and l4s signalling for low latency audio applications
> > with what I actually said in that talk, kind of hurt. I achieved
> > "my 2ms fiber based guitarist to fiber based drummer dream" 4+ years
> > back with fq_codel and diffserv, no ecn required,
> > no changes to the specs, no mandating packets be undroppable" and
> > would like to rip the opus codec out of that mix one day.
> >
> > F) I agree with jana that changing the definition of RFC3168 to suit
> > the RED algorithm (which is not pi or anything fancy) often present in
> > network switches,
> > today to suit dctcp, works. But you should say "configuring red to
> > have l4s marking style" and document that.
> >
> > Sometimes I try to point out many switches have a form of DRR in them,
> > and it's helpful to use that in conjunction with whatever diffserv
> > markings you trust in your network.
> >
> > To this day I wish someone would publish how much they use DCTCP style
> > signalling on a dc network relative to their other traffic.
> >
> > To this day I keep hoping that someone will publish a suitable
> > set of RED parameters for a wide variety of switches and routers -
> > for the most common switches and ethernet chips, for correct DCTCP usage.
> >
> > Mellonox's example:
> > (
> > https://community.mellanox.com/s/article/howto-configure-ecn-on-mellanox-ethernet-switches--spectrum-x
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__community.mellanox.com_s_article_howto-2Dconfigure-2Decn-2Don-2Dmellanox-2Dethernet-2Dswitches-2D-2Dspectrum-2Dx&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=j5nEJ3W8fRmqjnBSWapTVKj6dNbpegl4kSeynebCQT4&s=nEIW1DhRXOHu3F5tMwpyO5rQUBMfCZx3Hs4wVvkVFIQ&e=>
> > ) is not dctcp specific.
> >
> > many switches have a form of DRR in them, and it's helpful to use that
> > in conjunction with whatever diffserv markings you trust in your
> > network,
> > and, as per the above example, segregate two red queues that way. From
> > what I see
> > above there is no way to differentiate ECT(0) from ECT(1) in that switch.
> > (?)
> >
> > I do keep trying to point out the size of the end user ecn enabled
> > deployment, starting with the data I have from free.fr
> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__free.fr&d=DwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=bqnFROivDo_4iF8Z3R4DyNWKbbMeXr0LOgLnElT1Ook&m=j5nEJ3W8fRmqjnBSWapTVKj6dNbpegl4kSeynebCQT4&s=7gswGhl21lejSnIiu3yyUTPZEArHqQG6hD64BoW2Zco&e=>.
> > Are we
> > building a network for AIs or people?
> >
> > G) Jana also made a point about 2 queues "being enough" (I might be
> > mis-remembering the exact point). Mellonoxes ethernet chips at 10Gig expose
> > 64 hardware queues, some new intel hardware exposes 2000+. How do these
> > queues work relative to these algorithms?
> >
> > We have generally found hw mq to be far less of a benefit than the
> > manufacturers think, especially as regard to
> > lower latency or reduced cpu usage (as cache crossing is a bear).
> > There is a lot of software work in this area left to be done, however
> > they are needed to match queues to cpus (and tenants)
> >
> > Until sch_pie gained timestamping support recently, the rate estimator
> > did not work correctly in a hw mq environment. Haven't looked over
> > dualpi in this respect.
> >
> >
> >
> >
> >
