[Bloat] mo bettah open source multi-party videoconferncing in an age of bloated uplinks?

2020-03-27 Thread Dave Taht
sort of an outgrowth of this convo:

https://lwn.net/SubscriberLink/815751/786d161d06a90f0e/

I imagine worldwide videoconferencing quality could be much better if
we could convince more folk to
finally install sqm or upgrade to a working docsis 3.1 solution, etc.
Maybe some rag somewhere will finally pick up on bufferbloat solutions
and run with it? Or we can write some articles? Or reach out to school
systems? Or?

I've been fiddling with jitsi, and am about to give freeswitch a try.
Last I looked freeswitch's otherwise pretty nifty conference bridge
didn't dynamically adjust at all due to e2e signalling, but that was
years ago. (?)

I have to admit that p2p multiparty videoconferencing seems more
plausible in a de-bufferbloated age, but
haven't explored what tools are available. (?)

There's also been this somewhat entertaining convo on the ietf mbone
list: https://mailarchive.ietf.org/arch/msg/mboned/2thFQk_IYn38XCZBQavhUmOd6tk/

Around me there has been this huge interest in "streaming". The user
agreement for these (see restream.io's) is scary - and the copyright
police have control... but I am very happy to report that even a
couple really lousy long distance fq_codel'd ath9k links work *really*
well (with facebook's implementation), where a non fq_codeled link
(ath10k) failed miserably... and setting up a reflector in nginx also
failed miserably.

Anyone working on the ath10k AQL backport for openwrt as yet?

-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] mo bettah open source multi-party videoconferncing in an age of bloated uplinks?

2020-03-27 Thread David P. Reed
Congestion control for real-time video is quite different than for streaming. 
Streaming really is dealt with by a big enough (multi-second) buffering, and 
can in principle work great over TCP (if debloated).

UDP congestion control MUST be end-to-end and done in the application layer, 
which is usually outside the OS kernel. This makes it tricky, because you end 
up with latency variation due to eh OS's process scheduler that is on the order 
of magnitude of the real-time requirements for air-to-air or light-to-light 
response (meaning the physical transition from sound or picture to and from the 
transducer).

This creates a godawful mess when trying to do an app. Whether in WebRTC (peer 
to peer UDP) or in a Linux userspace app, the scheduler has huge variance in 
delay.

Now getting rid of bloat currently requires TCP to respond to congestion 
signalling. UDP in the kernel doesn't do that, and it doesn't tell userspace 
much either (you can try to detect packet drops in userspace, but coding that 
up is quite hard because the schdulers get in the way of measurement, and 
forget about ECN being seen in userspace)

This is OS architecture messiness, not a layer 2 or 3 issue.

I've thought about this a lot. Here's my thoughts:

I hate putting things in the kernel! It's insecure. But what this says is that 
for very historical and stupid reasons (related to the ideas of early 
timesharing systems like Unix and Multics) folks try to make real-time 
algorithms look like ordinary "processes" whose notion of controlling temporal 
behavior is abstracted away.

So: 
1. We really should rethink how timing-sensitive algorithms are expressed, and 
it isn't gonna be good to base them on semaphores and threads that run at 
random rates. That means a very different OS conceptual framework. Can this 
share with, say, the Linux we know and love - yes, the hardware can be shared. 
One should be able to dedicate virtual processors that are not running Linux 
processes, but instead another computational model (dataflow?).
An example of this (though clunky and unsupported by good tools) is in FreeBSD, 
it's called *netgraph*. It's a structured way to write reactive algorithms that 
are demand or arrival driven. It also has some security issues, and since it is 
heavily based on passing mbufs around it's really quirky. But I have found it 
useful for the kind of things that need to get done in teleconferencing voice 
and video.

2. EBPF is interesting, because it is more secure, and is again focused on 
running code at kernel level, event-driven.  I think it would be a seriously 
difficult lift to get it to the point where one could program the networked 
media processing in BPF.

3. One of the nice things about KVM (hardware virtualization) is that 
potentially it lets different low level machine models share a common machine. 
It occurs to me that using VIRTIO network devices and some kind of VIRTIO media 
processing devices, that a KVM virtual machine could be hooked up to the 
packet-level networking drivers in the end device, isolating the 
teleconferencing from the rest of the endpoint OS, and creating the right kind 
of near-bare--metal environment for managing the timing of network packets and 
the paths to the screen and audio that would be simple and clean and tightly 
scheduled. KVM could "own" one or more of the physical cores during the 
teleconference.

You can see, though, that this isn't just a "network protocol design" problem. 
This is only partly a network protocol issue, but one that is coupled with the 
architecture of the end systems.

I reminisce a little bit thinking back to the 1970's and 80's when TCP/IP and 
UDP/IP were being designed. Sadly, it was one of the big problems of 
communicating between the OS community and the protocol community that the OS 
community couldn't think outside the "timesharing" system box, and the protocol 
community thought of networking like phone calls (sessions). This is where the 
need for control of timing and buffering got lost. The timesharing folks 
largely thought of networks as for reliable timeless sequential "streams" of 
data that had no particular urgency. The network protocol folks were focused on 
ARQ.
Only a few of us cared about end-to-end latency bounds (where ends meant 
keyboard click or audio sample to screen display change or speaker motion). The 
packet speech guys did, but most networking guys wanted to toss them under the 
bus as annoying. And those of us doing distributed multinode algorithms did, 
but the remote login and FTP guys were skeptical that would ever matter.

It's the latency, stupid. Not the reliability, nor the consistency, nor 
throughput. Unless both the OS and the path are focused on minimizing latency, 
a vast set of applications will suck. Unfortunately, both the OS and network 
communities are *stuck* in a world where latency is uncontrollable, and there 
are no tools for getting it better.

 

On Friday, March 27, 2020 1:2

Re: [Bloat] [Cake] mo bettah open source multi-party videoconferncing in an age of bloated uplinks?

2020-03-27 Thread David Lang

On Fri, 27 Mar 2020, David P. Reed wrote:



Congestion control for real-time video is quite different than for streaming. 
Streaming really is dealt with by a big enough (multi-second) buffering, and 
can in principle work great over TCP (if debloated).

UDP congestion control MUST be end-to-end and done in the application layer, 
which is usually outside the OS kernel. This makes it tricky, because you end 
up with latency variation due to eh OS's process scheduler that is on the order 
of magnitude of the real-time requirements for air-to-air or light-to-light 
response (meaning the physical transition from sound or picture to and from the 
transducer).


at some level this is correct, but if the link is clogged with TCP packets, it 
doesn't matter what your UDP application attempts to do, so installing cake to 
keep individual links from being too congested will allow your UDP application 
have a chance to operate.


David Lang
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] mo bettah open source multi-party videoconferncing in an age of bloated uplinks?

2020-03-27 Thread Dave Taht
Of interest given some of what you say below, there is a huge
discussion on netdev about how to best implement
hardware offloads for network slicing:

https://www.spinics.net/lists/netdev/msg638836.html

Me, I always rolled my eyes up at all the network virtualization stuff
and ran from the room, screaming, given ow much I care about low
latency. The udp vs tcp offload split has been nightmare enough. That
said, to this day I lack a clear idea how any multi-tenant dc
operation really works, I've generally assumed it was policers, and
have deployed sqm (now cake) instead on everything in the cloud that
seemed to need it.

On Fri, Mar 27, 2020 at 12:00 PM David P. Reed  wrote:
>
> Congestion control for real-time video is quite different than for streaming. 
> Streaming really is dealt with by a big enough (multi-second) buffering, and 
> can in principle work great over TCP (if debloated).
>
> UDP congestion control MUST be end-to-end and done in the application layer, 
> which is usually outside the OS kernel. This makes it tricky, because you end 
> up with latency variation due to eh OS's process scheduler that is on the 
> order of magnitude of the real-time requirements for air-to-air or 
> light-to-light response (meaning the physical transition from sound or 
> picture to and from the transducer).
>
> This creates a godawful mess when trying to do an app. Whether in WebRTC 
> (peer to peer UDP) or in a Linux userspace app, the scheduler has huge 
> variance in delay.
>
> Now getting rid of bloat currently requires TCP to respond to congestion 
> signalling. UDP in the kernel doesn't do that, and it doesn't tell userspace 
> much either (you can try to detect packet drops in userspace, but coding that 
> up is quite hard because the schdulers get in the way of measurement, and 
> forget about ECN being seen in userspace)
>
> This is OS architecture messiness, not a layer 2 or 3 issue.
>
> I've thought about this a lot. Here's my thoughts:
>
> I hate putting things in the kernel! It's insecure. But what this says is 
> that for very historical and stupid reasons (related to the ideas of early 
> timesharing systems like Unix and Multics) folks try to make real-time 
> algorithms look like ordinary "processes" whose notion of controlling 
> temporal behavior is abstracted away.
>
> So:
> 1. We really should rethink how timing-sensitive algorithms are expressed, 
> and it isn't gonna be good to base them on semaphores and threads that run at 
> random rates. That means a very different OS conceptual framework. Can this 
> share with, say, the Linux we know and love - yes, the hardware can be 
> shared. One should be able to dedicate virtual processors that are not 
> running Linux processes, but instead another computational model (dataflow?).
> An example of this (though clunky and unsupported by good tools) is in 
> FreeBSD, it's called *netgraph*. It's a structured way to write reactive 
> algorithms that are demand or arrival driven. It also has some security 
> issues, and since it is heavily based on passing mbufs around it's really 
> quirky. But I have found it useful for the kind of things that need to get 
> done in teleconferencing voice and video.
>
> 2. EBPF is interesting, because it is more secure, and is again focused on 
> running code at kernel level, event-driven.  I think it would be a seriously 
> difficult lift to get it to the point where one could program the networked 
> media processing in BPF.
>
> 3. One of the nice things about KVM (hardware virtualization) is that 
> potentially it lets different low level machine models share a common 
> machine. It occurs to me that using VIRTIO network devices and some kind of 
> VIRTIO media processing devices, that a KVM virtual machine could be hooked 
> up to the packet-level networking drivers in the end device, isolating the 
> teleconferencing from the rest of the endpoint OS, and creating the right 
> kind of near-bare--metal environment for managing the timing of network 
> packets and the paths to the screen and audio that would be simple and clean 
> and tightly scheduled. KVM could "own" one or more of the physical cores 
> during the teleconference.
>
> You can see, though, that this isn't just a "network protocol design" 
> problem. This is only partly a network protocol issue, but one that is 
> coupled with the architecture of the end systems.
>
> I reminisce a little bit thinking back to the 1970's and 80's when TCP/IP and 
> UDP/IP were being designed. Sadly, it was one of the big problems of 
> communicating between the OS community and the protocol community that the OS 
> community couldn't think outside the "timesharing" system box, and the 
> protocol community thought of networking like phone calls (sessions). This is 
> where the need for control of timing and buffering got lost. The timesharing 
> folks largely thought of networks as for reliable timeless sequential 
> "streams" of data that had no particular urge

Re: [Bloat] [Cake] mo bettah open source multi-party videoconferncing in an age of bloated uplinks?

2020-03-27 Thread Dave Taht
I don't know to what extent the freeswitch guys would be interested in
this thread. I'd like find a good list or forum to talk about the
state of the art in videoconferencing ? , the ietf rmcat and webrtc
lists are mostly dead. hangouts, jitsi, zoom, etc, seem to be pretty
good products
nowadays (at least in my fq_codel'd environment), but solid info on
how to make them better in the home and for online tele-learning

On Fri, Mar 27, 2020 at 12:00 PM David P. Reed  wrote:
>
> Congestion control for real-time video is quite different than for streaming. 
> Streaming really is dealt with by a big enough (multi-second) buffering, and 
> can in principle work great over TCP (if debloated).

Your encoder still has to adjust to the available bandwidth. The
facebook streaming application did this beautifully through my very
limited highly shared 5mbit uplink - adjusting quickly to a parallel
rrul test in particular by skipping some frames. then lowering the
frame rate and quality, but an early attempt of mine to merely reflect
rtmp streams did not, neither an attempt with "obs studio".

there was about 30 sec of delay in the facebook test - I figure some
of this is tuned to visible uplink buffer sizes (still seconds over
cell), but also to give the riaa a shot at censoring the audio. (a
commercial song crept into - over a mic! - which was detected as
infringing on one attempt which automatically muted the audio and
keyed a nastygram from fb)

I'm going to poke into obs studios underlying code (rtsp anyone?0 at
some point, and really - udp with a head dropping aqm is the best
thing for transporting video, IMHO.

> UDP congestion control MUST be end-to-end and done in the application layer, 
> which is usually outside the OS kernel. This makes it tricky, because you end 
> up with latency variation due to eh OS's process scheduler that is on the 
> order of magnitude of the real-time requirements for air-to-air or 
> light-to-light response (meaning the physical transition from sound or 
> picture to and from the transducer).

We are so far from that point! encoder latencies today are in the
100+ms range. I always liked the opus codec because it can get down to
2.7ms encoding latencies, and a doubled frame rate camera 8ms but
video encoding rates Im out of date on. (?)

One long deferred piece of webrtc/rmcat research I always meant to do
was audio and video on separate ports in the stream,
and using that 2.7m opus clock and depending on fq at the bottleneck
to provide better congestion control information by treating the
smaller audio packets as a clock signal. Due to lack of port space and
a widespread perception that fq isn't out there, most
videoconferencing streams multiplex everything over the same port.
With ipv6 in place, well, port space is no longer a problem.

>
> This creates a godawful mess when trying to do an app. Whether in WebRTC 
> (peer to peer UDP) or in a Linux userspace app, the scheduler has huge 
> variance in delay.

I figure the bounding scheduler latency is still well manageable below
a single 60fps frame.

> Now getting rid of bloat currently requires TCP to respond to congestion 
> signalling. UDP in the kernel doesn't do that, and it doesn't tell userspace 
> much either (you can try to detect packet drops in userspace, but coding that 
> up is quite hard because the schdulers get in the way of measurement, and 
> forget about ECN being seen in userspace)

ECN in userspace is easy on udp, except that most api's tend to
abstract into a file handle style abstraction and a single return of
data, not control information, and the api for getting tos options
ugly. APIs that can return data and info (data, packetheader) =
getudp_someway() probably exist for more modern languages like go, but
rarely c or c++. Totally out of date on this, last I looked at the
google congestion congtrol code bae was in mozilla... 8 years ago!

As for doing udp semi-efficiently in batches...

sendmmsg, recvmmsg is a rather underused kernel api. And ugly as sin.
With some major limitations.


>
> This is OS architecture messiness, not a layer 2 or 3 issue.

To me the nightmare starts with most cpu context switch latencies
being 1000s of clocks nowadays.

>
> I've thought about this a lot. Here's my thoughts:
>
> I hate putting things in the kernel! It's insecure. But what this says is 
> that for very historical and stupid reasons (related to the ideas of early 
> timesharing systems like Unix and Multics) folks try to make real-time 
> algorithms look like ordinary "processes" whose notion of controlling 
> temporal behavior is abstracted away.

On the whole, with the rise of quic - in particular quic, as multiple
userspace libs have been emerging - we've got good bases to move
forward with more stuff in userspace.

>
> So:
> 1. We really should rethink how timing-sensitive algorithms are expressed, 
> and it isn't gonna be good to base them on semaphores and threads that run at 
> random rates. That means a 

Re: [Bloat] [Cake] mo bettah open source multi-party videoconferncing in an age of bloated uplinks

2020-03-27 Thread Hal Murray

> I hate putting things in the kernel! It's insecure. But what this says is
> that for very historical and stupid reasons (related to the ideas of early
> timesharing systems like Unix and Multics) folks try to make real-time
> algorithms look like ordinary "processes" whose notion of controlling
> temporal behavior is abstracted away. 

Could you please say more.

Why doesn't it work to put the time critical stuff in a separate light weight 
thread and give it higher priority than the stuff that needs lots of CPU?

Is the problem in the scheduler?  Is background junk overloading the system?
(Are people rebuilding the kernel while video converencing?)

Is it too hard to split out the logic that would go in the light weight 
thread? (get tangled on locks or such)


-- 
These are my opinions.  I hate spam.



___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] fcc's coronovirus guidelines

2020-03-27 Thread Dave Taht
"put everyone on a schedule"... sigh

https://www.fcc.gov/home-network-tips-coronavirus-pandemic


-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] fcc's coronovirus guidelines

2020-03-27 Thread Kenneth Porter
--On Friday, March 27, 2020 3:41 PM -0700 Dave Taht  
wrote:



"put everyone on a schedule"... sigh

https://www.fcc.gov/home-network-tips-coronavirus-pandemic


How do we educate officials? It's not clear who we'd even address a 
correction to. Is there a bufferbloat page we can point them to?




___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


[Bloat] Windows 10 updates multhread limit Feature request to Microsoft

2020-03-27 Thread cloneman
I don't know if the Windows 10 updates multithreading problem is still
relevant for many users, but it certainly still affects me as user with
low bufferbloat, low latency, and only moderate bandwidth (50mbit , 4ms
idle, ~9ms loaded)

In any case, I have submitted official feedback to microsoft, as I've
exhausted any possible workarounds on my end short of implementing a
Windows update cache server on my LAN (go figure, apparently the cache
servers download with a small number of threads).

After posting this link -- I'm done advocating for this issue. I think
valve steam has made some improvements, they still use many threads, but
somehow, it doesn't create as many issues.There seems to be no interest in
most discussion forums to explore this with any depth -- or even admit that
20 connections to 1-2 servers is problematic.



Feedback hub link
https://aka.ms/AA7zg1r
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat