On 2025-12-02 23:44, Marc-André Lureau wrote:
Hi Geoffrey
On Tue, Dec 2, 2025 at 4:31 PM Geoffrey McRae
<[email protected]> wrote:
The PipeWire and PulseAudio backends are used by a large number of
users
in the VFIO community. Removing these would be an enormous determent
to
QEMU.
They come with GStreamer pulse/pipe elements.
Yes, but through another layer of abstraction/complexity with no real
benefit.
Audio output from QEMU has always been problematic, but with the
PulseAudio and later, the PipeWire interface, it became much more
user
friendly for those that wanted to configure the VM to output native
audio into their sound plumbing.
Could you be more specific?
There are clock sync/drift issues with the emulated hardware device's
audio clock and the real hardware audio clock. GStreamer won't solve
this, it requires a tuned PID loop that resamples the audio to
compensate for the continual drift between the emulated and hardware
clocks. Without this, over time, the audio can and does get wildly out
of sync eventually resulting in xruns.
All you have to do is google for "QEMU Crackling Sound". JACK, PipeWire
and PulseAudio manage to mostly hide (not sovle) this issue from the
user, but it still occurs. It's worse for SPICE clients as the audio
gets buffered in the network stack rather then dropped and can lead to
many seconds of audio latency.
As for applications, we have a large number of people using QEMU/KVM
with full GPU pass-through for gaming workloads, many of which route the
QEMU audio into PipeWire/JACK directly which enables the host's sound
server to perform DSP and mixing, etc.
Others are streaming the guest via Looking Glass for the video feed, and
using PipeWire from QEMU to feed into OBS for live streaming setups.
The flexibility that JACK & PipeWire bring to the table can not be
overstated. From a maintenance point of view, JACK and PipeWire are only
~800 lines of code each, fully self contained and very easy to debug.
All the audio processing/mixing/resampling/routing (and any user
configured DSP) is fully offloaded to the host's audio server, where it
should be.
I do not agree that ALSA is as useful as you state it is, it's
dependent
on the host system's audio hardware support. If the sound device
doesn't
support hardware mixing (almost none do anymore), or the
bitrate/sample
rate QEMU wishes to use, your out of luck.
What I do think needs fixing here is the removal of the forced S16
audio
format, and the resampler which forces all output to 48KHz. This
though
would require changes to the SPICE protocol as currently it is fixed
at
two channel 48KHz S16 also IIRC.
Why is it a problem that Spice requires 48khz? Afaik, you can't have
both Spice & another backend (unlike VNC which does monitor to
capture)
For clients like Looking Glass that take the audio via SPICE for
rendering locally via it's own audio devices where we do additional
things such as tracking client/host audio clocks and resample to keep
the audio latency consistent correcting for the clock drift as mentioned
prior.
There are quite a lot of people also using virt-viewer with Intel GVT-g
these days too that are also limited to 48khz S16 again due to it using
SPICE by default.
I digress though, this is a different topic entirely and I should not
have raised it here.
IMHO adding GStreamer is unnecessary, we have the modern PipeWire
interface which is compatible with everything. I see absolutely no
reason to add so much complexity to the project for little to no
gain.
Pipewire alone is not compatible with Windows or OSX, afaik.
Yes, but there is the DirectSound audio driver for Windows, and
CoreAudio driver for OSX. While I appreciate that DirectSound is
deprecated, I really think that effort should be put into implementing a
WASAPI backend for QEMU.
I really do not think that adding all the complexity of GStreamer to
QEMU is the right way forward. We should just hand off the audio
processing to the host system's sound server (as we do already),
whatever it might be, and let it do the heavy lifting.
Regards,
Geoffrey McRae (gnif)