Re: Realtime AEC + VAD

Tamás Zahola via Coreaudio-api Fri, 18 Oct 2024 03:55:47 -0700

Hold on a sec, how are you planning to use the AudioDevice VAD on watchOS? It is a macOS-only API. It's not available on watchOS, neither on iOS.

Now, considering what Julian wrote, I think your problem might he that you're using the VPIO unit in conjunction with the AudioDevice VAD. Because if what Julian wrote is true, that AudioDevice already has echo cancellation when VAD is enabled, then what could be happening is that your output signal is subtracted *twice* from the input: first by the echo canceller of AudioDevice, and then by the VPIO unit. So in effect the VPIO unit ends up re-adding the echo with inverted phase.

I would recommend trying just AudioDevice directlt, without the VPIO unit.

Obviously, this is all macOS-only. On iOS (and I guess watchOS) you only have AudioUnits, so you must use your own VAD.

Regards,

Tamás Zahola

On 2024. Oct 18., at 12:30, π via Coreaudio-api <[email protected]> wrote:

Thanks for the pointer Tamás!

Pulling out VAD from WebRTC worked a treat.

I started with https://github.com/daanzu/py-webrtcvad-wheels and knocked together a hello.cpp and CMakeLists.txt (https://gist.github.com/p-i-/598da13d2a1a1e2a6ec978e15fa7d892)

I have to say, it feels hella awkward that I cannot control the pipeline and use native AudioUnits for this kind of work.

Surely it is a mistake on Apple's part to put VAD before AEC, if this is really what they're doing... it's gona trigger VAD callback on incoming/remote audio, rather than user-speech.

For a low-power usage scenario (say WatchOS), I really want to be dynamically rerouting -- if there's no audio being sent thru the speaker, I don't want AEC eating CPU cycles, but I DO want VAD detecting user-speech onset. And if audio IS being sent thru the speaker, I want AEC to be subtracting it, and VAD to be operating on this "cleaned" mic-input. I'd love it if VoiceProcessingIO unit took care of all of this.

I haven't yet managed to scientifically determine exactly what VoiceProcessingIO unit is actually doing, but if I engage its AEC and VAD and play a sine-wave, it disturbs the VAD callbacks, yet successfully subtracts the sinewave from mic-audio. So I strongly suspect they have these two subcomponents wired up in the wrong order.

If this is indeed the case, is there any liklihood of a future fix? Do Apple core-audio devs listen in on this list?

π

On Thu, 17 Oct 2024 at 10:24, Tamás Zahola via Coreaudio-api <[email protected]> wrote:
You can extract the VAD algorithm from WebRTC by starting at this file: https://chromium.googlesource.com/external/webrtc/stable/src/+/master/common_audio/vad/vad_core.h

You'll also need some stuff from the common_audio/signal_processing folder, but otherwise it's self-contained.

It's easy for me to get the audio-output-stream for MY app (it just comes in over the websocket), but I may wish to toggle whether I want my AEC to be cancelling out any output-audio generated by other processes on my mac.

From macOS Ventura onwards it is possible to capture system audio with the ScreenCaptureKit framework, although your app will need extra privacy permissions.

It must be possible on macOS, as apps like soundFlower or blackHole are able to do it.

BlackHole and SoundFlower are using an older technique, where they install a virtual loopback audio device on the system (you can see it listed in Audio MIDI Settings as e.g. "BlackHole 2 ch"), and change the system's default output device to that, then capture from the input port of this loopback device. But this requires installing the virtual device in /Library/Audio/Plug-Ins/HAL, which requires admin privileges.

But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) is that it was impossible to access this. But there's now some mention of v3 audio-units being able to process inter-app audio.

On iOS you must use the voice-processing I/O unit. Normal apps cannot capture the system audio output. Technically there is a way to do it with the ReplayKit framework, but it's a pain in the ass to use, and the primary purpose of that framework is capturing screen content, not audio. If you try e.g. Facebook Messenger on iOS, and initiate screen-sharing in a video call, that's going to use ReplayKit.

Regards,
Tamás Zahola

On 17 Oct 2024, at 08:04, π via Coreaudio-api <[email protected]> wrote:

Thankyou for the replies. I am glad to see that this mailing-list is still alive, despite the dwindling traffic this last few years.

Can I not encapsulate a VPIO unit, and control the input/output audio-streams by implementing input/render callbacks, or making connections?

I'm veering towards this approach of manual implementation: Just to use a (misnamed as it's I/O) HALInput unit on macOS or a RemoteIO unit on the mobile platforms to access the raw I/O buffers, and write my own pipeline.

Would it be a good idea to use https://github.com/apple/AudioUnitSDK to wrap this? My hunch is to minimize the layers/complexity and NOT use this framework.

And for the AEC/VAD, can anyone offer a perspective? Arshia? The two obvious candidates I see are WebRTC and SpeeX. GPT4o reckons WebRTC will be the most-advanced / best-performant solution, with the downside that it's a big project (and maybe a more complicated build process), while SpeeX is more light-weight and will probably do the job well enough for my purposes.

And as both are open-source, I may have the option of pulling out the minimal-dependency files and building just those.

The last question is regarding system-wide audio output. It's easy for me to get the audio-output-stream for MY app (it just comes in over the websocket), but I may wish to toggle whether I want my AEC to be cancelling out any output-audio generated by other processes on my mac. e.g. if I am watching a YouTube video, maybe I want my AI to listen to that, and maybe I want it subtracted. So do I have the option to listen to SYSTEM-level audio output (so as to feed it into my AEC impl)? It must be possible on macOS, as apps like soundFlower or blackHole are able to do it. But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) is that it was impossible to access this. But there's now some mention of v3 audio-units being able to process inter-app audio.

π

On Wed, 16 Oct 2024 at 19:35, Arshia Cont via Coreaudio-api <[email protected]> wrote:
Hi π,

From my experience that’s not possible. VPIO is an option for the lower level IO device; so is VAD. You don’t have much control over their internals, routing and wirings! Also, from our experience, VPIO has different behaviour on different devices. On some iPads we saw “gating” instead of actually removing echo (be aware of that!). In the end for a similar use-case we ended up doing our own AEC and Activity Detection.

Cheers,

Arshia Cont
metronautapp.com

On 15 Oct 2024, at 18:08, π via Coreaudio-api <[email protected]> wrote:

Dear Audio Engineers,

I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional realtime audio over websocket with AI serverside).

To do this, I need to be careful that the AI-speak doesn't make its way out of the speakers, back in thru the mic, and back to their server (else it starts to talk to itself, and gets very confused).

So I need AEC, which I've actually got working, using kAudioUnitSubType_VoiceProcessingIO and AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to False).

Now I also wish to detect when the speaker (me) is speaking or not speaking, which I've also managed to do via kAudioDevicePropertyVoiceActivityDetectionEnable.

But getting them to play together is another matter, and I'm struggling hard here.

I've rigged up a simple test (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a 440Hz sinewave is generated in the render-callback, and mic-input is recorded to file in the input-callback.

So the AEC works delightfully, subtracting the sinewave and recording my voice.
And if I turn the sine-wave amplitude down to 0, the VAD correctly triggers the speech-started and speech-stopped events.

But if I turn up the sine-wave, it messes up the VAD.

Presumably the VAD is working over the pre-EchoCancelled audio, which is most undesirable.

How can I progress here?

My thought was to create an audio pipeline, using AUGraph, but my efforts have thus far been unsuccessful, and I lack confidence that I'm even pushing in the right direction.

My thought was to have an IO unit that interfaces with the hardware (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit.

But I can't see how to set this up.

On iOS there's a RemoteIO unit to deal with the hardware, but I can't see any such unit on macOS. It seems the VoiceProcessing unit wants to do that itself.

And then I wonder: Could I make a second VoiceProcessing unit, and have vp1_aec split send its bus[1(mic)].outputScope to vp2_vad.bus[1].inputScope?

Can I do this kind of work by routing audio, or do I need to get my hands dirty with input/render callbacks?

It feels like I'm going hard against the grain if I am faffing with these callbacks.

If there's anyone out there that would care to offer me some guidance here, I am most grateful!

π

PS Is it not a serious problem that VAD can't operate on post-AEC input?
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/arshiacont%40antescofo.com

This email sent to [email protected]

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com

This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com

This email sent to [email protected]

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com

This email sent to [email protected]

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com

This email sent to [email protected]

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/archive%40mail-archive.com


This email sent to [email protected]

Re: Realtime AEC + VAD

Reply via email to