Re: Realtime AEC + VAD

Andy Lucas via Coreaudio-api Thu, 17 Oct 2024 19:11:45 -0700

+ Julian, who may be able to help answer.

> On Oct 17, 2024, at 2:22 AM, Tamás Zahola via Coreaudio-api 
> <[email protected]> wrote:
> 
> You can extract the VAD algorithm from WebRTC by starting at this file: 
> https://chromium.googlesource.com/external/webrtc/stable/src/+/master/common_audio/vad/vad_core.h
> 
> You'll also need some stuff from the common_audio/signal_processing folder, 
> but otherwise it's self-contained.
> 
>> It's easy for me to get the audio-output-stream for MY app (it just comes in 
>> over the websocket), but I may wish to toggle whether I want my AEC to be 
>> cancelling out any output-audio generated by other processes on my mac.
> 
> From macOS Ventura onwards it is possible to capture system audio with the 
> ScreenCaptureKit framework, although your app will need extra privacy 
> permissions.
> 
>> It must be possible on macOS, as apps like soundFlower or blackHole are able 
>> to do it.
> 
> BlackHole and SoundFlower are using an older technique, where they install a 
> virtual loopback audio device on the system (you can see it listed in Audio 
> MIDI Settings as e.g. "BlackHole 2 ch"), and change the system's default 
> output device to that, then capture from the input port of this loopback 
> device. But this requires installing the virtual device in 
> /Library/Audio/Plug-Ins/HAL, which requires admin privileges.
> 
>> But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) is that 
>> it was impossible to access this. But there's now some mention of v3 
>> audio-units being able to process inter-app audio.
> 
> On iOS you must use the voice-processing I/O unit. Normal apps cannot capture 
> the system audio output. Technically there is a way to do it with the 
> ReplayKit framework, but it's a pain in the ass to use, and the primary 
> purpose of that framework is capturing screen content, not audio. If you try 
> e.g. Facebook Messenger on iOS, and initiate screen-sharing in a video call, 
> that's going to use ReplayKit.
> 
> Regards,
> Tamás Zahola 
> 
>> On 17 Oct 2024, at 08:04, π via Coreaudio-api <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Thankyou for the replies. I am glad to see that this mailing-list is still 
>> alive, despite the dwindling traffic this last few years.
>> 
>> Can I not encapsulate a VPIO unit, and control the input/output 
>> audio-streams by implementing input/render callbacks, or making connections?
>> 
>> I'm veering towards this approach of manual implementation: Just to use a 
>> (misnamed as it's I/O) HALInput unit on macOS or a RemoteIO unit on the 
>> mobile platforms to access the raw I/O buffers, and write my own pipeline.
>> 
>> Would it be a good idea to use https://github.com/apple/AudioUnitSDK to wrap 
>> this? My hunch is to minimize the layers/complexity and NOT use this 
>> framework.
>> 
>> And for the AEC/VAD, can anyone offer a perspective? Arshia? The two obvious 
>> candidates I see are WebRTC and SpeeX. GPT4o reckons WebRTC will be the 
>> most-advanced / best-performant solution, with the downside that it's a big 
>> project (and maybe a more complicated build process), while SpeeX is more 
>> light-weight and will probably do the job well enough for my purposes.
>> 
>> And as both are open-source, I may have the option of pulling out the 
>> minimal-dependency files and building just those.
>> 
>> The last question is regarding system-wide audio output. It's easy for me to 
>> get the audio-output-stream for MY app (it just comes in over the 
>> websocket), but I may wish to toggle whether I want my AEC to be cancelling 
>> out any output-audio generated by other processes on my mac. e.g. if I am 
>> watching a YouTube video, maybe I want my AI to listen to that, and maybe I 
>> want it subtracted. So do I have the option to listen to SYSTEM-level audio 
>> output (so as to feed it into my AEC impl)? It must be possible on macOS, as 
>> apps like soundFlower or blackHole are able to do it. But mobile, I'm not so 
>> sure. My memory of iPhone audio dev (~2008) is that it was impossible to 
>> access this. But there's now some mention of v3 audio-units being able to 
>> process inter-app audio.
>> 
>> π
>> 
>> On Wed, 16 Oct 2024 at 19:35, Arshia Cont via Coreaudio-api 
>> <[email protected] <mailto:[email protected]>> wrote:
>>> Hi π,
>>> 
>>> From my experience that’s not possible. VPIO is an option for the lower 
>>> level IO device; so is VAD. You don’t have much control over their 
>>> internals, routing and wirings! Also, from our experience, VPIO has 
>>> different behaviour on different devices. On some iPads we saw “gating” 
>>> instead of actually removing echo (be aware of that!). In the end for a 
>>> similar use-case we ended up doing our own AEC and Activity Detection.
>>> 
>>> Cheers,
>>> 
>>> Arshia Cont
>>> metronautapp.com <http://metronautapp.com/>
>>> 
>>> 
>>> 
>>>> On 15 Oct 2024, at 18:08, π via Coreaudio-api 
>>>> <[email protected] <mailto:[email protected]>> 
>>>> wrote:
>>>> 
>>>> Dear Audio Engineers,
>>>> 
>>>> I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional 
>>>> realtime audio over websocket with AI serverside).
>>>> 
>>>> To do this, I need to be careful that the AI-speak doesn't make its way 
>>>> out of the speakers, back in thru the mic, and back to their server (else 
>>>> it starts to talk to itself, and gets very confused).
>>>> 
>>>> So I need AEC, which I've actually got working, using 
>>>> kAudioUnitSubType_VoiceProcessingIO and 
>>>> AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to 
>>>> False).
>>>> 
>>>> Now I also wish to detect when the speaker (me) is speaking or not 
>>>> speaking, which I've also managed to do via 
>>>> kAudioDevicePropertyVoiceActivityDetectionEnable.
>>>> 
>>>> But getting them to play together is another matter, and I'm struggling 
>>>> hard here.
>>>> 
>>>> I've rigged up a simple test 
>>>> (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a 
>>>> 440Hz sinewave is generated in the render-callback, and mic-input is 
>>>> recorded to file in the input-callback. 
>>>> 
>>>> So the AEC works delightfully, subtracting the sinewave and recording my 
>>>> voice.
>>>> And if I turn the sine-wave amplitude down to 0, the VAD correctly 
>>>> triggers the speech-started and speech-stopped events.
>>>> 
>>>> But if I turn up the sine-wave, it messes up the VAD.
>>>> 
>>>> Presumably the VAD is working over the pre-EchoCancelled audio, which is 
>>>> most undesirable.
>>>> 
>>>> How can I progress here?
>>>> 
>>>> My thought was to create an audio pipeline, using AUGraph, but my efforts 
>>>> have thus far been unsuccessful, and I lack confidence that I'm even 
>>>> pushing in the right direction.
>>>> 
>>>> My thought was to have an IO unit that interfaces with the hardware 
>>>> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit.
>>>> 
>>>> But I can't see how to set this up.
>>>> 
>>>> On iOS there's a RemoteIO unit to deal with the hardware, but I can't see 
>>>> any such unit on macOS. It seems the VoiceProcessing unit wants to do that 
>>>> itself.
>>>> 
>>>> And then I wonder: Could I make a second VoiceProcessing unit, and have 
>>>> vp1_aec split send its bus[1(mic)].outputScope to 
>>>> vp2_vad.bus[1].inputScope?
>>>> 
>>>> Can I do this kind of work by routing audio, or do I need to get my hands 
>>>> dirty with input/render callbacks?
>>>> 
>>>> It feels like I'm going hard against the grain if I am faffing with these 
>>>> callbacks.
>>>> 
>>>> If there's anyone out there that would care to offer me some guidance 
>>>> here, I am most grateful!
>>>> 
>>>> π
>>>> 
>>>> PS Is it not a serious problem that VAD can't operate on post-AEC input?
>>>> _______________________________________________
>>>> Do not post admin requests to the list. They will be ignored.
>>>> Coreaudio-api mailing list      ([email protected] 
>>>> <mailto:[email protected]>)
>>>> Help/Unsubscribe/Update your Subscription:
>>>> https://lists.apple.com/mailman/options/coreaudio-api/arshiacont%40antescofo.com
>>>> 
>>>> This email sent to [email protected] 
>>>> <mailto:[email protected]>
>>> 
>>>  _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Coreaudio-api mailing list      ([email protected] 
>>> <mailto:[email protected]>)
>>> Help/Unsubscribe/Update your Subscription:
>>> https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com
>>> 
>>> This email sent to [email protected] <mailto:[email protected]>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Coreaudio-api mailing list      ([email protected] 
>> <mailto:[email protected]>)
>> Help/Unsubscribe/Update your Subscription:
>> https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com
>> 
>> This email sent to [email protected] <mailto:[email protected]>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Coreaudio-api mailing list      ([email protected] 
> <mailto:[email protected]>)
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/coreaudio-api/ajwlucas%40apple.com
> 
> This email sent to [email protected] <mailto:[email protected]>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/coreaudio-api/archive%40mail-archive.com


This email sent to [email protected]

Re: Realtime AEC + VAD

Reply via email to