> So if I want something that works on all 3, I kinda need to roll my own > AEC+VAD.
If you insist on running the same audio code on all 3 devices, then yes. > I'm struggling really hard to extract aec3 out of WebRTC. Whereas VAD was > pretty straight forward, aec3 seems to have a dependency on something called > abseil. abseil should be pretty much self-contained. You can get it from here if you can't set up the WebRTC dependencies: https://github.com/abseil/abseil-cpp I'm afraid I can't address the other questions you've raised about AEC algorithms. I would recommend going with the simplest solution, and only start digging into research papers if that doesn't work for your purposes. Regards, Tamás Zahola > On 18 Oct 2024, at 16:09, π via Coreaudio-api <[email protected]> > wrote: > > Yikes! > > Well, the purpose of my project was to investigate the possibilities of using > OpenAI's realtime API on Apple tech, and I'm indeed discovering the gotchas. > > So, IIUC: > - on macOS I can get beneath the AudioUnit level and go straight to > AudioDevice; down to the wire, so to speak. And roll my own AEC & VAD. > Alternatively I can use VoiceProcessingIO AudioUnit which gives me AEC & VAD > tho' they don't play nice together, but if I roll my own VAD (using the > WebRTC code) I'm good to go. > > - on iOS I can't get at the AudioDevice, but still have the VoiceProcessingIO > technique available as above. Alternatively I could use RemoteIO AudioUnit > and roll my own AEC+VAD. But then I'm not getting system audio-out. hum ho. > liveable-withable. > > - on WatchOS, we don't have AudioDevice OR VoiceProcessingIO audioUnit, but > we DO still have RemoteIO audiounit. > > So if I want something that works on all 3, I kinda need to roll my own > AEC+VAD. > > I'm struggling really hard to extract aec3 out of WebRTC. Whereas VAD was > pretty straight forward, aec3 seems to have a dependency on something called > abseil. > > It seems AEC is far from a "Solved Problem". I see > https://www.microsoft.com/en-us/research/academic-program/acoustic-echo-cancellation-challenge-icassp-2023/ > Microsoft have recently (2023) issued a challenge inviting novel AEC > solutions (presumably the current AI boom is gona shake loose some new > approaches), but as an outsider I don't get to see the submissions. e.g. the > winning non-microsoft entry is behind a paywall > https://ieeexplore.ieee.org/document/10096411 (though maybe the same as > https://arxiv.org/pdf/2303.06828). > > I wonder whether "cheating" buys much; i.e. emitting a periodic sweep/chirp > from the speakers to estimate the impulse-response of the acoustic > environment, in order to deduce an inverse-IR. Then I think the AEC is just > applying that, possibly together with some delay to compensate for I/O > latency. > > Does anyone have an intuition whether it's even sensible to be considering > realtime AEC on WatchOS? Just from a performance PoV it might rinse out the > battery really fast. > > π > > On Fri, 18 Oct 2024 at 11:55, Tamás Zahola via Coreaudio-api > <[email protected] <mailto:[email protected]>> wrote: >> Hold on a sec, how are you planning to use the AudioDevice VAD on watchOS? >> It is a macOS-only API. It's not available on watchOS, neither on iOS. >> >> Now, considering what Julian wrote, I think your problem might he that >> you're using the VPIO unit in conjunction with the AudioDevice VAD. Because >> if what Julian wrote is true, that AudioDevice already has echo cancellation >> when VAD is enabled, then what could be happening is that your output signal >> is subtracted *twice* from the input: first by the echo canceller of >> AudioDevice, and then by the VPIO unit. So in effect the VPIO unit ends up >> re-adding the echo with inverted phase. >> >> I would recommend trying just AudioDevice directlt, without the VPIO unit. >> >> Obviously, this is all macOS-only. On iOS (and I guess watchOS) you only >> have AudioUnits, so you must use your own VAD. >> >> Regards, >> Tamás Zahola >> >>> On 2024. Oct 18., at 12:30, π via Coreaudio-api >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>> >>> >>> Thanks for the pointer Tamás! >>> >>> Pulling out VAD from WebRTC worked a treat. >>> >>> I started with https://github.com/daanzu/py-webrtcvad-wheels and knocked >>> together a hello.cpp and CMakeLists.txt >>> (https://gist.github.com/p-i-/598da13d2a1a1e2a6ec978e15fa7d892) >>> >>> I have to say, it feels hella awkward that I cannot control the pipeline >>> and use native AudioUnits for this kind of work. >>> >>> Surely it is a mistake on Apple's part to put VAD before AEC, if this is >>> really what they're doing... it's gona trigger VAD callback on >>> incoming/remote audio, rather than user-speech. >>> >>> For a low-power usage scenario (say WatchOS), I really want to be >>> dynamically rerouting -- if there's no audio being sent thru the speaker, I >>> don't want AEC eating CPU cycles, but I DO want VAD detecting user-speech >>> onset. And if audio IS being sent thru the speaker, I want AEC to be >>> subtracting it, and VAD to be operating on this "cleaned" mic-input. I'd >>> love it if VoiceProcessingIO unit took care of all of this. >>> >>> I haven't yet managed to scientifically determine exactly what >>> VoiceProcessingIO unit is actually doing, but if I engage its AEC and VAD >>> and play a sine-wave, it disturbs the VAD callbacks, yet successfully >>> subtracts the sinewave from mic-audio. So I strongly suspect they have >>> these two subcomponents wired up in the wrong order. >>> >>> If this is indeed the case, is there any liklihood of a future fix? Do >>> Apple core-audio devs listen in on this list? >>> >>> π >>> >>> On Thu, 17 Oct 2024 at 10:24, Tamás Zahola via Coreaudio-api >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>>> You can extract the VAD algorithm from WebRTC by starting at this file: >>>> https://chromium.googlesource.com/external/webrtc/stable/src/+/master/common_audio/vad/vad_core.h >>>> >>>> You'll also need some stuff from the common_audio/signal_processing >>>> folder, but otherwise it's self-contained. >>>> >>>>> It's easy for me to get the audio-output-stream for MY app (it just comes >>>>> in over the websocket), but I may wish to toggle whether I want my AEC to >>>>> be cancelling out any output-audio generated by other processes on my mac. >>>> >>>> From macOS Ventura onwards it is possible to capture system audio with the >>>> ScreenCaptureKit framework, although your app will need extra privacy >>>> permissions. >>>> >>>>> It must be possible on macOS, as apps like soundFlower or blackHole are >>>>> able to do it. >>>> >>>> BlackHole and SoundFlower are using an older technique, where they install >>>> a virtual loopback audio device on the system (you can see it listed in >>>> Audio MIDI Settings as e.g. "BlackHole 2 ch"), and change the system's >>>> default output device to that, then capture from the input port of this >>>> loopback device. But this requires installing the virtual device in >>>> /Library/Audio/Plug-Ins/HAL, which requires admin privileges. >>>> >>>>> But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) is >>>>> that it was impossible to access this. But there's now some mention of v3 >>>>> audio-units being able to process inter-app audio. >>>> >>>> On iOS you must use the voice-processing I/O unit. Normal apps cannot >>>> capture the system audio output. Technically there is a way to do it with >>>> the ReplayKit framework, but it's a pain in the ass to use, and the >>>> primary purpose of that framework is capturing screen content, not audio. >>>> If you try e.g. Facebook Messenger on iOS, and initiate screen-sharing in >>>> a video call, that's going to use ReplayKit. >>>> >>>> Regards, >>>> Tamás Zahola >>>> >>>>> On 17 Oct 2024, at 08:04, π via Coreaudio-api >>>>> <[email protected] <mailto:[email protected]>> >>>>> wrote: >>>>> >>>>> Thankyou for the replies. I am glad to see that this mailing-list is >>>>> still alive, despite the dwindling traffic this last few years. >>>>> >>>>> Can I not encapsulate a VPIO unit, and control the input/output >>>>> audio-streams by implementing input/render callbacks, or making >>>>> connections? >>>>> >>>>> I'm veering towards this approach of manual implementation: Just to use a >>>>> (misnamed as it's I/O) HALInput unit on macOS or a RemoteIO unit on the >>>>> mobile platforms to access the raw I/O buffers, and write my own pipeline. >>>>> >>>>> Would it be a good idea to use https://github.com/apple/AudioUnitSDK to >>>>> wrap this? My hunch is to minimize the layers/complexity and NOT use this >>>>> framework. >>>>> >>>>> And for the AEC/VAD, can anyone offer a perspective? Arshia? The two >>>>> obvious candidates I see are WebRTC and SpeeX. GPT4o reckons WebRTC will >>>>> be the most-advanced / best-performant solution, with the downside that >>>>> it's a big project (and maybe a more complicated build process), while >>>>> SpeeX is more light-weight and will probably do the job well enough for >>>>> my purposes. >>>>> >>>>> And as both are open-source, I may have the option of pulling out the >>>>> minimal-dependency files and building just those. >>>>> >>>>> The last question is regarding system-wide audio output. It's easy for me >>>>> to get the audio-output-stream for MY app (it just comes in over the >>>>> websocket), but I may wish to toggle whether I want my AEC to be >>>>> cancelling out any output-audio generated by other processes on my mac. >>>>> e.g. if I am watching a YouTube video, maybe I want my AI to listen to >>>>> that, and maybe I want it subtracted. So do I have the option to listen >>>>> to SYSTEM-level audio output (so as to feed it into my AEC impl)? It must >>>>> be possible on macOS, as apps like soundFlower or blackHole are able to >>>>> do it. But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) >>>>> is that it was impossible to access this. But there's now some mention of >>>>> v3 audio-units being able to process inter-app audio. >>>>> >>>>> π >>>>> >>>>> On Wed, 16 Oct 2024 at 19:35, Arshia Cont via Coreaudio-api >>>>> <[email protected] <mailto:[email protected]>> >>>>> wrote: >>>>>> Hi π, >>>>>> >>>>>> From my experience that’s not possible. VPIO is an option for the lower >>>>>> level IO device; so is VAD. You don’t have much control over their >>>>>> internals, routing and wirings! Also, from our experience, VPIO has >>>>>> different behaviour on different devices. On some iPads we saw “gating” >>>>>> instead of actually removing echo (be aware of that!). In the end for a >>>>>> similar use-case we ended up doing our own AEC and Activity Detection. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Arshia Cont >>>>>> metronautapp.com <http://metronautapp.com/> >>>>>> >>>>>> >>>>>> >>>>>>> On 15 Oct 2024, at 18:08, π via Coreaudio-api >>>>>>> <[email protected] <mailto:[email protected]>> >>>>>>> wrote: >>>>>>> >>>>>>> Dear Audio Engineers, >>>>>>> >>>>>>> I'm writing an app to interact with OpenAI's 'realtime' API >>>>>>> (bidirectional realtime audio over websocket with AI serverside). >>>>>>> >>>>>>> To do this, I need to be careful that the AI-speak doesn't make its way >>>>>>> out of the speakers, back in thru the mic, and back to their server >>>>>>> (else it starts to talk to itself, and gets very confused). >>>>>>> >>>>>>> So I need AEC, which I've actually got working, using >>>>>>> kAudioUnitSubType_VoiceProcessingIO and >>>>>>> AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting >>>>>>> to False). >>>>>>> >>>>>>> Now I also wish to detect when the speaker (me) is speaking or not >>>>>>> speaking, which I've also managed to do via >>>>>>> kAudioDevicePropertyVoiceActivityDetectionEnable. >>>>>>> >>>>>>> But getting them to play together is another matter, and I'm struggling >>>>>>> hard here. >>>>>>> >>>>>>> I've rigged up a simple test >>>>>>> (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where >>>>>>> a 440Hz sinewave is generated in the render-callback, and mic-input is >>>>>>> recorded to file in the input-callback. >>>>>>> >>>>>>> So the AEC works delightfully, subtracting the sinewave and recording >>>>>>> my voice. >>>>>>> And if I turn the sine-wave amplitude down to 0, the VAD correctly >>>>>>> triggers the speech-started and speech-stopped events. >>>>>>> >>>>>>> But if I turn up the sine-wave, it messes up the VAD. >>>>>>> >>>>>>> Presumably the VAD is working over the pre-EchoCancelled audio, which >>>>>>> is most undesirable. >>>>>>> >>>>>>> How can I progress here? >>>>>>> >>>>>>> My thought was to create an audio pipeline, using AUGraph, but my >>>>>>> efforts have thus far been unsuccessful, and I lack confidence that I'm >>>>>>> even pushing in the right direction. >>>>>>> >>>>>>> My thought was to have an IO unit that interfaces with the hardware >>>>>>> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit. >>>>>>> >>>>>>> But I can't see how to set this up. >>>>>>> >>>>>>> On iOS there's a RemoteIO unit to deal with the hardware, but I can't >>>>>>> see any such unit on macOS. It seems the VoiceProcessing unit wants to >>>>>>> do that itself. >>>>>>> >>>>>>> And then I wonder: Could I make a second VoiceProcessing unit, and have >>>>>>> vp1_aec split send its bus[1(mic)].outputScope to >>>>>>> vp2_vad.bus[1].inputScope? >>>>>>> >>>>>>> Can I do this kind of work by routing audio, or do I need to get my >>>>>>> hands dirty with input/render callbacks? >>>>>>> >>>>>>> It feels like I'm going hard against the grain if I am faffing with >>>>>>> these callbacks. >>>>>>> >>>>>>> If there's anyone out there that would care to offer me some guidance >>>>>>> here, I am most grateful! >>>>>>> >>>>>>> π >>>>>>> >>>>>>> PS Is it not a serious problem that VAD can't operate on post-AEC input? >>>>>>> _______________________________________________ >>>>>>> Do not post admin requests to the list. They will be ignored. >>>>>>> Coreaudio-api mailing list ([email protected] >>>>>>> <mailto:[email protected]>) >>>>>>> Help/Unsubscribe/Update your Subscription: >>>>>>> https://lists.apple.com/mailman/options/coreaudio-api/arshiacont%40antescofo.com >>>>>>> >>>>>>> This email sent to [email protected] >>>>>>> <mailto:[email protected]> >>>>>> >>>>>> _______________________________________________ >>>>>> Do not post admin requests to the list. They will be ignored. >>>>>> Coreaudio-api mailing list ([email protected] >>>>>> <mailto:[email protected]>) >>>>>> Help/Unsubscribe/Update your Subscription: >>>>>> https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com >>>>>> >>>>>> This email sent to [email protected] <mailto:[email protected]> >>>>> _______________________________________________ >>>>> Do not post admin requests to the list. They will be ignored. >>>>> Coreaudio-api mailing list ([email protected] >>>>> <mailto:[email protected]>) >>>>> Help/Unsubscribe/Update your Subscription: >>>>> https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com >>>>> >>>>> This email sent to [email protected] <mailto:[email protected]> >>>> _______________________________________________ >>>> Do not post admin requests to the list. They will be ignored. >>>> Coreaudio-api mailing list ([email protected] >>>> <mailto:[email protected]>) >>>> Help/Unsubscribe/Update your Subscription: >>>> https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com >>>> >>>> This email sent to [email protected] <mailto:[email protected]> >>> _______________________________________________ >>> Do not post admin requests to the list. They will be ignored. >>> Coreaudio-api mailing list ([email protected] >>> <mailto:[email protected]>) >>> Help/Unsubscribe/Update your Subscription: >>> https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com >>> >>> This email sent to [email protected] <mailto:[email protected]> >> _______________________________________________ >> Do not post admin requests to the list. They will be ignored. >> Coreaudio-api mailing list ([email protected] >> <mailto:[email protected]>) >> Help/Unsubscribe/Update your Subscription: >> https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com >> >> This email sent to [email protected] <mailto:[email protected]> > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Coreaudio-api mailing list ([email protected]) > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com > > This email sent to [email protected]
_______________________________________________ Do not post admin requests to the list. They will be ignored. Coreaudio-api mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/coreaudio-api/archive%40mail-archive.com This email sent to [email protected]
