+ Julian, who may be able to help answer. > On Oct 17, 2024, at 2:22 AM, Tamás Zahola via Coreaudio-api > <[email protected]> wrote: > > You can extract the VAD algorithm from WebRTC by starting at this file: > https://chromium.googlesource.com/external/webrtc/stable/src/+/master/common_audio/vad/vad_core.h > > You'll also need some stuff from the common_audio/signal_processing folder, > but otherwise it's self-contained. > >> It's easy for me to get the audio-output-stream for MY app (it just comes in >> over the websocket), but I may wish to toggle whether I want my AEC to be >> cancelling out any output-audio generated by other processes on my mac. > > From macOS Ventura onwards it is possible to capture system audio with the > ScreenCaptureKit framework, although your app will need extra privacy > permissions. > >> It must be possible on macOS, as apps like soundFlower or blackHole are able >> to do it. > > BlackHole and SoundFlower are using an older technique, where they install a > virtual loopback audio device on the system (you can see it listed in Audio > MIDI Settings as e.g. "BlackHole 2 ch"), and change the system's default > output device to that, then capture from the input port of this loopback > device. But this requires installing the virtual device in > /Library/Audio/Plug-Ins/HAL, which requires admin privileges. > >> But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) is that >> it was impossible to access this. But there's now some mention of v3 >> audio-units being able to process inter-app audio. > > On iOS you must use the voice-processing I/O unit. Normal apps cannot capture > the system audio output. Technically there is a way to do it with the > ReplayKit framework, but it's a pain in the ass to use, and the primary > purpose of that framework is capturing screen content, not audio. If you try > e.g. Facebook Messenger on iOS, and initiate screen-sharing in a video call, > that's going to use ReplayKit. > > Regards, > Tamás Zahola > >> On 17 Oct 2024, at 08:04, π via Coreaudio-api <[email protected] >> <mailto:[email protected]>> wrote: >> >> Thankyou for the replies. I am glad to see that this mailing-list is still >> alive, despite the dwindling traffic this last few years. >> >> Can I not encapsulate a VPIO unit, and control the input/output >> audio-streams by implementing input/render callbacks, or making connections? >> >> I'm veering towards this approach of manual implementation: Just to use a >> (misnamed as it's I/O) HALInput unit on macOS or a RemoteIO unit on the >> mobile platforms to access the raw I/O buffers, and write my own pipeline. >> >> Would it be a good idea to use https://github.com/apple/AudioUnitSDK to wrap >> this? My hunch is to minimize the layers/complexity and NOT use this >> framework. >> >> And for the AEC/VAD, can anyone offer a perspective? Arshia? The two obvious >> candidates I see are WebRTC and SpeeX. GPT4o reckons WebRTC will be the >> most-advanced / best-performant solution, with the downside that it's a big >> project (and maybe a more complicated build process), while SpeeX is more >> light-weight and will probably do the job well enough for my purposes. >> >> And as both are open-source, I may have the option of pulling out the >> minimal-dependency files and building just those. >> >> The last question is regarding system-wide audio output. It's easy for me to >> get the audio-output-stream for MY app (it just comes in over the >> websocket), but I may wish to toggle whether I want my AEC to be cancelling >> out any output-audio generated by other processes on my mac. e.g. if I am >> watching a YouTube video, maybe I want my AI to listen to that, and maybe I >> want it subtracted. So do I have the option to listen to SYSTEM-level audio >> output (so as to feed it into my AEC impl)? It must be possible on macOS, as >> apps like soundFlower or blackHole are able to do it. But mobile, I'm not so >> sure. My memory of iPhone audio dev (~2008) is that it was impossible to >> access this. But there's now some mention of v3 audio-units being able to >> process inter-app audio. >> >> π >> >> On Wed, 16 Oct 2024 at 19:35, Arshia Cont via Coreaudio-api >> <[email protected] <mailto:[email protected]>> wrote: >>> Hi π, >>> >>> From my experience that’s not possible. VPIO is an option for the lower >>> level IO device; so is VAD. You don’t have much control over their >>> internals, routing and wirings! Also, from our experience, VPIO has >>> different behaviour on different devices. On some iPads we saw “gating” >>> instead of actually removing echo (be aware of that!). In the end for a >>> similar use-case we ended up doing our own AEC and Activity Detection. >>> >>> Cheers, >>> >>> Arshia Cont >>> metronautapp.com <http://metronautapp.com/> >>> >>> >>> >>>> On 15 Oct 2024, at 18:08, π via Coreaudio-api >>>> <[email protected] <mailto:[email protected]>> >>>> wrote: >>>> >>>> Dear Audio Engineers, >>>> >>>> I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional >>>> realtime audio over websocket with AI serverside). >>>> >>>> To do this, I need to be careful that the AI-speak doesn't make its way >>>> out of the speakers, back in thru the mic, and back to their server (else >>>> it starts to talk to itself, and gets very confused). >>>> >>>> So I need AEC, which I've actually got working, using >>>> kAudioUnitSubType_VoiceProcessingIO and >>>> AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to >>>> False). >>>> >>>> Now I also wish to detect when the speaker (me) is speaking or not >>>> speaking, which I've also managed to do via >>>> kAudioDevicePropertyVoiceActivityDetectionEnable. >>>> >>>> But getting them to play together is another matter, and I'm struggling >>>> hard here. >>>> >>>> I've rigged up a simple test >>>> (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a >>>> 440Hz sinewave is generated in the render-callback, and mic-input is >>>> recorded to file in the input-callback. >>>> >>>> So the AEC works delightfully, subtracting the sinewave and recording my >>>> voice. >>>> And if I turn the sine-wave amplitude down to 0, the VAD correctly >>>> triggers the speech-started and speech-stopped events. >>>> >>>> But if I turn up the sine-wave, it messes up the VAD. >>>> >>>> Presumably the VAD is working over the pre-EchoCancelled audio, which is >>>> most undesirable. >>>> >>>> How can I progress here? >>>> >>>> My thought was to create an audio pipeline, using AUGraph, but my efforts >>>> have thus far been unsuccessful, and I lack confidence that I'm even >>>> pushing in the right direction. >>>> >>>> My thought was to have an IO unit that interfaces with the hardware >>>> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit. >>>> >>>> But I can't see how to set this up. >>>> >>>> On iOS there's a RemoteIO unit to deal with the hardware, but I can't see >>>> any such unit on macOS. It seems the VoiceProcessing unit wants to do that >>>> itself. >>>> >>>> And then I wonder: Could I make a second VoiceProcessing unit, and have >>>> vp1_aec split send its bus[1(mic)].outputScope to >>>> vp2_vad.bus[1].inputScope? >>>> >>>> Can I do this kind of work by routing audio, or do I need to get my hands >>>> dirty with input/render callbacks? >>>> >>>> It feels like I'm going hard against the grain if I am faffing with these >>>> callbacks. >>>> >>>> If there's anyone out there that would care to offer me some guidance >>>> here, I am most grateful! >>>> >>>> π >>>> >>>> PS Is it not a serious problem that VAD can't operate on post-AEC input? >>>> _______________________________________________ >>>> Do not post admin requests to the list. They will be ignored. >>>> Coreaudio-api mailing list ([email protected] >>>> <mailto:[email protected]>) >>>> Help/Unsubscribe/Update your Subscription: >>>> https://lists.apple.com/mailman/options/coreaudio-api/arshiacont%40antescofo.com >>>> >>>> This email sent to [email protected] >>>> <mailto:[email protected]> >>> >>> _______________________________________________ >>> Do not post admin requests to the list. They will be ignored. >>> Coreaudio-api mailing list ([email protected] >>> <mailto:[email protected]>) >>> Help/Unsubscribe/Update your Subscription: >>> https://lists.apple.com/mailman/options/coreaudio-api/pipad.org%40gmail.com >>> >>> This email sent to [email protected] <mailto:[email protected]> >> _______________________________________________ >> Do not post admin requests to the list. They will be ignored. >> Coreaudio-api mailing list ([email protected] >> <mailto:[email protected]>) >> Help/Unsubscribe/Update your Subscription: >> https://lists.apple.com/mailman/options/coreaudio-api/tzahola%40gmail.com >> >> This email sent to [email protected] <mailto:[email protected]> > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Coreaudio-api mailing list ([email protected] > <mailto:[email protected]>) > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/coreaudio-api/ajwlucas%40apple.com > > This email sent to [email protected] <mailto:[email protected]>
_______________________________________________ Do not post admin requests to the list. They will be ignored. Coreaudio-api mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/coreaudio-api/archive%40mail-archive.com
This email sent to [email protected]
