Hi guys, I am looking for some advice on how to use a speech-to-text model with a fpc program designed to teach reading of invented words composed from 8 brazilian portuguese phonemes (four consonants and fours vowels).
So, right now ( https://github.com/cpicanco/stimulus-control-sdl2/blob/hanna/src/sdl.app.audio.recorder.devices.pas) the program uses SDL2 to record short 4-5s audio streams and save each recording to a wav file using fpwavwriter. Each audio stream/file is supposed to be a word spoken by a student during a recording/playback session of a word presented on screen. The participant will click a button to finish the session. Then, the program will start a speech-to-text routine and give some feedback. There will be two speech-to-text routines. The first one will be a human transcription (nothing new here for me). The second one will be an IA transcription. I am looking for an approach to read the raw stream (or the saved file if no direct stream support) and pass it to a speech IA model (for example, whisper) and then return some text output for further processing. Using python, Whisper Medium (multilanguage), I got some good (although slow) results without any fine tuning. However, I am considering using Transformers if any fine tuning turns out to be necessary. So, in this context, what would be "the way to go" for using the final model with free pascal? Calling a script with TProcess? Please, can you shed some light on here? Best regards, R
_______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal