I just came across the article "Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges" Per E Kummervold, Javier de la Rosa, Freddy Wetjen, Rolv-Arild Braaten and Per Erik Solberg, <URL:https://arxiv.org/pdf/2402.01917.pdf>.
I found this quote particularly interesting: Although the original PyTorch training code was not released by OpenAI, a collaborative effort with HuggingFace led to an alternative implementation in the Transformers library. This has also been adapted for Jax. The project participated in developing and open-sourcing training scripts for TPU-v4-pods, enabling dynamic changes to the training data during runtime (The National Library of Norway, 2024). The reference point to <URL: https://www.github.com/NbAiLab/nostram >. I have not investigated further. Perhaps the alternative implementation can be used to make a model from scratch and provide source for the files requested by the ftpmasters? Unrelated to this, there is an alternative implementation using the whisper models called whisper.cpp, available from <URL: https://github.com/ggerganov/whisper.cpp.git >. It might be easier to package than the openai whisper implementation. -- Happy hacking Petter Reinholdtsen