Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

Vittorio Palmisano Wed, 23 Jul 2025 01:50:50 -0700

Hi,
I've applied some changes and created a pull request:
https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20022


>
> > +    frames = FFMAX(0, FFMIN(frames, wctx->audio_buffer_fill_size));
>
> I would call it samples, sample_count or nb_samples
>
> why are you cliping the number of samples ?
>
> I assume run_transcription() would be called with the correct number or am i 
> missing
> something ?

When using the VAD option, we want to process only a portion of the
total samples stored into the buffer (up to the detected silence).


> A bigger problem is that the input frame->pts are not passed through to the 
> output
> srt/json timestamps.
>
> To understand why this is a problem, consider some audio input device
> which samples at 16khz. This hardware contains lets say for simplicity a 16khz
> crystal and samples based on that. But depending on temperature of this
> crystal it will really sample lets say between 15990 and 16010khz. So
> simply counting samples alone is not enough. the frame->pts need to be
> used too.
> If the subtitles should be perfectly in sync with the video
>
> Its probably best to give the user the option to produce srt/json times
> based purely on sample numbers but also on pts.

Ok, let me think about using pts instead.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavfilter: Whisper audio filter

Reply via email to