Hi, I've applied some changes and created a pull request: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20022
> > > + frames = FFMAX(0, FFMIN(frames, wctx->audio_buffer_fill_size)); > > I would call it samples, sample_count or nb_samples > > why are you cliping the number of samples ? > > I assume run_transcription() would be called with the correct number or am i > missing > something ? When using the VAD option, we want to process only a portion of the total samples stored into the buffer (up to the detected silence). > A bigger problem is that the input frame->pts are not passed through to the > output > srt/json timestamps. > > To understand why this is a problem, consider some audio input device > which samples at 16khz. This hardware contains lets say for simplicity a 16khz > crystal and samples based on that. But depending on temperature of this > crystal it will really sample lets say between 15990 and 16010khz. So > simply counting samples alone is not enough. the frame->pts need to be > used too. > If the subtitles should be perfectly in sync with the video > > Its probably best to give the user the option to produce srt/json times > based purely on sample numbers but also on pts. Ok, let me think about using pts instead. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".