Re: [FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling
On Fri, Apr 28, 2023 at 03:11:16PM +0200, Anton Khirnov wrote: > Quoting Michael Niedermayer (2023-04-28 13:42:17) > > On Thu, Apr 27, 2023 at 04:25:56PM +0200, Anton Khirnov wrote: > > > Stop using InputStream.dts for generating missing timestamps for decoded > > > frames, because it contains pre-decoding timestamps and there may be > > > arbitrary amount of delay between input packets and output frames (e.g. > > > dependent on the thread count when frame threading is used). It is also > > > in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary > > > rounding issues. > > > > > > > > New code maintains a timebase that is the inverse of the LCM of all the > > > samplerates seen so far, and thus can accurately represent every audio > > > > if the LCM fits in int32 > > > > This can hit some pathologic cases though > > consider a 192khz stream that starts with a damaged packet thats read as > > 11.197 khz > > lcm of 192000 and 11197 > 2^31 so the whole stream will then be stuck with > > 11.197khz > > that seems like a bad choice > > the code should favor standard sample rates as well as the higher sample > > rate if > > the lcm is not representable > > > > also if lets say there are 48khz and 48.001khz where again lcm doesnt work > > then a multiple of 48khz may be a better choice than either itself > > Thank you, there are good points. I'm wondering if just picking the LCM > of all common samplerates (28224000 = lcm(192000, 44100)) wouldn't be > sufficient for all these pathological cases. Or do you have a better > general algorithm in mind? Maybe fall back on AV_TIME_BASE instead? 28224000 is an option, so is AV_TIME_BASE. Neither contain 90khz though which is the mpeg-ps/ts "timebase". But i have the feeling if we keep adding things then we will hit 32bit soon. 28224000 is better for exactly representing timestamps, AV_TIME_BASE may be easier for debuging. thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB He who knows, does not speak. He who speaks, does not know. -- Lao Tsu signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling
Quoting Michael Niedermayer (2023-04-28 13:42:17) > On Thu, Apr 27, 2023 at 04:25:56PM +0200, Anton Khirnov wrote: > > Stop using InputStream.dts for generating missing timestamps for decoded > > frames, because it contains pre-decoding timestamps and there may be > > arbitrary amount of delay between input packets and output frames (e.g. > > dependent on the thread count when frame threading is used). It is also > > in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary > > rounding issues. > > > > > New code maintains a timebase that is the inverse of the LCM of all the > > samplerates seen so far, and thus can accurately represent every audio > > if the LCM fits in int32 > > This can hit some pathologic cases though > consider a 192khz stream that starts with a damaged packet thats read as > 11.197 khz > lcm of 192000 and 11197 > 2^31 so the whole stream will then be stuck with > 11.197khz > that seems like a bad choice > the code should favor standard sample rates as well as the higher sample rate > if > the lcm is not representable > > also if lets say there are 48khz and 48.001khz where again lcm doesnt work > then a multiple of 48khz may be a better choice than either itself Thank you, there are good points. I'm wondering if just picking the LCM of all common samplerates (28224000 = lcm(192000, 44100)) wouldn't be sufficient for all these pathological cases. Or do you have a better general algorithm in mind? Maybe fall back on AV_TIME_BASE instead? > also what happens if the audio timestamps are known more precissely than > the audio sample rate ? I'm not sure that is very relevant in real-world uses cases, but I've added this snippet which should address this: // keep the frame timebase if it is strictly better than // the samplerate-defined one if (frame->time_base.num == 1 && frame->time_base.den > tb_new.den && !(frame->time_base.den % tb_new.den)) tb_new = frame->time_base; -- Anton Khirnov ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling
On Thu, Apr 27, 2023 at 04:25:56PM +0200, Anton Khirnov wrote: > Stop using InputStream.dts for generating missing timestamps for decoded > frames, because it contains pre-decoding timestamps and there may be > arbitrary amount of delay between input packets and output frames (e.g. > dependent on the thread count when frame threading is used). It is also > in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary > rounding issues. > > New code maintains a timebase that is the inverse of the LCM of all the > samplerates seen so far, and thus can accurately represent every audio if the LCM fits in int32 This can hit some pathologic cases though consider a 192khz stream that starts with a damaged packet thats read as 11.197 khz lcm of 192000 and 11197 > 2^31 so the whole stream will then be stuck with 11.197khz that seems like a bad choice the code should favor standard sample rates as well as the higher sample rate if the lcm is not representable also if lets say there are 48khz and 48.001khz where again lcm doesnt work then a multiple of 48khz may be a better choice than either itself also what happens if the audio timestamps are known more precissely than the audio sample rate ? thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Opposition brings concord. Out of discord comes the fairest harmony. -- Heraclitus signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling
Stop using InputStream.dts for generating missing timestamps for decoded frames, because it contains pre-decoding timestamps and there may be arbitrary amount of delay between input packets and output frames (e.g. dependent on the thread count when frame threading is used). It is also in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary rounding issues. New code maintains a timebase that is the inverse of the LCM of all the samplerates seen so far, and thus can accurately represent every audio sample. This timebase is used to generate missing timestamps after decoding. Changes the result of the following FATE tests * pcm_dvd-16-5.1-96000 * lavf-smjpeg * adpcm-ima-smjpeg In all of these the timestamps now better correspond to actual frame durations. --- fftools/ffmpeg.c| 90 +++- fftools/ffmpeg.h| 8 +- fftools/ffmpeg_demux.c | 6 +- tests/ref/fate/adpcm-ima-smjpeg | 658 ++-- tests/ref/fate/pcm_dvd-16-5.1-96000 | 8 +- tests/ref/lavf/smjpeg | 2 +- 6 files changed, 415 insertions(+), 357 deletions(-) diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c index 8829a163e0..d074d3e03b 100644 --- a/fftools/ffmpeg.c +++ b/fftools/ffmpeg.c @@ -881,6 +881,77 @@ static int send_frame_to_filters(InputStream *ist, AVFrame *decoded_frame) return ret; } +static AVRational audio_samplerate_update(InputStream *ist, const AVFrame *frame) +{ +const int prev = ist->last_frame_tb.den; +const int sr = frame->sample_rate; + +AVRational tb_new; +int64_t gcd; + +if (frame->sample_rate == ist->last_frame_sample_rate) +goto finish; + +gcd = av_gcd(prev, sr); + +if (prev / gcd >= INT_MAX / sr) { +av_log(ist, AV_LOG_WARNING, + "Audio timestamps cannot be represented exactly after " + "sample rate change: %d -> %d\n", prev, sr); +goto finish; +} +tb_new = (AVRational){ 1, prev / gcd * sr }; + +if (ist->last_frame_pts != AV_NOPTS_VALUE) +ist->last_frame_pts = av_rescale_q(ist->last_frame_pts, + ist->last_frame_tb, tb_new); +ist->last_frame_duration_est = av_rescale_q(ist->last_frame_duration_est, +ist->last_frame_tb, tb_new); + +ist->last_frame_tb = tb_new; +ist->last_frame_sample_rate = frame->sample_rate; + +finish: +return ist->last_frame_tb; +} + +static void audio_ts_process(InputStream *ist, AVFrame *frame) +{ +AVRational tb_filter = (AVRational){1, frame->sample_rate}; +AVRational tb; +int64_t pts_pred; + +// on samplerate change, choose a new internal timebase for timestamp +// generation that can represent timestamps from all the samplerates +// seen so far +tb = audio_samplerate_update(ist, frame); +pts_pred = ist->last_frame_pts == AV_NOPTS_VALUE ? 0 : + ist->last_frame_pts + ist->last_frame_duration_est; + +if (frame->pts == AV_NOPTS_VALUE) { +frame->pts = pts_pred; +frame->time_base = tb; +} else if (ist->last_frame_pts != AV_NOPTS_VALUE && + frame->pts > av_rescale_q_rnd(pts_pred, tb, frame->time_base, + AV_ROUND_UP)) { +// there was a gap in timestamps, reset conversion state +ist->filter_in_rescale_delta_last = AV_NOPTS_VALUE; +} + +frame->pts = av_rescale_delta(frame->time_base, frame->pts, + tb, frame->nb_samples, + &ist->filter_in_rescale_delta_last, tb); + +ist->last_frame_pts = frame->pts; +ist->last_frame_duration_est = av_rescale_q(frame->nb_samples, +tb_filter, tb); + +// finally convert to filtering timebase +frame->pts = av_rescale_q(frame->pts, tb, tb_filter); +frame->duration = frame->nb_samples; +frame->time_base = tb_filter; +} + static int decode_audio(InputStream *ist, AVPacket *pkt, int *got_output, int *decode_failed) { @@ -910,23 +981,7 @@ static int decode_audio(InputStream *ist, AVPacket *pkt, int *got_output, ist->next_dts += ((int64_t)AV_TIME_BASE * decoded_frame->nb_samples) / decoded_frame->sample_rate; -if (decoded_frame->pts == AV_NOPTS_VALUE) { -decoded_frame->pts = ist->dts; -decoded_frame->time_base = AV_TIME_BASE_Q; -} -if (pkt && pkt->duration && ist->prev_pkt_pts != AV_NOPTS_VALUE && -pkt->pts != AV_NOPTS_VALUE && pkt->pts - ist->prev_pkt_pts > pkt->duration) -ist->filter_in_rescale_delta_last = AV_NOPTS_VALUE; -if (pkt) -ist->prev_pkt_pts = pkt->pts; -if (decoded_frame->pts != AV_NOPTS_VALUE) { -AVRational tb_filter = (AVRational){1, decoded_frame->sample_rate}; -decoded_frame->pts = av_rescale_delta(decoded_fram