Re: [FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling

2023-04-28 Thread Michael Niedermayer
On Fri, Apr 28, 2023 at 03:11:16PM +0200, Anton Khirnov wrote:
> Quoting Michael Niedermayer (2023-04-28 13:42:17)
> > On Thu, Apr 27, 2023 at 04:25:56PM +0200, Anton Khirnov wrote:
> > > Stop using InputStream.dts for generating missing timestamps for decoded
> > > frames, because it contains pre-decoding timestamps and there may be
> > > arbitrary amount of delay between input packets and output frames (e.g.
> > > dependent on the thread count when frame threading is used). It is also
> > > in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary
> > > rounding issues.
> > > 
> > 
> > > New code maintains a timebase that is the inverse of the LCM of all the
> > > samplerates seen so far, and thus can accurately represent every audio
> > 
> > if the LCM fits in int32
> > 
> > This can hit some pathologic cases though
> > consider a 192khz stream that starts with a damaged packet thats read as 
> > 11.197 khz
> > lcm of 192000 and 11197 > 2^31 so the whole stream will then be stuck with 
> > 11.197khz
> > that seems like a bad choice
> > the code should favor standard sample rates as well as the higher sample 
> > rate if
> > the lcm is not representable
> > 
> > also if lets say there are 48khz and 48.001khz where again lcm doesnt work 
> > then a multiple of 48khz may be a better choice than either itself
> 
> Thank you, there are good points. I'm wondering if just picking the LCM
> of all common samplerates (28224000 = lcm(192000, 44100)) wouldn't be
> sufficient for all these pathological cases. Or do you have a better
> general algorithm in mind? Maybe fall back on AV_TIME_BASE instead?

28224000 is an option, so is AV_TIME_BASE. Neither contain 90khz though 
which is the mpeg-ps/ts "timebase". But i have the feeling if we keep adding
things then we will hit 32bit soon. 
28224000 is better for exactly representing timestamps, AV_TIME_BASE may be
easier for debuging.

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling

2023-04-28 Thread Anton Khirnov
Quoting Michael Niedermayer (2023-04-28 13:42:17)
> On Thu, Apr 27, 2023 at 04:25:56PM +0200, Anton Khirnov wrote:
> > Stop using InputStream.dts for generating missing timestamps for decoded
> > frames, because it contains pre-decoding timestamps and there may be
> > arbitrary amount of delay between input packets and output frames (e.g.
> > dependent on the thread count when frame threading is used). It is also
> > in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary
> > rounding issues.
> > 
> 
> > New code maintains a timebase that is the inverse of the LCM of all the
> > samplerates seen so far, and thus can accurately represent every audio
> 
> if the LCM fits in int32
> 
> This can hit some pathologic cases though
> consider a 192khz stream that starts with a damaged packet thats read as 
> 11.197 khz
> lcm of 192000 and 11197 > 2^31 so the whole stream will then be stuck with 
> 11.197khz
> that seems like a bad choice
> the code should favor standard sample rates as well as the higher sample rate 
> if
> the lcm is not representable
> 
> also if lets say there are 48khz and 48.001khz where again lcm doesnt work 
> then a multiple of 48khz may be a better choice than either itself

Thank you, there are good points. I'm wondering if just picking the LCM
of all common samplerates (28224000 = lcm(192000, 44100)) wouldn't be
sufficient for all these pathological cases. Or do you have a better
general algorithm in mind? Maybe fall back on AV_TIME_BASE instead?

> also what happens if the audio timestamps are known more precissely than
> the audio sample rate ?

I'm not sure that is very relevant in real-world uses cases, but I've
added this snippet which should address this:

// keep the frame timebase if it is strictly better than
// the samplerate-defined one
if (frame->time_base.num == 1 && frame->time_base.den > tb_new.den &&
!(frame->time_base.den % tb_new.den))
tb_new = frame->time_base;

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling

2023-04-28 Thread Michael Niedermayer
On Thu, Apr 27, 2023 at 04:25:56PM +0200, Anton Khirnov wrote:
> Stop using InputStream.dts for generating missing timestamps for decoded
> frames, because it contains pre-decoding timestamps and there may be
> arbitrary amount of delay between input packets and output frames (e.g.
> dependent on the thread count when frame threading is used). It is also
> in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary
> rounding issues.
> 

> New code maintains a timebase that is the inverse of the LCM of all the
> samplerates seen so far, and thus can accurately represent every audio

if the LCM fits in int32

This can hit some pathologic cases though
consider a 192khz stream that starts with a damaged packet thats read as 11.197 
khz
lcm of 192000 and 11197 > 2^31 so the whole stream will then be stuck with 
11.197khz
that seems like a bad choice
the code should favor standard sample rates as well as the higher sample rate if
the lcm is not representable

also if lets say there are 48khz and 48.001khz where again lcm doesnt work 
then a multiple of 48khz may be a better choice than either itself

also what happens if the audio timestamps are known more precissely than
the audio sample rate ?

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 16/21] fftools/ffmpeg: rework audio-decode timestamp handling

2023-04-27 Thread Anton Khirnov
Stop using InputStream.dts for generating missing timestamps for decoded
frames, because it contains pre-decoding timestamps and there may be
arbitrary amount of delay between input packets and output frames (e.g.
dependent on the thread count when frame threading is used). It is also
in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary
rounding issues.

New code maintains a timebase that is the inverse of the LCM of all the
samplerates seen so far, and thus can accurately represent every audio
sample. This timebase is used to generate missing timestamps after
decoding.

Changes the result of the following FATE tests
* pcm_dvd-16-5.1-96000
* lavf-smjpeg
* adpcm-ima-smjpeg
In all of these the timestamps now better correspond to actual frame
durations.
---
 fftools/ffmpeg.c|  90 +++-
 fftools/ffmpeg.h|   8 +-
 fftools/ffmpeg_demux.c  |   6 +-
 tests/ref/fate/adpcm-ima-smjpeg | 658 ++--
 tests/ref/fate/pcm_dvd-16-5.1-96000 |   8 +-
 tests/ref/lavf/smjpeg   |   2 +-
 6 files changed, 415 insertions(+), 357 deletions(-)

diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 8829a163e0..d074d3e03b 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -881,6 +881,77 @@ static int send_frame_to_filters(InputStream *ist, AVFrame 
*decoded_frame)
 return ret;
 }
 
+static AVRational audio_samplerate_update(InputStream *ist, const AVFrame 
*frame)
+{
+const int prev = ist->last_frame_tb.den;
+const int sr   = frame->sample_rate;
+
+AVRational tb_new;
+int64_t gcd;
+
+if (frame->sample_rate == ist->last_frame_sample_rate)
+goto finish;
+
+gcd  = av_gcd(prev, sr);
+
+if (prev / gcd >= INT_MAX / sr) {
+av_log(ist, AV_LOG_WARNING,
+   "Audio timestamps cannot be represented exactly after "
+   "sample rate change: %d -> %d\n", prev, sr);
+goto finish;
+}
+tb_new = (AVRational){ 1, prev / gcd * sr };
+
+if (ist->last_frame_pts != AV_NOPTS_VALUE)
+ist->last_frame_pts = av_rescale_q(ist->last_frame_pts,
+   ist->last_frame_tb, tb_new);
+ist->last_frame_duration_est = av_rescale_q(ist->last_frame_duration_est,
+ist->last_frame_tb, tb_new);
+
+ist->last_frame_tb  = tb_new;
+ist->last_frame_sample_rate = frame->sample_rate;
+
+finish:
+return ist->last_frame_tb;
+}
+
+static void audio_ts_process(InputStream *ist, AVFrame *frame)
+{
+AVRational tb_filter = (AVRational){1, frame->sample_rate};
+AVRational tb;
+int64_t pts_pred;
+
+// on samplerate change, choose a new internal timebase for timestamp
+// generation that can represent timestamps from all the samplerates
+// seen so far
+tb = audio_samplerate_update(ist, frame);
+pts_pred = ist->last_frame_pts == AV_NOPTS_VALUE ? 0 :
+   ist->last_frame_pts + ist->last_frame_duration_est;
+
+if (frame->pts == AV_NOPTS_VALUE) {
+frame->pts = pts_pred;
+frame->time_base = tb;
+} else if (ist->last_frame_pts != AV_NOPTS_VALUE &&
+   frame->pts > av_rescale_q_rnd(pts_pred, tb, frame->time_base,
+ AV_ROUND_UP)) {
+// there was a gap in timestamps, reset conversion state
+ist->filter_in_rescale_delta_last = AV_NOPTS_VALUE;
+}
+
+frame->pts = av_rescale_delta(frame->time_base, frame->pts,
+  tb, frame->nb_samples,
+  &ist->filter_in_rescale_delta_last, tb);
+
+ist->last_frame_pts  = frame->pts;
+ist->last_frame_duration_est = av_rescale_q(frame->nb_samples,
+tb_filter, tb);
+
+// finally convert to filtering timebase
+frame->pts   = av_rescale_q(frame->pts, tb, tb_filter);
+frame->duration  = frame->nb_samples;
+frame->time_base = tb_filter;
+}
+
 static int decode_audio(InputStream *ist, AVPacket *pkt, int *got_output,
 int *decode_failed)
 {
@@ -910,23 +981,7 @@ static int decode_audio(InputStream *ist, AVPacket *pkt, 
int *got_output,
 ist->next_dts += ((int64_t)AV_TIME_BASE * decoded_frame->nb_samples) /
  decoded_frame->sample_rate;
 
-if (decoded_frame->pts == AV_NOPTS_VALUE) {
-decoded_frame->pts = ist->dts;
-decoded_frame->time_base = AV_TIME_BASE_Q;
-}
-if (pkt && pkt->duration && ist->prev_pkt_pts != AV_NOPTS_VALUE &&
-pkt->pts != AV_NOPTS_VALUE && pkt->pts - ist->prev_pkt_pts > 
pkt->duration)
-ist->filter_in_rescale_delta_last = AV_NOPTS_VALUE;
-if (pkt)
-ist->prev_pkt_pts = pkt->pts;
-if (decoded_frame->pts != AV_NOPTS_VALUE) {
-AVRational tb_filter = (AVRational){1, decoded_frame->sample_rate};
-decoded_frame->pts = av_rescale_delta(decoded_fram