On Wed, Nov 27, 2019 at 4:50 PM Michael Niedermayer <mich...@niedermayer.cc> wrote:
> On Wed, Nov 27, 2019 at 08:25:13AM +0100, Andreas Rheinhardt wrote: > > On Mon, Nov 25, 2019 at 2:57 PM Michael Niedermayer > <mich...@niedermayer.cc> > > wrote: > > > > > On Sun, Nov 24, 2019 at 11:05:03PM +0100, Andreas Rheinhardt wrote: > > > > On Sun, Nov 24, 2019 at 8:28 PM Michael Niedermayer > > > <mich...@niedermayer.cc> > > > > wrote: > > > > > > > > > On Sun, Nov 24, 2019 at 09:08:00AM +0000, Andreas Rheinhardt wrote: > > > > > > Michael Niedermayer: > > > > > > > On Wed, Nov 06, 2019 at 03:49:02AM +0100, Andreas Rheinhardt > wrote: > > > > > > >> Up until now, the TrackUID of a Matroska track which is > supposed > > > to be > > > > > > >> random was not random at all: It always coincided with the > > > TrackNumber > > > > > > >> which is usually the 1-based index of the corresponding > stream in > > > the > > > > > > >> array of AVStreams. This has been changed: It is now set via > an > > > AVLFG > > > > > > >> if AVFMT_FLAG_BITEXACT is not set. Otherwise it is set like > it is > > > set > > > > > > >> now (the only change happens if an explicit track number has > been > > > > > > >> choosen via dash_track_number, because the system used in the > > > normal > > > > > > >> situation is now used, too). In particular, no FATE tests > need to > > > be > > > > > > >> updated. > > > > > > >> > > > > > > >> This also fixes a bug in case the dash_track_number option was > > > used: > > > > > > >> In this case the TrackUID was set to the track number, but the > > > tags > > > > > were > > > > > > >> written with a TagTrackUID simply based upon the index, so > that > > > the > > > > > tags > > > > > > >> didn't apply to the track they ought to apply to. > > > > > > >> > > > > > > >> Signed-off-by: Andreas Rheinhardt < > andreas.rheinha...@gmail.com> > > > > > > >> --- > > > > > > >> mkv_get_uid() might be overkill, but I simply wanted to be > sure > > > that > > > > > > >> there are no collisions. > > > > > > >> > > > > > > >> libavformat/matroskaenc.c | 65 > > > > > ++++++++++++++++++++++++++++++++++----- > > > > > > >> 1 file changed, 57 insertions(+), 8 deletions(-) > > > > > > >> > > > > > > >> diff --git a/libavformat/matroskaenc.c > b/libavformat/matroskaenc.c > > > > > > >> index de57e474be..b87d15b7ff 100644 > > > > > > >> --- a/libavformat/matroskaenc.c > > > > > > >> +++ b/libavformat/matroskaenc.c > > > > > > >> @@ -94,6 +94,7 @@ typedef struct mkv_cues { > > > > > > >> typedef struct mkv_track { > > > > > > >> int write_dts; > > > > > > >> int has_cue; > > > > > > >> + uint32_t uid; > > > > > > >> int sample_rate; > > > > > > >> int64_t sample_rate_offset; > > > > > > >> int64_t last_timestamp; > > > > > > >> @@ -1199,8 +1200,7 @@ static int > mkv_write_track(AVFormatContext > > > *s, > > > > > MatroskaMuxContext *mkv, > > > > > > >> track = start_ebml_master(pb, MATROSKA_ID_TRACKENTRY, 0); > > > > > > >> put_ebml_uint (pb, MATROSKA_ID_TRACKNUMBER, > > > > > > >> mkv->is_dash ? mkv->dash_track_number : i > + > > > 1); > > > > > > >> - put_ebml_uint (pb, MATROSKA_ID_TRACKUID, > > > > > > >> - mkv->is_dash ? mkv->dash_track_number : i > + > > > 1); > > > > > > >> + put_ebml_uint (pb, MATROSKA_ID_TRACKUID, > mkv->tracks[i].uid); > > > > > > >> put_ebml_uint (pb, MATROSKA_ID_TRACKFLAGLACING , 0); > // no > > > > > lacing (yet) > > > > > > >> > > > > > > >> if ((tag = av_dict_get(st->metadata, "title", NULL, 0))) > > > > > > >> @@ -1650,7 +1650,8 @@ static int > mkv_write_tags(AVFormatContext > > > *s) > > > > > > >> if (!mkv_check_tag(st->metadata, > > > > > MATROSKA_ID_TAGTARGETS_TRACKUID)) > > > > > > >> continue; > > > > > > >> > > > > > > >> - ret = mkv_write_tag(s, st->metadata, > > > > > MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1); > > > > > > >> + ret = mkv_write_tag(s, st->metadata, > > > > > MATROSKA_ID_TAGTARGETS_TRACKUID, > > > > > > >> + mkv->tracks[i].uid); > > > > > > >> if (ret < 0) return ret; > > > > > > >> } > > > > > > >> > > > > > > >> @@ -1658,13 +1659,15 @@ static int > mkv_write_tags(AVFormatContext > > > *s) > > > > > > >> for (i = 0; i < s->nb_streams; i++) { > > > > > > >> AVIOContext *pb; > > > > > > >> AVStream *st = s->streams[i]; > > > > > > >> + mkv_track *track = &mkv->tracks[i]; > > > > > > >> ebml_master tag_target; > > > > > > >> ebml_master tag; > > > > > > >> > > > > > > >> if (st->codecpar->codec_type == > > > AVMEDIA_TYPE_ATTACHMENT) > > > > > > >> continue; > > > > > > >> > > > > > > >> - mkv_write_tag_targets(s, > > > > > MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1, &tag_target); > > > > > > >> + mkv_write_tag_targets(s, > > > MATROSKA_ID_TAGTARGETS_TRACKUID, > > > > > > >> + track->uid, &tag_target); > > > > > > >> pb = mkv->tags_bc; > > > > > > >> > > > > > > >> tag = start_ebml_master(pb, > MATROSKA_ID_SIMPLETAG, > > > 0); > > > > > > >> @@ -1863,10 +1866,6 @@ static int > > > mkv_write_header(AVFormatContext *s) > > > > > > >> version = 4; > > > > > > >> } > > > > > > >> > > > > > > >> - mkv->tracks = av_mallocz_array(s->nb_streams, > > > > > sizeof(*mkv->tracks)); > > > > > > >> - if (!mkv->tracks) { > > > > > > >> - return AVERROR(ENOMEM); > > > > > > >> - } > > > > > > >> ebml_header = start_ebml_master(pb, EBML_ID_HEADER, > > > > > MAX_EBML_HEADER_SIZE); > > > > > > >> put_ebml_uint (pb, EBML_ID_EBMLVERSION , > 1); > > > > > > >> put_ebml_uint (pb, EBML_ID_EBMLREADVERSION , > 1); > > > > > > >> @@ -2667,8 +2666,42 @@ static int webm_query_codec(enum > AVCodecID > > > > > codec_id, int std_compliance) > > > > > > >> return 0; > > > > > > >> } > > > > > > >> > > > > > > >> +static uint32_t mkv_get_uid(const mkv_track *tracks, int i, > > > AVLFG *c) > > > > > > >> +{ > > > > > > >> + uint32_t uid; > > > > > > >> + > > > > > > >> + for (int j = 0, k; j < 5; j++) { > > > > > > >> + uid = av_lfg_get(c); > > > > > > >> + if (!uid) > > > > > > >> + continue; > > > > > > >> + for (k = 0; k < i; k++) { > > > > > > >> + if (tracks[k].uid == uid) { > > > > > > >> + /* Was something wrong with our random seed? > */ > > > > > > >> + av_lfg_init(c, av_get_random_seed()); > > > > > > >> + break; > > > > > > >> + } > > > > > > >> + } > > > > > > >> + if (k == i) > > > > > > >> + return uid; > > > > > > >> + } > > > > > > >> + > > > > > > >> + /* Test the numbers from 1 to i. */ > > > > > > >> + for (int j = 1, k; j < i + 1; j++) { > > > > > > >> + for (k = 0; k < i; k++) { > > > > > > >> + if (tracks[k].uid == j) > > > > > > >> + break; > > > > > > >> + } > > > > > > >> + if (k == i) > > > > > > >> + return j; > > > > > > >> + } > > > > > > >> + /* Return i + 1. It can't coincide with another uid. */ > > > > > > >> + return i + 1; > > > > > > >> +} > > > > > > >> + > > > > > > >> static int mkv_init(struct AVFormatContext *s) > > > > > > >> { > > > > > > >> + MatroskaMuxContext *mkv = s->priv_data; > > > > > > >> + AVLFG c; > > > > > > >> int i; > > > > > > >> > > > > > > >> if (s->nb_streams > MAX_TRACKS) { > > > > > > >> @@ -2697,7 +2730,23 @@ static int mkv_init(struct > AVFormatContext > > > *s) > > > > > > >> s->internal->avoid_negative_ts_use_pts = 1; > > > > > > >> } > > > > > > >> > > > > > > >> + mkv->tracks = av_mallocz_array(s->nb_streams, > > > > > sizeof(*mkv->tracks)); > > > > > > >> + if (!mkv->tracks) { > > > > > > >> + return AVERROR(ENOMEM); > > > > > > >> + } > > > > > > >> + > > > > > > > > > > > > > >> + if (!(s->flags & AVFMT_FLAG_BITEXACT)) > > > > > > >> + av_lfg_init(&c, av_get_random_seed()); > > > > > > > > > > > > > > would there be a disadvantage if the configuration of a stream > / > > > its > > > > > content > > > > > > > is used a seed ? > > > > > > > That way it would be more deterministic > > > > > > > > > > > > > I see several disadvantages: > > > > > > 1. If one duplicates a stream (i.e. muxes it twice in the same > file), > > > > > > they would get the same uid (unless one takes additional > measures). > > > > > > > > > > If you took the configuration of all streams as a single seed that > > > would > > > > > not > > > > > happen > > > > > > > > > > > > > > > > 2. If the packet content is used, one can determine the uid only > > > after > > > > > > having received a packet. This would mean that one has to > postpone > > > > > > writing the Tracks element until one has a packet and the same > goes > > > > > > for the tags given that the uid is used to convey what they > apply to. > > > > > > One could solve this problem in the seekable, non-livestreaming > case > > > > > > by reserving size to be overwritten later, but this is > nontrivial and > > > > > > I'd like to avoid that. > > > > > > 3. If the configuration record is used, then you won't get > anything > > > > > > remotely unique: Using the same encoder settings will likely > produce > > > > > > files with identical uids. > > > > > > > > > > > > Furthermore, this approach means that I can reuse the code for > both > > > > > > attachments as well as tracks. (See the next patch in this > series.) > > > > > > > > > > fair enough, if its considered that avoiding some complexity is > more > > > > > important than deterministic output > > > > > > > > > > > > > Just to be sure: The output will be deterministic if the bitexact > flag is > > > > set. > > > > I don't see a reason for deterministic output if it is not set. > > > > > > the bitexact flag is used for multiple things > > > 1. produce the same output between platforms and minor code changes > > > 2. produce the same output when running the exact same binary with the > > > exact > > > same input > > > > > > This has a few problems > > > code pathes disabled because of 1, cannot be tested easily because > > > if bitexact is enabled the code to be tested is disabled > > > if bitexact is disabled random values are put in the file making it not > > > as easy to see if the code under test produces the same results each > time > > > > > > Another problem is, obscuring problems, because > > > if with a deterministic binary you get different output on 2 runs then > you > > > know you have a bug, a race condition or maybe a uninitialized > variable or > > > such. > > > But with random values stored all over this is no longer vissible > unless > > > one uses bitexact which most people dont do by default > > > so this could cause a decrease in chance detections of some bugs > > > > > > The Matroska muxer only behaves differently when in bitexact mode in > two > > ways: The UIDs are chosen deterministically (or in case of the > SegmentUID: > > not written at all) and certain strings that usually contain the > > libavformat version no longer do so. All these things are in the file > > header. There are no (and won't be) random values stored all over the > > place. The code paths are very small and so I don't think that they will > > obscure any bug. > > If a user encodes the same input twice with the same options and > as its popular uses mkv as output and then notices that the files > are different then with deterministic components she can conclude > there was a bug. With randomly seeded UIDs that conclusion is not > possible anymore. > Retesting with bitexact in this case also doesnt help, the bug > might not occur during 2 bitexact retests > > Iam not saying thats enough of a problem to change something but what > i mean is that its a real thing not have absolutely 0 disadvantage. > > I consider this only a slight disadvantage: E.g. the streamhash or the framehash muxers would provide easy ways to examine whether the actual content of the files (besides the UIDs) are different. > > [...] > > > > > > [...] > > > > > > > > > > > > > and, this > > > > > + /* Test the numbers from 1 to i. */ > > > > > + for (int j = 1, k; j < i + 1; j++) { > > > > > + for (k = 0; k < i; k++) { > > > > > + if (tracks[k].uid == j) > > > > > + break; > > > > > + } > > > > > + if (k == i) > > > > > + return j; > > > > > + } > > > > > > > > > > just smells "wrong" > > > > > > > > > > This code would only be executed if using AVLFG would repeatedly > lead > > > to > > > > collisions; > > > > > > yes but this doesnt smell right > > > if your random number generator produces no random numbers i dont think > > > adding +1 is the solution. > > > I mean if that really happens, theres some bug either in LFG the seed > > > generation > > > or in how it is used. > > > > > > Yes, this code was only intended to be run if the random number > generator > > is completely buggy. But it seems that this can unfortunately not ruled > out > > completely ( > > > > https://www.phoronix.com/scan.php?page=news_item&px=AMD-RdRand-Disable-15h-16h > > ). > > The fix to this does not belong into a caller of av_get_random_seed() > if thats fixed / worked around in CPU, Bios, OS, or in av_get_random_seed() > i dont know but definitly not the user of av_get_random_seed(). > If av_get_random_seed() is bad thats bad and needs to be fixed > > Also lets assume av_get_random_seed() is bad and returns -1 always > or another constant. > The code still must not fail like this. the seeded PRNG must still > function properly and produce a few differentg values. > If its statistics are 100% duplicates thats a failure of the PRNG. > > First, are you saying that a caller can rely on the values being different so that I can simply remove the last-resort code and assert that everything is fine? (Or should I let the initial loop using AVLFG run until it has found a good value in the hope that it will terminate eventually?) Second, given that only one call to av_get_random_seed() is ever performed, the question of whether it always returns -1 is irrelevant here: If some seed exists that leads to these collisions, then it is possible for these collisions to happen even when the random seed is truly random (in such a case, it would be unlikely, of course). Is there such a seed? I don't know. I only know that if the state of the AVLFG is completely zero, then av_lfg_get() will always return zero. This is the only state with the property that the state is not altered by any av_lfg_get(), but this does not preclude the output to be e.g. periodic. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".