Hello, here is the patchset for the Matroska demuxer that I already announced last week [1]. It is an improvement upon an earlier patchset from March [2] which has been kindly reviewed by Steve Lhomme. Just as the earlier attempt, it is intended to get rid of a bogus error message introduced in 9326117b.
1. Here is a recap of why this bogus error message exists: a) The ebml parser contained in the Matroska muxer basically has two modes of operation: It can parse simple elements and it can parse whole master elements all at once. The latter is of course implemented via the former. Also one can designate certain element IDs as "EBML_STOP" type so that when such an element is encountered, parsing immediately stops without parsing the length and performing the checks (regarding whether an element is truly contained in the master element it is supposed to be contained in --these checks were introduced in 9326117b). b) Normal parsing (via matroska_parse_cluster_incremental) uses a syntax list for parsing that contains elements at different levels in the hierarchy: The elements that can be found in a cluster and (certain) level 1 elements (i.e. the level of the cluster); in particular, it contains a cluster as EBML_STOP type element. When a cluster is encountered and ebml_parse returns, the current level is ended (if there already was an open cluster) and the new cluster is entered (via a designated syntax list containing a cluster as master element) and parsed until a SimpleBlock or a BlockGroup is encountered (these two are of course EBML_STOP elements in the used syntax). Then the first syntax list (the one containing both the cluster as well as the elements permitted in a cluster as entries) is used again for parsing of said blocks and all the blocks that follow until one encounters another cluster or an error occurs. c) The problem with this approach is that clusters are not closed automatically when they are finished, but only when another cluster is encountered. This works (i.e. yields no error message) when a cluster follows another cluster (because no checks are performed for EBML_STOP type elements), but if e.g. the cues (the index) is located after a cluster (as happens with most Matroska files in existence), then parsing the cues will yield an error: After all, the preceding cluster ended right before the cues began, but the level hasn't been ended, so the length check will fail if the preceding cluster wasn't an unknown-length cluster. (In the latter case, the cues are considered part of the preceding cluster by the current code.) That's the reason behind this (harmless) error. 2. a) When ebml_parse parses a Master element by itself, it also takes care of ending the level (i.e. the level automatically entered during parsing of the master element) itself; but there is no check at the end of ebml_parse whether a level has just ended right after the last element; if there were, then the above problem wouldn't exist. b) So my approach is to add such a check and make ebml_parse end the levels all by itself; the Matroska functions shouldn't have to end levels manually (as happens currently when a new cluster is encountered), but the EBML functions would do so given that the levels are an EBML concept after all. c) But all is slightly complicated by the fact that there also can be unknown-length master elements. These end when an element not allowed in them but on a higher level is encountered. So in this situation it is impossible to perform the check whether a level has ended after consuming an element; instead, said check needs to be performed directly after the id of the next element is read, without reading any more data if it turns out that said element belongs to a higher level. And that's the way it has been implemented. 3. I have also added improvements to various checks and other things. You can find them explained in the various commit messages. Just two things: a) I am unsure how to check whether a read error or attempted reading beyond EOF should be checked: Via avio_feof(pb) or via pb->eof_reached? The former tries to read again (refill the buffer) when pb->eof_reached was already set; does this mean that if one wants to know whether a read was not successfull one should check pb->eof_reached, but when one is more interested into whether one can perform future reads, one should use avio_feof (after all, in case of livestreaming, new data might have arrived after the earlier failed read)? That's at least the interpretation that I had when I wrote the patchset. b) ffio_limit is used to check for whether there is enough data left to skip if an element intended to be skipped is encountered. There are three issues with this function: i) It takes an int, although it should be an int64_t for our needs. ii) It emits its own error message (in case it fails) which is not appropriate for our needs (it claims to be truncating a packet, but in our case, no packet is truncated; instead it is treated as an indication that an error happened and a resync is performed). iii) For some reason it allows the remaining size to be one less than the size given as argument. Not understanding iii) is what made me hesitate to factor the needed part of ffio_limit out of it and using the new function in the Matroska demuxer. Would be nice if someone could explain the rationale behind this. This originated (without any explanation) in commit 559ae20d and got carried over from there since then. - Andreas [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243769.html [2]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-March/241694.html Andreas Rheinhardt (37): avformat/matroskadec: Remove unused variables avformat/matroskadec: Correct outdated error message avformat/matroskadec: Compactify structure avformat/matroskadec: Don't zero unnecessarily avformat/matroskadec: Get rid of cluster size field assumption avformat/matroskadec: Use generic size check for signed integers avformat/matroskadec: Set offset of first cluster avformat/matroskadec: Don't copy attached pictures avformat/matroskadec: Remove redundant initialization avformat/matroskadec: Properly check return values avformat/matroskadec: Improve read error/EOF checks I avformat/matroskadec: Improve read error/EOF checks II avformat/matroskadec: Improve error/EOF checks III avformat/matroskadec: Remove non-incremental parsing of clusters avformat/matroskadec: Don't keep old blocks avformat/matroskadec: Treat SimpleBlock as EBML_BIN avformat/matroskadec: Don't abort resyncing upon seek failure avformat/matroskadec: Add function to reset status avformat/matroskadec: Use proper levels after discontínuity avformat/matroskadec: Refactor some functions avformat/matroskadec: Introduce a "last known good" position avformat/matroskadec: Link to parents in syntax tables avformat/matroskadec: Redo level handling avformat/matroskadec: Make cluster parsing level compatible avformat/matroskadec: Don't reset cluster position avformat/matroskadec: Combine arrays avformat/matroskadec: Redo EOF handling avformat/matroskadec: Reuse positions avformat/matroskadec: Typos, nits and cosmetics avformat/matroskadec: Don't skip too much when unseekable avformat/matroskadec: Improve invalid length error handling avformat/matroskadec: Accept more unknown-length elements avformat/matroskadec: Fix probing of unknown-length headers avformat/matroskadec: Accept more unknown-length elements II avformat/matroskadec: Reindent after previous commit avformat/matroskadec: Use file offsets for level 1 elements avformat/matroskadec: Improve check for level 1 duplicates libavformat/matroskadec.c | 1006 +++++++++++++++++++++---------------- 1 file changed, 582 insertions(+), 424 deletions(-) -- 2.21.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".