On 01/06/2018 04:10 PM, Carl Eugen Hoyos wrote:
2018-01-06 11:07 GMT+01:00 Jörn Heusipp <osm...@problemloesungsmaschine.de>:

libopenmpt file header probing is tested regularly against the FATE
suite and other diverse file collections by libopenmpt upstream in
order to avoid false positives.

You could also test tools/probetest

I was not aware of that tool. Thanks for the suggestion.

It currently lists a failure related to libopenmpt:
Failure of libopenmpt probing code with score=76 type=0 p=FDC size=1024

Looking at tools/probetest.c, that would be a file with very few bits set. libopenmpt detects the random data in question as M15 .MOD files (original Amiga 15 samples .mod files), and there is sadly not much that can be done about that. There are perfectly valid real-world M15 .MOD files with only 73 bits set in the first 600 bytes (untouchables-station.mod, <https://modarchive.org/index.php?request=view_by_moduleid&query=104280>). The following up to 64*4*4*63 bytes could even be completely 0 (empty pattern data) for valid files (even without the file being totally silent). The generated random data that tools/probetest complains about here contains 46 bits set to 1 in the first 600 bytes. What follows are various other examples with up to 96 bits set to 1. Completely loading a file like that would very likely reject it (in particular, libopenmpt can deduce the expected file size from the sample headers and, with some slack to account for corrupted real-world examples, reject files with non-matching size), however, that is not possible when only probing the file header. The libopenmpt API allows for the file header probing functions to return false-positives, however false-negatives are not allowed.

Performance numbers shown by tools/probetest are what I had expected (measured on an AMD A4-5000, during normal Xorg session (i.e. not 100% idle)):

  1110194971 cycles,           aa
 24986722468 cycles,          aac
 26418545168 cycles,          ac3
  1484717267 cycles,          acm
  1627888281 cycles,          act
  2109884646 cycles,          adp
  2316235992 cycles,          ads
  1244706028 cycles,          adx
  1132390431 cycles,          aea
  1729241354 cycles,         aiff
  1728288238 cycles,          aix
  2662531158 cycles,          amr
 16189546067 cycles,        amrnb
 10342883200 cycles,        amrwb
  1487752343 cycles,          anm
  2268900502 cycles,          apc
  1140814303 cycles,          ape
  2181170710 cycles,         apng
 18698762054 cycles,      aqtitle
  2656908730 cycles,          asf
  2402762967 cycles,        asf_o
 18148196647 cycles,          ass
  1392503829 cycles,          ast
  1774264703 cycles,           au
  1807159562 cycles,          avi
  1745391230 cycles,          avr
  1370939762 cycles,          avs
  1555620708 cycles,  bethsoftvid
  1459171160 cycles,          bfi
  2640635742 cycles,         bink
  2022320986 cycles,          bit
  1664933324 cycles,        bfstm
  1588023172 cycles,        brstm
  1769430536 cycles,          boa
  2294286860 cycles,          c93
  1022646071 cycles,          caf
  9063207678 cycles,    cavsvideo
  1898790300 cycles,         cdxl
  1037718383 cycles,         cine
  3358938768 cycles,       concat
  2367399953 cycles,        dcstr
  1795803759 cycles,          dfa
  1454750468 cycles,        dirac
  1167905836 cycles,        dnxhd
  2076678208 cycles,          dsf
  1226761232 cycles,       dsicin
  1157816261 cycles,          dss
 31466350564 cycles,          dts
  1357475606 cycles,        dtshd
 15626181281 cycles,           dv
 12227021709 cycles,       dvbsub
  1747998309 cycles,       dvbtxt
  1941371107 cycles,          dxa
  1988122049 cycles,           ea
  1395161162 cycles,     ea_cdata
 21013119067 cycles,         eac3
  1282697126 cycles,         epaf
  1658521102 cycles,          ffm
  2919787300 cycles,   ffmetadata
  3786264941 cycles,         fits
  2700385826 cycles,         flac
  1840776863 cycles,         flic
  1317695853 cycles,          flv
  1511756606 cycles,     live_flv
  1135064427 cycles,          4xm
  1830871233 cycles,          frm
  3011403748 cycles,          fsb
  1462985803 cycles,          gdv
  1708440935 cycles,         genh
  3480430744 cycles,          gif
  2533542048 cycles,          gsm
  2412598563 cycles,          gxf
 21637989787 cycles,         h261
 22268834035 cycles,         h263
 22135718754 cycles,         h264
 13939886275 cycles,         hevc
  1979375582 cycles, hls,applehttp
  1658646375 cycles,          hnm
  1507634977 cycles,          ico
  2534774499 cycles,        idcin
  1684324336 cycles,          idf
  1353664382 cycles,          iff
  2978779893 cycles,         ilbc
  1892353081 cycles,    alias_pix
  2456259645 cycles,  brender_pix
  2077466815 cycles,    ingenient
 11281657144 cycles,      ipmovie
  1840789384 cycles,        ircam
  2455541614 cycles,          iss
  1114518907 cycles,          iv8
  1750327098 cycles,          ivf
  3803895407 cycles,          ivr
 30510491919 cycles,      jacosub
  1271391143 cycles,           jv
  1504674165 cycles,        lmlm4
 28284647311 cycles,         loas
  2746771768 cycles,          lrc
  1630546444 cycles,          lvf
  2198871369 cycles,          lxf
 15210250791 cycles,          m4v
  2074024051 cycles, matroska,webm
  1756348463 cycles,        mgsts
 13894318111 cycles,     microdvd
 15146276963 cycles,        mjpeg
 13215378411 cycles,   mjpeg_2000
 21505153187 cycles,          mlp
  1623684275 cycles,          mlv
  2009009898 cycles,           mm
  1401453493 cycles,          mmf
  3614852044 cycles, mov,mp4,m4a,3gp,3g2,mj2
 37065167696 cycles,          mp3
  2003306237 cycles,          mpc
  1695842377 cycles,         mpc8
 20922947044 cycles,         mpeg
 26950626806 cycles,       mpegts
 12903395151 cycles,    mpegvideo
  1861191163 cycles,       mpjpeg
 11292546869 cycles,         mpl2
 10904909514 cycles,        mpsub
  2556705558 cycles,          msf
 14520727615 cycles,     msnwctcp
  1513345014 cycles,         mtaf
  1498181103 cycles,          mtv
  2100567692 cycles,         musx
  1398481833 cycles,           mv
  3839928046 cycles,          mxf
  1084340183 cycles,           nc
  2260039804 cycles,   nistsphere
  1557302811 cycles,          nsp
 14077588650 cycles,          nsv
 12804865958 cycles,          nut
  3498085105 cycles,          nuv
  2785399093 cycles,          ogg
  2800628120 cycles,          oma
  2241873172 cycles,          paf
 11630567717 cycles,          pjs
  1538360044 cycles,          pmp
  1966776985 cycles,          pva
  2051297210 cycles,          pvf
  1464824135 cycles,          qcp
  1395151376 cycles,          r3d
 13872717447 cycles,     realtext
  1648061451 cycles,     redspark
  1881530375 cycles,          rl2
  1865198787 cycles,           rm
  1848791502 cycles,          roq
  3141932957 cycles,          rpl
  2379252069 cycles,          rsd
 31146518791 cycles,        s337m
  7497815228 cycles,         sami
 24830800138 cycles,          sbg
 15351196732 cycles,          scc
  9758760073 cycles,          sdp
  2159674057 cycles,         sdr2
  1555316250 cycles,          sds
  1533405328 cycles,          sdx
  1681270049 cycles,     film_cpk
  2303851902 cycles,          shn
  1761647489 cycles,         siff
  1510520120 cycles,          smk
  2859907925 cycles,       smjpeg
  1643498999 cycles,        smush
  1545689291 cycles,          sol
  1912740702 cycles,          sox
 17486361594 cycles,        spdif
 20080502425 cycles,          srt
  2659637846 cycles,       psxstr
 17633213722 cycles,          stl
  8032855323 cycles,   subviewer1
  8572013351 cycles,    subviewer
  2043897951 cycles,          sup
  2980746200 cycles,         svag
  1617398584 cycles,          swf
  2842115745 cycles,          tak
  5320163051 cycles,  tedcaptions
  1884107745 cycles,          thp
  4320119922 cycles,       3dostr
  2018755118 cycles,   tiertexseq
  1714617022 cycles,          tmv
 21456317423 cycles,       truehd
  1050826275 cycles,          tta
  2065773077 cycles,          txd
  1577829281 cycles,           ty
  3450802460 cycles,          vag
 19179500628 cycles,          vc1
  1860036853 cycles,      vc1test
  2035593194 cycles,         vivo
  1518758455 cycles,          vmd
  2696860615 cycles,       vobsub
  2762235280 cycles,          voc
  1957794567 cycles,          vpk
 15280000639 cycles,      vplayer
  1763355055 cycles,          vqf
  1879310121 cycles,          w64
  1717961542 cycles,          wav
  2095837026 cycles,     wc3movie
  2960188092 cycles,       webvtt
  1922356839 cycles,        wsaud
  1978715237 cycles,          wsd
  1468438585 cycles,        wsvqa
  2668937770 cycles,          wtv
  3193222838 cycles,          wve
  1744694735 cycles,           wv
  1677278541 cycles,           xa
  1759862474 cycles,         xbin
  2077217647 cycles,          xmv
  2161496331 cycles,         xvag
  2330794326 cycles,         xwma
  1103137131 cycles,          yop
  2154690280 cycles, yuv4mpegpipe
  1842301899 cycles,     bmp_pipe
  2039875920 cycles,     dds_pipe
  1627504710 cycles,     dpx_pipe
  1463019740 cycles,     exr_pipe
  1539585051 cycles,     j2k_pipe
  1187861714 cycles,    jpeg_pipe
  1682815484 cycles,  jpegls_pipe
  1840465166 cycles,     pam_pipe
  1755858395 cycles,     pbm_pipe
  1211589601 cycles,     pcx_pipe
  2002446954 cycles,  pgmyuv_pipe
  1818965412 cycles,     pgm_pipe
  1654095834 cycles,  pictor_pipe
  1404252441 cycles,     png_pipe
  1211120882 cycles,     ppm_pipe
  1205883539 cycles,     psd_pipe
  1764091290 cycles,   qdraw_pipe
  1091809273 cycles,     sgi_pipe
  2994663150 cycles,     svg_pipe
  1348938514 cycles, sunrast_pipe
  1464347337 cycles,    tiff_pipe
  1142572756 cycles,    webp_pipe
  1412715104 cycles,     xpm_pipe
  3550700989 cycles,   libmodplug
109589637233 cycles,   libopenmpt

  2672917981           libopenmpt (per module format)

At first glance, libopenmpt looks huge here in comparison. However one should consider that libopenmpt internally has to probe for (currently) 41 different module file formats, going through 41 separate probing functions internally.

Dividing 109589637233 by 41 gives 2672917981, which is in the ballpark of all other probing functions in ffmpeg.

+#if OPENMPT_API_VERSION_AT_LEAST(0,3,0)
+    if (p->buf && p->buf_size > 0) {
+        probe_result = openmpt_probe_file_header_without_filesize(
+                           OPENMPT_PROBE_FILE_HEADER_FLAGS_DEFAULT,
+                           p->buf, p->buf_size,
+                           &openmpt_logfunc, NULL, NULL, NULL, NULL, NULL);
+        if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_FAILURE) {
+            score = score_fail;

What's wrong with return 0;?

Nothing. If preferred, I can get rid of all score_* constants and use 0 or AVPROBE_SCORE_* directly.

+        } else if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS) {
+            score = FFMAX(score, score_data);

What does OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS mean?

It is documented as "OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS: The file will most likely be supported by libopenmpt." (see <https://lib.openmpt.org/doc/group__libopenmpt__c.html#ga92cdc66eb529a8a4a67987b659ed3c5e>). An ultimately precise answer is never possible as that would require actually trying to load the complete file in some cases:
 * Not all module file formats store feature flags in the file header.
* Some module file formats provide very little file magic numbers, and/or file magic numbers at strange offsets (like at 1080 for M.K. .MOD). * Some formats store header-like information in the file footer, which is not accessible during probing. * The extreme case of M15 (original 15 samples Amiga .MOD files) provides absolutely no true file header or magic numbers. libopenmpt implements heuristics to reliably identify and probe even those, however there is only so much it can do. * Some container formats (Unreal Music .UMX, which can contain module music files) theoretically potentially require seeking to arbitrary locations in the file in order to determine the format.

Why not return MAX?

For all the reasons listed above, even though libopenmpt tries to be as pessimistic as possible, false positives fundamentally cannot be avoided completely. As the libopenmpt probing logic is code outside of ffmpeg, the effects of such a false positive could potentially cause mis-detection of other formats supported by ffmpeg, which would not be immediately or easily fixable by ffmpeg itself. I used the lowest possible score that makes sense in order to reduce the risk of potential impact. The probing result in this case is deduced from looking at the actual file data, as opposed to just trusting a mime-type which is external to the file and could be inconsistent/wrong, which is why I used a score higher than AVPROBE_SCORE_MIME.
I opted for AVPROBE_SCORE_MIME+1, which seemed reasonable to me.
Should I add a comment explaining the reasoning to the code?

+        } else if (probe_result == 
OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA) {
 > I believe this should return 0 but maybe you found that this is bad?

Would 0 be semantically right here? OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA means that libopenmpt requires more data to come to any usable conclusion, which is what I thought AVPROBE_SCORE_RETRY would mean. I do not see any particular problem with returning 0 in this case either, given the probing logic in av_probe_input_format() (and it would reduce the whole probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA block to a single line). However, if client code directly calls .read_probe() on AVInputFormat ff_libopenmpt_demuxer, I think returning AVPROBE_SCORE_RETRY (or similar) makes more sense.

+            if (score > score_fail) {
+                /* known file extension */
+                score = FFMAX(score, score_ext_retry);
+            } else {
+                /* unknown file extension */
+                if (p->buf_size >= 
openmpt_probe_file_header_get_recommended_size()) {
+                    /* We have already received the recommended amount of data
+                     * and still cannot decide. Return a rather low score.
+                     */
+                    score = FFMAX(score, score_retry);
+                } else {
+                    /* The file extension is unknown and we have very few data
+                     * bytes available. libopenmpt cannot decide anything here,
+                     * and returning any score > 0 would result in successfull
+                     * probing of random data.
+                     */
+                    score = score_fail;

This patch indicates that it may be a good idea to require libopenmpt 0.3,

The amount of #ifdef needed to support 0.2 and 0.3 is rather small, I think.

I understand that the current (and future libopenmpt 0.2) way of solely relying on the file extension is far from optimal, but I do not see any reason to drop libopenmpt 0.2 support right now; in particular, continuing 0.2 support as is would be no regression. Additionally, libopenmpt 0.2 can be built with C++03 compilers while libopenmpt 0.3 requires a C++11 compiler, thus, libopenmpt 0.3 cannot easily be built on older platforms.

libopenmpt 0.2 also allows for file probing, however the API and code path is very heavy-weight (goes through the normal file loader and discards unneeded data), which I fear would be way too heavy performance-wise for ffmpeg.

when was it released, which distributions do not include it?

The first version of libopenmpt 0.3 was released 2017-09-28.

I am not aware of any stable, non-rolling distribution that ships libopenmpt 0.3 as of now.

Debian 9 has libopenmpt 0.2.7386~beta20.3-3+deb9u2
Ubuntu 17.10 has libopenmpt 0.2.8760~beta27-1
Ubuntu 16.04 LTS has no libopenmpt at all
even openSUSE Tumbleweed only has libopenmpt 0.2.8461~beta26
Debian Testing and Ubuntu Bionic both have libopenmpt 0.3.4.

I do not think ffmpeg should drop libopenmpt 0.2 support at the moment.


Regards,
Jörn
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to