Hi again, GNUnet people. Is this the place where to discuss about libextractor? I have two points.
#1 I often see something interesting. Key-value pairs are categorized as EXTRACTOR_METATYPE_UNKNOWN: unknown: chroma-format=4:2:0 unknown: bit-depth-chroma=8 unknown: colorimetry=bt709 unknown: stream-format=avc unknown: stream-format=raw unknown: bit-depth-luma=8 unknown: base-profile=lc unknown: mpegversion=4 unknown: profile=high unknown: alignment=au unknown: parsed=true unknown: framed=true unknown: variant=iso unknown: profile=lc unknown: level=4.1 But one point is that they are often numerous, and another point is that that of a key-value type is a really interesting metatype to have (and is not really “unknown”, since the key is self-explanatory). Would it not make sense to add an EXTRACTOR_METATYPE_KEY_VALUE_PAIR to the list of MetaTypes? ... /* generic attributes */ EXTRACTOR_METATYPE_UNKNOWN = 45, EXTRACTOR_METATYPE_DESCRIPTION = 46, EXTRACTOR_METATYPE_COPYRIGHT = 47, EXTRACTOR_METATYPE_RIGHTS = 48, EXTRACTOR_METATYPE_KEYWORDS = 49, EXTRACTOR_METATYPE_ABSTRACT = 50, EXTRACTOR_METATYPE_SUMMARY = 51, EXTRACTOR_METATYPE_SUBJECT = 52, EXTRACTOR_METATYPE_CREATOR = 53, EXTRACTOR_METATYPE_FORMAT = 54, EXTRACTOR_METATYPE_FORMAT_VERSION = 55, *EXTRACTOR_METATYPE_KEY_VALUE_PAIR* = XXX, ... #2 I often see that files get tagged with multiple mime types according to libextractor: mimetype: video/quicktime mimetype: video/x-h264 mimetype: audio/mpeg mimetype: video/mp4 But that never reflects the reality, since files should have only one mime type (or at most, multiple mime types that mean the same thing). But then I see what happens with file names: there is only one EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME, but there can be many EXTRACTOR_METATYPE_FILENAMEs (in the case of archives, for example): EXTRACTOR_METATYPE_FILENAME = 2, ... EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME = 180, Would it not make sense to do something similar for mime types? Only one “original mime type”, and an infinity of secondary mime types…? EXTRACTOR_METATYPE_MIMETYPE = 1, ...*EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE* = XXX, So, two simple proposals: 1. Create EXTRACTOR_METATYPE_KEY_VALUE_PAIR 2. Create EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE What do you think? Does it make sense? --madmurphy
