D19109: [Extractor] Add metadata to extractors

Stefan Brüns Tue, 19 Feb 2019 14:17:36 -0800

bruns added a comment.

  In D19109#415710 <https://phabricator.kde.org/D19109#415710>, @astippich 
wrote:

  > In D19109#414968 <https://phabricator.kde.org/D19109#414968>, @bruns wrote:
  >
  > > In D19109#414758 <https://phabricator.kde.org/D19109#414758>, @astippich 
wrote:
  > >
  > > > A few general remarks:
  > > >
  > > > - I really do not like that there are two lists of supported mimetypes 
now which have to be kept in sync
  > >
  > >
  > > I think this is trivial enough. Also this is covered by the unit test.
  >
  >
  > My fear is that it is easily forgotten, but I did not see the autotest. 
Still, do you think it is feasible to generate the mimetype stringlist from the 
JSON data to remove the duplication?

  These are not completely duplicate - e.g. the officeextractor (pre-2007) uses 
runtime detection of some binary helpers. If these are not found, the list 
returned by the plugin is empty. The plugin has no direct access to its 
metadata, as it is only available from the loader and there is no possibility 
to pass it back, so it can not default to it.

  >>> - Do we really need versioning per mimetype? IMHO it is sufficient to 
have a version number per extractor. From my experience, fixing an extractor 
usually impacts all its supported mimetypes, and rarily affects only one 
mimetype.
  >> 
  >> Past experience tells otherwise. There have been feature extensions and 
bugfixes for specific mimetypes, just look at your own commits
  >> 
  >> - "fix ape disc number extraction"
  >> - "implement more tags for asf metadata"
  >> - ...
  >> 
  >>   I want to reduce reindexing as much as possible.
  > 
  > And I can give you examples where this was not the case :).

  ... which does not **prohibit** bumping the version for **all** affected 
encoders. Also, there is nothing disallowing to skip versions, e.g. if 
"foo/bar" is 2.1, and "foo/baz" is 1.3, and both get a major bump, both can be 
set to 3.0.

  This is also only the case because TagLibExtractor was stupidly written 
(which D18826 <https://phabricator.kde.org/D18826> fixes). The other extractors 
do not have that many special codepath.

  > Well, I find it cumbersome to implement this fine-grained control, but 
otherwise people will probably yell because of high cpu usage...
  >  At least, I would like to group duplicated mimetypes such as audio/wav and 
audio/x-wav, but that is not possible with JSON, is it?

  You can reorder any aliasing mimetypes.

  Another question is, why do we have "audio/wav" and "audio/x-wav" in the 
first place? Are there really files where one type is a reported for one file, 
and the other for other files? Wouldn't it be better to just have the canonical 
type? At least on my computer, shared-mime-info only has audio/x-wav, listing 
audio/wav and audio/vnd.wave as aliases. Aliases should never be returned by 
QMimeDatabase.

REPOSITORY
  R286 KFileMetaData

REVISION DETAIL
  https://phabricator.kde.org/D19109

To: bruns, #baloo, #frameworks, ngraham, astippich, poboiko
Cc: kde-frameworks-devel, ashaposhnikov, michaelh, astippich, spoorun, ngraham, 
bruns, abrahams

D19109: [Extractor] Add metadata to extractors

Reply via email to