D20011: Be more precise with mimetype detection

2019-03-30 Thread Alexander Stippich
This revision was automatically updated to reflect the committed changes.
Closed by commit R293:a256687a1d11: Be more precise with mimetype detection 
(authored by astippich).

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D20011?vs=54959=55055

REVISION DETAIL
  https://phabricator.kde.org/D20011

AFFECTED FILES
  src/file/extractor/app.cpp

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-27 Thread Stefan Brüns
bruns accepted this revision.
This revision is now accepted and ready to land.

REPOSITORY
  R293 Baloo

BRANCH
  mimetypes

REVISION DETAIL
  https://phabricator.kde.org/D20011

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-27 Thread Alexander Stippich
astippich edited the summary of this revision.
astippich edited the test plan for this revision.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D20011

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-27 Thread Alexander Stippich
astippich updated this revision to Diff 54959.
astippich added a comment.


  - use new mime type helper

REPOSITORY
  R293 Baloo

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D20011?vs=54649=54959

BRANCH
  mimetypes

REVISION DETAIL
  https://phabricator.kde.org/D20011

AFFECTED FILES
  src/file/extractor/app.cpp

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-25 Thread Stefan Brüns
bruns requested changes to this revision.
bruns added inline comments.
This revision now requires changes to proceed.

INLINE COMMENTS

> app.cpp:147
>  {
> -QString mimetype = m_mimeDb.mimeTypeForFile(url, 
> QMimeDatabase::MatchContent).name();
> +QMimeType extensionMimeType = m_mimeDb.mimeTypeForFile(url, 
> QMimeDatabase::MatchExtension);
> +QMimeType contentMimeType = m_mimeDb.mimeTypeForFile(url, 
> QMimeDatabase::MatchContent);

This only works correctly when the actual mimetype is the preferred one for 
this extension. See
https://doc.qt.io/qt-5/qmimedatabase.html#mimeTypesForFileName

Mismatching file (should be graphviz):

  $> cat test.dot
  
  # some comment
  graph {}

> app.cpp:154
> +mimetype = contentMimeType.name();
> +}
>  qCDebug(BALOO) << "Indexing" << id << url << mimetype;

This should be a standalone function, to reuse it in e.g. the baloo-widgets 
temp extractor

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D20011

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-24 Thread Alexander Stippich
astippich edited the summary of this revision.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D20011

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-24 Thread Alexander Stippich
astippich added a comment.


  Ideally Qt would provide this, but I could not find any

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D20011

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams


D20011: Be more precise with mimetype detection

2019-03-24 Thread Alexander Stippich
astippich created this revision.
astippich added reviewers: Baloo, bruns.
Herald added projects: Frameworks, Baloo.
Herald added a subscriber: kde-frameworks-devel.
astippich requested review of this revision.

REVISION SUMMARY
  D18819  changed the mimetype detection to 
content matching. However,
  this has side effects as the content matching algorithm quite often
  puts out ancestor mimetypes, in some case even "application/octet-stream".
  To solve this, query both extension and content mimetypes, and only
  use the file extension mimetype when the mimetype detected by file content
  is an ancestor, and use content mimetype when it isn't.
  This gives the precise mime type as before the change, but also prevents
  false detection only based on extension, which D18819 
 tried to fix.

TEST PLAN
  unfortunately, no unit tests

REPOSITORY
  R293 Baloo

BRANCH
  master

REVISION DETAIL
  https://phabricator.kde.org/D20011

AFFECTED FILES
  src/file/extractor/app.cpp

To: astippich, #baloo, bruns
Cc: kde-frameworks-devel, gennad, domson, ashaposhnikov, michaelh, astippich, 
spoorun, ngraham, bruns, abrahams