[ https://issues.apache.org/jira/browse/TIKA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182146#comment-15182146 ]
Nick Burch commented on TIKA-1882: ---------------------------------- Just because other people think it's a magic doesn't necessarily mean it is - many others just blindly find a few bytes that look common without trying to understand the underlying format, and consequently can get it wrong... As the QuickTime container is a base for MP4, and our MP4 Video mime type declares QuickTime Video as its parent, if things are common then QuickTime is the right place to put it. I've had a go in bee1a87d7d9ad3a1c5f45cf65082b9505dbe9fc0 to better express the QuickTime/MP4 relationship in the mime types hierarchy. If you could merge that and re-test, and all tests pass, plus switch hex strings to text where possible (see pull request comments) then I think we should be fine to apply > Updating the tika-mimetypes.xml for new mime magic patterns > ----------------------------------------------------------- > > Key: TIKA-1882 > URL: https://issues.apache.org/jira/browse/TIKA-1882 > Project: Tika > Issue Type: Improvement > Components: mime > Affects Versions: 1.11 > Reporter: Manisha Kampasi > Priority: Minor > Labels: patch > > The following mime magic can be added to better detect the below mime-types: > 1. vnd.ms-cab-compressed (.cab files) - pattern "MCSF" in the first 4 bytes > 2. application/vnd.xara (.xar files) - pattern "xar!" in the first 4 bytes > 3. application/x-mobipocket-ebook (.mobi files) - pattern "BOOKMOBI" starting > at byte position 60 > 4. video/quicktime (.mov files) - patterns "free" and "wide" seen starting at > byte position 4 > The changes can be seen here: > https://github.com/mkampasi/tika/commit/f7433daf434a44937ba3ae8b15813a768f95e334 -- This message was sent by Atlassian JIRA (v6.3.4#6332)