[ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750520#comment-17750520 ]
Luís Filipe Nassif commented on TIKA-1180: ------------------------------------------ Great and thank you [~tc-wleite]! Between your test files, does it happen to be possible to share 1 small MKV and 1 small WEBM without sensitive info that weren't detected properly before and are detected by your custom signatures, so we could write a proper unit test to avoid future regressions? > Matroska (mkv, mka, webm) Detector > ---------------------------------- > > Key: TIKA-1180 > URL: https://issues.apache.org/jira/browse/TIKA-1180 > Project: Tika > Issue Type: New Feature > Components: detector > Affects Versions: 1.5 > Reporter: Nick Burch > Priority: Major > Labels: new-parser > > Following the work on TIKA-1177, we now have mimetype entries for the various > formats which are based on the Matroska container (mkv, mka, webm etc). > However, we are unable to properly identify the specific type just from some > mime magic > Instead, for fully accurate detection, we'll need a new Detector for the > Matroska family, which does some very simple container/stream processing to > work out what the container contains -- This message was sent by Atlassian Jira (v8.20.10#820010)