[ 
https://issues.apache.org/jira/browse/TIKA-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992896#comment-13992896
 ] 

Nick Burch commented on TIKA-1204:
----------------------------------

Any chance of a much smaller sample DWFX file? The one supplied is a little 
larger than we generally like for unit testing against

> DWFX files detection
> --------------------
>
>                 Key: TIKA-1204
>                 URL: https://issues.apache.org/jira/browse/TIKA-1204
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, mime
>    Affects Versions: 1.4
>            Reporter: Marco Quaranta
>            Priority: Minor
>         Attachments: General assembly filter.dwfx
>
>
> DWFX are AutoCAD [Design web 
> format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open 
> Packaging 
> Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions]. 
> Tika "correctly" detects these files as application/zip. 
> It would be better if Tika could recognize the true mimetype: 
> model/vnd.dwfx+xps. (y)
> Please add logic in ZipContainerDetector in such a way could be possible to 
> detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage 
> pkg): 
> {noformat}
> PackageRelationshipCollection core = 
> pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence";);
> if (core.size() != 1) {
>  // Invalid DWFX Package received
>  return null;
> }
> PackagePart corePart = pkg.getPart(core.getRelationship(0));
> String coreType = corePart.getContentType();
> return MediaType.parse(coreType);
> {noformat}
> Thank you,
> Marco



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to