[ 
https://issues.apache.org/jira/browse/TIKA-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787608#comment-17787608
 ] 

Nick Burch commented on TIKA-4148:
----------------------------------

For detection of the OLE2 based files, we don't need to find unique byte 
combinations, we only need to find unique OLE2 entry names / sets of names

See 
[https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/detect/microsoft/POIFSContainerDetector.java#L362]
 for an example of "must have this then one of those"

If you can run POIFSLister (and/or POIFSDumper) on a bunch of files, and spot 
the entry names that are common (+ ideally not already in POIFSContainerDector 
for other ones), that's what we need

> Support Autodesk Inventor files (.ipt) (.iam) (.ipn) (.idw)
> -----------------------------------------------------------
>
>                 Key: TIKA-4148
>                 URL: https://issues.apache.org/jira/browse/TIKA-4148
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Alexey Pismenskiy
>            Priority: Major
>
> Add support for Autodesk Inventor files in Tika. 
> Examples of the files can be downloaded from 
> [https://www.autodesk.com/support/technical/article/caas/tsarticles/ts/3gnm93P9sPAWE6vndk7fjq.html]
> It would be great to start at least at the metadata level and then add 
> content parsing later. 
> I suspect I would be something similar to 
> [DWGParser|[https://tika.apache.org/0.9/api/org/apache/tika/parser/dwg/DWGParser.html]|https://tika.apache.org/0.9/api/org/apache/tika/parser/dwg/DWGParser.html].],
>  
> any suggestions where to start looking are appreciated. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to