[ 
https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173820#comment-13173820
 ] 

Nick Burch commented on TIKA-823:
---------------------------------

Note that it looks like the strings are prefixed with a 4 byte long length 
field, and are null terminated. It looks like the first one may always start in 
the same place in the file, if so you can probably skip forward to that, then 
use the POI utils to read you the string from the DocumentInputStream
                
> Detect StarOffice files
> -----------------------
>
>                 Key: TIKA-823
>                 URL: https://issues.apache.org/jira/browse/TIKA-823
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 1.1
>            Reporter: Antoni Mylka
>         Attachments: testStarOffice-5.2-calc.sdc, 
> testStarOffice-5.2-draw.sda, testStarOffice-5.2-impress.sdd, 
> testStarOffice-5.2-write.sdw
>
>
> I would like both MimeTypes and the POIFSContainerDetector to be able to 
> detect files created with Star Office Draw, Impress, Writer and Calc.
> I started working on this, but stumbled upon a POI issue, which I posted to 
> poi-user. 
> http://thread.gmane.org/gmane.comp.jakarta.poi.user/17857
> Nick? Yegor? I know you're on the Tika list as well. Could you take a look? 
> How to get the raw content of CompObj entry?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to