[ 
https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876431#action_12876431
 ] 

Jukka Zitting commented on TIKA-402:
------------------------------------

> XML root element detection

See the o.a.t.detect.XmlRootExtractor class and the <root-XML/> entries in the 
tika-mimetypes.xml configuration file.

> directory

My idea is that if you point a file system crawler to uncompressed iWork 
directories, we should still be able to produce reasonable output when the 
crawler feeds the XML file to Tika.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, 
> iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, 
> testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and 
> Pages applications. Both file formats are described in 
> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html.
>  I'm not sure if there already are open source parser libraries for these 
> formats or if we'd need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to