[ 
https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135996#comment-13135996
 ] 

Uwe Schindler commented on TIKA-736:
------------------------------------

Hi Michael,

thanks for this simple improvement. Can you also check that parsing styles.xml 
of e.g. writer or calc documents does no harm?

About the order: I have it somewhere in the back of my head, that the order of 
files in the ZIP file is somehow part of the standard. At least I know, that 
the MIME_TYPE file must be the first one in the ZIP file, to make detection of 
format easy. As far as I remember there was also the requirement that the 
metadata.xml must come before the contents.xml. Unfortunately I am not able to 
download the ODF spec and verify this, maybe you have one mentioning this.

I still dont get the reason for problems with metadata if the order of files is 
different. The metadata is parsed to another structure and not the 
HTMLContentHandler, so where is the problem is content comes first? The 
Metadata object should be filled in all cases once the parsing process is 
finished.
                
> OpenOffice parser: master footer text isn't extracted
> -----------------------------------------------------
>
>                 Key: TIKA-736
>                 URL: https://issues.apache.org/jira/browse/TIKA-736
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: TIKA-736.patch, TIKA-736.patch, testMasterFooter.odp
>
>
> If I edit the footer text on the master slide of an OpenOffice presentation, 
> I see that text rendered on the slide, but it's not extracted by Tika.
> Digging into the document, curiously the footer text is in the styles.xml, 
> under office:master-styles -> style:master-page -> draw:frame -> 
> draw:text-box -> text:p.  I think somehow we're not linking up each slide's 
> master text elements to that slide, similar to TIKA-712.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to