[jira] [Commented] (TIKA-906) Headers, footers, and footnotes not extracted from Pages documents

Nick Burch (JIRA) Fri, 27 Apr 2012 15:51:15 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264059#comment-13264059
 ]


Nick Burch commented on TIKA-906:
---------------------------------

Support added in r1331618. We can now get headers, footers and footnotes, 
assuming a file only has one set of each, with the default names. (If a file 
has multiple styles with different ones, the code will likely just end up with 
the last one)

Note that we are rapidly approaching the point when the current model for the 
parser won't cope. At that point, we'll need to start holding things like 
styles, headers, footers etc properly, track state more as we process the file 
(a single state level isn't really enough), be aware of styles applied to text 
etc.
                
> Headers, footers, and footnotes not extracted from Pages documents
> ------------------------------------------------------------------
>
>                 Key: TIKA-906
>                 URL: https://issues.apache.org/jira/browse/TIKA-906
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7
>            Reporter: Gabriel Valencia
>              Labels: iWork
>             Fix For: 1.2
>
>         Attachments: testPagesHeadersFootersFootnotesJIRA.pages
>
>
> Tika does not extract anything from the header or footer area and also does 
> not extract footnotes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-906) Headers, footers, and footnotes not extracted from Pages documents

Reply via email to