[ 
https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171040#comment-13171040
 ] 

Jeremy Anderson edited comment on TIKA-810 at 12/16/11 4:50 PM:
----------------------------------------------------------------

Appears that the issues I'm seeing with the PdfParserTest failing is related to 
the inclussion of Tika's PDFParser and PDF2XHTML files into PDFBox on October 
13, rev 1182880 (PDFBOX-1132).  Subsequent Patches made to Tika's PDFParser 
file, for which the test case relies upon, is overridden by the Parser version 
contained in PDFBOX. (the AutoDetectParser returns the parser contained in 
PDFBox, rather than Tika's PDFParser)

This has been a bit of a discussed issue based on parser usage when 
dependencies are/are not present I believe.

But as is, when using the daily builds of PDFBox and TIKA, fixes applied to 
these two files in Tika, should probably be replicated in the PDFBox file 
versions as well.  Currently, as of 12/16, the following TIKA issues have 
caused changes to these files and should likely be applied to the files on 
PDFBox's side: TIKA-612, TIKA-724, TIKA-738, TIKA-767, TIKA-778.
                
      was (Author: rpialum):
    Appears that the issues I'm seeing with the PdfParserTest failing is 
related to the inclussion of Tika's PDFParser and PDF2XHTML files into PDFBox 
on October 13, rev 1182880.  Subsequent Patches made to Tika's PDFParser file, 
for which the test case relies upon, is overridden by the Parser version 
contained in PDFBOX.

This has been a bit of a discussed issue based on parser usage when 
dependencies are/are not present I believe.

But as is, when using the daily builds of PDFBox and TIKA, fixes applied to 
these two files in Tika, should probably be replicated in the PDFBox file 
versions as well.  Currently, as of 12/16, the following TIKA issues have 
caused changes to these files: TIKA-612, TIKA-724, TIKA-738, TIKA-767, TIKA-778.
                  
> Upgrade to PDFbox 1.7.0 as available
> ------------------------------------
>
>                 Key: TIKA-810
>                 URL: https://issues.apache.org/jira/browse/TIKA-810
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Jeremy Anderson
>            Priority: Minor
>         Attachments: pdfbox-1.7.0.diff
>
>
> This isssue is to track upgrading the PDFbox dependency to 1.7.0 Final once 
> it's available, and the daily build before then

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to