[ 
https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205376#comment-13205376
 ] 

Nick Burch commented on TIKA-612:
---------------------------------

The conclusion was to expose the options on the PDFParser directly instead. 
setEnableAutoSpace is already supported by PDFParser

If you know you have a PDF, create a PDFParser, set the options, then parse

If you want to use something like AutoDetectParser but with special PDF 
options, you have two options. One is to fetch the parsers from the 
AutoDetectParser, possibly recursing, until you find the PDFParser, and set. 
The other is to create a new AutoDetectParser on an explicitly created 
PDFParser, with the DefaultParser as a fallback
                
> Specify PDFBox options via ParseContext 
> ----------------------------------------
>
>                 Key: TIKA-612
>                 URL: https://issues.apache.org/jira/browse/TIKA-612
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Julien Nioche
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: TIKA-612-testcase.patch, TIKA-612.patch, Tika-612.patch, 
> testPDFTwoColumns.pdf
>
>
> See https://issues.apache.org/jira/browse/TIKA-611. The options used by 
> PDFBox are currently hardwritten in the PDFParser code, we will allow them to 
> be specified via the ParseContext objects

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to