[ https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205376#comment-13205376 ]
Nick Burch commented on TIKA-612: --------------------------------- The conclusion was to expose the options on the PDFParser directly instead. setEnableAutoSpace is already supported by PDFParser If you know you have a PDF, create a PDFParser, set the options, then parse If you want to use something like AutoDetectParser but with special PDF options, you have two options. One is to fetch the parsers from the AutoDetectParser, possibly recursing, until you find the PDFParser, and set. The other is to create a new AutoDetectParser on an explicitly created PDFParser, with the DefaultParser as a fallback > Specify PDFBox options via ParseContext > ---------------------------------------- > > Key: TIKA-612 > URL: https://issues.apache.org/jira/browse/TIKA-612 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 0.9 > Reporter: Julien Nioche > Assignee: Michael McCandless > Priority: Minor > Fix For: 1.1 > > Attachments: TIKA-612-testcase.patch, TIKA-612.patch, Tika-612.patch, > testPDFTwoColumns.pdf > > > See https://issues.apache.org/jira/browse/TIKA-611. The options used by > PDFBox are currently hardwritten in the PDFParser code, we will allow them to > be specified via the ParseContext objects -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira