[ https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148695#comment-13148695 ]
Michael McCandless commented on TIKA-612: ----------------------------------------- I agree, we probably shouldn't just directly expose PDFTextStripper directly; it'd be better (less API surface area) if we pick certain options and expose them ourselves. Then if PDFTextStripper changes things, or if we somehow switch to a different PDF lib, we won't break our users. Alternatively, can just expose options on PDFParser directly? This is more intuitive and direct (you just use setters on the parser), and we can name/genericize the options, and choose which to expose? (This is what I've been doing on the last few PDF issues....). > Specify PDFBox options via ParseContext > ---------------------------------------- > > Key: TIKA-612 > URL: https://issues.apache.org/jira/browse/TIKA-612 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 0.9 > Reporter: Julien Nioche > Assignee: Julien Nioche > Priority: Minor > Attachments: TIKA-612-testcase.patch, Tika-612.patch, > testPDFTwoColumns.pdf > > > See https://issues.apache.org/jira/browse/TIKA-611. The options used by > PDFBox are currently hardwritten in the PDFParser code, we will allow them to > be specified via the ParseContext objects -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira