[ 
https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148695#comment-13148695
 ] 

Michael McCandless commented on TIKA-612:
-----------------------------------------

I agree, we probably shouldn't just directly expose PDFTextStripper
directly; it'd be better (less API surface area) if we pick certain
options and expose them ourselves.  Then if PDFTextStripper changes
things, or if we somehow switch to a different PDF lib, we won't break
our users.

Alternatively, can just expose options on PDFParser directly?  This is
more intuitive and direct (you just use setters on the parser), and we
can name/genericize the options, and choose which to expose?  (This is
what I've been doing on the last few PDF issues....).

                
> Specify PDFBox options via ParseContext 
> ----------------------------------------
>
>                 Key: TIKA-612
>                 URL: https://issues.apache.org/jira/browse/TIKA-612
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>            Priority: Minor
>         Attachments: TIKA-612-testcase.patch, Tika-612.patch, 
> testPDFTwoColumns.pdf
>
>
> See https://issues.apache.org/jira/browse/TIKA-611. The options used by 
> PDFBox are currently hardwritten in the PDFParser code, we will allow them to 
> be specified via the ParseContext objects

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to