[ 
https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995298#comment-13995298
 ] 

Ray Gauss II commented on TIKA-1278:
------------------------------------

Hi [~tallison],

I thought about adding to {{PDFParser.properties}} but decided against it since 
PDFBox could change the default values or change the properties' scale or use, 
and if we weren't aware of that change we'd be inadvertently overriding those 
defaults.

Similarly with {{PDFParserConfig.configure}}, PDFBox's defaults seem to work 
well for most people.

We can certainly reconsider setting those defaults and/or adding other config 
if there are particular parameters people would find useful.

> Expose PDF Avg Char and Spacing Tolerance Config Params
> -------------------------------------------------------
>
>                 Key: TIKA-1278
>                 URL: https://issues.apache.org/jira/browse/TIKA-1278
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Ray Gauss II
>            Assignee: Ray Gauss II
>             Fix For: 1.6
>
>
> {{PDFParserConfig}} should allow for override of PDFBox's 
> {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
> comment in {{PDF2XHTML}}.
> Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
> slightly to allow for extension of that config class and its configuration 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to