[ 
https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155788#comment-17155788
 ] 

ASF GitHub Bot commented on TIKA-3131:
--------------------------------------

clarkperkins opened a new pull request #325:
URL: https://github.com/apache/tika/pull/325


   ā€¦olerance to match PDFBox defaults


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> PDFParserConfig default values were accidentally swapped
> --------------------------------------------------------
>
>                 Key: TIKA-3131
>                 URL: https://issues.apache.org/jira/browse/TIKA-3131
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.24.1
>            Reporter: Clark Perkins
>            Priority: Major
>
> When default values were added for averageCharTolerance andĀ spacingTolerance 
> as a part of TIKA-3091, their values appear to have been inadvertently 
> swapped.
> From PDFBox:
> {noformat}
>     private float spacingTolerance = .5f;
>     private float averageCharTolerance = .3f;
> {noformat}
> From tika 1.24.1:
> {noformat}
>     //The character width-based tolerance value used to estimate where spaces 
> in text should be added
>     //Default taken from PDFBox.
>     private Float averageCharTolerance = 0.5f;
>     //The space width-based tolerance value used to estimate where spaces in 
> text should be added
>     //Default taken from PDFBox.
>     private Float spacingTolerance = 0.3f;
> {noformat}
> This effective change in defaults has caused PDFParser to start adding more 
> spaces than it did in 1.24 and earlier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to