[
https://issues.apache.org/jira/browse/PDFBOX-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson closed PDFBOX-2800.
-------------------------------
Resolution: Won't Fix
Unfortunately this is a known problem with 1.8, which isn't fixable without
introducing major breaking changes to the APIs. The good new is that it's been
fixed in 2.0.
> PDFTextStripper calculates the character bounding boxes incorrectly
> -------------------------------------------------------------------
>
> Key: PDFBOX-2800
> URL: https://issues.apache.org/jira/browse/PDFBOX-2800
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.8.9
> Environment: java version "1.6.0_35"
> Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
> Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
> Reporter: Evgeny Chesnokov
> Labels: text_extraction
> Attachments: C9002-highlighted.png, C9002.pdf
>
>
> For a specific file the extracted coordinates provided by a TextPosition
> stored in a charactersByArticle variable do not match the actual positions of
> the characters of the content. Some of the rectangles return with zero
> heights, and others appear shifted on a vertical axis. I am attaching the
> files illustrating the issue, both the sample file itself and a highlighted
> bounding rectangles on the 2nd page that mismatch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]