[removing this from jira]
Do you have a suggestion for how PDFBox could most ideally solve your
situation? Could you get the needed info by making a class that
extends PDFTextStripper and overrides processTextPosition()? Then
you could see all of the TextPositions and where they are located?
On Feb 11, 2009, at 5:27 PM, Gustavo Hexsel (JIRA) wrote:
[ https://issues.apache.org/jira/browse/PDFBOX-422?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=12672809#action_12672809 ]
Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------
Thanks for the prompt response.
Yes, I saw the methods, they just don't carry the text position
anymore (also, blocks get merged).
This is fine, the class is doing what is supposed to (according to
its name). We had a use-case (specifically document redaction)
that needed to bring back the text and the associated positions of
each char, which we were doing by using the startup of the text
block and each individual character width.
Methods are marked as deprecated but they're effectively dead
-------------------------------------------------------------
Key: PDFBOX-422
URL: https://issues.apache.org/jira/browse/PDFBOX-422
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 0.8.0-incubator
Reporter: Gustavo Hexsel
There are several methods on PDFTextStripper and PDFStreamEngine
that are marked @deprecated, but they are not really used by the
existing infrastructure anymore.
This would be ok if such methods weren't callbacks. In this case,
it breaks pre-existing code, and prevents the compiler from
letting you know the methods are not to be used anymore.
Simply removing the methods would have been a much better solution
in this case.
Example of said methods:
org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.