[removing this from jira]

Do you have a suggestion for how PDFBox could most ideally solve your situation? Could you get the needed info by making a class that extends PDFTextStripper and overrides processTextPosition()? Then you could see all of the TextPositions and where they are located?

On Feb 11, 2009, at 5:27 PM, Gustavo Hexsel (JIRA) wrote:


[ https://issues.apache.org/jira/browse/PDFBOX-422? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel&focusedCommentId=12672809#action_12672809 ]

Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------

Thanks for the prompt response.

Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).

This is fine, the class is doing what is supposed to (according to its name). We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.


Methods are marked as deprecated but they're effectively dead
-------------------------------------------------------------

                Key: PDFBOX-422
                URL: https://issues.apache.org/jira/browse/PDFBOX-422
            Project: PDFBox
         Issue Type: Bug
         Components: Text extraction
   Affects Versions: 0.8.0-incubator
           Reporter: Gustavo Hexsel

There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore. This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore. Simply removing the methods would have been a much better solution in this case.
Example of said methods:
org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
org.apache.pdfbox.util.PDFTextStripper#writeCharacters

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Reply via email to