[
https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672426#action_12672426
]
Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------
These methods used to be called from the flushPage() method, so we used them as
callbacks since we need the geometry as well as the text in our code.
The new code for PDFTextStripper is more true to its name, it really deals with
text and text only. The problem is that the methods are still there but they
don't get called anymore. So, our code compiled but all the text was null
(since our extras weren't valid anymore).
It would have been much more useful simply to remove the methods since at least
the compiler would have flagged our code as not being a callback anymore.
We might fork the old PDFTextStripper into a TextGeometryStripper or the like,
if I can get management to approve it (probably not, my contract is up tomorrow
an I'm going on vacation :)
I'll post a patch if we do that.
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
> Key: PDFBOX-422
> URL: https://issues.apache.org/jira/browse/PDFBOX-422
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are
> marked @deprecated, but they are not really used by the existing
> infrastructure anymore.
> This would be ok if such methods weren't callbacks. In this case, it breaks
> pre-existing code, and prevents the compiler from letting you know the
> methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this
> case.
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.