[
https://issues.apache.org/jira/browse/PDFBOX-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758969#comment-13758969
]
Andreas Lehmkühler commented on PDFBOX-1662:
--------------------------------------------
As I already said, should be quite simple:
- get the resources of each page
- iterate through the XObjects and filter all XObjectForm objects (see
ExtractImages.java)
- the stream of the XObjectForm is the content and should be processed exactly
like the content of the page
Be aware that there maybe some recursive structure, e.g. the text of your pdf
is in a XObjectForm which is part of another XObjectForm. You should use the
PDFDebugger to visualize the structure of your pdf.
> The Example RemoveAllText does not remove text from certain pdf
> ---------------------------------------------------------------
>
> Key: PDFBOX-1662
> URL: https://issues.apache.org/jira/browse/PDFBOX-1662
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 1.8.2
> Environment: Windows XP, JDK 6.x
> Reporter: Paul Heinrich
> Attachments: grswum02s_1306_01.6796191.1.pdf
>
>
> The Example RemoveAllText does not remove text from certain pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira