[ 
https://issues.apache.org/jira/browse/PDFBOX-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758969#comment-13758969
 ] 

Andreas Lehmkühler commented on PDFBOX-1662:
--------------------------------------------

As I already said, should be quite simple:

- get the resources of each page
- iterate through the XObjects and filter all XObjectForm objects (see 
ExtractImages.java)
- the stream of the XObjectForm is the content and should be processed exactly 
like the content of the page

Be aware that there maybe some recursive structure, e.g. the text of your pdf 
is in a XObjectForm which is part of another XObjectForm. You should use the 
PDFDebugger to visualize the structure of your pdf.
                
> The Example RemoveAllText does not remove text from certain pdf
> ---------------------------------------------------------------
>
>                 Key: PDFBOX-1662
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1662
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.8.2
>         Environment: Windows XP, JDK 6.x
>            Reporter: Paul Heinrich
>         Attachments: grswum02s_1306_01.6796191.1.pdf
>
>
> The Example RemoveAllText does not remove text from certain pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to