[
https://issues.apache.org/jira/browse/PDFBOX-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144816#comment-16144816
]
Tilman Hausherr commented on PDFBOX-3912:
-----------------------------------------
Well, the problem is with the creator of that PDF. Maybe he/she got confused
and made content invisible instead of deleting it. PDFBox just extracts what's
there. It can't know that something is "invisible".
> Command line : ExtractText, Duplicated text
> -------------------------------------------
>
> Key: PDFBOX-3912
> URL: https://issues.apache.org/jira/browse/PDFBOX-3912
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.7
> Reporter: Hasan Karaoğlu
> Attachments: bugzilla867751.html
>
>
> When I convert some pages of a pdf file to html, it gives me duplicated
> pages.For example, I convert seventh page of a pdf file. It is converted. But
> it also contains sixth page's content.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]