Re: [iText-questions] getPageContent bug

Bruno Lowagie Sun, 18 Jun 2006 01:00:14 -0700

sjf wrote:

> Hi,
> I don't know if I should send the bug report here.


This is the best place to report bugs.

> I download the latest itext and itextsharp and find a bug. If I burst
> a PDF file into pages and merge them into one PDF file again using
> pdfsam (http://sourceforge.net/projects/pdfsam), getPageContent will
> not return the correct content of the remerged PDF file, while
> ExtractText from PDFbox(www.pdfbox.org <http://www.pdfbox.org>) can
> extract all the text correctly from the same PDF file.

It's not a bug.
You are mixing two different concepts.
1. you DO get the correct content of the remerged PDF,
but it's different from the content of the original PDF.
In the merged PDF the content is added as a PDF Form XObject.
2. The text extracted with PDFBox is the text that is in the
Form XObject. PDFBox parses the page content and discovers
that the real content is in a different object. It gets
that object to retrieve the text.
br,
Bruno

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] getPageContent bug

Reply via email to