Please post this to the mailing-list.

sjf wrote:

> Hi, Mr Bruno Lowagie,
>
> > > I download the latest itext and itextsharp and find a bug. If I burst
> > > a PDF file into pages and merge them into one PDF file again using
> > > pdfsam (http://sourceforge.net/projects/pdfsam), getPageContent will
> > > not return the correct content of the remerged PDF file, while
> > > ExtractText from PDFbox(www.pdfbox.org <http://www.pdfbox.org>
> <http://www.pdfbox.org>) can
> > > extract all the text correctly from the same PDF file.
> >
> > It's not a bug.
> > You are mixing two different concepts.
> > 1. you DO get the correct content of the remerged PDF,
> > but it's different from the content of the original PDF.
> > In the merged PDF the content is added as a PDF Form XObject.
> > 2. The text extracted with PDFBox is the text that is in the
> > Form XObject. PDFBox parses the page content and discovers
> > that the real content is in a different object. It gets
> > that object to retrieve the text.
> Is there any examples about how to get the text in the Form XObject?
> A New Bug:
> When I use itextsharp to getPageContent, I got an Exception:
> System.IO.EndOfStreamException: Trying to read content after the end
> of the stream
> iTextSharp.text.pdf.RandomAccessFileOrArray.ReadFully(Byte[] b, Int32
> off, Int32 len)
> iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw(PRStream stream,
> RandomAccessFileOrArray file)
> iTextSharp.text.pdf.PdfReader.GetStreamBytes(PRStream stream,
> RandomAccessFileOrArray file)
> iTextSharp.text.pdf.PdfReader.GetPageContent(Int32 pageNum,
> RandomAccessFileOrArray file)
> But there IS a picture(and nothing else) in the page and
> GetImportedPage runs well.
> Thanks,
> sjf



_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to