Please post this to the mailing-list. sjf wrote:
> Hi, Mr Bruno Lowagie, > > > > I download the latest itext and itextsharp and find a bug. If I burst > > > a PDF file into pages and merge them into one PDF file again using > > > pdfsam (http://sourceforge.net/projects/pdfsam), getPageContent will > > > not return the correct content of the remerged PDF file, while > > > ExtractText from PDFbox(www.pdfbox.org <http://www.pdfbox.org> > <http://www.pdfbox.org>) can > > > extract all the text correctly from the same PDF file. > > > > It's not a bug. > > You are mixing two different concepts. > > 1. you DO get the correct content of the remerged PDF, > > but it's different from the content of the original PDF. > > In the merged PDF the content is added as a PDF Form XObject. > > 2. The text extracted with PDFBox is the text that is in the > > Form XObject. PDFBox parses the page content and discovers > > that the real content is in a different object. It gets > > that object to retrieve the text. > Is there any examples about how to get the text in the Form XObject? > A New Bug: > When I use itextsharp to getPageContent, I got an Exception: > System.IO.EndOfStreamException: Trying to read content after the end > of the stream > iTextSharp.text.pdf.RandomAccessFileOrArray.ReadFully(Byte[] b, Int32 > off, Int32 len) > iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw(PRStream stream, > RandomAccessFileOrArray file) > iTextSharp.text.pdf.PdfReader.GetStreamBytes(PRStream stream, > RandomAccessFileOrArray file) > iTextSharp.text.pdf.PdfReader.GetPageContent(Int32 pageNum, > RandomAccessFileOrArray file) > But there IS a picture(and nothing else) in the page and > GetImportedPage runs well. > Thanks, > sjf
_______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions
