Parsing PDF requires a lot of random access. It tends to be chunked - move to a particular offset in the file, then parse as a stream (this is why paging makes sense, and why memory mapping is effective until the file gets too big). But the parsing is incredibly complex. You can have nested object structures, lots of alternative representations for the same type of data, etc...
And we definitely don't know size of any of these structures ahead of time. hmmm - just had a thought on IO performance. I'll post that in a separate message so we can keep the technical discussion separate. - K Mike Marchywka-2 wrote: > > > You can have alt implementations in the mean time if you know > size a priori. Ideally you would > like to be able to operate on a stream and scrap random access. > > > -- View this message in context: http://old.nabble.com/performance-follow-up-tp28322800p28345099.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/