Parsing PDF requires a lot of random access.  It tends to be chunked - move
to a particular offset in the file, then parse as a stream (this is why
paging makes sense, and why memory mapping is effective until the file gets
too big).  But the parsing is incredibly complex.  You can have nested
object structures, lots of alternative representations for the same type of
data, etc... 

And we definitely don't know size of any of these structures ahead of time.


hmmm - just had a thought on IO performance.  I'll post that in a separate
message so we can keep the technical discussion separate.

- K


Mike Marchywka-2 wrote:
> 
> 
> You can have alt implementations in the mean time if you know
> size a priori. Ideally you would
> like to be able to operate on a stream and scrap random access.
>  
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28345099.html
Sent from the iText - General mailing list archive at Nabble.com.


------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to