I'd love to discuss specific ideas on prediction - are you familiar enough
with the PDF spec to provide any suggestions?

Some obvious ones are the xref table - but iText reads that entirely into
memory one time and holds onto it, so it seems unlikely that pre-fetch would
do much there (other than having the last 1MB of the file be the first block
pre-fetched - but any sort of paging implementation would handle that
already).

The rest... well, from my experience with this, you've got objects that
refer to other objects that refer to other objects.  And there's really no
way to know where in the object graph you need to go until you parse and
then go there.  So I think I'll need some concrete examples of how this
might be done with PDF structure - just to get my creativity going!

- K


Mike Marchywka-2 wrote:
> 
> 
> 
>>
>>
>> Parsing PDF requires a lot of random access. It tends to be chunked -
>> move
>> to a particular offset in the file, then parse as a stream (this is why
>> paging makes sense, and why memory mapping is effective until the file
>> gets
> 
> Yes, that is great but instead of a generic MRU approach are
> there better predictions you can make, even start loaing pages
> before having to wait later etc? Maybe multithreading makes
> sense here. 
>  
>  
>  
>> too big). But the parsing is incredibly complex. You can have nested
>> object structures, lots of alternative representations for the same type
>> of
>> data, etc...
> 
> surely there are rules and I'm sure this topic has been beaten
> to death in many CS courses ( as have stats LOL). Profiling 
> should point to some suspects. Algorithmic optimizations may
> be possible as maybe just coding changes. Most compilers
> operate sequentially on input in maybe multiple passes I'm
> sure you can find ideas easily in a vraiety of sources.
>  
>  
>>
>> And we definitely don't know size of any of these structures ahead of
>> time.
> 
> well, you don;t need to know if a week ahead of time, but
> you could maybe waste an access or two finding sizes if that
> can be done more quickly than just reading everything. 
>  
>                                         
> _________________________________________________________________
> Hotmail is redefining busy with tools for the New Busy. Get more from your
> inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
> ------------------------------------------------------------------------------
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.itextpdf.com/book/
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/performance-follow-up-tp28322800p28346601.html
Sent from the iText - General mailing list archive at Nabble.com.


------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to