Hi, 2010/3/3 Andreas Lehmkühler <[email protected]>: > Von: Johannes Koch<[email protected]> >> How will caching PD objects synchronize their cached PD objects with >> underlying COS data changed by other PD objects? > I don't remember a concrete example, but I'm sure that there are a few. But I > think the > solution is obvious. You just have to reinitialize your cached value when > calling the > corresponding setter.
See PDFont.get/setEncoding for a good example of this. The problem that I believe Johannes is referring to is that there's currently no way for the PD object to know when the underlying COS object (typically a dictionary) is changed, which makes all the current caching solutions a bit brittle. This is also why I was opposed to the earlier idea of extending the current COSObjectable mechanism and would in fact prefer to avoid it as much as possible. PS. I've been trying (see PDFBOX-626) to reduce the memory impact of the full COS object hierarchy that we keep in memory for all PDF documents, but it looks like there are no more big improvements to be made without some radical design changes. One thing I've been considering is making the PD model the canonical data layer and using COS objects only during parsing and serialization. This should give us dramatic memory improvements for text extraction and rendering use cases, but may be troublesome for all use cases where existing PDF documents are being modified. Perhaps we should consider creating an optimized "read only" version of PDFBox in addition to the fully featured version we now have. BR, Jukka Zitting
