> On 7 Jan 2015, at 15:01, Leonard Rosenthol <[email protected]> wrote: > > I admit to never actually looking a the PDFBox Cos implementation, but > every other implementation that I’ve worked with (and it’s been quite a > few) have a VERY deep connection between the object and the source > document. This is necessary in order to enable various features such as > “on-demand read” (especially important for large arrays and streams), > incremental updates and more. > > It’s your library, but I would personally strongly recommended NOT going > in this direction…
Thanks, however I’m not proposing any changes to how PDFBox works. We already do on-demand reading for COS streams. When I say that there is nothing about a COS object that is specific to a given document, I mean only that there’s no problem sharing our Java COSStream instances between two or more COSDocument instances. This is somewhat similar to the issue of sharing PDPage instances between threads in Java (not safe). It’s a specific detail of PDFBox, rather than something to do with COS in general. Currently we do a clone() on all COS object instances when copying them to another document because someone assumed that COS objects read from one document cannot be written to another document. However, as I think you’re trying to point out, because of on-demand read, our COSStream class maintains a reference to its source document, which means that a a COSStream is self-contained and the same Java COS object can be safely shared between different document instances. Indeed, the on-demands read you mention is exactly what I’m counting on. One could imagine an alternative where a COSStream contained only an integer offset, so that simply copying a COSStream instance between documents would result in corruption when that offset is later used to lookup data in the target document. Fortunately PDFBox doesn’t use that design, but some of the code we have which uses clone() works on that assumption, and I’d like to get rid of that. Cheers — John > Leonard > > > > > On 1/7/15, 9:56 PM, "Andreas Lehmkuehler" <[email protected]> wrote: > >> Hi, >> >> Am 07.01.2015 um 22:42 schrieb John Hewson: >>> Hi All, >>> >>> I’d like to bring PDFBOX-2592 to the attention of the dev mailing list. >>> >>> A number of users on the mailing list have asked about how to import >>> pages from other PDFs as forms, our current solution is LayerUtility, >>> which is depends on PDFCloneUtility. >>> >>> However, the design of the COS API allows for sharing of COS objects >>> between documents (in the same thread). So there’s no need for all the >>> copying and cloning. With only a few minor changes we could get this >>> working robustly. It might also help simplify splitting and merging. >>> >>> I like this idea a lot and it’s pretty simple - any thoughts? >> We should wait until the COSStream is refactored (split compressed and >> umcompressed stream, optimize the data handling memory vs. file) and see >> if your >> idea will still work. >> >>> -- John >> >> BR >> Andreas Lehmkühler
