Timo Boehme <timo.boe...@ontochem.com> hat am 16. Juli 2012 um 18:02 geschrieben:
> Hi, > > Am 16.07.2012 17:48, schrieb Andreas Lehmkuehler: > > Am 10.07.2012 09:16, schrieb Timo Boehme: > >> ... > >> looks good to me. Some mention about the preflight module which will be > >> integrated in the next major release? > > Thanks for your comment. I added some information about preflight/xmpbox > > as you maybe already have seen. > > Yes, thank you very much for all the time spending on administrative > tasks/improvements on PDFBOX. > > For the next time I plan to improve on the broken document robustness of > the parser by doing a first scan over the document (in case of parsing > failure), collecting object start/end points and using them to repair > xref table. Seems to be necessary, at least for some PDFs. :-( > Another task I would like to do is reducing the amount of memory needed > by using the existing file as input stream resource instead of copying > an object stream first to a temporary buffer (in cases where an input > file exists). > Maybe for this we should change from assuming to have an input stream to > assuming we have an input file and if we have an input stream a > temporary file is created on the fly - WDYT? I guess internally we have to use something abstract and as everything is a stream the might be a good choice. AFAIU the current implementation, one reason for the usage of a temporary buffer is the fact that the data is modified (decompressing, decrypting) and we must not alter the input data. It is perhaps a better idea to somehow split the inputstream and the unfilteredinputstream, e.g. read from the inputstream every time an object is dereferenced and store the (decompressed) data in the corresponding object. > > > Kind regards, > Timo BR Andreas Lehmkühler