Re: Replace methods using an InputStream from Loader.loadPDF

sahy...@fileaffairs.de Sun, 31 Jul 2022 06:31:58 -0700

Hi,

I'm very much in favour of simpliying as much as possible and not doing
too much magic under the hood which can be better handled individually
by a developer. This will also leave room for an individual to come up
with an optimized version for specific uses cases.


+1 from my side.

BR
Maruan


Am Sonntag, dem 31.07.2022 um 15:18 +0200 schrieb Andreas Lehmkuehler:
> Hi fellow devs,
> 
> 
> there was a discussion on JIRA [1] about the changed behaviour of the
> parser due 
> to the removal of the ScratchFileBuffer when reading a pdf.
> 
> Additionally there was the post "High memory usage with pdfbox 3" on 
> users@pdfbox targeting the very same topic
> 
> After explaining myself and my changes twice I came to conclusion
> that I'm going 
> to have to do so in the future again and again if we don't change the
> API of 
> Loader.loadPDF
> 
> People simply realize that all methods to be used for loading a pdf
> are moved 
> from PDDocument to Loader. They expect the very same behaviour when
> using a 
> similar api and that is understandable from a user point of view.
> 
> We have to remove the loadPDF variants using InputStream and replace
> them with 
> RandomAccessRead.
> 
> It it comes to InputStreams users have to decide how to procide:
> * copy the InputStream to memory by using RandomAccessReadBuffer
> * copy the InputStream to a file and use RandomAccessReadBufferedFile
> or 
> RandomAccessReadMemoryMappedFile
> 
> This would make it more transparent what happens under the hood when
> using the 
> different kinds of loadPDF methods:
> 
> * a byte array as source is already in memory and the obvious choice
> is to use 
> RandomAccessReadBuffer as a wrapper
> * a file as source targets a local file and the most obvious choice
> is to use 
> RandomAccessReadBufferedFile as a wrapper. We should document that as
> the other 
> alternative RandomAccessReadMemoryMappedFile is offered in this case
> * RandomAccessRead as source is the most obvious one and the user
> decides how to 
> create it. Additionally is ist possible to implement some own caching
> loading 
> and/or mechanism
> 
> I know, this will lead to some changes in the codebase of our users,
> but they 
> have to do it in any case as the method was moved, so why not change
> the data 
> type as well
> 
> 
> WDYT? Am I missing something?
> 
> Andreas
> 
> [1] https://issues.apache.org/jira/browse/PDFBOX-5462
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Replace methods using an InputStream from Loader.loadPDF

Reply via email to