Hi, I'm very much in favour of simpliying as much as possible and not doing too much magic under the hood which can be better handled individually by a developer. This will also leave room for an individual to come up with an optimized version for specific uses cases.
+1 from my side. BR Maruan Am Sonntag, dem 31.07.2022 um 15:18 +0200 schrieb Andreas Lehmkuehler: > Hi fellow devs, > > > there was a discussion on JIRA [1] about the changed behaviour of the > parser due > to the removal of the ScratchFileBuffer when reading a pdf. > > Additionally there was the post "High memory usage with pdfbox 3" on > users@pdfbox targeting the very same topic > > After explaining myself and my changes twice I came to conclusion > that I'm going > to have to do so in the future again and again if we don't change the > API of > Loader.loadPDF > > People simply realize that all methods to be used for loading a pdf > are moved > from PDDocument to Loader. They expect the very same behaviour when > using a > similar api and that is understandable from a user point of view. > > We have to remove the loadPDF variants using InputStream and replace > them with > RandomAccessRead. > > It it comes to InputStreams users have to decide how to procide: > * copy the InputStream to memory by using RandomAccessReadBuffer > * copy the InputStream to a file and use RandomAccessReadBufferedFile > or > RandomAccessReadMemoryMappedFile > > This would make it more transparent what happens under the hood when > using the > different kinds of loadPDF methods: > > * a byte array as source is already in memory and the obvious choice > is to use > RandomAccessReadBuffer as a wrapper > * a file as source targets a local file and the most obvious choice > is to use > RandomAccessReadBufferedFile as a wrapper. We should document that as > the other > alternative RandomAccessReadMemoryMappedFile is offered in this case > * RandomAccessRead as source is the most obvious one and the user > decides how to > create it. Additionally is ist possible to implement some own caching > loading > and/or mechanism > > I know, this will lead to some changes in the codebase of our users, > but they > have to do it in any case as the method was moved, so why not change > the data > type as well > > > WDYT? Am I missing something? > > Andreas > > [1] https://issues.apache.org/jira/browse/PDFBOX-5462 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > -- -- Maruan Sahyoun FileAffairs GmbH Josef-Schappe-Straße 21 40882 Ratingen Tel: +49 (2102) 89497 88 Fax: +49 (2102) 89497 91 sahy...@fileaffairs.de www.fileaffairs.de Geschäftsführer: Maruan Sahyoun Handelsregister: AG Düsseldorf, HRB 53837 UST.-ID: DE248275827 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org