Replace methods using an InputStream from Loader.loadPDF

Andreas Lehmkuehler Sun, 31 Jul 2022 06:18:33 -0700

Hi fellow devs,

there was a discussion on JIRA [1] about the changed behaviour of the parser dueto the removal of the ScratchFileBuffer when reading a pdf.

Additionally there was the post "High memory usage with pdfbox 3" onusers@pdfbox targeting the very same topic

After explaining myself and my changes twice I came to conclusion that I'm goingto have to do so in the future again and again if we don't change the API ofLoader.loadPDF

People simply realize that all methods to be used for loading a pdf are movedfrom PDDocument to Loader. They expect the very same behaviour when using asimilar api and that is understandable from a user point of view.

We have to remove the loadPDF variants using InputStream and replace them withRandomAccessRead.


It it comes to InputStreams users have to decide how to procide:
* copy the InputStream to memory by using RandomAccessReadBuffer

* copy the InputStream to a file and use RandomAccessReadBufferedFile orRandomAccessReadMemoryMappedFile

This would make it more transparent what happens under the hood when using thedifferent kinds of loadPDF methods:

* a byte array as source is already in memory and the obvious choice is to useRandomAccessReadBuffer as a wrapper* a file as source targets a local file and the most obvious choice is to useRandomAccessReadBufferedFile as a wrapper. We should document that as the otheralternative RandomAccessReadMemoryMappedFile is offered in this case* RandomAccessRead as source is the most obvious one and the user decides how tocreate it. Additionally is ist possible to implement some own caching loadingand/or mechanism

I know, this will lead to some changes in the codebase of our users, but theyhave to do it in any case as the method was moved, so why not change the datatype as well



WDYT? Am I missing something?

Andreas

[1] https://issues.apache.org/jira/browse/PDFBOX-5462

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Replace methods using an InputStream from Loader.loadPDF

Reply via email to