Am 01.08.22 um 20:20 schrieb Tilman Hausherr:
+1 but
- the explanation below (when to use which class) should be in the javadoc
- the removal should be in the migration guide
It is already on my TODO list
Andreas
Tilman
Am 31.07.2022 um 15:18 schrieb Andreas Lehmkuehler:
Hi fellow devs,
there was a discussion on JIRA [1] about the changed behaviour of the parser
due to the removal of the ScratchFileBuffer when reading a pdf.
Additionally there was the post "High memory usage with pdfbox 3" on
users@pdfbox targeting the very same topic
After explaining myself and my changes twice I came to conclusion that I'm
going to have to do so in the future again and again if we don't change the
API of Loader.loadPDF
People simply realize that all methods to be used for loading a pdf are moved
from PDDocument to Loader. They expect the very same behaviour when using a
similar api and that is understandable from a user point of view.
We have to remove the loadPDF variants using InputStream and replace them with
RandomAccessRead.
It it comes to InputStreams users have to decide how to procide:
* copy the InputStream to memory by using RandomAccessReadBuffer
* copy the InputStream to a file and use RandomAccessReadBufferedFile or
RandomAccessReadMemoryMappedFile
This would make it more transparent what happens under the hood when using the
different kinds of loadPDF methods:
* a byte array as source is already in memory and the obvious choice is to use
RandomAccessReadBuffer as a wrapper
* a file as source targets a local file and the most obvious choice is to use
RandomAccessReadBufferedFile as a wrapper. We should document that as the
other alternative RandomAccessReadMemoryMappedFile is offered in this case
* RandomAccessRead as source is the most obvious one and the user decides how
to create it. Additionally is ist possible to implement some own caching
loading and/or mechanism
I know, this will lead to some changes in the codebase of our users, but they
have to do it in any case as the method was moved, so why not change the data
type as well
WDYT? Am I missing something?
Andreas
[1] https://issues.apache.org/jira/browse/PDFBOX-5462
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org