P.S.: thank you for having investigated and reported this!
Tilman
On 01.02.2024 16:06, Tilman Hausherr wrote:
Oh. I had looked at the trunk and not at 3.0. That was likely a
mistake in refactoring. Fixed in
https://issues.apache.org/jira/browse/PDFBOX-5757
and you get get a snapshot here
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
Tilman
On 01.02.2024 15:25, Lars Juel Jensen wrote:
That is weird.. The source file I am looking at for version 3.0.1
does not
pass it:
-->
https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91
On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr <thaush...@t-online.de>
wrote:
On 31.01.2024 16:19, Lars Juel Jensen wrote:
Well that's my problem.. It works with PDFBox2 with reasonable sized
files.
When it comes to the big ones it crashes.. So reading the migration
guide
for PDFBox3.0 I thought I saw some light in the tunnel as it says I
can
create my own reader and stream cache. I see that I can provide my own
RandomAccessReader when I call Loader.loadPDF, but the loadPDF method
that
takes a StreamCacheCreate function does not work as promised as the
StreamCacheCreateFunction is not passed from PDFParser to COSParser in
the
PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I
guess
this is a bug?
I don't know if there is a bug, but it is passed:
public PDFParser(RandomAccessRead source, String
decryptionPassword, InputStream keyStore,
String alias, StreamCacheCreateFunction
streamCacheCreateFunction) throws IOException
{
super(source, decryptionPassword, keyStore, alias,
streamCacheCreateFunction);
}
and here's COSParser:
public COSParser(RandomAccessRead source, String password,
InputStream keyStore,
String keyAlias, StreamCacheCreateFunction
streamCacheCreateFunction) throws IOException
{
super(source);
this.password = password;
this.keyAlias = keyAlias;
fileLen = source.length();
keyStoreInputStream = keyStore;
init(streamCacheCreateFunction);
}
If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you
create a scenario to reproduce this? Preferably without using a
container.
Tilman
On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr
<thaush...@t-online.de>
wrote:
On 31.01.2024 14:48, Lars Juel Jensen wrote:
This creates another problem for me. I am running PDFBox in a
kubernetes
cluster on premises with limited resources. I can not setup
persistent
volume claims nor ephemeral volumes, and I can not change how my
pods
are
started. I have limited resources and an emptyDir that is mounted on
/tmp
where the temporary files go. The emptyDir is mapped to a portion of
the
kubernetes node's memory, and this memory is shared with many other
services. All in all - I need to keep a very low memory and tempFile
footprint, hence the InputStream. Using RandomAccessReadBuffer
with an
InputStream loads the entire PDF into memory, and I can encounter
PDF
documents that can be over 1GB in size. So loading everything into
memory
is not an option.
You can try to create your own class extending RandomAccessRead.
If your /tmp is mapped on main memory, then it doesn't make sense
to use
a temp file at all, you're just wasting time.
Btw PDFBox 2 was also loading the whole PDF file into memory (or
into a
scratch file) and had an even bigger footprint because it was also
parsing the complete PDF. So if your project was working with
PDFBox 2
then it should work with PDFBox 3.
Tilman
On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <
thaush...@t-online.de>
wrote:
On 31.01.2024 09:50, Lars Juel Jensen wrote:
In PDFBox2 I could do:
PDDocument.load(inputStream,
MemoryUsageSetting.setupTempFileOnly())
But there is no equivalent to this in PDFBox3. How do I read a PDF
from
an
inputstream?
|Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
IOUtils.createTempFileOnlyStreamCache());|
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org