[jira] [Commented] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF

Michael Klink (Jira) Wed, 03 Aug 2022 02:26:08 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574627#comment-17574627
 ]


Michael Klink commented on PDFBOX-5483:
---------------------------------------

Indeed, I didn't necessarily mean keeping the original method signature but at 
least keeping it simple.

E.g. one can introduce an enumeration {{PdfCaching}} with values {{inMemory}}, 
{{inFile}}, and {{inMemoryMappedFile}}. Then one could change
{code:java}
public static PDDocument loadPDF(InputStream input) throws IOException
{code}
to
{code:java}
public static PDDocument loadPDF(InputStream input, PdfCaching pdfCaching) 
throws IOException
{code}

IMO it is more friendly and less frustrating to have to write
{code:java}
PDDocument pdDocument = Loader.loadPdf(inputStream, PdfCaching.inMemory);
{code}
than
{code:java}
PDDocument pdDocument = 
Loader.loadPDF(RandomAccessReadBuffer.createBufferFromStream(inputStream));
{code}
in particular as IDEs often support enumeration value proposals there.

To keep things in one place, the actual code for creating the 
{{RandomAccessRead}} for an {{InputStream}} may be a method of the enumeration.

> Replace methods using an InputStream from Loader.loadPDF
> --------------------------------------------------------
>
>                 Key: PDFBOX-5483
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5483
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Andreas Lehmkühler
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>
> As discussed on dev@pdfbox
> {quote}
> We have to remove the loadPDF variants using InputStream and replace them 
> with RandomAccessRead.
> If it comes to InputStreams users have to decide how to procide:
> * copy the InputStream to memory by using RandomAccessReadBuffer
> * copy the InputStream to a file and use RandomAccessReadBufferedFile or 
> RandomAccessReadMemoryMappedFile
> This would make it more transparent what happens under the hood when using 
> the different kinds of loadPDF methods:
> * a byte array as source is already in memory and the obvious choice is to 
> use RandomAccessReadBuffer as a wrapper
> * a file as source targets a local file and the most obvious choice is to use 
> RandomAccessReadBufferedFile as a wrapper. We should document that as the 
> other alternative RandomAccessReadMemoryMappedFile is offered in this case
> * RandomAccessRead as source is the most obvious one and the user decides how 
> to create it. Additionally is ist possible to implement some own caching 
> loading and/or mechanism
> {quote}
> see PDFBOX-5462 and [High memory usage with pdfbox 
> 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF

Reply via email to