[jira] [Updated] (PDFBOX-5526) Apply subsampling and region to masks

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5526:
---
Fix Version/s: 3.0.0 PDFBox

> Apply subsampling and region to masks
> -
>
> Key: PDFBOX-5526
> URL: https://issues.apache.org/jira/browse/PDFBOX-5526
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.26
>Reporter: Velislava Yanchina
>Assignee: Andreas Lehmkühler
>Priority: Trivial
> Fix For: 2.0.28, 3.0.0 PDFBox
>
>
> When {{{}{}}}`PDImageXObject.{{{}getImage()`{}}} gets invoked with 
> subsampling and region, internally it loads the entire mask into memory: 
> [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465]
> and applies the entire mask on the subsampled image.
> Which is extra work and can cause `OOM` exceptions. 
> The proposed optimisation is to pass `region` and `subsampling` params to 
> `PDImageXObject.getOpaqueImage() `here - 
> [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548]
> such that masks are also subsampled before applying them...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5526) Apply subsampling and region to masks

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5526:
--

Assignee: Andreas Lehmkühler

> Apply subsampling and region to masks
> -
>
> Key: PDFBOX-5526
> URL: https://issues.apache.org/jira/browse/PDFBOX-5526
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.26
>Reporter: Velislava Yanchina
>Assignee: Andreas Lehmkühler
>Priority: Trivial
> Fix For: 2.0.28
>
>
> When {{{}{}}}`PDImageXObject.{{{}getImage()`{}}} gets invoked with 
> subsampling and region, internally it loads the entire mask into memory: 
> [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465]
> and applies the entire mask on the subsampled image.
> Which is extra work and can cause `OOM` exceptions. 
> The proposed optimisation is to pass `region` and `subsampling` params to 
> `PDImageXObject.getOpaqueImage() `here - 
> [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548]
> such that masks are also subsampled before applying them...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5479) PDFTextStripper needs 1GB heap for a 3.6 MB pdf

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5479:
---
Issue Type: Improvement  (was: Bug)

> PDFTextStripper needs 1GB heap for a 3.6 MB pdf
> ---
>
> Key: PDFBOX-5479
> URL: https://issues.apache.org/jira/browse/PDFBOX-5479
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Text extraction
>Affects Versions: 2.0.26
> Environment: JDK11.0.2 on MacOS 12.4
>Reporter: Manfred Schauer
>Priority: Minor
> Attachments: heapDump.png, x.pdf
>
>
> Extracting text from the attached x.pdf:
> PDDocument pdDocument = PDDocument.load(new File("/tmp/x.pdf"));
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.getText(pdDocument);
> succeeds with -Xmx1G but throws OOME with -Xmx900m
> Heapdump shows 2923 instances of TrueTypeFont, PDRessources.cache contains 
> SoftReferences to lots of fonts keyed by different COSObjects;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5532) COSString field non-ascii characters

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-5532.
--
Resolution: Not A Problem

Closed as this is not a bug.

Please join the [users-mailing 
list|https://pdfbox.apache.org/mailinglists.html] for further discussions.

> COSString field non-ascii characters
> 
>
> Key: PDFBOX-5532
> URL: https://issues.apache.org/jira/browse/PDFBOX-5532
> Project: PDFBox
>  Issue Type: Bug
>Reporter: David
>Priority: Major
>  Labels: how-to
>
>  
> Hello,
> I am reading a pdf document but in the COSString field non-ascii characters 
> are being retrieved. What can be the motive? I am using version 
> pdfbox-2.0.24.jar
> This would be an example of the pdf document parsed:
> {code}
> COSInt\{50} 
> COSInt\{0} 
> PDFOperator\{Td} 
> COSString\{åÅÕãÁâ@} 
> PDFOperator\{Tj} 
> COSFloat\{770.18} 
> COSInt\{0} 
> PDFOperator\{Td} 
> COSString\{×–Ž–©@} 
> PDFOperator\{Tj} 
> COSFloat\{520.21} 
> COSInt\{0}
> {code}
> Function java:
> {code}
>  public static PDDocument replaceText(PDDocument document, String 
> searchString, String replacement) throws IOException {
>   
>   PDPageTree pages = document.getDocumentCatalog().getPages();
>   for (PDPage page : pages) {
>   
>   PDFStreamParser parser = new PDFStreamParser(page);
>   parser.parse();
>   List tokens = parser.getTokens();
>   for (int j = 0; j < tokens.size(); j++) {
>   Object next = tokens.get(j);
>  
>   if (next instanceof Operator) {
>   Operator op = (Operator) next;
>
>if (op.getName().equals("Tj")) {
>   COSString previous = (COSString) 
> tokens.get(j - 1);  
>   String string = previous.getString();
>   System.out.println("previous:=" + string);
>   
>   
>   if (string.equals(searchString)){
>COSString sx = new 
> COSString(replacement); 
>   previous.setValue(sx.getBytes());
>   
>   }
>   }
>   }
>   }
>   // now that the tokens are updated we will replace the 
> page content stream.
>   PDStream updatedStream = new PDStream(document);
>   OutputStream out = updatedStream.createOutputStream();
>   ContentStreamWriter tokenWriter = new 
> ContentStreamWriter(out);
>   tokenWriter.writeTokens(tokens);
>   page.setContents(updatedStream);
>   out.close();
>   
>   
>   }
>   return document;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5532) COSString field non-ascii characters

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5532:
---
Labels: how-to  (was: )

> COSString field non-ascii characters
> 
>
> Key: PDFBOX-5532
> URL: https://issues.apache.org/jira/browse/PDFBOX-5532
> Project: PDFBox
>  Issue Type: Bug
>Reporter: David
>Priority: Major
>  Labels: how-to
>
>  
> Hello,
> I am reading a pdf document but in the COSString field non-ascii characters 
> are being retrieved. What can be the motive? I am using version 
> pdfbox-2.0.24.jar
> This would be an example of the pdf document parsed:
> {code}
> COSInt\{50} 
> COSInt\{0} 
> PDFOperator\{Td} 
> COSString\{åÅÕãÁâ@} 
> PDFOperator\{Tj} 
> COSFloat\{770.18} 
> COSInt\{0} 
> PDFOperator\{Td} 
> COSString\{×–Ž–©@} 
> PDFOperator\{Tj} 
> COSFloat\{520.21} 
> COSInt\{0}
> {code}
> Function java:
> {code}
>  public static PDDocument replaceText(PDDocument document, String 
> searchString, String replacement) throws IOException {
>   
>   PDPageTree pages = document.getDocumentCatalog().getPages();
>   for (PDPage page : pages) {
>   
>   PDFStreamParser parser = new PDFStreamParser(page);
>   parser.parse();
>   List tokens = parser.getTokens();
>   for (int j = 0; j < tokens.size(); j++) {
>   Object next = tokens.get(j);
>  
>   if (next instanceof Operator) {
>   Operator op = (Operator) next;
>
>if (op.getName().equals("Tj")) {
>   COSString previous = (COSString) 
> tokens.get(j - 1);  
>   String string = previous.getString();
>   System.out.println("previous:=" + string);
>   
>   
>   if (string.equals(searchString)){
>COSString sx = new 
> COSString(replacement); 
>   previous.setValue(sx.getBytes());
>   
>   }
>   }
>   }
>   }
>   // now that the tokens are updated we will replace the 
> page content stream.
>   PDStream updatedStream = new PDStream(document);
>   OutputStream out = updatedStream.createOutputStream();
>   ContentStreamWriter tokenWriter = new 
> ContentStreamWriter(out);
>   tokenWriter.writeTokens(tokens);
>   page.setContents(updatedStream);
>   out.close();
>   
>   
>   }
>   return document;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5531) wrong image data is extracted from PDF having single image

2022-11-01 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5531.
---
Resolution: Cannot Reproduce

> wrong image data is extracted from PDF having single image
> --
>
> Key: PDFBOX-5531
> URL: https://issues.apache.org/jira/browse/PDFBOX-5531
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.26
>Reporter: Komal
>Priority: Major
>
> Dear Concerned,
> We are trying to extract image from PDF having single image with following 
> properties: CCITTFaxDecode decoded G4 compression, 150 dpi but when following 
> code of PDFBox is used than we get LZW image with 96 dpi
>   PDDocument document = PDDocument.load(new 
> File("D:\\extractImage\\in\\20211125174048BT Exception Documents.pdf"));
>         PDPageTree list = document.getPages();
>         for (PDPage page : list) {
>             PDResources pdResources = page.getResources();
>             for (COSName c : pdResources.getXObjectNames()) {
>                 PDXObject o = pdResources.getXObject(c);
>                 if (o instanceof 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) {
>  BufferedImage img=   
> ((org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject)o).getImage();
> }
> }
> }
>  
> Also we we try to get raw stream byte data of image using following method , 
> the byte array coming is incorrect.
> PDPage page1 = reader.getPage(pageNumber-1);
>             PDResources pdResources = page1.getResources();
>             for (COSName c : pdResources.getXObjectNames()) {
>                 PDXObject o = pdResources.getXObject(c);
>                 PDImageXObject ob = (PDImageXObject)o;
>             ImageXObject xObj1 = new ImageXObject();
>             xObj1.xObject = (PDImageXObject) o;
> COSStream imageStream = ob.getCOSObject();
>             PDStream stream = (new PDStream(imageStream));
>         //    BufferedImage image = ob.getImage();
>             byte[] streamDataBuffer = stream.toByteArray(); 
>  
> kindly provide a method which can return black and white image object and 
> image raw stream byte array.
> Thanks in advance.
> Regards,
> Komal Walia



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



PDFBOX-5538: introduction of a new interface/functional interface to handle a stream cache

2022-11-01 Thread Andreas Lehmkuehler

Hi,

the new on demand parser doesn't use the ScratchFileBuffer anymore and my idea 
was to overhaul/remove the usage of the ScratchFileBuffer for the creation of 
new COSStreams as well. My plan was to wait for the 4.0 release. A couple of 
days ago I realize it might be a good idea to introduce an interface to make the 
usage of that stream a little bit more flexible. An I'd like to do it in 3.0.


In the end it wasn't that hard to do, see PDFBOX-5538 for details. But I had to 
adjust some of the public method signatures such as those load-methods using 
MemoryUsageSetting as parameter.


The main benefit is that the usage of the stream cache within the parser is 
decoupled from the actual implementation. Furthermore users have the opportunity 
to implement their own cache. Most likely we don't have to change the signature 
of the loader methods in a possible 4.0 version.



Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF

2022-11-01 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627181#comment-17627181
 ] 

Andreas Lehmkühler edited comment on PDFBOX-5483 at 11/1/22 2:18 PM:
-

I've refactored some of the ScratchFile-stuff and had to adjust the method 
signatures of the Loader, see PDFBOX-5538
I'd prefer to do it now instead of waiting for 4.0 to change the signature 
another time.



was (Author: lehmi):
I've refactored some of the ScratchFile-stuff and had to adjust the method 
signatures of the Loader, see PDFBOX-5538

> Replace methods using an InputStream from Loader.loadPDF
> 
>
> Key: PDFBOX-5483
> URL: https://issues.apache.org/jira/browse/PDFBOX-5483
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As discussed on dev@pdfbox
> {quote}
> We have to remove the loadPDF variants using InputStream and replace them 
> with RandomAccessRead.
> If it comes to InputStreams users have to decide how to procide:
> * copy the InputStream to memory by using RandomAccessReadBuffer
> * copy the InputStream to a file and use RandomAccessReadBufferedFile or 
> RandomAccessReadMemoryMappedFile
> This would make it more transparent what happens under the hood when using 
> the different kinds of loadPDF methods:
> * a byte array as source is already in memory and the obvious choice is to 
> use RandomAccessReadBuffer as a wrapper
> * a file as source targets a local file and the most obvious choice is to use 
> RandomAccessReadBufferedFile as a wrapper. We should document that as the 
> other alternative RandomAccessReadMemoryMappedFile is offered in this case
> * RandomAccessRead as source is the most obvious one and the user decides how 
> to create it. Additionally is ist possible to implement some own caching 
> loading and/or mechanism
> {quote}
> see PDFBOX-5462 and [High memory usage with pdfbox 
> 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF

2022-11-01 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627181#comment-17627181
 ] 

Andreas Lehmkühler commented on PDFBOX-5483:


I've refactored some of the ScratchFile-stuff and had to adjust the method 
signatures of the Loader, see PDFBOX-5538

> Replace methods using an InputStream from Loader.loadPDF
> 
>
> Key: PDFBOX-5483
> URL: https://issues.apache.org/jira/browse/PDFBOX-5483
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As discussed on dev@pdfbox
> {quote}
> We have to remove the loadPDF variants using InputStream and replace them 
> with RandomAccessRead.
> If it comes to InputStreams users have to decide how to procide:
> * copy the InputStream to memory by using RandomAccessReadBuffer
> * copy the InputStream to a file and use RandomAccessReadBufferedFile or 
> RandomAccessReadMemoryMappedFile
> This would make it more transparent what happens under the hood when using 
> the different kinds of loadPDF methods:
> * a byte array as source is already in memory and the obvious choice is to 
> use RandomAccessReadBuffer as a wrapper
> * a file as source targets a local file and the most obvious choice is to use 
> RandomAccessReadBufferedFile as a wrapper. We should document that as the 
> other alternative RandomAccessReadMemoryMappedFile is offered in this case
> * RandomAccessRead as source is the most obvious one and the user decides how 
> to create it. Additionally is ist possible to implement some own caching 
> loading and/or mechanism
> {quote}
> see PDFBOX-5462 and [High memory usage with pdfbox 
> 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches

2022-11-01 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627179#comment-17627179
 ] 

Andreas Lehmkühler commented on PDFBOX-5538:


I've replaced all occurrences of 
ScratchFile/ScratchFileBuffer/MemoryUsageSettings outside of 
org.apache.pdfbox.io with the new interface/functional interface.

> Introduce interface/functional interface to handle stream caches
> 
>
> Key: PDFBOX-5538
> URL: https://issues.apache.org/jira/browse/PDFBOX-5538
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing, Writing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> The parser of PDFBox uses the current implementation of 
> ScratchFile/ScratchFileBuffer exclusively as cache for streams when 
> creating/writing COSStreams.
> Using an interface to handle those caches 
> * makes the parser code of PDFBox independent of a specific implementation of 
> that cache
> * gives any user the opportunity to implement some other kind of a cache
> * the current implementation of ScratchFile/ScratchFileBuffer can still be 
> used as default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5538.

Resolution: Fixed

> Introduce interface/functional interface to handle stream caches
> 
>
> Key: PDFBOX-5538
> URL: https://issues.apache.org/jira/browse/PDFBOX-5538
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing, Writing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> The parser of PDFBox uses the current implementation of 
> ScratchFile/ScratchFileBuffer exclusively as cache for streams when 
> creating/writing COSStreams.
> Using an interface to handle those caches 
> * makes the parser code of PDFBox independent of a specific implementation of 
> that cache
> * gives any user the opportunity to implement some other kind of a cache
> * the current implementation of ScratchFile/ScratchFileBuffer can still be 
> used as default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627178#comment-17627178
 ] 

ASF subversion and git services commented on PDFBOX-5538:
-

Commit 1904976 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904976 ]

PDFBOX-5538: update comment

> Introduce interface/functional interface to handle stream caches
> 
>
> Key: PDFBOX-5538
> URL: https://issues.apache.org/jira/browse/PDFBOX-5538
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing, Writing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> The parser of PDFBox uses the current implementation of 
> ScratchFile/ScratchFileBuffer exclusively as cache for streams when 
> creating/writing COSStreams.
> Using an interface to handle those caches 
> * makes the parser code of PDFBox independent of a specific implementation of 
> that cache
> * gives any user the opportunity to implement some other kind of a cache
> * the current implementation of ScratchFile/ScratchFileBuffer can still be 
> used as default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5534) Remove finalize from ScratchFileBuffer

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5534.

Resolution: Fixed

> Remove finalize from ScratchFileBuffer
> --
>
> Key: PDFBOX-5534
> URL: https://issues.apache.org/jira/browse/PDFBOX-5534
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 2.0.27, 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.28, 3.0.0 PDFBox
>
>
> The usage of finalize is discouraged so that it is a good idea to remove it.
> I've found a way to do so for ScratchFileBuffer. All created buffers are 
> collected within ScratchFile. Once a buffer is closed it is removed from the 
> collection. If ScratchFile is closed when closing the pdf all remaining 
> buffers are closed as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627174#comment-17627174
 ] 

ASF subversion and git services commented on PDFBOX-5538:
-

Commit 1904975 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904975 ]

PDFBOX-5538: use introduced interface/functional interface to handle stream 
cache

> Introduce interface/functional interface to handle stream caches
> 
>
> Key: PDFBOX-5538
> URL: https://issues.apache.org/jira/browse/PDFBOX-5538
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing, Writing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> The parser of PDFBox uses the current implementation of 
> ScratchFile/ScratchFileBuffer exclusively as cache for streams when 
> creating/writing COSStreams.
> Using an interface to handle those caches 
> * makes the parser code of PDFBox independent of a specific implementation of 
> that cache
> * gives any user the opportunity to implement some other kind of a cache
> * the current implementation of ScratchFile/ScratchFileBuffer can still be 
> used as default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627150#comment-17627150
 ] 

ASF subversion and git services commented on PDFBOX-5538:
-

Commit 1904974 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904974 ]

PDFBOX-5538: introduce new interface RandomAccessStreamCache including 
functional interface StreamCacheCreateFunction

> Introduce interface/functional interface to handle stream caches
> 
>
> Key: PDFBOX-5538
> URL: https://issues.apache.org/jira/browse/PDFBOX-5538
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing, Writing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> The parser of PDFBox uses the current implementation of 
> ScratchFile/ScratchFileBuffer exclusively as cache for streams when 
> creating/writing COSStreams.
> Using an interface to handle those caches 
> * makes the parser code of PDFBox independent of a specific implementation of 
> that cache
> * gives any user the opportunity to implement some other kind of a cache
> * the current implementation of ScratchFile/ScratchFileBuffer can still be 
> used as default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches

2022-11-01 Thread Jira
Andreas Lehmkühler created PDFBOX-5538:
--

 Summary: Introduce interface/functional interface to handle stream 
caches
 Key: PDFBOX-5538
 URL: https://issues.apache.org/jira/browse/PDFBOX-5538
 Project: PDFBox
  Issue Type: Improvement
  Components: Parsing, Writing
Affects Versions: 3.0.0 PDFBox
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 3.0.0 PDFBox


The parser of PDFBox uses the current implementation of 
ScratchFile/ScratchFileBuffer exclusively as cache for streams when 
creating/writing COSStreams.

Using an interface to handle those caches 
* makes the parser code of PDFBox independent of a specific implementation of 
that cache
* gives any user the opportunity to implement some other kind of a cache
* the current implementation of ScratchFile/ScratchFileBuffer can still be used 
as default



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no longer needed

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5537.

Resolution: Fixed

> PDFMergerUtility: partioned memory settings no longer needed
> 
>
> Key: PDFBOX-5537
> URL: https://issues.apache.org/jira/browse/PDFBOX-5537
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is 
> safe to remove the partioned memory settings from the legeacy merge mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no longer needed

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627063#comment-17627063
 ] 

ASF subversion and git services commented on PDFBOX-5537:
-

Commit 1904969 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904969 ]

PDFBOX-5537: remove partioned memory settings

> PDFMergerUtility: partioned memory settings no longer needed
> 
>
> Key: PDFBOX-5537
> URL: https://issues.apache.org/jira/browse/PDFBOX-5537
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is 
> safe to remove the partioned memory settings from the legeacy merge mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no longer needed

2022-11-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5537:
---
Summary: PDFMergerUtility: partioned memory settings no longer needed  
(was: PDFMergerUtility: partioned memory settings no lomnger needed)

> PDFMergerUtility: partioned memory settings no longer needed
> 
>
> Key: PDFBOX-5537
> URL: https://issues.apache.org/jira/browse/PDFBOX-5537
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is 
> safe to remove the partioned memory settings from the legeacy merge mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no lomnger needed

2022-11-01 Thread Jira
Andreas Lehmkühler created PDFBOX-5537:
--

 Summary: PDFMergerUtility: partioned memory settings no lomnger 
needed
 Key: PDFBOX-5537
 URL: https://issues.apache.org/jira/browse/PDFBOX-5537
 Project: PDFBox
  Issue Type: Improvement
  Components: Utilities
Affects Versions: 3.0.0 PDFBox
Reporter: Andreas Lehmkühler
Assignee: Andreas Lehmkühler
 Fix For: 3.0.0 PDFBox


PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is safe 
to remove the partioned memory settings from the legeacy merge mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627028#comment-17627028
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1904968 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904968 ]

PDFBOX-4892: removed obviously mixed up method

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627026#comment-17627026
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1904967 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904967 ]

PDFBOX-4892: revert accidentically committed code

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-11-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627025#comment-17627025
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1904966 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904966 ]

PDFBOX-4892: removed obviously mixed up method

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org