[jira] [Updated] (PDFBOX-5526) Apply subsampling and region to masks
[ https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5526: --- Fix Version/s: 3.0.0 PDFBox > Apply subsampling and region to masks > - > > Key: PDFBOX-5526 > URL: https://issues.apache.org/jira/browse/PDFBOX-5526 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.26 >Reporter: Velislava Yanchina >Assignee: Andreas Lehmkühler >Priority: Trivial > Fix For: 2.0.28, 3.0.0 PDFBox > > > When {{{}{}}}`PDImageXObject.{{{}getImage()`{}}} gets invoked with > subsampling and region, internally it loads the entire mask into memory: > [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465] > and applies the entire mask on the subsampled image. > Which is extra work and can cause `OOM` exceptions. > The proposed optimisation is to pass `region` and `subsampling` params to > `PDImageXObject.getOpaqueImage() `here - > [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548] > such that masks are also subsampled before applying them... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-5526) Apply subsampling and region to masks
[ https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-5526: -- Assignee: Andreas Lehmkühler > Apply subsampling and region to masks > - > > Key: PDFBOX-5526 > URL: https://issues.apache.org/jira/browse/PDFBOX-5526 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.26 >Reporter: Velislava Yanchina >Assignee: Andreas Lehmkühler >Priority: Trivial > Fix For: 2.0.28 > > > When {{{}{}}}`PDImageXObject.{{{}getImage()`{}}} gets invoked with > subsampling and region, internally it loads the entire mask into memory: > [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465] > and applies the entire mask on the subsampled image. > Which is extra work and can cause `OOM` exceptions. > The proposed optimisation is to pass `region` and `subsampling` params to > `PDImageXObject.getOpaqueImage() `here - > [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548] > such that masks are also subsampled before applying them... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5479) PDFTextStripper needs 1GB heap for a 3.6 MB pdf
[ https://issues.apache.org/jira/browse/PDFBOX-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5479: --- Issue Type: Improvement (was: Bug) > PDFTextStripper needs 1GB heap for a 3.6 MB pdf > --- > > Key: PDFBOX-5479 > URL: https://issues.apache.org/jira/browse/PDFBOX-5479 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction >Affects Versions: 2.0.26 > Environment: JDK11.0.2 on MacOS 12.4 >Reporter: Manfred Schauer >Priority: Minor > Attachments: heapDump.png, x.pdf > > > Extracting text from the attached x.pdf: > PDDocument pdDocument = PDDocument.load(new File("/tmp/x.pdf")); > PDFTextStripper stripper = new PDFTextStripper(); > stripper.getText(pdDocument); > succeeds with -Xmx1G but throws OOME with -Xmx900m > Heapdump shows 2923 instances of TrueTypeFont, PDRessources.cache contains > SoftReferences to lots of fonts keyed by different COSObjects; -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5532) COSString field non-ascii characters
[ https://issues.apache.org/jira/browse/PDFBOX-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler closed PDFBOX-5532. -- Resolution: Not A Problem Closed as this is not a bug. Please join the [users-mailing list|https://pdfbox.apache.org/mailinglists.html] for further discussions. > COSString field non-ascii characters > > > Key: PDFBOX-5532 > URL: https://issues.apache.org/jira/browse/PDFBOX-5532 > Project: PDFBox > Issue Type: Bug >Reporter: David >Priority: Major > Labels: how-to > > > Hello, > I am reading a pdf document but in the COSString field non-ascii characters > are being retrieved. What can be the motive? I am using version > pdfbox-2.0.24.jar > This would be an example of the pdf document parsed: > {code} > COSInt\{50} > COSInt\{0} > PDFOperator\{Td} > COSString\{åÅÕãÁâ@} > PDFOperator\{Tj} > COSFloat\{770.18} > COSInt\{0} > PDFOperator\{Td} > COSString\{×–Ž–©@} > PDFOperator\{Tj} > COSFloat\{520.21} > COSInt\{0} > {code} > Function java: > {code} > public static PDDocument replaceText(PDDocument document, String > searchString, String replacement) throws IOException { > > PDPageTree pages = document.getDocumentCatalog().getPages(); > for (PDPage page : pages) { > > PDFStreamParser parser = new PDFStreamParser(page); > parser.parse(); > List tokens = parser.getTokens(); > for (int j = 0; j < tokens.size(); j++) { > Object next = tokens.get(j); > > if (next instanceof Operator) { > Operator op = (Operator) next; > >if (op.getName().equals("Tj")) { > COSString previous = (COSString) > tokens.get(j - 1); > String string = previous.getString(); > System.out.println("previous:=" + string); > > > if (string.equals(searchString)){ >COSString sx = new > COSString(replacement); > previous.setValue(sx.getBytes()); > > } > } > } > } > // now that the tokens are updated we will replace the > page content stream. > PDStream updatedStream = new PDStream(document); > OutputStream out = updatedStream.createOutputStream(); > ContentStreamWriter tokenWriter = new > ContentStreamWriter(out); > tokenWriter.writeTokens(tokens); > page.setContents(updatedStream); > out.close(); > > > } > return document; > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5532) COSString field non-ascii characters
[ https://issues.apache.org/jira/browse/PDFBOX-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5532: --- Labels: how-to (was: ) > COSString field non-ascii characters > > > Key: PDFBOX-5532 > URL: https://issues.apache.org/jira/browse/PDFBOX-5532 > Project: PDFBox > Issue Type: Bug >Reporter: David >Priority: Major > Labels: how-to > > > Hello, > I am reading a pdf document but in the COSString field non-ascii characters > are being retrieved. What can be the motive? I am using version > pdfbox-2.0.24.jar > This would be an example of the pdf document parsed: > {code} > COSInt\{50} > COSInt\{0} > PDFOperator\{Td} > COSString\{åÅÕãÁâ@} > PDFOperator\{Tj} > COSFloat\{770.18} > COSInt\{0} > PDFOperator\{Td} > COSString\{×–Ž–©@} > PDFOperator\{Tj} > COSFloat\{520.21} > COSInt\{0} > {code} > Function java: > {code} > public static PDDocument replaceText(PDDocument document, String > searchString, String replacement) throws IOException { > > PDPageTree pages = document.getDocumentCatalog().getPages(); > for (PDPage page : pages) { > > PDFStreamParser parser = new PDFStreamParser(page); > parser.parse(); > List tokens = parser.getTokens(); > for (int j = 0; j < tokens.size(); j++) { > Object next = tokens.get(j); > > if (next instanceof Operator) { > Operator op = (Operator) next; > >if (op.getName().equals("Tj")) { > COSString previous = (COSString) > tokens.get(j - 1); > String string = previous.getString(); > System.out.println("previous:=" + string); > > > if (string.equals(searchString)){ >COSString sx = new > COSString(replacement); > previous.setValue(sx.getBytes()); > > } > } > } > } > // now that the tokens are updated we will replace the > page content stream. > PDStream updatedStream = new PDStream(document); > OutputStream out = updatedStream.createOutputStream(); > ContentStreamWriter tokenWriter = new > ContentStreamWriter(out); > tokenWriter.writeTokens(tokens); > page.setContents(updatedStream); > out.close(); > > > } > return document; > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-5531) wrong image data is extracted from PDF having single image
[ https://issues.apache.org/jira/browse/PDFBOX-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-5531. --- Resolution: Cannot Reproduce > wrong image data is extracted from PDF having single image > -- > > Key: PDFBOX-5531 > URL: https://issues.apache.org/jira/browse/PDFBOX-5531 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.26 >Reporter: Komal >Priority: Major > > Dear Concerned, > We are trying to extract image from PDF having single image with following > properties: CCITTFaxDecode decoded G4 compression, 150 dpi but when following > code of PDFBox is used than we get LZW image with 96 dpi > PDDocument document = PDDocument.load(new > File("D:\\extractImage\\in\\20211125174048BT Exception Documents.pdf")); > PDPageTree list = document.getPages(); > for (PDPage page : list) { > PDResources pdResources = page.getResources(); > for (COSName c : pdResources.getXObjectNames()) { > PDXObject o = pdResources.getXObject(c); > if (o instanceof > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) { > BufferedImage img= > ((org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject)o).getImage(); > } > } > } > > Also we we try to get raw stream byte data of image using following method , > the byte array coming is incorrect. > PDPage page1 = reader.getPage(pageNumber-1); > PDResources pdResources = page1.getResources(); > for (COSName c : pdResources.getXObjectNames()) { > PDXObject o = pdResources.getXObject(c); > PDImageXObject ob = (PDImageXObject)o; > ImageXObject xObj1 = new ImageXObject(); > xObj1.xObject = (PDImageXObject) o; > COSStream imageStream = ob.getCOSObject(); > PDStream stream = (new PDStream(imageStream)); > // BufferedImage image = ob.getImage(); > byte[] streamDataBuffer = stream.toByteArray(); > > kindly provide a method which can return black and white image object and > image raw stream byte array. > Thanks in advance. > Regards, > Komal Walia -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
PDFBOX-5538: introduction of a new interface/functional interface to handle a stream cache
Hi, the new on demand parser doesn't use the ScratchFileBuffer anymore and my idea was to overhaul/remove the usage of the ScratchFileBuffer for the creation of new COSStreams as well. My plan was to wait for the 4.0 release. A couple of days ago I realize it might be a good idea to introduce an interface to make the usage of that stream a little bit more flexible. An I'd like to do it in 3.0. In the end it wasn't that hard to do, see PDFBOX-5538 for details. But I had to adjust some of the public method signatures such as those load-methods using MemoryUsageSetting as parameter. The main benefit is that the usage of the stream cache within the parser is decoupled from the actual implementation. Furthermore users have the opportunity to implement their own cache. Most likely we don't have to change the signature of the loader methods in a possible 4.0 version. Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF
[ https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627181#comment-17627181 ] Andreas Lehmkühler edited comment on PDFBOX-5483 at 11/1/22 2:18 PM: - I've refactored some of the ScratchFile-stuff and had to adjust the method signatures of the Loader, see PDFBOX-5538 I'd prefer to do it now instead of waiting for 4.0 to change the signature another time. was (Author: lehmi): I've refactored some of the ScratchFile-stuff and had to adjust the method signatures of the Loader, see PDFBOX-5538 > Replace methods using an InputStream from Loader.loadPDF > > > Key: PDFBOX-5483 > URL: https://issues.apache.org/jira/browse/PDFBOX-5483 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > As discussed on dev@pdfbox > {quote} > We have to remove the loadPDF variants using InputStream and replace them > with RandomAccessRead. > If it comes to InputStreams users have to decide how to procide: > * copy the InputStream to memory by using RandomAccessReadBuffer > * copy the InputStream to a file and use RandomAccessReadBufferedFile or > RandomAccessReadMemoryMappedFile > This would make it more transparent what happens under the hood when using > the different kinds of loadPDF methods: > * a byte array as source is already in memory and the obvious choice is to > use RandomAccessReadBuffer as a wrapper > * a file as source targets a local file and the most obvious choice is to use > RandomAccessReadBufferedFile as a wrapper. We should document that as the > other alternative RandomAccessReadMemoryMappedFile is offered in this case > * RandomAccessRead as source is the most obvious one and the user decides how > to create it. Additionally is ist possible to implement some own caching > loading and/or mechanism > {quote} > see PDFBOX-5462 and [High memory usage with pdfbox > 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF
[ https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627181#comment-17627181 ] Andreas Lehmkühler commented on PDFBOX-5483: I've refactored some of the ScratchFile-stuff and had to adjust the method signatures of the Loader, see PDFBOX-5538 > Replace methods using an InputStream from Loader.loadPDF > > > Key: PDFBOX-5483 > URL: https://issues.apache.org/jira/browse/PDFBOX-5483 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > As discussed on dev@pdfbox > {quote} > We have to remove the loadPDF variants using InputStream and replace them > with RandomAccessRead. > If it comes to InputStreams users have to decide how to procide: > * copy the InputStream to memory by using RandomAccessReadBuffer > * copy the InputStream to a file and use RandomAccessReadBufferedFile or > RandomAccessReadMemoryMappedFile > This would make it more transparent what happens under the hood when using > the different kinds of loadPDF methods: > * a byte array as source is already in memory and the obvious choice is to > use RandomAccessReadBuffer as a wrapper > * a file as source targets a local file and the most obvious choice is to use > RandomAccessReadBufferedFile as a wrapper. We should document that as the > other alternative RandomAccessReadMemoryMappedFile is offered in this case > * RandomAccessRead as source is the most obvious one and the user decides how > to create it. Additionally is ist possible to implement some own caching > loading and/or mechanism > {quote} > see PDFBOX-5462 and [High memory usage with pdfbox > 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches
[ https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627179#comment-17627179 ] Andreas Lehmkühler commented on PDFBOX-5538: I've replaced all occurrences of ScratchFile/ScratchFileBuffer/MemoryUsageSettings outside of org.apache.pdfbox.io with the new interface/functional interface. > Introduce interface/functional interface to handle stream caches > > > Key: PDFBOX-5538 > URL: https://issues.apache.org/jira/browse/PDFBOX-5538 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Writing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > The parser of PDFBox uses the current implementation of > ScratchFile/ScratchFileBuffer exclusively as cache for streams when > creating/writing COSStreams. > Using an interface to handle those caches > * makes the parser code of PDFBox independent of a specific implementation of > that cache > * gives any user the opportunity to implement some other kind of a cache > * the current implementation of ScratchFile/ScratchFileBuffer can still be > used as default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches
[ https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-5538. Resolution: Fixed > Introduce interface/functional interface to handle stream caches > > > Key: PDFBOX-5538 > URL: https://issues.apache.org/jira/browse/PDFBOX-5538 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Writing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > The parser of PDFBox uses the current implementation of > ScratchFile/ScratchFileBuffer exclusively as cache for streams when > creating/writing COSStreams. > Using an interface to handle those caches > * makes the parser code of PDFBox independent of a specific implementation of > that cache > * gives any user the opportunity to implement some other kind of a cache > * the current implementation of ScratchFile/ScratchFileBuffer can still be > used as default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches
[ https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627178#comment-17627178 ] ASF subversion and git services commented on PDFBOX-5538: - Commit 1904976 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904976 ] PDFBOX-5538: update comment > Introduce interface/functional interface to handle stream caches > > > Key: PDFBOX-5538 > URL: https://issues.apache.org/jira/browse/PDFBOX-5538 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Writing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > The parser of PDFBox uses the current implementation of > ScratchFile/ScratchFileBuffer exclusively as cache for streams when > creating/writing COSStreams. > Using an interface to handle those caches > * makes the parser code of PDFBox independent of a specific implementation of > that cache > * gives any user the opportunity to implement some other kind of a cache > * the current implementation of ScratchFile/ScratchFileBuffer can still be > used as default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5534) Remove finalize from ScratchFileBuffer
[ https://issues.apache.org/jira/browse/PDFBOX-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-5534. Resolution: Fixed > Remove finalize from ScratchFileBuffer > -- > > Key: PDFBOX-5534 > URL: https://issues.apache.org/jira/browse/PDFBOX-5534 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 2.0.27, 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 2.0.28, 3.0.0 PDFBox > > > The usage of finalize is discouraged so that it is a good idea to remove it. > I've found a way to do so for ScratchFileBuffer. All created buffers are > collected within ScratchFile. Once a buffer is closed it is removed from the > collection. If ScratchFile is closed when closing the pdf all remaining > buffers are closed as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches
[ https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627174#comment-17627174 ] ASF subversion and git services commented on PDFBOX-5538: - Commit 1904975 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904975 ] PDFBOX-5538: use introduced interface/functional interface to handle stream cache > Introduce interface/functional interface to handle stream caches > > > Key: PDFBOX-5538 > URL: https://issues.apache.org/jira/browse/PDFBOX-5538 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Writing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > The parser of PDFBox uses the current implementation of > ScratchFile/ScratchFileBuffer exclusively as cache for streams when > creating/writing COSStreams. > Using an interface to handle those caches > * makes the parser code of PDFBox independent of a specific implementation of > that cache > * gives any user the opportunity to implement some other kind of a cache > * the current implementation of ScratchFile/ScratchFileBuffer can still be > used as default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches
[ https://issues.apache.org/jira/browse/PDFBOX-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627150#comment-17627150 ] ASF subversion and git services commented on PDFBOX-5538: - Commit 1904974 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904974 ] PDFBOX-5538: introduce new interface RandomAccessStreamCache including functional interface StreamCacheCreateFunction > Introduce interface/functional interface to handle stream caches > > > Key: PDFBOX-5538 > URL: https://issues.apache.org/jira/browse/PDFBOX-5538 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Writing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > The parser of PDFBox uses the current implementation of > ScratchFile/ScratchFileBuffer exclusively as cache for streams when > creating/writing COSStreams. > Using an interface to handle those caches > * makes the parser code of PDFBox independent of a specific implementation of > that cache > * gives any user the opportunity to implement some other kind of a cache > * the current implementation of ScratchFile/ScratchFileBuffer can still be > used as default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5538) Introduce interface/functional interface to handle stream caches
Andreas Lehmkühler created PDFBOX-5538: -- Summary: Introduce interface/functional interface to handle stream caches Key: PDFBOX-5538 URL: https://issues.apache.org/jira/browse/PDFBOX-5538 Project: PDFBox Issue Type: Improvement Components: Parsing, Writing Affects Versions: 3.0.0 PDFBox Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Fix For: 3.0.0 PDFBox The parser of PDFBox uses the current implementation of ScratchFile/ScratchFileBuffer exclusively as cache for streams when creating/writing COSStreams. Using an interface to handle those caches * makes the parser code of PDFBox independent of a specific implementation of that cache * gives any user the opportunity to implement some other kind of a cache * the current implementation of ScratchFile/ScratchFileBuffer can still be used as default -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no longer needed
[ https://issues.apache.org/jira/browse/PDFBOX-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-5537. Resolution: Fixed > PDFMergerUtility: partioned memory settings no longer needed > > > Key: PDFBOX-5537 > URL: https://issues.apache.org/jira/browse/PDFBOX-5537 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is > safe to remove the partioned memory settings from the legeacy merge mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no longer needed
[ https://issues.apache.org/jira/browse/PDFBOX-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627063#comment-17627063 ] ASF subversion and git services commented on PDFBOX-5537: - Commit 1904969 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904969 ] PDFBOX-5537: remove partioned memory settings > PDFMergerUtility: partioned memory settings no longer needed > > > Key: PDFBOX-5537 > URL: https://issues.apache.org/jira/browse/PDFBOX-5537 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is > safe to remove the partioned memory settings from the legeacy merge mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no longer needed
[ https://issues.apache.org/jira/browse/PDFBOX-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5537: --- Summary: PDFMergerUtility: partioned memory settings no longer needed (was: PDFMergerUtility: partioned memory settings no lomnger needed) > PDFMergerUtility: partioned memory settings no longer needed > > > Key: PDFBOX-5537 > URL: https://issues.apache.org/jira/browse/PDFBOX-5537 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is > safe to remove the partioned memory settings from the legeacy merge mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5537) PDFMergerUtility: partioned memory settings no lomnger needed
Andreas Lehmkühler created PDFBOX-5537: -- Summary: PDFMergerUtility: partioned memory settings no lomnger needed Key: PDFBOX-5537 URL: https://issues.apache.org/jira/browse/PDFBOX-5537 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 3.0.0 PDFBox Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Fix For: 3.0.0 PDFBox PDFBox no longer uses a ScatchFileBuffer when reading a pdf so that it is safe to remove the partioned memory settings from the legeacy merge mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627028#comment-17627028 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1904968 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904968 ] PDFBOX-4892: removed obviously mixed up method > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627026#comment-17627026 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1904967 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904967 ] PDFBOX-4892: revert accidentically committed code > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627025#comment-17627025 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1904966 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904966 ] PDFBOX-4892: removed obviously mixed up method > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org