[jira] [Commented] (PDFBOX-5526) Apply subsampling and region to masks
[ https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626407#comment-17626407 ] Velislava Yanchina commented on PDFBOX-5526: Thanks, created a PR > Apply subsampling and region to masks > - > > Key: PDFBOX-5526 > URL: https://issues.apache.org/jira/browse/PDFBOX-5526 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.26 >Reporter: Velislava Yanchina >Priority: Trivial > Fix For: 2.0.28 > > > When {{{}{}}}`PDImageXObject.{{{}getImage()`{}}} gets invoked with > subsampling and region, internally it loads the entire mask into memory: > [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465] > and applies the entire mask on the subsampled image. > Which is extra work and can cause `OOM` exceptions. > The proposed optimisation is to pass `region` and `subsampling` params to > `PDImageXObject.getOpaqueImage() `here - > [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548] > such that masks are also subsampled before applying them... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[GitHub] [pdfbox] velyan opened a new pull request, #147: [PDFBOX-5526] Adding subsampling for masks
velyan opened a new pull request, #147: URL: https://github.com/apache/pdfbox/pull/147 See[ Jira ticket ](https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17618666#comment-17618666) When `PDImageXObject.getImage()` gets invoked with subsampling and region, internally it loads the entire mask into memory: https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465 and applies the entire mask on the subsampled image. Which is extra work and can cause `OOM` exceptions. The proposed optimisation is to pass `region` and `subsampling` params to `PDImageXObject.getOpaqueImage() `here - https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548 such that masks are also subsampled before applying them... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5536) Refactor OperatorProcessor
[ https://issues.apache.org/jira/browse/PDFBOX-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626259#comment-17626259 ] ASF subversion and git services commented on PDFBOX-5536: - Commit 1904937 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904937 ] PDFBOX-5536: use main memory instead of temp files for tests > Refactor OperatorProcessor > -- > > Key: PDFBOX-5536 > URL: https://issues.apache.org/jira/browse/PDFBOX-5536 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > I've stumbled upon the following sonar issue > {code} > Child class fields should not shadow parent class fields > {code} > and assumed it wasn't a big deal to fix it. It isn't but a lot of classes are > affected so that I've decide to create a new issue instead of using > PDFBOX-4892 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5536) Refactor OperatorProcessor
[ https://issues.apache.org/jira/browse/PDFBOX-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626229#comment-17626229 ] ASF subversion and git services commented on PDFBOX-5536: - Commit 1904934 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904934 ] PDFBOX-5536: add tests > Refactor OperatorProcessor > -- > > Key: PDFBOX-5536 > URL: https://issues.apache.org/jira/browse/PDFBOX-5536 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > I've stumbled upon the following sonar issue > {code} > Child class fields should not shadow parent class fields > {code} > and assumed it wasn't a big deal to fix it. It isn't but a lot of classes are > affected so that I've decide to create a new issue instead of using > PDFBOX-4892 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5536) Refactor OperatorProcessor
[ https://issues.apache.org/jira/browse/PDFBOX-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626225#comment-17626225 ] ASF subversion and git services commented on PDFBOX-5536: - Commit 1904933 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904933 ] PDFBOX-5536: sonar fix > Refactor OperatorProcessor > -- > > Key: PDFBOX-5536 > URL: https://issues.apache.org/jira/browse/PDFBOX-5536 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > I've stumbled upon the following sonar issue > {code} > Child class fields should not shadow parent class fields > {code} > and assumed it wasn't a big deal to fix it. It isn't but a lot of classes are > affected so that I've decide to create a new issue instead of using > PDFBOX-4892 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF
[ https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626221#comment-17626221 ] Andreas Lehmkühler commented on PDFBOX-5483: [~mkl] I totally understand your point of view, but sorry, I don't like it. Let me explain why. The code of {{org.apache.pdfbox.io}} is still a work in progress. Most likely the following features will be added in near future: * setting the buffersize for {{org.apache.pdfbox.io.RandomAccessReadBuffer}} * setting the buffersize for {{org.apache.pdfbox.io.RandomAccessReadBufferedFile}} * paging support for memory mapped files so that we might want to set the buffer size as well for {{org.apache.pdfbox.io.RandomAccessReadMemoryMappedFile}} * I'm thinking of a replacement for the current implementation and usage of {{org.apache.pdfbox.io.ScratchFile}}. Something that isn't burried somewhere in org.apache.pdfbox.cos Maybe there will be some other implementations of {{org.apache.pdfbox.io.RandomAccessRead}} and I'm pretty sure there are other things I can't imagine now. However, if the code is located somewhere in the parser and/or loader all of those modifications require changes within code of the parser/loader and depending on the kind of changes different method signatures. IMHO that code should no be responsible for the management of the source of the data. That stuff belongs to {{org.apache.pdfbox.io}}. Saying that, if someone wants to provide some convenience code it should be added somewhere within {{org.apache.pdfbox.io}}. > Replace methods using an InputStream from Loader.loadPDF > > > Key: PDFBOX-5483 > URL: https://issues.apache.org/jira/browse/PDFBOX-5483 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler >Priority: Major > Fix For: 3.0.0 PDFBox > > > As discussed on dev@pdfbox > {quote} > We have to remove the loadPDF variants using InputStream and replace them > with RandomAccessRead. > If it comes to InputStreams users have to decide how to procide: > * copy the InputStream to memory by using RandomAccessReadBuffer > * copy the InputStream to a file and use RandomAccessReadBufferedFile or > RandomAccessReadMemoryMappedFile > This would make it more transparent what happens under the hood when using > the different kinds of loadPDF methods: > * a byte array as source is already in memory and the obvious choice is to > use RandomAccessReadBuffer as a wrapper > * a file as source targets a local file and the most obvious choice is to use > RandomAccessReadBufferedFile as a wrapper. We should document that as the > other alternative RandomAccessReadMemoryMappedFile is offered in this case > * RandomAccessRead as source is the most obvious one and the user decides how > to create it. Additionally is ist possible to implement some own caching > loading and/or mechanism > {quote} > see PDFBOX-5462 and [High memory usage with pdfbox > 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5034) Add paging support for memory mapped files
[ https://issues.apache.org/jira/browse/PDFBOX-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5034: --- Summary: Add paging support for memory mapped files (was: Add paging support fpr memory mapped files) > Add paging support for memory mapped files > -- > > Key: PDFBOX-5034 > URL: https://issues.apache.org/jira/browse/PDFBOX-5034 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Andreas Lehmkühler >Priority: Major > Fix For: 4.0.0 > > > PDFBOX-4855 introduced support for memory mapped files, but there is still > some room for improvements. The current implementation doesn't support paging > (the whole file is mapped to memory) and therefore doesn't support files > bigger than Integer.MAX_VALUE -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626214#comment-17626214 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1904931 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1904931 ] PDFBOX-4892: sonar fix > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org