[jira] [Commented] (PDFBOX-5526) Apply subsampling and region to masks

2022-10-30 Thread Velislava Yanchina (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626407#comment-17626407
 ] 

Velislava Yanchina commented on PDFBOX-5526:


Thanks, created a PR

> Apply subsampling and region to masks
> -
>
> Key: PDFBOX-5526
> URL: https://issues.apache.org/jira/browse/PDFBOX-5526
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.26
>Reporter: Velislava Yanchina
>Priority: Trivial
> Fix For: 2.0.28
>
>
> When {{{}{}}}`PDImageXObject.{{{}getImage()`{}}} gets invoked with 
> subsampling and region, internally it loads the entire mask into memory: 
> [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465]
> and applies the entire mask on the subsampled image.
> Which is extra work and can cause `OOM` exceptions. 
> The proposed optimisation is to pass `region` and `subsampling` params to 
> `PDImageXObject.getOpaqueImage() `here - 
> [https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548]
> such that masks are also subsampled before applying them...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[GitHub] [pdfbox] velyan opened a new pull request, #147: [PDFBOX-5526] Adding subsampling for masks

2022-10-30 Thread GitBox


velyan opened a new pull request, #147:
URL: https://github.com/apache/pdfbox/pull/147

   See[ Jira ticket 
](https://issues.apache.org/jira/browse/PDFBOX-5526?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17618666#comment-17618666)
   
   When `PDImageXObject.getImage()` gets invoked with subsampling and region, 
internally it loads the entire mask into memory: 
https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L465
   
   and applies the entire mask on the subsampled image.
   
   Which is extra work and can cause `OOM` exceptions. 
   
   The proposed optimisation is to pass `region` and `subsampling` params to 
`PDImageXObject.getOpaqueImage() `here - 
https://github.com/apache/pdfbox/blob/961c052d52dd9ab2dd3d7cd762a5046e5cc85a91/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PDImageXObject.java#L548
   
   such that masks are also subsampled before applying them...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5536) Refactor OperatorProcessor

2022-10-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626259#comment-17626259
 ] 

ASF subversion and git services commented on PDFBOX-5536:
-

Commit 1904937 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904937 ]

PDFBOX-5536: use main memory instead of temp files for tests

> Refactor OperatorProcessor
> --
>
> Key: PDFBOX-5536
> URL: https://issues.apache.org/jira/browse/PDFBOX-5536
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> I've stumbled upon the following sonar issue
> {code}
> Child class fields should not shadow parent class fields
> {code}
> and assumed it wasn't a big deal to fix it. It isn't but a lot of classes are 
> affected so that I've decide to create a new issue instead of using 
> PDFBOX-4892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5536) Refactor OperatorProcessor

2022-10-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626229#comment-17626229
 ] 

ASF subversion and git services commented on PDFBOX-5536:
-

Commit 1904934 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904934 ]

PDFBOX-5536: add tests

> Refactor OperatorProcessor
> --
>
> Key: PDFBOX-5536
> URL: https://issues.apache.org/jira/browse/PDFBOX-5536
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> I've stumbled upon the following sonar issue
> {code}
> Child class fields should not shadow parent class fields
> {code}
> and assumed it wasn't a big deal to fix it. It isn't but a lot of classes are 
> affected so that I've decide to create a new issue instead of using 
> PDFBOX-4892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5536) Refactor OperatorProcessor

2022-10-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626225#comment-17626225
 ] 

ASF subversion and git services commented on PDFBOX-5536:
-

Commit 1904933 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904933 ]

PDFBOX-5536: sonar fix

> Refactor OperatorProcessor
> --
>
> Key: PDFBOX-5536
> URL: https://issues.apache.org/jira/browse/PDFBOX-5536
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> I've stumbled upon the following sonar issue
> {code}
> Child class fields should not shadow parent class fields
> {code}
> and assumed it wasn't a big deal to fix it. It isn't but a lot of classes are 
> affected so that I've decide to create a new issue instead of using 
> PDFBOX-4892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5483) Replace methods using an InputStream from Loader.loadPDF

2022-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626221#comment-17626221
 ] 

Andreas Lehmkühler commented on PDFBOX-5483:


[~mkl] I totally understand your point of view, but sorry, I don't like it. Let 
me explain why.

The code of {{org.apache.pdfbox.io}} is still a work in progress. Most likely 
the following features will be added in near future:
* setting the buffersize for {{org.apache.pdfbox.io.RandomAccessReadBuffer}}
* setting the buffersize for 
{{org.apache.pdfbox.io.RandomAccessReadBufferedFile}}
* paging support for memory mapped files so that we might want to set the 
buffer size as well for 
{{org.apache.pdfbox.io.RandomAccessReadMemoryMappedFile}}
* I'm thinking of a replacement for the current implementation and usage of 
{{org.apache.pdfbox.io.ScratchFile}}. Something that isn't burried somewhere in 
org.apache.pdfbox.cos

Maybe there will be some other implementations of 
{{org.apache.pdfbox.io.RandomAccessRead}} and I'm pretty sure there are other 
things I can't imagine now.

However, if the code is located somewhere in the parser and/or loader all of 
those modifications require changes within code of the parser/loader and 
depending on the 
kind of changes different method signatures. IMHO that code should no be 
responsible for the management of the source of the data. That stuff belongs to 
{{org.apache.pdfbox.io}}.

Saying that, if someone wants to provide some convenience code it should be 
added somewhere within {{org.apache.pdfbox.io}}.



> Replace methods using an InputStream from Loader.loadPDF
> 
>
> Key: PDFBOX-5483
> URL: https://issues.apache.org/jira/browse/PDFBOX-5483
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As discussed on dev@pdfbox
> {quote}
> We have to remove the loadPDF variants using InputStream and replace them 
> with RandomAccessRead.
> If it comes to InputStreams users have to decide how to procide:
> * copy the InputStream to memory by using RandomAccessReadBuffer
> * copy the InputStream to a file and use RandomAccessReadBufferedFile or 
> RandomAccessReadMemoryMappedFile
> This would make it more transparent what happens under the hood when using 
> the different kinds of loadPDF methods:
> * a byte array as source is already in memory and the obvious choice is to 
> use RandomAccessReadBuffer as a wrapper
> * a file as source targets a local file and the most obvious choice is to use 
> RandomAccessReadBufferedFile as a wrapper. We should document that as the 
> other alternative RandomAccessReadMemoryMappedFile is offered in this case
> * RandomAccessRead as source is the most obvious one and the user decides how 
> to create it. Additionally is ist possible to implement some own caching 
> loading and/or mechanism
> {quote}
> see PDFBOX-5462 and [High memory usage with pdfbox 
> 3|https://lists.apache.org/thread/6mmgp23v8b2yztj4hghkgkd14s1gzs8g] as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5034) Add paging support for memory mapped files

2022-10-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5034:
---
Summary: Add paging support for memory mapped files  (was: Add paging 
support fpr memory mapped files)

> Add paging support for memory mapped files
> --
>
> Key: PDFBOX-5034
> URL: https://issues.apache.org/jira/browse/PDFBOX-5034
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Priority: Major
> Fix For: 4.0.0
>
>
> PDFBOX-4855 introduced support for memory mapped files, but there is still 
> some room for improvements. The current implementation doesn't support paging 
> (the whole file is mapped to memory) and therefore doesn't support files 
> bigger than Integer.MAX_VALUE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-10-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626214#comment-17626214
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1904931 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1904931 ]

PDFBOX-4892: sonar fix

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org