[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-12 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854362#comment-17854362
 ] 

Jonathan Prates commented on PDFBOX-5824:
-

Thank you both once again for the hard working on keeping this library. It 
solves the issue.

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854126#comment-17854126
 ] 

ASF subversion and git services commented on PDFBOX-5824:
-

Commit 1918261 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918261 ]

PDFBOX-5824: remove SmallMap optimisation based on a proposal from Jonathan 
Prates

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854125#comment-17854125
 ] 

ASF subversion and git services commented on PDFBOX-5824:
-

Commit 1918260 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918260 ]

PDFBOX-5824: remove SmallMap optimisation based on a proposal from Jonathan 
Prates

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854124#comment-17854124
 ] 

Andreas Lehmkühler commented on PDFBOX-5824:


I ran some more tests and I can't find any substantial differences concerning 
memory consumption. I'm going to remove the SmallMap optimisation from the 
trunk and the 3.0 branch.

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853750#comment-17853750
 ] 

Tilman Hausherr commented on PDFBOX-5824:
-

It was introduced in PDFBOX-3284.

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-10 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853748#comment-17853748
 ] 

Jonathan Prates commented on PDFBOX-5824:
-

Thank you both. SmallMap seems to be part of some legacy optimisation. I 
believe removing it can be beneficial even for smaller hash maps.

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-10 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853727#comment-17853727
 ] 

Andreas Lehmkühler commented on PDFBOX-5824:


I tend to remove the SmallMap, I'm still running some tests

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853675#comment-17853675
 ] 

Tilman Hausherr commented on PDFBOX-5824:
-

I support it, I will have the time in a few days unless Andreas or another does 
it first. I'll put it on my todo list.

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-06-10 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853668#comment-17853668
 ] 

Jonathan Prates commented on PDFBOX-5824:
-

[~tilman] [~lehmi] can you please have a look on this change? We've been 
running a patched version in production for a few weeks and the results are 
consistent. 

Also, this patch does not introduce any breaking change, meaning, unless 
explicitly set, the behaviour stays the same.

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848182#comment-17848182
 ] 

Jonathan Prates commented on PDFBOX-5824:
-

Patch used https://github.com/apache/pdfbox/pull/196

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org