[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854362#comment-17854362 ] Jonathan Prates commented on PDFBOX-5824: - Thank you both once again for the hard working on keeping this library. It solves the issue. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854126#comment-17854126 ] ASF subversion and git services commented on PDFBOX-5824: - Commit 1918261 from le...@apache.org in branch 'pdfbox/branches/3.0' [ https://svn.apache.org/r1918261 ] PDFBOX-5824: remove SmallMap optimisation based on a proposal from Jonathan Prates > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854125#comment-17854125 ] ASF subversion and git services commented on PDFBOX-5824: - Commit 1918260 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1918260 ] PDFBOX-5824: remove SmallMap optimisation based on a proposal from Jonathan Prates > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854124#comment-17854124 ] Andreas Lehmkühler commented on PDFBOX-5824: I ran some more tests and I can't find any substantial differences concerning memory consumption. I'm going to remove the SmallMap optimisation from the trunk and the 3.0 branch. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853750#comment-17853750 ] Tilman Hausherr commented on PDFBOX-5824: - It was introduced in PDFBOX-3284. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853748#comment-17853748 ] Jonathan Prates commented on PDFBOX-5824: - Thank you both. SmallMap seems to be part of some legacy optimisation. I believe removing it can be beneficial even for smaller hash maps. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853727#comment-17853727 ] Andreas Lehmkühler commented on PDFBOX-5824: I tend to remove the SmallMap, I'm still running some tests > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853675#comment-17853675 ] Tilman Hausherr commented on PDFBOX-5824: - I support it, I will have the time in a few days unless Andreas or another does it first. I'll put it on my todo list. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Priority: Minor > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853668#comment-17853668 ] Jonathan Prates commented on PDFBOX-5824: - [~tilman] [~lehmi] can you please have a look on this change? We've been running a patched version in production for a few weeks and the results are consistent. Also, this patch does not introduce any breaking change, meaning, unless explicitly set, the behaviour stays the same. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Priority: Minor > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property
[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848182#comment-17848182 ] Jonathan Prates commented on PDFBOX-5824: - Patch used https://github.com/apache/pdfbox/pull/196 > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > -- > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel >Affects Versions: 3.0.3 PDFBox, 4.0.0 >Reporter: Jonathan Prates >Priority: Minor > Attachments: Screenshot 2024-05-21 at 11.00.25.jpg > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap, ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org