[ https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Prates updated PDFBOX-5824: ------------------------------------ Description: [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] controls which Map class is used to optimize memory usage. By default, a SmallMap is used. However, if the number of items in a COSDictionary reaches the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to a LinkedHashMap. For larger documents, where the COSDictionary is expected to be substantial bigger than this limit, this copying occurs frequently. Additionally, [SmallMap.keySet is not efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. The attached screenshot shows pdfbox performance with SmallMap (in red) versus using LinkedHashMap and ignoring the threshold (in green). *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System property?* If set to 0, LinkedHashMap would be used. If not set, it would default to the current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. was: [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] controls which Map class is used to optimize memory usage. By default, a SmallMap is used. However, if the number of items in a COSDictionary reaches the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to a LinkedHashMap. For larger documents, where the COSDictionary is expected to be substantial, this copying occurs frequently. Additionally, [SmallMap.keySet is not efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. The attached screenshot shows pdfbox performance with SmallMap (in red) versus using LinkedHashMap and ignoring the threshold (in green). *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System property?* If set to 0, LinkedHashMap would be used. If not set, it would default to the current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. > Allow COSDictionary.MAP_THRESHOLD to be defined as System property > ------------------------------------------------------------------ > > Key: PDFBOX-5824 > URL: https://issues.apache.org/jira/browse/PDFBOX-5824 > Project: PDFBox > Issue Type: Improvement > Components: PDModel > Affects Versions: 3.0.3 PDFBox, 4.0.0 > Reporter: Jonathan Prates > Priority: Minor > Attachments: Screenshot 2024-05-21 at 11.00.25.png > > > [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54] > controls which Map class is used to optimize memory usage. By default, a > SmallMap is used. However, if the number of items in a COSDictionary reaches > the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied > |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to > a LinkedHashMap. > For larger documents, where the COSDictionary is expected to be substantial > bigger than this limit, this copying occurs frequently. Additionally, > [SmallMap.keySet is not > efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281]. > The attached screenshot shows pdfbox performance with SmallMap (in red) > versus using LinkedHashMap and ignoring the threshold (in green). > *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System > property?* > If set to 0, LinkedHashMap would be used. If not set, it would default to the > current MAP_THRESHOLD value and SmallMap, not changing the current behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org