[ https://issues.apache.org/jira/browse/PDFBOX-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600344#comment-17600344 ]
Michael Klink commented on PDFBOX-5499: --------------------------------------- Just to make sure: Your code measures not only _parsing_ but also _closing_ the document (by means of the try-with-resources feature). Please test whether the performance issue really is in the parsing and not in the closing. > Performance issue since 2.0.18 > ------------------------------ > > Key: PDFBOX-5499 > URL: https://issues.apache.org/jira/browse/PDFBOX-5499 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 2.0.18 > Reporter: Thomas Debray Luyat > Priority: Major > Attachments: image-2022-09-05-12-48-04-608.png > > > Our PDF is parsed in less than 200ms in 2.0.18 and more then 8 seconds in > 2.0.19. The same issue is still there in 2.0.26. > > In version 2.0.19, SmallMap has been introduced. We're facing a performance > issue since this modification. > !image-2022-09-05-12-48-04-608.png|width=968,height=377! > We patch our code to just replace the SmallMap implementation like this: > {code:java} > package org.apache.pdfbox.util; > import java.util.LinkedHashMap; > public class SmallMap<K, V> extends LinkedHashMap<K, V> { > // nothing : use the standard LinkedHashMap > }{code} > And the performance issue disappear. > Our test is really simple: > {code:java} > long start = System.currentTimeMillis(); > try (PDDocument document = PDDocument.load(new File(inFile))) { > // nothing : only parsing is evaluated > } > long duration = System.currentTimeMillis() -start; > assertTrue(duration < 500);{code} > > I can understand that the SmallMap can solve issues in some cases, but it is > possible to implement a factory to create this map and then allow to setup > which Map implementation we want to use? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org