[ https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836373#comment-16836373 ]
ASF subversion and git services commented on PDFBOX-4539: --------------------------------------------------------- Commit 1859006 from Tilman Hausherr in branch 'pdfbox/branches/issue45' [ https://svn.apache.org/r1859006 ] PDFBOX-4539: don't construct new decoder each time, as suggested by Jonathan > Cache CharsetDecoder > -------------------- > > Key: PDFBOX-4539 > URL: https://issues.apache.org/jira/browse/PDFBOX-4539 > Project: PDFBox > Issue Type: Improvement > Components: Parsing > Affects Versions: 2.0.14 > Reporter: Jonathan > Priority: Minor > Labels: performance > Fix For: 2.0.16 > > > We were using PDFBox to parse and process a large number of PDFs, which could > potentially contains thousands of pages in total, so performance mattered to > us. > Thus, we'd like to suggest to cache the CharsetDecoder, which is currently > instantiated on each call of `isValidUTF8(byte[])`. > Our suggestion in BaseParser.java > {code:java} > private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder(); > /** > * Returns true if a byte sequence is valid UTF-8. > */ > private boolean isValidUTF8(byte[] input) > { > try > { > csUTF_8.decode(ByteBuffer.wrap(input)); > return true; > } > catch (CharacterCodingException e) > { > return false; > } > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org