Axel Howind created PDFBOX-5997:
-----------------------------------
Summary: avoid creation of temporary objects when parsing hex
values
Key: PDFBOX-5997
URL: https://issues.apache.org/jira/browse/PDFBOX-5997
Project: PDFBox
Issue Type: Improvement
Reporter: Axel Howind
Attachments:
avoid_the_creation_of_temporary_string_instances_when_parsing_hex_values_version1.patch
There currently are two places where hex numbers are parsed in PDFBox, the Hex
and COSString classes. The current implementation instantiates several
temporary objects for each conversion:
1. trim() is called on the String, creating a copy if the String is not yet
trimmed.
2. a Stringbuilder is created containing the String and possibly a padding 0.
This has to copy the whole character arrangement every time.
3. for each pair of hex digits, substring() is called, creating a new String
instances (or looking it up in the String pool
I have created two different patches for this. One that also replaces the
Integer.parseInt() call and one that uses an overload of the method. Both
should be much more performant and reduce GC activity. You might want to run a
benchmark to decide which one to use.
version 1 also does not rely on exception handling which is inherently slow to
handle incorrect hex data. version two still uses exception handling, but
should nevertheless improve performance and reduce GC activity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]