[
https://issues.apache.org/jira/browse/PDFBOX-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler reassigned PDFBOX-6002:
------------------------------------------
Assignee: Andreas Lehmkühler
> change parse methods to take CharSequence argument
> --------------------------------------------------
>
> Key: PDFBOX-6002
> URL: https://issues.apache.org/jira/browse/PDFBOX-6002
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Axel Howind
> Assignee: Andreas Lehmkühler
> Priority: Major
> Attachments: image-2025-05-02-07-00-52-161.png
>
>
> PDFBox parsing works on Strings in almost all places. Often, StringBuilder
> instances are created to prepare a fragment to parse, and then another parse
> method is called using the result of calling toString() on the StringBuilder.
> If the parse methods were changed to take CharSequence instead, the
> StringBuilder instance could be passed on without creating a temporary String
> instance. This would reduce memory consumption and load on the GC.
> I did some profiling using the async profiler, and for example in
> BaseParser.parseCOSNumber() about 25% of the runtime is spent in
> StringBuilder().toString() which would be completely eliminated if the parse
> methods worked on CharSequences instead of Strings (see image):
> !image-2025-05-02-07-00-52-161.png!
> A consequence would be that user code needs to be recompiled (no code changes
> on the user side) against the new version because the method signature
> changes.
> An alternative approach is to introduce new methods with the prefix CS, like
> parseCOSNumberCS(), and to delegate parseCOSNumber() to the new method. This
> would be a PDFBox 3 compatible change.
> Please let me know if, and if yes, which version of a patch you would
> possibly accept. I'd then create incremental patches to provide this
> functionality.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]