I would like to report a performance regression when using StringSubstitutor 
with large strings that our application experienced after upgrading to v1.9+, 
however there’s no public signup for the ASF Jira anymore. I’m hoping my report 
here will suffice. If not, please create an account for me and I’ll file a 
proper ticket for it.

As of v1.9, StringSubstitutor no longer pre-converts the TextStringBuilder to a 
char[], see this ( 
https://github.com/apache/commons-text/commit/248af06171e14648e00ce0873c5f95e03041a6c7
 ) commit, and opts to reuse the TextStringBuilder API instead. A new default 
method ( 
https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/matcher/StringMatcher.java#L146
 ) was added to StringMatcher that takes CharSequence (which TextStringBuilder 
implements) to handle the conversion. However, it calls 
CharSequenceUtils.toCharArray(buffer), which is not aware of TextStringBuilder 
and cannot optimize the conversion to char[] since CharSequence has no way to 
do so, and it’s not a String (which does). When using a custom StringMatcher 
implementation that does not override this default method (as the stock 
matchers do), it results in a full copy of the CharSequence being made, which 
adds up very quickly when the text is large and lots of replacements are being 
made.

Methods of ours which used to take 3 seconds, now take upwards of a minute. 
Fortunately, our custom matcher is a simple OrStringMatcher (not provided 
out-of-the-box) which delegates to stock StringMatchers created by 
StringMatcherFactory.stringMatcher(…) which have their own optimized 
implementation of the method, so we were able to resolve this ourselves by 
overriding the method and delegating it directly to the optimized 
implementation of the stock matchers – thus bypassing the 
CharSequenceUtils.toCharArray(buffer) penalty completely.

But others may not be as fortunate. Perhaps the default method could be made 
aware of TextStringBuilder and use its package protected getBuffer() method 
instead? Or maybe there’s a better way to solve it.

Hopefully, I’ve explained it clearly enough. Please reach out with any further 
questions.

--
David Becker
Senior IT Engineer

*******************************************************************************************************************************************************************
Notice: This e-mail, including any attachment(s) and link(s), is confidential, 
proprietary and intended solely for the above-named individual(s). It may 
constitute non-public information and may contain information subject to 
certain legal privileges. If you are the intended recipient, your use of any 
confidential, proprietary or personal information may be restricted by federal 
and state privacy or other laws. Any unauthorized use of this communication by 
others is strictly prohibited and may be unlawful. If you have received this 
e-mail in error, do not open any attachment(s) or link(s). Please notify the 
sender immediately by replying to sender and then delete both this e-mail and 
any attachment(s). Thank you.

EMPLOYERS® provides workers compensation insurance through Employers Preferred 
Insurance Company, Employers Assurance Company, Employers Compensation 
Insurance Company and Employers Insurance Company of Nevada. EIG Services, Inc. 
(in California, dba EIG Insurance Services) is an affiliated agency and 
adjuster. 
*******************************************************************************************************************************************************************

Reply via email to