On Thu, 24 Aug 2023 10:38:57 GMT, Glavo <[email protected]> wrote:

>> I mainly made these optimizations:
>> 
>> * Avoid allocating `StringBuilder` when there are no characters in the URL 
>> that need to be encoded;
>> * Implement a fast path for UTF-8.
>> 
>> In addition to improving performance, these optimizations also reduce 
>> temporary objects:
>> 
>> * It no longer allocates any object when there are no characters in the URL 
>> that need to be encoded;
>> * The initial size of StringBuilder is larger to avoid expansion as much as 
>> possible;
>> * For UTF-8, the temporary `CharArrayWriter`, strings and byte arrays are no 
>> longer needed.
>> 
>> The results of the `URLEncodeDecode` benchmark:
>> 
>> 
>> Before:
>> Benchmark                       (count)  (maxLength)  (mySeed)  Mode  Cnt  
>> Score   Error  Units
>> URLEncodeDecode.testEncodeUTF8     1024         1024         3  avgt   15  
>> 5.587 ? 0.010  ms/op
>> 
>> After:
>> Benchmark                       (count)  (maxLength)  (mySeed)  Mode  Cnt  
>> Score   Error  Units
>> URLEncodeDecode.testEncodeUTF8     1024         1024         3  avgt   15  
>> 3.582 ? 0.054  ms/op
>> 
>> 
>> I also updated the tests to add more test cases.
>
> Glavo has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Remove UTF-8 fast path

Does your benchmark test a healthy mix of strings? Some that need encoding, 
some that don't (perhaps mostly weighted so that most inputs need encoding only 
in the latter half - which is common since protocol+host seldom needs encoding) 

For strings that don't need encoding at all this optimization alone should get 
you close to the numbers for the full thing.

The heuristic to size the sb could perhaps discount chars we copy 1:1 to reduce 
allocation pressure (`i + ((s.length() - i) << 1)`)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1691559924

Reply via email to