Re: RFR: 8364320: String encodeUTF8 latin1 with negatives

Chen Liang Fri, 01 Aug 2025 09:15:33 -0700

On Fri, 1 Aug 2025 12:39:05 GMT, Brett Okken <[email protected]> wrote:


>> As suggested on mailing list, when encoding latin1 bytes to utf-8, we can 
>> count the leading positive bytes and in the case where there is a negative, 
>> we can copy all the positive values to the target byte[] prior to processing 
>> the remaining data 1 byte at a time.
>> 
>> https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149417.html
>
> Benchmark on win64
> 
> Baseline:
> 
> 
> Benchmark                           (charsetName)  Mode  Cnt      Score     
> Error  Units
> StringEncode.encodeAllMixed                 UTF-8  avgt   10  20067.519 ┬▒ 
> 528.152  ns/op
> StringEncode.encodeAsciiLong                UTF-8  avgt   10  12115.389 ┬▒ 
> 307.491  ns/op
> StringEncode.encodeAsciiShort               UTF-8  avgt   10     70.098 ┬▒   
> 1.696  ns/op
> StringEncode.encodeLatin1LongEnd            UTF-8  avgt   10   1974.391 ┬▒ 
> 162.405  ns/op
> StringEncode.encodeLatin1LongOnly           UTF-8  avgt   10    270.097 ┬▒  
> 13.840  ns/op
> StringEncode.encodeLatin1LongStart          UTF-8  avgt   10   1876.366 ┬▒  
> 51.971  ns/op
> StringEncode.encodeLatin1Mixed              UTF-8  avgt   10   4973.070 ┬▒ 
> 130.426  ns/op
> StringEncode.encodeLatin1Short              UTF-8  avgt   10     96.227 ┬▒   
> 2.816  ns/op
> StringEncode.encodeShortMixed               UTF-8  avgt   10    360.586 ┬▒   
> 8.691  ns/op
> StringEncode.encodeUTF16LongEnd             UTF-8  avgt   10   1534.748 ┬▒  
> 34.584  ns/op
> StringEncode.encodeUTF16LongOnly            UTF-8  avgt   10    528.919 ┬▒  
> 15.143  ns/op
> StringEncode.encodeUTF16LongStart           UTF-8  avgt   10   2275.117 ┬▒  
> 50.152  ns/op
> StringEncode.encodeUTF16Mixed               UTF-8  avgt   10   4398.943 ┬▒ 
> 116.607  ns/op
> StringEncode.encodeUTF16Short               UTF-8  avgt   10    152.219 ┬▒   
> 8.677  ns/op
> 
> 
> 
> Patch:
> 
> Benchmark                           (charsetName)  Mode  Cnt      Score     
> Error  Units
> StringEncode.encodeAllMixed                 UTF-8  avgt   10  18876.056 ┬▒ 
> 330.644  ns/op
> StringEncode.encodeAsciiLong                UTF-8  avgt   10  12040.590 ┬▒ 
> 165.905  ns/op
> StringEncode.encodeAsciiShort               UTF-8  avgt   10     69.895 ┬▒   
> 0.318  ns/op
> StringEncode.encodeLatin1LongEnd            UTF-8  avgt   10    574.455 ┬▒  
> 14.769  ns/op
> StringEncode.encodeLatin1LongOnly           UTF-8  avgt   10    284.553 ┬▒   
> 1.886  ns/op
> StringEncode.encodeLatin1LongStart          UTF-8  avgt   10   2230.789 ┬▒  
> 11.043  ns/op
> StringEncode.encodeLatin1Mixed              UTF-8  avgt   10   3278.998 ┬▒  
> 96.779  ns/op
> StringEncode.encodeLatin1Short              UTF-8  avgt   10     99.332 ┬▒   
> 1.977  ns/op
> StringEncode.encodeShortMixed               UTF-8  avgt   10    378.183 ┬▒  
> 17.504  ns/op
> StringEncode.encodeUTF16LongEnd             UTF-8  avgt   10   1531.960 ┬▒  
> 19.300  ns/op
> StringEncode.encodeUTF16LongOnly            UTF-8  avgt   10    563.810 ┬▒   
> 4.811  ns/op
> StringEncode.encodeUTF16LongS...

@bokken FYI to make JMH comparison easier, you can let JMH generate JSON 
reports, upload them to github gists, and use https://jmh.morethan.io/ to 
compare the two results from two gists.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26597#issuecomment-3145088238

Re: RFR: 8364320: String encodeUTF8 latin1 with negatives

Reply via email to