Re: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86

Claes Redestad Fri, 24 Sep 2021 09:04:51 -0700

On Tue, 21 Sep 2021 21:58:48 GMT, Claes Redestad <redes...@openjdk.org> wrote:


> This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to 
> work also for ASCII encoding, which makes for example the `UTF_8$Encoder` 
> perform on par with (or outperform) similarly getting charset encoded bytes 
> from a String. The former took a small performance hit in JDK 9, and the 
> latter improved greatly in the same release.
> 
> Extending the `EncodeIsoArray` intrinsics on other platforms should be 
> possible, but I'm unfamiliar with the macro assembler in general and unlike 
> the x86 intrinsic they don't use a simple vectorized mask to implement the 
> latin-1 check. For example aarch64 seem to filter out the low bytes and then 
> check if there's any bits set in the high bytes. Clever, but very different 
> to the 0xFF80 2-byte mask that an ASCII test wants.

The current version (cef05f4) copies the ISO_8859_1.implEncodeISOArray 
intrinsic and adapts it to work on ASCII encoding, which makes the 
UTF_8$Encoder perform on par with (or outperform) encoding from a String. Using 
microbenchmarks provided by @carterkozak here: 
https://github.com/carterkozak/stringbuilder-encoding-performance

Baseline:


Benchmark                                                      (charsetName)    
                      (message)  (timesToAppend)  Mode  Cnt     Score    Error  
Units
EncoderBenchmarks.charsetEncoder                                       UTF-8    
 This is a simple ASCII message                3  avgt    8   270.237 ± 10.504  
ns/op
EncoderBenchmarks.charsetEncoder                                       UTF-8  
This is a message with unicode 😊                3  avgt    8   568.353 ±  2.331 
 ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8    
 This is a simple ASCII message                3  avgt    8   324.889 ± 17.466  
ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8  
This is a message with unicode 😊                3  avgt    8   633.720 ± 22.703 
 ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8    
 This is a simple ASCII message                3  avgt    8  1132.436 ± 30.661  
ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8  
This is a message with unicode 😊                3  avgt    8  1379.207 ± 66.982 
 ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8    
 This is a simple ASCII message                3  avgt    8    91.253 ±  3.848  
ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8  
This is a message with unicode 😊                3  avgt    8   519.489 ± 12.516 
 ns/op


Patch:

Benchmark                                                      (charsetName)    
                      (message)  (timesToAppend)  Mode  Cnt     Score     Error 
 Units
EncoderBenchmarks.charsetEncoder                                       UTF-8    
 This is a simple ASCII message                3  avgt    4    82.535 ±  20.310 
 ns/op
EncoderBenchmarks.charsetEncoder                                       UTF-8  
This is a message with unicode 😊                3  avgt    4   522.679 ±  
13.456  ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8    
 This is a simple ASCII message                3  avgt    4   127.831 ±  32.612 
 ns/op
EncoderBenchmarks.charsetEncoderWithAllocation                         UTF-8  
This is a message with unicode 😊                3  avgt    4   549.343 ±  
59.899  ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8    
 This is a simple ASCII message                3  avgt    4  1182.835 ± 153.735 
 ns/op
EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder          UTF-8  
This is a message with unicode 😊                3  avgt    4  1416.407 ± 
130.551  ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8    
 This is a simple ASCII message                3  avgt    4    97.770 ±  15.742 
 ns/op
EncoderBenchmarks.toStringGetBytes                                     UTF-8  
This is a message with unicode 😊                3  avgt    4   516.351 ±  
58.580  ns/op


This can probably be simplified further, say by adding a flag to the intrinsic 
of whether we're encoding ASCII only or ISO-8859-1. It also needs to be 
implemented and tested on all architectures.

(edit: accidentally edit rather than quote-reply, restored original comment)

On the JDK-included `CharsetEncodeDecode.encode` microbenchmark, I get these 
numbers in the baseline (18-b09):


Benchmark                   (size)       (type)  Mode  Cnt    Score   Error  
Units
CharsetEncodeDecode.encode   16384        UTF-8  avgt   30   39.962 ± 1.703  
us/op
CharsetEncodeDecode.encode   16384         BIG5  avgt   30  153.282 ± 4.521  
us/op
CharsetEncodeDecode.encode   16384  ISO-8859-15  avgt   30  192.040 ± 4.543  
us/op
CharsetEncodeDecode.encode   16384        ASCII  avgt   30   40.051 ± 1.210  
us/op
CharsetEncodeDecode.encode   16384       UTF-16  avgt   30  302.815 ± 9.490  
us/op


With the proposed patch:

Benchmark                   (size)       (type)  Mode  Cnt    Score    Error  
Units
CharsetEncodeDecode.encode   16384        UTF-8  avgt   30    4.081 ±  0.182  
us/op
CharsetEncodeDecode.encode   16384         BIG5  avgt   30  150.374 ±  3.579  
us/op
CharsetEncodeDecode.encode   16384  ISO-8859-15  avgt   30    4.010 ±  0.179  
us/op
CharsetEncodeDecode.encode   16384        ASCII  avgt   30    3.961 ±  0.176  
us/op
CharsetEncodeDecode.encode   16384       UTF-16  avgt   30  302.235 ± 11.395  
us/op


That is: on my system encoding 16K char ASCII data is 10x faster for UTF-8 and 
ASCII, and roughly 48x faster for ASCII-compatible charsets like ISO-8859-15. 
On 3rd party microbenchmarks we can assert that performance for non-ASCII input 
either doesn't change, or improves when messages have an ASCII prefix.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5621

Re: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86

Reply via email to