Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding

Corey Ashford Wed, 26 Aug 2020 09:52:47 -0700

Thanks for your careful review, Martin. I will consider what you havesaid, and reply with comments/questions and possibly a revised webrev ifI think I can satisfy your concerns.


Regards,


- Corey

On 8/26/20 8:26 AM, Doerr, Martin wrote:

Hi Corey,

I should explain my comments regarding Base64.java better.

Let's be precise: "should process a multiple of four" => "must process a
multiple of four"

Did you try to support non-multiple of 4 and this was intended as 
recommendation?
I think making it a requirement and simplifying the logic in decode0 is better.
Or what's the benefit of the recommendation?

If any illegal base64 bytes are encountered in the source by the
intrinsic, the intrinsic can return a data length of zero or any
number of bytes before the place where the illegal base64 byte
was encountered.

I think this has a drawback. Somebody may use a debugger and want to stop
when throwing IllegalArgumentException. He should see the position which
matches the Java implementation.

This is probably hard to understand. Let me try to explain it by example:
1. 80 Bytes get processed by the intrinsic and 60 Bytes written to the 
destination array.
2. The intrinsic sees an illegal base64 Byte and it returns 12 which is allowed 
by your specification.
3. The compiled method containing the intrinsic hits a safepoint (e.g. in the 
large while loop in decodeBlockSlow).
4. A JVMTI agent (debugger) reads dp and dst.
5. The person using the debugger gets angry because more bytes than dp were 
written into dst. The JVM didn't follow the specified behavior.

I guess we can and should avoid it by specifying that the intrinsic needs to 
return the dp value matching the number of Bytes written.

Best regards,
Martin

-----Original Message-----
From: Doerr, Martin
Sent: Dienstag, 25. August 2020 15:38
To: Corey Ashford <[email protected]>; Michihiro Horie
<[email protected]>
Cc: [email protected]; core-libs-dev <core-libs-
[email protected]>; Kazunori Ogata <[email protected]>;
[email protected]
Subject: RE: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
API for Base64 decoding

Hi Corey,

thanks for proposing this change. I have comments and suggestions
regarding various files.

Base64.java

This is the only file which needs another review from core-libs-dev.
First of all, I like the idea to use a HotSpotIntrinsicCandidate which can
consume as many bytes as the implementation wants.

Comment before decodeBlock:
Let's be precise: "should process a multiple of four" => "must process a
multiple of four"

If any illegal base64 bytes are encountered in the source by the
intrinsic, the intrinsic can return a data length of zero or any
number of bytes before the place where the illegal base64 byte
was encountered.

I think this has a drawback. Somebody may use a debugger and want to stop
when throwing IllegalArgumentException. He should see the position which
matches the Java implementation.

Please note that the comment indentation differs from other comments.

decode0: Final "else" after return is redundant.


stubGenerator_ppc.cpp

"__vector" breaks AIX build!
Does it work on Big Endian linux with old gcc (we require 7.3.1, now)?
Please either support Big Endian properly or #ifdef it out.
What exactly does it on linux?
I remember that we had tried such prefixes but were not satisfied. I think it
didn't enforce 16 Byte alignment if I remember correctly.

Attention: C2 does no longer convert int/bool to 64 bit values (since JDK-
8086069). So the argument registers for offset, length and isURL may contain
garbage in the higher bits.

You may want to use load_const_optimized which produces shorter code.

You may want to use __ align(32) to align unrolled_loop_start.

I'll review the algorithm in detail when I find more time.


assembler_ppc.hpp
assembler_ppc.inline.hpp
vm_version_ppc.cpp
vm_version_ppc.hpp
Please rebase. Parts of the change were pushed as part of 8248190: Enable
Power10 system and implement new byte-reverse instructions


vmSymbols.hpp
Indentation looks odd at the end.


library_call.cpp
Good. Indentation style of the call parameters differs from encodeBlock.


runtime.cpp
Good.


aotCodeHeap.cpp
vmSymbols.cpp
shenandoahSupport.cpp
vmStructs_jvmci.cpp
shenandoahSupport.cpp
escape.cpp
runtime.hpp
stubRoutines.cpp
stubRoutines.hpp
vmStructs.cpp
Good and trivial.


Tests:
I think we should have JTREG tests to check for regressions in the future.

Best regards,
Martin

-----Original Message-----
From: Corey Ashford <[email protected]>
Sent: Mittwoch, 19. August 2020 20:11
To: Michihiro Horie <[email protected]>
Cc: [email protected]; core-libs-dev <core-libs-
[email protected]>; Kazunori Ogata <[email protected]>;
[email protected]; Doerr, Martin <[email protected]>
Subject: Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and
API for Base64 decoding

Michihiro Horie posted up a new iteration of this webrev for me.  This
time the webrev includes a complete implementation of the intrinsic for
Power9 and Power10.

You can find it here:
http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/

Changes in webrev.02 vs. webrev.01:

    * The method header for the intrinsic in the Base64 code has been
rewritten using the Javadoc style.  The clarity of the comments has been
improved and some verbosity has been removed.  There are no additional
functional changes to Base64.java.

    * The code needed to martial and check the intrinsic parameters has
been added, using the base64 encodeBlock intrinsic as a guideline.

    * A complete intrinsic implementation for Power9 and Power10 is

included.


    * Adds some Power9 and Power10 assembler instructions needed by the
intrinsic which hadn't been defined before.

The intrinsic implementation in this patch accelerates the decoding of
large blocks of base64 data by a factor of about 3.5X on Power9.

I'm attaching two Java test cases I am using for testing and
benchmarking.  The TestBase64_VB encodes and decodes randomly-sized
buffers of random data and checks that original data matches the
encoded-then-decoded data.  TestBase64Errors encodes a 48K block of
random bytes, then corrupts each byte of the encoded data, one at a
time, checking to see if the decoder catches the illegal byte.

Any comments/suggestions would be appreciated.

Thanks,

- Corey

On 7/27/20 6:49 PM, Corey Ashford wrote:

Michihiro Horie uploaded a new revision of the Base64 decodeBlock
intrinsic API for me:

http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/

It has the following changes with respect to the original one posted:

   * In the event of encountering a non-base64 character, instead of
having a separate error code of -1, the intrinsic can now just return
either 0, or the number of data bytes produced up to the point where

the

illegal base64 character was encountered.  This reduces the number of
special cases, and also provides a way to speed up the process of
finding the bad character by the slower, pure-Java algorithm.

   * The isMIME boolean is removed from the API for two reasons:
     - The current API is not sufficient to handle the isMIME case,
because there isn't a strict relationship between the number of input
bytes and the number of output bytes, because there can be an arbitrary
number of non-base64 characters in the source.
     - If an intrinsic only implements the (isMIME == false) case as ours
does, it will always return 0 bytes processed, which will slightly slow
down the normal path of processing an (isMIME == true) instantiation.
     - We considered adding a separate hotspot candidate for the (isMIME
== true) case, but since we don't have an intrinsic implementation to
test that, we decided to leave it as a future optimization.

Comments and suggestions are welcome.  Thanks for your consideration.

- Corey

On 6/23/20 6:23 PM, Michihiro Horie wrote:

Hi Corey,

Following is the issue I created.
https://bugs.openjdk.java.net/browse/JDK-8248188

I will upload a webrev when you're ready as we talked in private.

Best regards,
Michihiro

Inactive hide details for "Corey Ashford" ---2020/06/24
09:40:10---Currently in java.util.Base64, there is a
HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently
in java.util.Base64, there is a HotSpotIntrinsicCandidate and API for
encodeBlock, but no

From: "Corey Ashford" <[email protected]>
To: "[email protected]"
<[email protected]>,
"[email protected]" <ppc-aix-port-

[email protected]>

Cc: Michihiro Horie/Japan/IBM@IBMJP, Kazunori

Ogata/Japan/IBM@IBMJP,

[email protected]
Date: 2020/06/24 09:40
Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for
Base64 decoding

------------------------------------------------------------------------

Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
API for encodeBlock, but none for decoding.  This means that only
encoding gets acceleration from the underlying CPU's vector hardware.

I'd like to propose adding a new intrinsic for decodeBlock.  The
considerations I have for this new intrinsic's API:

   * Don't make any assumptions about the underlying capability of the
hardware.  For example, do not impose any specific block size
granularity.

   * Don't assume the underlying intrinsic can handle isMIME or isURL
modes, but also let them decide if they will process the data regardless
of the settings of the two booleans.

   * Any remaining data that is not processed by the intrinsic will be
processed by the pure Java implementation.  This allows the intrinsic to
process whatever block sizes it's good at without the complexity of
handling the end fragments.

   * If any illegal character is discovered in the decoding process, the
intrinsic will simply return -1, instead of requiring it to throw a
proper exception from the context of the intrinsic.  In the event of
getting a -1 returned from the intrinsic, the Java Base64 library code
simply calls the pure Java implementation to have it find the error and
properly throw an exception.  This is a performance trade-off in the
case of an error (which I expect to be very rare).

   * One thought I have for a further optimization (not implemented in
the current patch), is that when the intrinsic decides not to process a
block because of some combination of isURL and isMIME settings it
doesn't handle, it could return extra bits in the return code, encoded
as a negative number.  For example:

Illegal_Base64_char   = 0b001;
isMIME_unsupported    = 0b010;
isURL_unsupported     = 0b100;

These can be OR'd together as needed and then negated (flip the sign).
The Base64 library code could then cache these flags, so it will know
not to call the intrinsic again when another decodeBlock is requested
but with an unsupported mode.  This will save the performance hit of
calling the intrinsic when it is guaranteed to fail.

I've tested the attached patch with an actual intrinsic coded up for
Power9/Power10, but those runtime intrinsics and arch-specific patches
aren't attached today.  I want to get some consensus on the
library-level intrinsic API first.

Also attached is a simple test case to test that the new intrinsic API
doesn't break anything.

I'm open to any comments about this.

Thanks for your consideration,

- Corey

Corey Ashford
IBM Systems, Linux Technology Center, OpenJDK team
cjashfor at us dot ibm dot com
[attachment "decodeBlock_api-20200623.patch" deleted by Michihiro
Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro
Horie/Japan/IBM]

Re: RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding

Reply via email to