Hi Corey,

The idea I had in mind is refactoring the fast path into the method you call decodeBlock.
Base64: lines 751-768.

It leaves all the unknown/illegal character handling to the Java code.
And yes, it does not need to handle MIME, except to return on illegal characters.

The patch is attached.

Regards, Roger



On 8/31/20 6:22 PM, Corey Ashford wrote:
On 8/29/20 1:19 PM, Corey Ashford wrote:
Hi Roger,

Thanks for your reply and thoughts!  Comments interspersed below:

On 8/28/20 10:54 AM, Roger Riggs wrote:
...
Comparing with the way that the Base64 encoder was intrinsified, the
method that is intrinsified should have a method body that does
the same function, so it is interchangable.  That likely will just shift
the "fast path" code into the decodeBlock method.
Keeping the symmetry between encoder and decoder will
make it easier to maintain the code.

Good point.  I'll investigate what this looks like in terms of the actual code, and will report back (perhaps in a new webrev).


Having looked at this again, I don't think it makes sense.  One thing that differs significantly from the encodeBlock intrinsic is that the decodeBlock intrinsic only needs to process a prefix of the data, and so it can leave virtually any amount of data at the end of the src buffer unprocessed, where as with the encodeBlock intrinsic, if it exists, it must process the entire buffer.

In the (common) case where the decodeBlock intrinsic returns not having processed everything, it still needs to call the Java code, and if that Java code is "replaced" by the intrinsic, it's inaccessible.

Is there something I'm overlooking here?  Basically I want the decode API to behave differently than the encode API, mostly to make the arch-specific intrinsic easier to implement. If that's not acceptable, then I need to rethink the API, and also figure out how to deal with the illegal character case.  The latter could perhaps be done by throwing an exception from the intrinsic, or maybe by returning a negative length that specifies the index of the illegal src byte, and then have the Java code throw the exception).

Regards,

- Corey


diff --git a/src/java.base/share/classes/java/util/Base64.java 
b/src/java.base/share/classes/java/util/Base64.java
index 34b39b18a54..e2b3a686d70 100644
--- a/src/java.base/share/classes/java/util/Base64.java
+++ b/src/java.base/share/classes/java/util/Base64.java
@@ -741,6 +741,27 @@ public class Base64 {
             return 3 * (int) ((len + 3L) / 4) - paddings;
         }
 
+        // Fast path for full 4 byte -> 3 byte conversion w/o errors
+        private int decodeBlock(byte[] src, int sp, int sl, byte[] dst, 
boolean isURL) {
+            int[] base64 = isURL ? fromBase64URL : fromBase64;
+            int dp = 0;
+            int sl0 = sp + ((sl - sp) & ~0b11);
+            while (sp < sl0) {
+                int b1 = base64[src[sp++] & 0xff];
+                int b2 = base64[src[sp++] & 0xff];
+                int b3 = base64[src[sp++] & 0xff];
+                int b4 = base64[src[sp++] & 0xff];
+                if ((b1 | b2 | b3 | b4) < 0) {    // non base64 byte
+                    return dp;
+                }
+                int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4;
+                dst[dp++] = (byte)(bits0 >> 16);
+                dst[dp++] = (byte)(bits0 >>  8);
+                dst[dp++] = (byte)(bits0);
+            }
+            return dp;
+        }
+
         private int decode0(byte[] src, int sp, int sl, byte[] dst) {
             int[] base64 = isURL ? fromBase64URL : fromBase64;
             int dp = 0;
@@ -749,23 +770,34 @@ public class Base64 {
 
             while (sp < sl) {
                 if (shiftto == 18 && sp + 4 < sl) {       // fast path
-                    int sl0 = sp + ((sl - sp) & ~0b11);
-                    while (sp < sl0) {
-                        int b1 = base64[src[sp++] & 0xff];
-                        int b2 = base64[src[sp++] & 0xff];
-                        int b3 = base64[src[sp++] & 0xff];
-                        int b4 = base64[src[sp++] & 0xff];
-                        if ((b1 | b2 | b3 | b4) < 0) {    // non base64 byte
-                            sp -= 4;
-                            break;
-                        }
-                        int bits0 = b1 << 18 | b2 << 12 | b3 << 6 | b4;
-                        dst[dp++] = (byte)(bits0 >> 16);
-                        dst[dp++] = (byte)(bits0 >>  8);
-                        dst[dp++] = (byte)(bits0);
-                    }
-                    if (sp >= sl)
-                        break;
+                    int dl = decodeBlock(src, sp, sl, dst, isURL);
+                    /*
+                     * Calculate how many characters were processed by how many
+                     * bytes of data were returned.
+                     */
+
+                    /*
+                     * Base64 characters always come in groups of four,
+                     * producing three bytes of binary data (except for on
+                     * the final four-character piece where it can produce
+                     * one to three data bytes depending on how many fill
+                     * characters there - zero, one, or two).  The only
+                     * case where there should be a non-multiple of three
+                     * returned is if the intrinsic has processed all of
+                     * the characters passed to it.  At this point in the
+                     * logic, however, we know the instrinsic hasn't
+                     * processed all of the chracters.
+                     *
+                     * Round dl down to the nearest three-byte boundary.
+                     */
+                    dl = (dl / 3) * 3;
+
+                    // Recalculate chars_decoded based on the rounded dl
+                    int chars_decoded = (dl / 3) * 4;
+
+                    sp += chars_decoded;
+                    dp += dl;
+                    continue;
                 }
                 int b = src[sp++] & 0xff;
                 if ((b = base64[b]) < 0) {

Reply via email to