Hello again,

I sent this email a week ago and have received no replies.  Is there any step I 
have missed necessary to contribute to the JDK libraries?

I am very interested in making your lives easier, so please let me know if I am 
in the wrong place or are otherwise misguided.

Thanks!
Steven


On Dec 29, 2012, at 9:25 AM, Steven Schlansker <stevenschlans...@gmail.com> 
wrote:

> Hello core-libs-dev,
> 
> My company uses UUIDs throughout our software.  We recently identified that 
> the java.util.UUID implementations of fromString and toString are horridly 
> inefficient.
> 
> An incomplete list of the inefficiencies:
> 
> * fromString uses a regular expression that is not even precompiled
> * fromString uses a regular expression to parse out the "-" markers, 
> requiring the allocation of many String and String[] objects, when a simple 
> linear scan works just fine
> * fromString unnecessarily allocates multiple Long objects
> 
> * toString creates a char[64], a String object which copies that, and a 
> sub-String object for *each* of the 5 hex digit segments
> * toString produces a fixed-length result but involves multiple StringBuilder 
> concatenations
> 
> Everyone that I have shown the relevant code to has been horrified by the 
> complete lack of care towards object allocations.  While I am a big fan of 
> the "object allocation is free until otherwise proven" philosophy, we traced 
> multiple production problems down to these hotspots.
> 
> But instead of just whining about it, I wish to contribute a patch which I 
> believe fixes the problem.  This is my first attempt to submit anything to 
> the JDK, so please be nice :-)
> 
> My initial attempt has yielded micro benchmarks with very favorable outcomes. 
>  By taking the appropriate code from java.lang.Long, modifying it to work on 
> a single pre-allocated array+offset rather than returning a String, and 
> scanning for dashes instead of regex splitting I reduced times 3-6x and 
> object allocations to the low single digits.
> 
> My Google Caliper benchmark is available here, to ensure that I have not 
> committed any benchmark sins:
> https://gist.github.com/4356621
> 
>        benchmark instances  Bytes    ns linear runtime
> JdkUuidFromString     51.00 1544.0 608.2 ==============================
> NewUuidFromString      2.00   72.0 179.1 ========
> 
>  JdkUuidToString     31.00 1720.0 321.4 ===============
>  NewUuidToString      3.00  200.0  51.5 ==
> 
> In particular, the reduction of object allocations from ~1.5KB to 72/200 
> bytes dramatically reduces GC pressure when you sit in a tight loop 
> deserializing millions of UUIDs from strings.
> 
> I believe the same patch can (and should?) apply to jdk7u and jdk8, as the 
> code does not seem to have changed.
> 
> I would appreciate any feedback on the code style, efficiency, or correctness 
> that you can offer.  I have run the "java/util/UUID" jtreg suite against this 
> and it passes.  We stole the more thorough Apache Harmony tests for this and 
> they pass as well: 
> https://github.com/stevenschlansker/components-ness-core/blob/master/src/test/java/com/nesscomputing/uuid/TestNessUUID.java
> 
> I have not yet secured a CLA, but my company is in the process of signing one.
> 
> The rest of the message is a "hg export" of my change set, which is current 
> to the tip of jdk7.
> Happy holidays, and thank you for your time!
> Steven Schlansker
> 
> 
> 
> 
> # HG changeset patch
> # User Steven Schlansker <ste...@nesscomputing.com>
> # Date 1356737383 0
> # Node ID a812c963b96f08112f6805cbc380b12af196f788
> # Parent  9b8c96f96a0f9a5801b55530a387fefbe50482a3
> java.util.UUID: improve performance of UUID#toString and UUID#fromString by 
> avoiding many unnecessary object allocations.
> 
>        benchmark instances  Bytes    ns linear runtime
> JdkUuidFromString     51.00 1544.0 608.2 ==============================
> NewUuidFromString      2.00   72.0 179.1 ========
> 
>  JdkUuidToString     31.00 1720.0 321.4 ===============
>  NewUuidToString      3.00  200.0  51.5 ==
> 
> diff -r 9b8c96f96a0f -r a812c963b96f src/share/classes/java/util/UUID.java
> --- a/src/share/classes/java/util/UUID.java   Mon Jun 27 13:21:34 2011 -0700
> +++ b/src/share/classes/java/util/UUID.java   Fri Dec 28 23:29:43 2012 +0000
> @@ -189,26 +189,79 @@
>     *          described in {@link #toString}
>     *
>     */
> -    public static UUID fromString(String name) {
> -        String[] components = name.split("-");
> -        if (components.length != 5)
> -            throw new IllegalArgumentException("Invalid UUID string: "+name);
> -        for (int i=0; i<5; i++)
> -            components[i] = "0x"+components[i];
> +    public static UUID fromString(String str) {
> +        int dashCount = 4;
> +        int [] dashPos = new int [6];
> +        dashPos[0] = -1;
> +        dashPos[5] = str.length();
> 
> -        long mostSigBits = Long.decode(components[0]).longValue();
> +        for (int i = str.length()-1; i >= 0; i--) {
> +            if (str.charAt(i) == '-') {
> +                if (dashCount == 0) {
> +                    throw new IllegalArgumentException("Invalid UUID string: 
> " + str);
> +                }
> +                dashPos[dashCount--] = i;
> +            }
> +        }
> +
> +        if (dashCount > 0) {
> +            throw new IllegalArgumentException("Invalid UUID string: " + 
> str);
> +        }
> +
> +        long mostSigBits = decode(str, dashPos, 0) & 0xffffffffL;
>        mostSigBits <<= 16;
> -        mostSigBits |= Long.decode(components[1]).longValue();
> +        mostSigBits |= decode(str, dashPos, 1) & 0xffffL;
>        mostSigBits <<= 16;
> -        mostSigBits |= Long.decode(components[2]).longValue();
> +        mostSigBits |= decode(str,  dashPos, 2) & 0xffff);
> 
> -        long leastSigBits = Long.decode(components[3]).longValue();
> +        long leastSigBits = decode(str,  dashPos, 3) & 0xffffL;
>        leastSigBits <<= 48;
> -        leastSigBits |= Long.decode(components[4]).longValue();
> +        leastSigBits |= decode(str,  dashPos, 4) & 0xffffffffffffL;
> 
>        return new UUID(mostSigBits, leastSigBits);
>    }
> 
> +    private static long decode(final String str, final int [] dashPos, final 
> int field) {
> +        int start = dashPos[field]+1;
> +        int end = dashPos[field+1];
> +        if (start >= end) {
> +            throw new IllegalArgumentException("Invalid UUID string: " + 
> str);
> +        }
> +        long curr = 0;
> +        for (int i = start; i < end; i++) {
> +            int x = getNibbleFromChar(str.charAt(i));
> +            curr <<= 4;
> +            if (curr < 0) {
> +                throw new NumberFormatException("long overflow");
> +            }
> +            curr |= x;
> +        }
> +        return curr;
> +    }
> +    
> +    private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1;
> +    private static final int LOWER_UPPER_DIFF = 'a' - 'A';
> +
> +    private static int getNibbleFromChar(final char c)
> +    {
> +        int x = c - '0';
> +        if (x > 9) {
> +            x -= NUM_ALPHA_DIFF; // difference between '9' and 'A'
> +            if (x > 15) {
> +                x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A'
> +            }
> +            if (x < 10) {
> +                throw new IllegalArgumentException(c + " is not a valid 
> character for an UUID string");
> +            }
> +        }
> +
> +        if (x < 0 || x > 15) {
> +            throw new IllegalArgumentException(c + " is not a valid 
> character for an UUID string");
> +        }
> +
> +        return x;
> +    }
> +
>    // Field Accessor Methods
> 
>    /**
> @@ -372,19 +425,46 @@
>     * @return  A string representation of this {@code UUID}
>     */
>    public String toString() {
> -        return (digits(mostSigBits >> 32, 8) + "-" +
> -                digits(mostSigBits >> 16, 4) + "-" +
> -                digits(mostSigBits, 4) + "-" +
> -                digits(leastSigBits >> 48, 4) + "-" +
> -                digits(leastSigBits, 12));
> +        return toString(getMostSignificantBits(), getLeastSignificantBits());
>    }
> 
> -    /** Returns val represented by the specified number of hex digits. */
> -    private static String digits(long val, int digits) {
> -        long hi = 1L << (digits * 4);
> -        return Long.toHexString(hi | (val & (hi - 1))).substring(1);
> +    public static String toString(long msb, long lsb) {
> +        char[] uuidChars = new char[36];
> +
> +        hexDigits(uuidChars, 0, 8, msb >> 32);
> +        uuidChars[8] = '-';
> +        hexDigits(uuidChars, 9, 4, msb >> 16);
> +        uuidChars[13] = '-';
> +        hexDigits(uuidChars, 14, 4, msb);
> +        uuidChars[18] = '-';
> +        hexDigits(uuidChars, 19, 4, lsb >> 48);
> +        uuidChars[23] = '-';
> +        hexDigits(uuidChars, 24, 12, lsb);
> +
> +        return new String(uuidChars);
>    }
> 
> +    private static void hexDigits(char[] dest, int offset, int digits, long 
> val) {
> +        long hi = 1L << (digits * 4);
> +        toUnsignedString(dest, offset, digits, hi | (val & (hi - 1)), 4);
> +    }
> +
> +    private final static char[] HEX_DIGITS = {
> +        '0' , '1' , '2' , '3' , '4' , '5' ,
> +        '6' , '7' , '8' , '9' , 'a' , 'b' ,
> +        'c' , 'd' , 'e' , 'f'
> +    };
> +
> +    private static void toUnsignedString(char[] dest, int offset, int len, 
> long i, int shift) {
> +        int charPos = len;
> +        int radix = 1 << shift;
> +        long mask = radix - 1;
> +        do {
> +            dest[offset + --charPos] = HEX_DIGITS[(int)(i & mask)];
> +            i >>>= shift;
> +        } while (i != 0 && charPos > 0);
> +    }
> +    
>    /**
>     * Returns a hash code for this {@code UUID}.
>     *

Reply via email to