Hello again, I sent this email a week ago and have received no replies. Is there any step I have missed necessary to contribute to the JDK libraries?
I am very interested in making your lives easier, so please let me know if I am in the wrong place or are otherwise misguided. Thanks! Steven On Dec 29, 2012, at 9:25 AM, Steven Schlansker <stevenschlans...@gmail.com> wrote: > Hello core-libs-dev, > > My company uses UUIDs throughout our software. We recently identified that > the java.util.UUID implementations of fromString and toString are horridly > inefficient. > > An incomplete list of the inefficiencies: > > * fromString uses a regular expression that is not even precompiled > * fromString uses a regular expression to parse out the "-" markers, > requiring the allocation of many String and String[] objects, when a simple > linear scan works just fine > * fromString unnecessarily allocates multiple Long objects > > * toString creates a char[64], a String object which copies that, and a > sub-String object for *each* of the 5 hex digit segments > * toString produces a fixed-length result but involves multiple StringBuilder > concatenations > > Everyone that I have shown the relevant code to has been horrified by the > complete lack of care towards object allocations. While I am a big fan of > the "object allocation is free until otherwise proven" philosophy, we traced > multiple production problems down to these hotspots. > > But instead of just whining about it, I wish to contribute a patch which I > believe fixes the problem. This is my first attempt to submit anything to > the JDK, so please be nice :-) > > My initial attempt has yielded micro benchmarks with very favorable outcomes. > By taking the appropriate code from java.lang.Long, modifying it to work on > a single pre-allocated array+offset rather than returning a String, and > scanning for dashes instead of regex splitting I reduced times 3-6x and > object allocations to the low single digits. > > My Google Caliper benchmark is available here, to ensure that I have not > committed any benchmark sins: > https://gist.github.com/4356621 > > benchmark instances Bytes ns linear runtime > JdkUuidFromString 51.00 1544.0 608.2 ============================== > NewUuidFromString 2.00 72.0 179.1 ======== > > JdkUuidToString 31.00 1720.0 321.4 =============== > NewUuidToString 3.00 200.0 51.5 == > > In particular, the reduction of object allocations from ~1.5KB to 72/200 > bytes dramatically reduces GC pressure when you sit in a tight loop > deserializing millions of UUIDs from strings. > > I believe the same patch can (and should?) apply to jdk7u and jdk8, as the > code does not seem to have changed. > > I would appreciate any feedback on the code style, efficiency, or correctness > that you can offer. I have run the "java/util/UUID" jtreg suite against this > and it passes. We stole the more thorough Apache Harmony tests for this and > they pass as well: > https://github.com/stevenschlansker/components-ness-core/blob/master/src/test/java/com/nesscomputing/uuid/TestNessUUID.java > > I have not yet secured a CLA, but my company is in the process of signing one. > > The rest of the message is a "hg export" of my change set, which is current > to the tip of jdk7. > Happy holidays, and thank you for your time! > Steven Schlansker > > > > > # HG changeset patch > # User Steven Schlansker <ste...@nesscomputing.com> > # Date 1356737383 0 > # Node ID a812c963b96f08112f6805cbc380b12af196f788 > # Parent 9b8c96f96a0f9a5801b55530a387fefbe50482a3 > java.util.UUID: improve performance of UUID#toString and UUID#fromString by > avoiding many unnecessary object allocations. > > benchmark instances Bytes ns linear runtime > JdkUuidFromString 51.00 1544.0 608.2 ============================== > NewUuidFromString 2.00 72.0 179.1 ======== > > JdkUuidToString 31.00 1720.0 321.4 =============== > NewUuidToString 3.00 200.0 51.5 == > > diff -r 9b8c96f96a0f -r a812c963b96f src/share/classes/java/util/UUID.java > --- a/src/share/classes/java/util/UUID.java Mon Jun 27 13:21:34 2011 -0700 > +++ b/src/share/classes/java/util/UUID.java Fri Dec 28 23:29:43 2012 +0000 > @@ -189,26 +189,79 @@ > * described in {@link #toString} > * > */ > - public static UUID fromString(String name) { > - String[] components = name.split("-"); > - if (components.length != 5) > - throw new IllegalArgumentException("Invalid UUID string: "+name); > - for (int i=0; i<5; i++) > - components[i] = "0x"+components[i]; > + public static UUID fromString(String str) { > + int dashCount = 4; > + int [] dashPos = new int [6]; > + dashPos[0] = -1; > + dashPos[5] = str.length(); > > - long mostSigBits = Long.decode(components[0]).longValue(); > + for (int i = str.length()-1; i >= 0; i--) { > + if (str.charAt(i) == '-') { > + if (dashCount == 0) { > + throw new IllegalArgumentException("Invalid UUID string: > " + str); > + } > + dashPos[dashCount--] = i; > + } > + } > + > + if (dashCount > 0) { > + throw new IllegalArgumentException("Invalid UUID string: " + > str); > + } > + > + long mostSigBits = decode(str, dashPos, 0) & 0xffffffffL; > mostSigBits <<= 16; > - mostSigBits |= Long.decode(components[1]).longValue(); > + mostSigBits |= decode(str, dashPos, 1) & 0xffffL; > mostSigBits <<= 16; > - mostSigBits |= Long.decode(components[2]).longValue(); > + mostSigBits |= decode(str, dashPos, 2) & 0xffff); > > - long leastSigBits = Long.decode(components[3]).longValue(); > + long leastSigBits = decode(str, dashPos, 3) & 0xffffL; > leastSigBits <<= 48; > - leastSigBits |= Long.decode(components[4]).longValue(); > + leastSigBits |= decode(str, dashPos, 4) & 0xffffffffffffL; > > return new UUID(mostSigBits, leastSigBits); > } > > + private static long decode(final String str, final int [] dashPos, final > int field) { > + int start = dashPos[field]+1; > + int end = dashPos[field+1]; > + if (start >= end) { > + throw new IllegalArgumentException("Invalid UUID string: " + > str); > + } > + long curr = 0; > + for (int i = start; i < end; i++) { > + int x = getNibbleFromChar(str.charAt(i)); > + curr <<= 4; > + if (curr < 0) { > + throw new NumberFormatException("long overflow"); > + } > + curr |= x; > + } > + return curr; > + } > + > + private static final int NUM_ALPHA_DIFF = 'A' - '9' - 1; > + private static final int LOWER_UPPER_DIFF = 'a' - 'A'; > + > + private static int getNibbleFromChar(final char c) > + { > + int x = c - '0'; > + if (x > 9) { > + x -= NUM_ALPHA_DIFF; // difference between '9' and 'A' > + if (x > 15) { > + x -= LOWER_UPPER_DIFF; // difference between 'a' and 'A' > + } > + if (x < 10) { > + throw new IllegalArgumentException(c + " is not a valid > character for an UUID string"); > + } > + } > + > + if (x < 0 || x > 15) { > + throw new IllegalArgumentException(c + " is not a valid > character for an UUID string"); > + } > + > + return x; > + } > + > // Field Accessor Methods > > /** > @@ -372,19 +425,46 @@ > * @return A string representation of this {@code UUID} > */ > public String toString() { > - return (digits(mostSigBits >> 32, 8) + "-" + > - digits(mostSigBits >> 16, 4) + "-" + > - digits(mostSigBits, 4) + "-" + > - digits(leastSigBits >> 48, 4) + "-" + > - digits(leastSigBits, 12)); > + return toString(getMostSignificantBits(), getLeastSignificantBits()); > } > > - /** Returns val represented by the specified number of hex digits. */ > - private static String digits(long val, int digits) { > - long hi = 1L << (digits * 4); > - return Long.toHexString(hi | (val & (hi - 1))).substring(1); > + public static String toString(long msb, long lsb) { > + char[] uuidChars = new char[36]; > + > + hexDigits(uuidChars, 0, 8, msb >> 32); > + uuidChars[8] = '-'; > + hexDigits(uuidChars, 9, 4, msb >> 16); > + uuidChars[13] = '-'; > + hexDigits(uuidChars, 14, 4, msb); > + uuidChars[18] = '-'; > + hexDigits(uuidChars, 19, 4, lsb >> 48); > + uuidChars[23] = '-'; > + hexDigits(uuidChars, 24, 12, lsb); > + > + return new String(uuidChars); > } > > + private static void hexDigits(char[] dest, int offset, int digits, long > val) { > + long hi = 1L << (digits * 4); > + toUnsignedString(dest, offset, digits, hi | (val & (hi - 1)), 4); > + } > + > + private final static char[] HEX_DIGITS = { > + '0' , '1' , '2' , '3' , '4' , '5' , > + '6' , '7' , '8' , '9' , 'a' , 'b' , > + 'c' , 'd' , 'e' , 'f' > + }; > + > + private static void toUnsignedString(char[] dest, int offset, int len, > long i, int shift) { > + int charPos = len; > + int radix = 1 << shift; > + long mask = radix - 1; > + do { > + dest[offset + --charPos] = HEX_DIGITS[(int)(i & mask)]; > + i >>>= shift; > + } while (i != 0 && charPos > 0); > + } > + > /** > * Returns a hash code for this {@code UUID}. > *