Hi guys, It turns out that much of the world's computers is busy doing UTF-8 decoding, so lots of people have expended much effort to make it faster. Much of that text is ASCII, so it's definitely worth having special ASCII-only loops. Like this:
static boolean isAscii1(byte[] bytes) { for (byte b : bytes) if (b < 0) return false; return true; } Hotspot does a pretty good job on that, but we can go faster if we reach for Unsafe, processing data a long at a time. static boolean isAscii2(byte[] bytes) { long i, limit; for (i = ABASE, limit = ABASE + bytes.length - 7; i < limit; i += 8) if ((U.getLong(bytes, i) & 0x8080808080808080L) != 0) return false; for (limit = ABASE + bytes.length; i < limit; i += 1) if (U.getByte(bytes, i) < 0) return false; return true; } We can also try the same strategy with ByteBuffer.getLong, hoping it will do atomic reads for us: static boolean isAscii3(byte[] bytes) { ByteBuffer b = ByteBuffer.wrap(bytes); int i, limit; for (i = 0, limit = bytes.length - 7; i < limit; i += 8) if ((b.getLong(i) & 0x8080808080808080L) != 0) return false; for (limit = bytes.length; i < limit; i += 1) if (b.get(i) < 0) return false; return true; } A stupid benchmark (sorry Aleksey - maybe you already have a jmh version of this?) http://cr.openjdk.java.net/~martin/Utf8Bench.java gives me: (jdk9) Unsafe stride 8: 0.35626781581093897 Buffer stride 8: 0.4825880453758097 (jdk8) Unsafe stride 8: 0.33818706070673143 Buffer stride 8: 1.9958109701624323 That is, the Unsafe code is 3x faster than the simple code. The ByteBuffer code used to be 2x slower and is now 2x faster - well done - crowd goes wild! I see it uses new and well-hidden Unsafe.getLongUnaligned ... all the performance fanatics want to use that, but y'all don't want to give it to us. And we don't want to allocate a temp ByteBuffer object. What to do? If you provided a static variant of ByteBuffer.getLong for byte[], it would probably have enough performance for us to live without Unsafe. It's hard to resist Unsafe when it gives 3x improvement in a performance hotspot. Maybe C2 can optimize the simple loop even more than it is today? Maybe we can add a static method on j.u.Arrays? Maybe static int asciiCount(byte[] bytes, int start, int end) Still a performance hack, but far less of one.