On 06/10/2013 08:06 PM, Steven Schlansker wrote:
Hi core-libs-dev,
Hi Steven,
the main issue is that intern() doesn't work in isolation,
by example if you write
"foo" == new String("foo").intern()
the result should be always true so the String cache must be accessible
not only from the Java side but also from the VM side.
Given that we can do a switch on String since java7,
we don't really need to check strings with == in code anymore,
I think it's better to change the JSON Parser implementation to use it's
own cache (or not) and not rely on String.intern().
cheers,
RĂ©mi
While doing performance profiling of my application, I discovered that nearly
50% of the time deserializing JSON was spent within String.intern(). I
understand that in general interning Strings is not the best approach for
things, but I think I have a decent use case -- the value of a certain field is
one of a very limited number of valid values (that are not known at compile
time, so I cannot use an Enum), and is repeated many millions of times in the
JSON stream.
I discovered that replacing String.intern() with a ConcurrentHashMap improved
performance by almost an order of magnitude.
I'm not the only person that discovered this and was surprised:
http://stackoverflow.com/questions/10624232/performance-penalty-of-string-intern
I've been excited about starting to contribute to OpenJDK, so I am thinking
that this might be a fun project for me to take on and then contribute back.
But I figured I should check in on the list before spending a lot of time
tracking this down. I have a couple of preparatory questions:
* Has this bottleneck been examined thoroughly before? Am I wishing too hard
for performance here?
* String.intern() is a native method currently. My understanding is that there is a
nontrivial penalty to invoking native methods (at least via JNI, not sure if this is also
true for "built ins"?). I assume the reason that this is native is so the Java
intern is the same as C++-invoked interns from within the JVM itself. Is this an actual
requirement, or could String.intern be replaced with Java code?
* If the interning itself must be handled by a symbol table in C++ land as it is today, would a "second
level cache" in Java land that invokes a native "intern0" method be acceptable, so that there
is a low-penalty "fast path"? If so, this would involve a nonzero memory cost, although I assume
that a few thousand references inside of a Map is an OK price to pay for a (for example) 5x speedup.
* I assume the String class itself is loaded at a very sensitive time during VM
initialization. Having String initialization trigger (for example) ConcurrentHashMap
class initialization may cause problems or circularities. If this is the case, would
triggering such a load lazily on the first intern() call be "late enough" as to
not cause problems?
I'm sure that if I get anywhere with this I will have more questions, but this
should get me started. Thank you for any advice / insight you may be able to
provide!
Steven