Re: [drlvm] string interning in java

Salikh Zakirov Sat, 29 Jul 2006 08:52:29 -0700

Alex Blewitt wrote:

> Yes, but I disagree with your conclusion; to me, it doesn't make
> sense.


I see you point.
Still, we need to have this functionality, because it is in spec.

> However, by hard-coding the message like this, you're
> guaranteeing that the VM will intern() the string "how are you?", even
> if it never prints this message out (because it's in the constant
> pool).

This is not the case, at least for DRLVM (I do not know about other JVMs).
The string literal in the .java file is compiled to the constant
pool entry in the class file, and essentially, the whole contents
of the class file is loaded to memory (native memory).
No java object instances are created at this point.

Later, at execution time, LDC instruction is calling to the VM helper
to resolve the constant pool entry. Resolution is done lazy,
so if your code is never called, then the string literal will
never be resolved. The java string instance is created and interned
only at resolution time.

So, the issue you have described is not applicable to the DRLVM.
The interning is never done for the unused strings.

We may consider the opportunity to save memory by not keeping the
class file contents in memory, and loading it on demand, but
I personally do not know for sure how is it done now.

> Many, many, messages are never printed out in the course of a
> normal code flow; yet they're all cached in the intern() pool, and in
> fact, have to be because of the way that constant pools are supposed
> to work.

The java specification does not require eager loading of the string
literals, and DRLVM in fact does not do it.

> So this 'perfect sense' is trading off speed of lookup of a message
> for memory requirements, even if it's never used. If it's never used,
> the speed of lookup is irrelivant (think exception messages like
> 'Invalid URL format') but it still takes up space.

It only keeps space in class file if it is never used.
Saving space on class loading is a separate issue, not related
to strings interning.


> Reading the Eclipse Performance Bloopoers is a very good summary of
> the techniques taken by the Eclipse development team to try and limit
> the amount of memory used, and should be read by anyone with an
> interest in intern()'d strings, or for that matter, messages in
> general:
> 
> http://wiki.eclipse.org/index.php/Performance_Bloopers

Thanks, this is very interesting reading.

> Importantly, it explains why the Eclipse NLS class uses static string
> variables to refer to messages, and not to String literals for exactly
> this reason. If you use a static string variable that is dynamically
> initialised from a properties file, you only take the memory up when
> you need to refer to that message

actually, I think, this should read "... when you need to refer
to any of messages contained in the resource bundle ..."

> Furthermore, because it's a
> dynamically read in string, it doesn't pollute the intern() pool, and
> thus when the class is unloaded, the string is unloaded too.

The interned strings design does not precludes from garbage collecting
interned strings. The patches I've sent in fact do just that.
Before the patch, the native interned string pool is used, and interned
strings are not garbage collectable. After the patch, the interned string
pool is implemented in java, and non-reachable interned strings are
reclaimed from interned pool too.

The literals are still referenced in a strong way from the constant pool,
so can be reclaimed only if the class is unloaded.
(and class unloading is not yet fully supported in DRLVM, though some prototype
exists).

> Once a
> String is intern()'d, it's like a memory leak -- you'll never see that
> memory again.

Not true for DRLVM with my patch.

> Lastly, the Eclipse approach ensures that there are no String values
> saved as keys anywhere in the system -- they're always referred to by
> the static field contents. So regardless of whether they're intern()'d
> or not, you are always comparing by reference anyway.

I would rather say that Eclipse approach eliminates the need of any lookup,
because static fields are compiled into immediate address access.

> In other words, all of the benefits of intern(), and yet none of the
> disadvantages. Remind me again why we're not using this method  for
> doing messages in Harmony?

The Eclipse approach for localizing messages from java code looks promising,
but requires some work duplication or complex tooling support.

As far as I understand, Eclipse approach is completely orthogonal to traditional
localization approach, so it imposes no restrictions on using two approaches
simultaneously. Feel free to submit patches to use it :)


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [drlvm] string interning in java

Reply via email to