Re: String.equals optimisation
I stand corrected. -- Steve ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
Re: String.equals optimisation
Stephen Crawley wrote: [EMAIL PROTECTED] said: I'd be interested to hear of other reasons for Java's requirement to intern all literal strings and constants. Backwards compatibility. At this point we can only conjecture as to why Java was originally defined this way. My guess is that this decision was made in the early days when the language was being targeted at embedded computing and machines with not a lot of memory. Having the JVM do interning of literals could save enough memory to matter. Er, no. String literals in JDK 1.0 were *not* interned. That was changed in JDK 1.1. The world has moved on, and nobody thinks much about conserving string space these days. I don't believe saving space was the motivation, but consistency. Assume: static String x = a; static String y = a; Then you had (x==y) if x and y were defined in the same class, but (x!=y) in they were defined in different classes. People expect that (a==a). -- --Per Bothner [EMAIL PROTECTED] http://per.bothner.com/ ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
RE: String.equals optimisation
Archie Cobbs wrote: Simon Kitching wrote:. * Class.getName returns strings that have been interned. I don't think this is explicitly required by the java specs but is certainly true for Sun's JVM and seems likely to be done by any sensible JVM. I.e., is there something special about class names which means they should be treated differently from any other String randomly created and used in a Java application? (rhetorical question) Otherwise, why not intern all Strings? Etc. In any case, to provide two concrete counter-examples: $ cat zz.java public class zz { public static void main(String[] args) { zz z = new zz(); System.out.println(z.getClass().getName() == zz); } } $ javac zz.java $ java zz true $ jc -Xint zz false $ jamvm zz false $ ikvm zz true He did say any sensible JVM... gdr Regards, Jeroen ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
Re: String.equals optimisation
Simon Kitching wrote: * Class.getName returns strings that have been interned. I don't think this is explicitly required by the java specs but is certainly true for Sun's JVM and seems likely to be done by any sensible JVM. You definitely make some good arguments, but this one is not neccesarily true. In fact, I'd argue a JVM that interns every class' name (even if only on demand) is potentially wasting a bunch of heap space. I'm assuming that the Class object would contain a reference to the interned string, so there is only one copy of the string, ie somewhere Not a valid assumtion.. in JC no String is associated with Class objects. VMClass.getName() is native and the returned String is created on demand, based on the UTF-8 name stored internally in memory. In fact, one could argue that storing class names in any other way than in their native UTF-8 form is a big waste of memory. E.g., for loaded classes... If the VM can find it's UTF-8 name and create a String dynamically: Then also storing the class name persistently as a String is a 200% increase in memory (a char[] array is twice as big as UTF-8) Else: The VM must store the class name as a String, which is a 100% increase in memory vs. storing it as UTF-8 The extra space used for interning is therefore just a single extra reference (as a reference to the string is contained in both the Class object and the String class internal pool). Yes that is a little space wasted, but not a bunch. Right, the wasted space is not much.. at first I was forgetting that intern'd strings are stored with weak keys and will get flushed out after they're no longer referenced (just like normal Strings)... replace big waste of memory with waste of memory :-) -Archie __ Archie Cobbs *CTO, Awarix* http://www.awarix.com ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
Re: String.equals optimisation
Hi, On 7/12/05, Archie Cobbs [EMAIL PROTECTED] wrote: Simon Kitching wrote: * Class.getName returns strings that have been interned. I don't think this is explicitly required by the java specs but is certainly true for Sun's JVM and seems likely to be done by any sensible JVM. You definitely make some good arguments, but this one is not neccesarily true. In fact, I'd argue a JVM that interns every class' name (even if only on demand) is potentially wasting a bunch of heap space. I'm assuming that the Class object would contain a reference to the interned string, so there is only one copy of the string, ie somewhere Not a valid assumtion.. in JC no String is associated with Class objects. VMClass.getName() is native and the returned String is created on demand, based on the UTF-8 name stored internally in memory. It might also be worthwhile to mention that the internal class name uses slash as a separator, instead of the dot returned by getName(), e.g. java/lang/String, rather than java.lang.String. Rob. ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
Re: String.equals optimisation
On Tue, 2005-07-12 at 11:02 +1200, Simon Kitching wrote: It would certainly be nice to know that collection methods will automatically work more efficiently when the objects being manipulated are String objects that have been interned (of course String.intern has to be used appropriately). Umm..sorry, this particular argument doesn't work. The proposed optimisation only improves the speed of comparing an object to itself - not the most common operation in collection work. Operations that speed up determining when two objects are NOT equal would help much more. [1] I still think the original patch is relevant though (just not this point). [1] eg org.apache.commons.collections.map.IdentityMap, but this relies on unique objects for keys rather than just trying to optimise for them. Regards, Simon ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
Re: String.equals optimisation
Simon Kitching wrote:. * Class.getName returns strings that have been interned. I don't think this is explicitly required by the java specs but is certainly true for Sun's JVM and seems likely to be done by any sensible JVM. You definitely make some good arguments, but this one is not neccesarily true. In fact, I'd argue a JVM that interns every class' name (even if only on demand) is potentially wasting a bunch of heap space. I.e., is there something special about class names which means they should be treated differently from any other String randomly created and used in a Java application? (rhetorical question) Otherwise, why not intern all Strings? Etc. In any case, to provide two concrete counter-examples: $ cat zz.java public class zz { public static void main(String[] args) { zz z = new zz(); System.out.println(z.getClass().getName() == zz); } } $ javac zz.java $ java zz true $ jc -Xint zz false $ jamvm zz false On the other hand, comparing reference equality is very low cost, so it seems like adding == to equals() might make good sense. Of course, the real answer lies in empirical testing (something I can't claim to have done). -Archie __ Archie Cobbs *CTO, Awarix* http://www.awarix.com ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath
Re: String.equals optimisation
Hi Archie, On Mon, 2005-07-11 at 20:27 -0500, Archie Cobbs wrote: Simon Kitching wrote:. * Class.getName returns strings that have been interned. I don't think this is explicitly required by the java specs but is certainly true for Sun's JVM and seems likely to be done by any sensible JVM. You definitely make some good arguments, but this one is not neccesarily true. In fact, I'd argue a JVM that interns every class' name (even if only on demand) is potentially wasting a bunch of heap space. I'm assuming that the Class object would contain a reference to the interned string, so there is only one copy of the string, ie somewhere in the ClassLoader.defineClass method there is this sort of thing: public Class defineClass(String name, ...) Class newClass = new Class(); newClass.setName(name.intern()); . The extra space used for interning is therefore just a single extra reference (as a reference to the string is contained in both the Class object and the String class internal pool). Yes that is a little space wasted, but not a bunch. I.e., is there something special about class names which means they should be treated differently from any other String randomly created and used in a Java application? (rhetorical question) Otherwise, why not intern all Strings? Etc. I do wonder why Java specified that all literal and constant strings in a class file are automatically interned. Being able to compare literals with == is not that useful. Maybe the most important goal was to compress class representations in memory. In particular, when class A references a static final String field from some other class, A gets a copy of that string not a reference to it, so without the intern mechanism to merge instances back together again such strings would get duplicated in ram when the classes were loaded. I'd be interested to hear of other reasons for Java's requirement to intern all literal strings and constants. But, strangely, I do think that interning classnames (which is optional) is particularly useful. When ClassLoader resolves a class it has loaded, it must do lots of lookups to find other classes. Surely being able to do this using identity to compare classnames would be a significant timesaver. And class resolution is the biggest issue in application startup time, so improving this seems like a good idea. In the general case, whether interning a string proves useful or not depends upon the usage pattern for that string. I guess the usage patterns for class literals and classnames are pretty well known: long lifetimes, and either: * comparisons against them are common, or * duplication of the content is common. But only users know the usage patterns for the dynamic string objects they create, so it's up to them to decide when to use intern... In any case, to provide two concrete counter-examples: $ cat zz.java public class zz { public static void main(String[] args) { zz z = new zz(); System.out.println(z.getClass().getName() == zz); } } $ javac zz.java $ java zz true $ jc -Xint zz false $ jamvm zz false Hmm..interesting. $ gij false $ gcj -o zz --main=zz zz.java $ zz false Note for others reading this thread: all this is really irrelevant anyway. The classname stuff was just one example I suggested for when strings being compared might perform better with String.equals optimised for comparison by identity. As shown above, sun's java might benefit from this; other JVMs currently won't. But it's only one example. On the other hand, comparing reference equality is very low cost, so it seems like adding == to equals() might make good sense. Of course, the real answer lies in empirical testing (something I can't claim to have done). Regards, Simon ___ Classpath mailing list Classpath@gnu.org http://lists.gnu.org/mailman/listinfo/classpath