Re: String.equals optimisation

2005-07-12 Thread Stephen Crawley

I stand corrected.

-- Steve



___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


Re: String.equals optimisation

2005-07-12 Thread Per Bothner

Stephen Crawley wrote:

[EMAIL PROTECTED] said:


I'd be interested to hear of other reasons for Java's requirement to
intern all literal strings and constants. 


Backwards compatibility.

At this point we can only conjecture as to why Java was originally defined 
this way.  My guess is that this decision was made in the early days when

the language was being targeted at embedded computing and machines with
not a lot of memory.  Having the JVM do interning of literals could save
enough memory to matter.


Er, no.

String literals in JDK 1.0 were *not* interned.
That was changed in JDK 1.1.

The world has moved on, and nobody thinks much about conserving string 
space these days.


I don't believe saving space was the motivation, but consistency.
Assume:

static String x = a;
static String y = a;

Then you had (x==y) if x and y were defined in the same class, but
(x!=y) in they were defined in different classes.

People expect that (a==a).
--
--Per Bothner
[EMAIL PROTECTED]   http://per.bothner.com/


___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


RE: String.equals optimisation

2005-07-12 Thread Jeroen Frijters
Archie Cobbs wrote:
 Simon Kitching wrote:.
  * Class.getName returns strings that have been interned. I don't
think this is explicitly required by the java specs but is
certainly true for Sun's JVM and seems likely to be done by
any sensible JVM.
 
 I.e., is there something special about class names which means
 they should be treated differently from any other String randomly
 created and used in a Java application? (rhetorical question)
 Otherwise, why not intern all Strings? Etc.
 
 In any case, to provide two concrete counter-examples:
 
$ cat  zz.java
public class zz {
  public static void main(String[] args) {
  zz z = new zz();
  System.out.println(z.getClass().getName() == zz);
  }
}
$ javac zz.java
$ java zz
true
$ jc -Xint zz
false
$ jamvm zz
false

$ ikvm zz
true

He did say any sensible JVM... gdr

Regards,
Jeroen


___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


Re: String.equals optimisation

2005-07-12 Thread Archie Cobbs

Simon Kitching wrote:

* Class.getName returns strings that have been interned. I don't
 think this is explicitly required by the java specs but is
 certainly true for Sun's JVM and seems likely to be done by
 any sensible JVM.


You definitely make some good arguments, but this one is not
neccesarily true. In fact, I'd argue a JVM that interns every
class' name (even if only on demand) is potentially wasting
a bunch of heap space.


I'm assuming that the Class object would contain a reference to the
interned string, so there is only one copy of the string, ie somewhere


Not a valid assumtion.. in JC no String is associated with Class
objects.  VMClass.getName() is native and the returned String is
created on demand, based on the UTF-8 name stored internally in memory.

In fact, one could argue that storing class names in any other way
than in their native UTF-8 form is a big waste of memory. E.g.,
for loaded classes...

If the VM can find it's UTF-8 name and create a String dynamically:
  Then also storing the class name persistently as a String is a
  200% increase in memory (a char[] array is twice as big as UTF-8)
Else:
  The VM must store the class name as a String, which is a 100%
  increase in memory vs. storing it as UTF-8


The extra space used for interning is therefore just a single extra
reference (as a reference to the string is contained in both the Class
object and the String class internal pool). Yes that is a little space
wasted, but not a bunch.


Right, the wasted space is not much.. at first I was forgetting
that intern'd strings are stored with weak keys and will get
flushed out after they're no longer referenced (just like normal
Strings)... replace big waste of memory with waste of memory :-)

-Archie

__
Archie Cobbs  *CTO, Awarix*  http://www.awarix.com


___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


Re: String.equals optimisation

2005-07-12 Thread Robert Lougher
Hi,

On 7/12/05, Archie Cobbs [EMAIL PROTECTED] wrote:
 Simon Kitching wrote:
 * Class.getName returns strings that have been interned. I don't
   think this is explicitly required by the java specs but is
   certainly true for Sun's JVM and seems likely to be done by
   any sensible JVM.
 
 You definitely make some good arguments, but this one is not
 neccesarily true. In fact, I'd argue a JVM that interns every
 class' name (even if only on demand) is potentially wasting
 a bunch of heap space.
 
  I'm assuming that the Class object would contain a reference to the
  interned string, so there is only one copy of the string, ie somewhere
 
 Not a valid assumtion.. in JC no String is associated with Class
 objects.  VMClass.getName() is native and the returned String is
 created on demand, based on the UTF-8 name stored internally in memory.
 

It might also be worthwhile to mention that the internal class name
uses slash as a separator, instead of the dot returned by getName(),
e.g. java/lang/String, rather than java.lang.String.

Rob.


___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


Re: String.equals optimisation

2005-07-11 Thread Simon Kitching
On Tue, 2005-07-12 at 11:02 +1200, Simon Kitching wrote:
 It would certainly be nice to know that collection methods will
 automatically work more efficiently when the objects being manipulated
 are String objects that have been interned (of course String.intern has
 to be used appropriately).

Umm..sorry, this particular argument doesn't work. The proposed
optimisation only improves the speed of comparing an object to itself -
not the most common operation in collection work. Operations that speed
up determining when two objects are NOT equal would help much more. [1]

I still think the original patch is relevant though (just not this
point).

[1] eg org.apache.commons.collections.map.IdentityMap, but this relies
on unique objects for keys rather than just trying to optimise for them.

Regards,

Simon




___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


Re: String.equals optimisation

2005-07-11 Thread Archie Cobbs

Simon Kitching wrote:.

* Class.getName returns strings that have been interned. I don't
  think this is explicitly required by the java specs but is
  certainly true for Sun's JVM and seems likely to be done by
  any sensible JVM.


You definitely make some good arguments, but this one is not
neccesarily true. In fact, I'd argue a JVM that interns every
class' name (even if only on demand) is potentially wasting
a bunch of heap space.

I.e., is there something special about class names which means
they should be treated differently from any other String randomly
created and used in a Java application? (rhetorical question)
Otherwise, why not intern all Strings? Etc.

In any case, to provide two concrete counter-examples:

  $ cat  zz.java
  public class zz {
public static void main(String[] args) {
zz z = new zz();
System.out.println(z.getClass().getName() == zz);
}
  }
  $ javac zz.java
  $ java zz
  true
  $ jc -Xint zz
  false
  $ jamvm zz
  false

On the other hand, comparing reference equality is very low cost,
so it seems like adding == to equals() might make good sense.

Of course, the real answer lies in empirical testing (something
I can't claim to have done).

-Archie

__
Archie Cobbs  *CTO, Awarix*  http://www.awarix.com


___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath


Re: String.equals optimisation

2005-07-11 Thread Simon Kitching
Hi Archie,

On Mon, 2005-07-11 at 20:27 -0500, Archie Cobbs wrote:
 Simon Kitching wrote:.
  * Class.getName returns strings that have been interned. I don't
think this is explicitly required by the java specs but is
certainly true for Sun's JVM and seems likely to be done by
any sensible JVM.
 
 You definitely make some good arguments, but this one is not
 neccesarily true. In fact, I'd argue a JVM that interns every
 class' name (even if only on demand) is potentially wasting
 a bunch of heap space.

I'm assuming that the Class object would contain a reference to the
interned string, so there is only one copy of the string, ie somewhere
in the ClassLoader.defineClass method there is this sort of thing:

public Class defineClass(String name, ...)
  Class newClass = new Class();
  newClass.setName(name.intern());
  .
   

The extra space used for interning is therefore just a single extra
reference (as a reference to the string is contained in both the Class
object and the String class internal pool). Yes that is a little space
wasted, but not a bunch.

 
 I.e., is there something special about class names which means
 they should be treated differently from any other String randomly
 created and used in a Java application? (rhetorical question)
 Otherwise, why not intern all Strings? Etc.

I do wonder why Java specified that all literal and constant strings in
a class file are automatically interned. Being able to compare literals
with == is not that useful. 

Maybe the most important goal was to compress class representations in
memory. In particular, when class A references a static final String
field from some other class, A gets a copy of that string not a
reference to it, so without the intern mechanism to merge instances back
together again such strings would get duplicated in ram when the classes
were loaded.

I'd be interested to hear of other reasons for Java's requirement to
intern all literal strings and constants.

But, strangely, I do think that interning classnames (which is optional)
is particularly useful. When ClassLoader resolves a class it has loaded,
it must do lots of lookups to find other classes. Surely being able to
do this using identity to compare classnames would be a significant
timesaver. And class resolution is the biggest issue in application
startup time, so improving this seems like a good idea.

In the general case, whether interning a string proves useful or not
depends upon the usage pattern for that string. I guess the usage
patterns for class literals and classnames are pretty well known: long
lifetimes, and either:
* comparisons against them are common, or
* duplication of the content is common. 
But only users know the usage patterns for the dynamic string objects
they create, so it's up to them to decide when to use intern...

 In any case, to provide two concrete counter-examples:
 
$ cat  zz.java
public class zz {
  public static void main(String[] args) {
  zz z = new zz();
  System.out.println(z.getClass().getName() == zz);
  }
}
$ javac zz.java
$ java zz
true
$ jc -Xint zz
false
$ jamvm zz
false

Hmm..interesting.

$ gij
false

$ gcj -o zz --main=zz zz.java
$ zz
false

Note for others reading this thread: all this is really irrelevant
anyway. The classname stuff was just one example I suggested for when
strings being compared might perform better with String.equals optimised
for comparison by identity. As shown above, sun's java might benefit
from this; other JVMs currently won't. But it's only one example.

 
 
 On the other hand, comparing reference equality is very low cost,
 so it seems like adding == to equals() might make good sense.
 
 Of course, the real answer lies in empirical testing (something
 I can't claim to have done).


Regards,

Simon



___
Classpath mailing list
Classpath@gnu.org
http://lists.gnu.org/mailman/listinfo/classpath