[cp-patches] RFC: changes to java.lang.Integer, Long...

2008-04-14 Thread Ian Rogers

Hi,

please give your comments on the attached patch. It tries to reduce the 
size of char[] for strings used to hold numbers. It changes Float/Double 
equals to use bit based comparisons rather than division. It increases 
the use of valueOf methods. It adds a cache of values from -128 to 127 
for Long. It adds a cache of the values of zero and one to Float and Double.


The string size is an estimate. For decimal numbers it will divide the 
value repeatedly by 8, causing the string length to be over estimated by 
a character for values like 999. This string size is still better than 
the current estimate of 33 characters. It also avoids the use of 
division (shifts are used) and/or lookup tables.


Regards,
Ian
Index: ChangeLog
===
RCS file: /sources/classpath/classpath/ChangeLog,v
retrieving revision 1.9572
diff -u -r1.9572 ChangeLog
--- ChangeLog   9 Apr 2008 20:23:11 -   1.9572
+++ ChangeLog   14 Apr 2008 13:54:25 -
@@ -1,3 +1,50 @@
+2008-04-14  Ian Rogers  [EMAIL PROTECTED]
+
+   * java/lang/Byte.java (static): initialize byteCache.
+   (valueOf(String,int)): use valueOf(byte) rather than new.
+   (valueOf(String)): likewise.
+   (valueOf(byte)): remove synchronization.
+   (decode): use valueOf(byte) rather than new.
+   * java/lang/Character.java (static): initialize charCache.
+   (valueOf): remove synchronization.
+   * java/lang/Double.java (ZERO): new private field.
+   (ONE): likewise.
+   (valueOf(double)): don't create new doubles for the case of 0 and 1.
+   (valueOf(String)): use valueOf(double) rather than new.
+   (equals): use raw bits for comparison to avoid division.
+* java/lang/Float.java (ZERO): new private field.
+(ONE): likewise.
+(valueOf(double)): don't create new floats for the case of 0 and 1.
+(valueOf(String)): use valueOf(float) rather than new.
+(equals): use raw bits for comparison to avoid division.
+   * java/lang/Integer.java (static): initialize intCache.
+   (stringSize): new private method to estimate size of string for an int.
+   (toString): reuse digits for single character strings, for multiple
+   character strings estimate their length using string size method.
+   (valueOf(String,int)): use valueOf(int) rather than new.
+   (valueOf(String)): likewise.
+   (valueOf(int)): remove synchronization.
+   (getInteger): use valueOf(int) rather than new.
+   (decode): use valueOf(int) rather than new.
+   (signum): use shift and subtract to compute value.
+   (toUnsignedString): calculate string size rather than using 32 chars.
+* java/lang/Long.java (longCache): new private field.
+(stringSize): new private method to estimate size of string for a long.
+(toString): reuse digits for single character strings, for multiple
+character strings estimate their length using string size method.
+(valueOf(String,int)): use valueOf(long) rather than new.
+(valueOf(String)): likewise.
+(valueOf(long)): use cache.
+(decode): use valueOf(long) rather than new.
+   (getLong): likewise.
+(signum): use shift and subtract to compute value.
+   (toUnsignedString): calculate string size rather than using 64 chars.
+   * java/lang/Short.java (static): initialize shortCache.
+(valueOf(String,int)): use valueOf(short) rather than new.
+(valueOf(String)): likewise.
+(valueOf(short)): remove synchronization.
+(decode): use valueOf(short) rather than new.
+
 2008-04-09  Mario Torre  [EMAIL PROTECTED]
  
* java/io/File.java (canWrite): use canWriteDirectory(String). 
Index: java/lang/Byte.java
===
RCS file: /sources/classpath/classpath/java/lang/Byte.java,v
retrieving revision 1.26
diff -u -r1.26 Byte.java
--- java/lang/Byte.java 10 Dec 2006 20:25:44 -  1.26
+++ java/lang/Byte.java 14 Apr 2008 13:54:25 -
@@ -88,6 +88,11 @@
   // This caches Byte values, and is used by boxing conversions via
   // valueOf().  We're required to cache all possible values here.
   private static Byte[] byteCache = new Byte[MAX_VALUE - MIN_VALUE + 1];
+  static
+  {
+for (byte i=MIN_VALUE; i = MAX_VALUE; i++)
+  byteCache[i - MIN_VALUE] = new Byte(i);
+  }
 
 
   /**
@@ -185,7 +190,7 @@
*/
   public static Byte valueOf(String s, int radix)
   {
-return new Byte(parseByte(s, radix));
+return valueOf(parseByte(s, radix));
   }
 
   /**
@@ -201,7 +206,7 @@
*/
   public static Byte valueOf(String s)
   {
-return new Byte(parseByte(s, 10));
+return valueOf(parseByte(s, 10));
   }
 
   /**
@@ -214,12 +219,7 @@
*/
   public static Byte valueOf(byte val)
   {
-synchronized (byteCache)
-  {
-   if (byteCache[val - MIN_VALUE] == null)
- byteCache[val - MIN_VALUE] = 

Re: [cp-patches] RFC: changes to java.lang.Integer, Long...

2008-04-14 Thread David Daney

Ian Rogers wrote:

Hi,

please give your comments on the attached patch. It tries to reduce the 
size of char[] for strings used to hold numbers. It changes Float/Double 
equals to use bit based comparisons rather than division. It increases 
the use of valueOf methods. It adds a cache of values from -128 to 127 
for Long. It adds a cache of the values of zero and one to Float and 
Double.


The string size is an estimate. For decimal numbers it will divide the 
value repeatedly by 8, causing the string length to be over estimated by 
a character for values like 999. This string size is still better than 
the current estimate of 33 characters. It also avoids the use of 
division (shifts are used) and/or lookup tables.




I would like to know your motivation for doing this.  Do you have any 
evidence that this will reduce memory usage and speed up real applications?



That said, in our gcj-3.4 based application, we had to create a cache of 
Integers because we were creating large numbers of them all with a small 
set of values.


So in principle this could be a good approach, but I don't know if we 
can assume that there is universal benefit from a patch like this.  Can 
you point to any benchmarks where this helps?



Thanks,
David Daney



Re: [cp-patches] RFC: changes to java.lang.Integer, Long...

2008-04-14 Thread Ian Rogers

David Daney wrote:

Ian Rogers wrote:

Hi,

please give your comments on the attached patch. It tries to reduce 
the size of char[] for strings used to hold numbers. It changes 
Float/Double equals to use bit based comparisons rather than 
division. It increases the use of valueOf methods. It adds a cache of 
values from -128 to 127 for Long. It adds a cache of the values of 
zero and one to Float and Double.


The string size is an estimate. For decimal numbers it will divide 
the value repeatedly by 8, causing the string length to be over 
estimated by a character for values like 999. This string size is 
still better than the current estimate of 33 characters. It also 
avoids the use of division (shifts are used) and/or lookup tables.




I would like to know your motivation for doing this.  Do you have any 
evidence that this will reduce memory usage and speed up real 
applications?



That said, in our gcj-3.4 based application, we had to create a cache 
of Integers because we were creating large numbers of them all with a 
small set of values.


So in principle this could be a good approach, but I don't know if we 
can assume that there is universal benefit from a patch like this.  
Can you point to any benchmarks where this helps?



Thanks,
David Daney

Hi David,

I'm having a crack down on wasted memory in the Jikes RVM.

For DaCapo fop (single iteration) there are 270 and 977 occurrences of  
Double 0 and 1 and 20 occurrences of other Doubles. On the other hand 
DaCapo bloat has very few 0 and 1 values. My motivation to cache these, 
other than fop, is that they exist as bytecodes (fconst0/fconst1 and 
dconst0/dconst1, although I'm ignoring fconst2 and dconst2). We already 
cache Integers in the intCache. I do extend this concept to Long, as is 
done in OpenJDK, and to Float and Double.


Currently we always allocated 33 char arrays to hold the string value, 
this is 4.625 the size of the minimum object in the RVM. In the case of 
a single character string, 18.86% of Integer strings in DaCapo bloat, 
this code doesn't allocate any char arrays. For other integers the char 
array is reduced to either the exact or (20% of the time for decimal 
values)  1 character longer char arrays. This is at the cost of up to 32 
compares, branches and shifts. For DaCapo bloat a little under 50% of 
integer strings created are for values between -128 and 127.


So the trade offs in the code are, slower Float/Double valueOf code, but 
fewer Float and Double objects (hopefully improving GC). A small time to 
calculate string sizes vs smaller strings and less GC pressure.


For the Jikes RVM we measure performance 4 times a day [1], I introduced 
this patch in r14113 and there are no peaks or troughs that appear at 
this point. Given the patch is performance neutral but saves memory 
(although not improving GC performance for the RVM markedly) I think 
it's worth including. GC is less than 6% of execution time, so time 
saved may be difficult to measure in the bigger picture (unless it 
pushes you under or over a particular threshold).


Regards,
Ian

[1] 
http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/3437/performance_report