Re: [cp-patches] Float/Double compare
On Tuesday 03 July 2007 23:16:19 Ian Rogers wrote: Here's a revised version of the patch following suggestions from Andrew Haley. Ian Committed: 2007-10-12 Ian Rogers [EMAIL PROTECTED] 2007-10-12 Andrew Haley [EMAIL PROTECTED] PR classpath/33741: * java/lang/Double.java: (compare(double,double)): Increase performance of this method. * java/lang/Float.java: (compare(float,float)): Likewise. -- Andrew :)
Re: [cp-patches] Float/Double compare
Here's a revised version of the patch following suggestions from Andrew Haley. Ian --- java/lang/Float.java2006-12-10 15:25:44.0 -0500 +++ java/lang/Float.java2007-07-02 12:17:29.0 -0400 @@ -596,16 +596,25 @@ */ public static int compare(float x, float y) { -if (isNaN(x)) - return isNaN(y) ? 0 : 1; -if (isNaN(y)) - return -1; -// recall that 0.0 == -0.0, so we convert to infinities and try again -if (x == 0 y == 0) - return (int) (1 / x - 1 / y); -if (x == y) - return 0; + // handle the easy cases: + if (x y) + return -1; + if (x y) + return 1; -return x y ? 1 : -1; + // handle equality respecting that 0.0 != -0.0 (hence not using x == y): + int ix = floatToRawIntBits(x); + int iy = floatToRawIntBits(y); + if (ix == iy) + return 0; + + // handle NaNs: + if (x != x) + return (y != y) ? 0 : 1; + else if (y != y) + return -1; + + // handle +/- 0.0 + return (ix iy) ? -1 : 1; } } --- java/lang/Double.java 2006-12-10 15:25:44.0 -0500 +++ java/lang/Double.java 2007-07-02 12:18:36.0 -0400 @@ -587,16 +587,25 @@ */ public static int compare(double x, double y) { -if (isNaN(x)) - return isNaN(y) ? 0 : 1; -if (isNaN(y)) - return -1; -// recall that 0.0 == -0.0, so we convert to infinites and try again -if (x == 0 y == 0) - return (int) (1 / x - 1 / y); -if (x == y) - return 0; + // handle the easy cases: + if (x y) + return -1; + if (x y) + return 1; -return x y ? 1 : -1; + // handle equality respecting that 0.0 != -0.0 (hence not using x == y): + long lx = doubleToRawLongBits(x); + long ly = doubleToRawLongBits(y); + if (lx == ly) + return 0; + + // handle NaNs: + if (x != x) + return (y != y) ? 0 : 1; + else if (y != y) + return -1; + + // handle +/- 0.0 + return (lx ly) ? -1 : 1; } }
Re: Float/Double compare
Ian Rogers writes: Ian Rogers wrote: public static int compare(float x, float y) { if (isNaN(x)) return isNaN(y) ? 0 : 1; if (isNaN(y)) return -1; // recall that 0.0 == -0.0, so we convert to infinities and try again if (x == 0 y == 0) return (int) (1 / x - 1 / y); if (x == y) return 0; return x y ? 1 : -1; } Below is a variant of the compare code and a comparison with the cost of the current compare code: public static int compare(float x, float y) { int ix = floatAsIntBits(x); int iy = floatAsIntBits(y); if (ix == iy) return 0; if (isNaN(x)) return 1; if (isNaN(y)) return -1; int ix_sign = ix31; int iy_sign = iy31; if (ix_sign == iy_sign) { return x y ? 1 : -1; } else { return ix_sign - iy_sign; } } Case: x == y neither are NaN or 0.0 New Cost: 2 float as ints, 1 int compare Old Cost: 5 float compares Case: x y and different sign New Cost: 2 float as ints, 2 int compares, 2 float compares, 2 shifts, one subtract Old Cost: 6 float compares Case: x y and same sign New Cost: 2 float as ints, 2 int compares, 3 float compares, 2 shifts Old Cost: 6 float compares Case: x is NaN New Cost: 2 float as ints, 1 int compare, 1 float compare Old Cost: 2 float compares Case: y is NaN New Cost: 2 float as ints, 1 int compare, 2 float compares Old Cost: 2 float compares Case: x == 0.0 and y == 0.0 New Cost: either 2 float as ints, 1 int compare or 2 float as ints, 2 int compares, 2 float compares, 2 shifts, one subtract Old Cost: 4 float compares, 2 float divides, 1 float subtract, 1 float to int The same code but for doubles would be: public static int compare(double x, double y) { long lx = doubleAsLongBits(x); long ly = doubleAsLongBits(y); if (lx == ly) return 0; if (isNaN(x)) return 1; if (isNaN(y)) return -1; int lx_sign = (int)(lx63); int ly_sign = (int)(ly63); if (lx_sign == ly_sign) { return x y ? 1 : -1; } else { return lx_sign - ly_sign; } } So the case when y is a NaN is slower with the new code but I think all of the more common cases are similar or faster. Of course it depends on the speed of all the operations. If you cared about a CPU with no floating point, the new code is a lot faster. If you've got great floating point and a lot of cases of y being a NaN the old code will be better. My guess is that in the general case the new code wins out. There may be a bug I can't see in this code :-) We know that any calculation involving NaN returns false, right? So, simple cases can be done first without the (possibly expensive) move out of the FPU: if (x y) return -1; if (x y) return 1; then you can do the doubleToLongBits: long lx = doubleAsLongBits(x); long ly = doubleAsLongBits(y); if (lx == ly) return 0; At this point we know they're unequal. Also, either one of them is NaN, or one of them is -0 and the other +0. Java's doubleToLongBits alwayse uses the canonical NaN, so we know that they can't both be NaN. if (lx ly) return -1; return 1; Andrew.
Re: Float/Double compare
Andrew Haley wrote: We know that any calculation involving NaN returns false, right? So, simple cases can be done first without the (possibly expensive) move out of the FPU: if (x y) return -1; if (x y) return 1; then you can do the doubleToLongBits: long lx = doubleAsLongBits(x); long ly = doubleAsLongBits(y); if (lx == ly) return 0; At this point we know they're unequal. Also, either one of them is NaN, or one of them is -0 and the other +0. Java's doubleToLongBits alwayse uses the canonical NaN, so we know that they can't both be NaN. if (lx ly) return -1; return 1; Hi Andrew, this is great. I'd forgotten about the NaN test buried in doubleToLongBits, and I like the logic to push the and tests to the top. Using these insights I've tried to figure if there's a better way than this and I just wonder whether as the long bits conversions do a NaN test is the following cheaper: // handle the easy cases: if (x y) return -1; if (x y) return 1; // handle equality respecting that 0.0 != -0.0 (hence not using x == y): int ix = floatToRawIntBits(x); int iy = floatToRawIntBits(y); if (ix == iy) return 0; // handle NaNs: if (x != x) return (y != y) ? 0 : 1; else if (y != y) return -1; // handle +/- 0.0 return (ix iy) ? -1 : 1; Although its almost certain a VM can do better with a VM specific implementation. For example, on Intel you can reuse the fact that a floating point compare gives you unordered information (in fact in many cases you need to test for this anyway). I imagine GCC is better at doing this kind of machine specific optimization than most JITs are. Regards, Ian
Re: Float/Double compare
Ian Rogers writes: Andrew Haley wrote: We know that any calculation involving NaN returns false, right? So, simple cases can be done first without the (possibly expensive) move out of the FPU: if (x y) return -1; if (x y) return 1; then you can do the doubleToLongBits: long lx = doubleAsLongBits(x); long ly = doubleAsLongBits(y); if (lx == ly) return 0; At this point we know they're unequal. Also, either one of them is NaN, or one of them is -0 and the other +0. Java's doubleToLongBits alwayse uses the canonical NaN, so we know that they can't both be NaN. if (lx ly) return -1; return 1; Hi Andrew, this is great. I'd forgotten about the NaN test buried in doubleToLongBits, and I like the logic to push the and tests to the top. Using these insights I've tried to figure if there's a better way than this and I just wonder whether as the long bits conversions do a NaN test is the following cheaper: // handle the easy cases: if (x y) return -1; if (x y) return 1; // handle equality respecting that 0.0 != -0.0 (hence not using x == y): int ix = floatToRawIntBits(x); int iy = floatToRawIntBits(y); if (ix == iy) return 0; // handle NaNs: if (x != x) return (y != y) ? 0 : 1; else if (y != y) return -1; // handle +/- 0.0 return (ix iy) ? -1 : 1; It probably is cheaper, yes. Andrew.
Re: Float/Double compare
Hi Ian, On Sat, 2007-06-30 at 15:25 -0400, Ian Rogers wrote: public static int compare(float x, float y) { int ix = floatAsIntBits(x); int iy = floatAsIntBits(y); if (ix == iy) return 0; if (isNaN(x)) return 1; if (isNaN(y)) return -1; int ix_sign = ix31; int iy_sign = iy31; if (ix_sign == iy_sign) { return x y ? 1 : -1; } else { return ix_sign - iy_sign; } } I am not really comfortable with writing this as is since unless the compiler is smart you will do a multiple isNaN() and floatToIntBits() calls even if not necessary. Could we try to do something smarter and factor some of this out like: // First check for NaNs if (x != x || y != y) { int ix = floatToIntBits(x); int iy = floatToIntBits(y); if (ix == iy) return 0; if (x != x) return 1; if (y != y) return -1; } else { // Normal cases, or positive/negative zeros if (x y) return 1; if (y x) return -1; // Either equal or positive/negative zeros if (x == 0 y == 0) { int ix = floatToIntBits(x); int iy = floatToIntBits(y); int ix_sign = ix 31; int iy_sign = iy 31; return ix_sign - iy_sign; } return 0; } But maybe I am not giving the compiler/jit enough credit for actually producing such an optimized version in the first place. And this isn't actually the same as your code since I moved the NaN checks first again. The above assumes the NaN test (x != x) is relatively cheap since NaN is a canonical value is java. That might be a wrong assumption. You might be able to do something more clever with the actual ordering of floats according to (since if any argument is NaN or both are zero it evaluates to false) or with the ordering of floatToIntBits, but my head was spinning trying to keep all the corner cases right (this calls for expanding the Mauve Float.compareTo() test). There may be a bug I can't see in this code :-) Same with mine (I admit to not even having tried to compile it...) Cheers, Mark
Re: Float/Double compare
Hi Mark, Mark Wielaard wrote: Hi Ian, On Sat, 2007-06-30 at 15:25 -0400, Ian Rogers wrote: public static int compare(float x, float y) { int ix = floatAsIntBits(x); int iy = floatAsIntBits(y); if (ix == iy) return 0; if (isNaN(x)) return 1; if (isNaN(y)) return -1; int ix_sign = ix31; int iy_sign = iy31; if (ix_sign == iy_sign) { return x y ? 1 : -1; } else { return ix_sign - iy_sign; } } I am not really comfortable with writing this as is since unless the compiler is smart you will do a multiple isNaN() and floatToIntBits() calls even if not necessary. Could we try to do something smarter and factor some of this out like: // First check for NaNs if (x != x || y != y) { int ix = floatToIntBits(x); int iy = floatToIntBits(y); if (ix == iy) return 0; if (x != x) return 1; if (y != y) return -1; } else { // Normal cases, or positive/negative zeros if (x y) return 1; if (y x) return -1; // Either equal or positive/negative zeros if (x == 0 y == 0) { int ix = floatToIntBits(x); int iy = floatToIntBits(y); int ix_sign = ix 31; int iy_sign = iy 31; return ix_sign - iy_sign; } return 0; } But maybe I am not giving the compiler/jit enough credit for actually producing such an optimized version in the first place. And this isn't actually the same as your code since I moved the NaN checks first again. The above assumes the NaN test (x != x) is relatively cheap since NaN is a canonical value is java. That might be a wrong assumption. You might be able to do something more clever with the actual ordering of floats according to (since if any argument is NaN or both are zero it evaluates to false) or with the ordering of floatToIntBits, but my head was spinning trying to keep all the corner cases right (this calls for expanding the Mauve Float.compareTo() test). Thanks for the feedback. I think your code may well be better. I think its good to have identified that there is at least an ~8% potential speed up on IA32 with my code, and potentially better with yours. There is of course a big dependence on what values are being compared and on the architecture. For the arrays of floats I'm seeing in the Jikes RVM a lot of the values are identical (either 0.0 or 1.0). With your code that has a path of 6 compares to get through. With mine its just one compare, but because a floating point compare of 0.0 and -0.0 will yield that they are identical I'm forced into doing it on their int bit representation to keep the 0.0 case sane. Here's my hand wavy costing of the now 3 ways we have a doing things: Case: x == y neither are NaN or 0.0 Newer Cost: 5 float compares New Cost: 2 float as ints, 1 int compare Old Cost: 5 float compares Case: x y and different sign Newer Cost: 4 float compares New Cost: 2 float as ints, 2 int compares, 2 float compares, 2 shifts, one subtract Old Cost: 6 float compares Case: x y and same sign Newer Cost: 4 float compares New Cost: 2 float as ints, 2 int compares, 3 float compares, 2 shifts Old Cost: 6 float compares Case: x is NaN Newer Cost: 2 float compares (I think you could reuse the original code which is slightly cheaper than yours) New Cost: 2 float as ints, 1 int compare, 1 float compare Old Cost: 2 float compares Case: y is NaN Newer Cost: 2 float compares (again assuming reuse of the original code) New Cost: 2 float as ints, 1 int compare, 2 float compares Old Cost: 2 float compares Case: x == 0.0 and y == 0.0 Newer Cost: 2 float as ints, 6 float compares, 2 shifts, one subtract New Cost: either 2 float as ints, 1 int compare or 2 float as ints, 2 int compares, 2 float compares, 2 shifts, one subtract Old Cost: 4 float compares, 2 float divides, 1 float subtract, 1 float to int So I think the cases where the different ways win out are: most values are / then the newer way has the shortest path, most values are == then the new way has the shortest path, for two zeros (common in Jikes RVM :-( ) the new way wins out, and no one really cares about NaNs :-) I agree about the desirable properties of your code, and I think they also address Pinski's concern, that floatToIntBits on the common path seems bad. I also agree that its good to have a short and path. Possibly the only redeeming feature of the new code was that it made == cheap (depending on the floatToIntBits cost). So my variant of your code would be: // First check for NaNs if (x != x) { if (y != y) return 0; else return 1; } else if (y != y) { return -1; } else { // Normal cases, or positive/negative zeros if (x y) return 1; if (y x) return -1; // Either equal or positive/negative zeros if (x == 0 y == 0) { int ix = floatToIntBits(x); int iy = floatToIntBits(y); int ix_sign = ix 31; int iy_sign = iy 31; return ix_sign -
Re: Float/Double compare
On Sat, 2007-06-30 at 16:57 -0400, Ian Rogers wrote: In terms of performance I just tried sorting an array of floats with the new code (floatAsIntBits should have been floatToIntBits, sorry). The test was to initialize an array of floats backwards and then sort it to be ascending. Running a large enough array and enough iterations to get a ~4 second run, using the Jikes RVM's optimizing compiler, the speed up was around 7.77% on a Pentium 4 Linux box. Hi Ian! Could you try that also on an Athlon box? As I can remember significant performance differences on these two implementations. I/you could also try the different Java implementations with CACAO on different architectures. Although we don't optimize that much as JikesRVM. - twisti
[cp-patches] Float/Double compare
Attached is a patch that improves the performance of an Float/Double compare by exploiting information carried in their bit integer/long versions sign bit. It removes one comparison from the normal path, as well as making other compares with ints/longs. When sorting an array of floats this can yield a little under a 8% speed up on a Pentium 4. There's a thread discussing whether this is a good or bad idea on the main mailing list, but so far I'm merrily talking to myself :-) Ian --- java/lang/Double.java 2006-12-10 20:25:44.0 + +++ java/lang/Double.java 2007-06-30 22:21:39.98610 +0100 @@ -587,16 +587,17 @@ */ public static int compare(double x, double y) { -if (isNaN(x)) - return isNaN(y) ? 0 : 1; -if (isNaN(y)) - return -1; -// recall that 0.0 == -0.0, so we convert to infinites and try again -if (x == 0 y == 0) - return (int) (1 / x - 1 / y); -if (x == y) - return 0; - -return x y ? 1 : -1; +long lx = doubleToLongBits(x); +long ly = doubleToLongBits(y); +if (lx == ly) return 0; +if (isNaN(x)) return 1; +if (isNaN(y)) return -1; +int lx_sign = (int)(lx63); +int ly_sign = (int)(ly63); +if (lx_sign == ly_sign) { + return x y ? 1 : -1; +} else { + return lx_sign - ly_sign; +} } } --- java/lang/Float.java2006-12-10 20:25:44.0 + +++ java/lang/Float.java2007-06-30 22:10:46.099648000 +0100 @@ -596,16 +596,17 @@ */ public static int compare(float x, float y) { -if (isNaN(x)) - return isNaN(y) ? 0 : 1; -if (isNaN(y)) - return -1; -// recall that 0.0 == -0.0, so we convert to infinities and try again -if (x == 0 y == 0) - return (int) (1 / x - 1 / y); -if (x == y) - return 0; - -return x y ? 1 : -1; +int ix = floatToIntBits(x); +int iy = floatToIntBits(y); +if (ix == iy) return 0; +if (isNaN(x)) return 1; +if (isNaN(y)) return -1; +int ix_sign = ix31; +int iy_sign = iy31; +if (ix_sign == iy_sign) { + return x y ? 1 : -1; +} else { + return ix_sign - iy_sign; +} } }
Re: [cp-patches] Float/Double compare
Andrew Pinski wrote: On 6/30/07, Ian Rogers [EMAIL PROTECTED] wrote: Attached is a patch that improves the performance of an Float/Double compare by exploiting information carried in their bit integer/long versions sign bit. It removes one comparison from the normal path, as well as making other compares with ints/longs. When sorting an array of floats this can yield a little under a 8% speed up on a Pentium 4. There's a thread discussing whether this is a good or bad idea on the main mailing list, but so far I'm merrily talking to myself :-) This is going to slow down this function for PowerPC (except for Power6+) where you have to take a huge hit to transfer between register sets via the stack. For an example on the Cell, this is going to be an extra 40-100 cycle hit. I guess there are a lot of factors to do with the underlying architecture. My feeling is that we should move the code to VMFloat/VMDouble and then the VM can choose the appropriate version. Having the regular version optimized for Intel would make sense to me. Its not straight forward for me to benchmark the code on non-IA32 architectures, in particular Cell. I'll try to do a Jikes RVM benchmark on a PowerPC I have access to in the week. My feeling is that the performance may not be as bad as you imagine, but it's hard to predict - and this may not be an architecture you care about anyway. Regards, Ian
Float/Double compare
Hi everyone, I was tracking a bug and came across the code in Float.compare which is very similar to Double.compare. It just so happened that the bug was being caused by sorting arrays containing lots of floating point 0s. This occurs quite frequently in the Jikes RVM when we have an array of branch profile data. So the current code looks like: public static int compare(float x, float y) { if (isNaN(x)) return isNaN(y) ? 0 : 1; if (isNaN(y)) return -1; // recall that 0.0 == -0.0, so we convert to infinities and try again if (x == 0 y == 0) return (int) (1 / x - 1 / y); if (x == y) return 0; return x y ? 1 : -1; } When comparing 0 with 0 we do two NaN tests, two tests with 0, two floating point divides, floating point subtract and a float to int. An alternate way of coding the case of (x == 0 y == 0) would be: if (x == 0 y == 0) return (floatAsIntBits(x)31) - (floatAsIntBits(y) 31) The shift basically turns the ints being subtracted into 0 or -1, 0 in the case the value is +ve and -1 when it's negative. This changes the cost of 2 float divides, float subtract and float to int, into a cost of 2 float as ints, 2 int shifts, 1 int subtract. On x86 the float to int is expensive (set and restore control bits yadda yadda), where as a float as int is just a float store and then int load. I think the latter pattern could be quite a lot cheaper. In any case, it seems that this code is quite performance important and possibly the VM could do some clever optimizations. I wonder if it wouldn't be better to move compare into VMFloat and VMDouble? Regards, Ian
Re: Float/Double compare
Ian Rogers wrote: public static int compare(float x, float y) { if (isNaN(x)) return isNaN(y) ? 0 : 1; if (isNaN(y)) return -1; // recall that 0.0 == -0.0, so we convert to infinities and try again if (x == 0 y == 0) return (int) (1 / x - 1 / y); if (x == y) return 0; return x y ? 1 : -1; } Below is a variant of the compare code and a comparison with the cost of the current compare code: public static int compare(float x, float y) { int ix = floatAsIntBits(x); int iy = floatAsIntBits(y); if (ix == iy) return 0; if (isNaN(x)) return 1; if (isNaN(y)) return -1; int ix_sign = ix31; int iy_sign = iy31; if (ix_sign == iy_sign) { return x y ? 1 : -1; } else { return ix_sign - iy_sign; } } Case: x == y neither are NaN or 0.0 New Cost: 2 float as ints, 1 int compare Old Cost: 5 float compares Case: x y and different sign New Cost: 2 float as ints, 2 int compares, 2 float compares, 2 shifts, one subtract Old Cost: 6 float compares Case: x y and same sign New Cost: 2 float as ints, 2 int compares, 3 float compares, 2 shifts Old Cost: 6 float compares Case: x is NaN New Cost: 2 float as ints, 1 int compare, 1 float compare Old Cost: 2 float compares Case: y is NaN New Cost: 2 float as ints, 1 int compare, 2 float compares Old Cost: 2 float compares Case: x == 0.0 and y == 0.0 New Cost: either 2 float as ints, 1 int compare or 2 float as ints, 2 int compares, 2 float compares, 2 shifts, one subtract Old Cost: 4 float compares, 2 float divides, 1 float subtract, 1 float to int The same code but for doubles would be: public static int compare(double x, double y) { long lx = doubleAsLongBits(x); long ly = doubleAsLongBits(y); if (lx == ly) return 0; if (isNaN(x)) return 1; if (isNaN(y)) return -1; int lx_sign = (int)(lx63); int ly_sign = (int)(ly63); if (lx_sign == ly_sign) { return x y ? 1 : -1; } else { return lx_sign - ly_sign; } } So the case when y is a NaN is slower with the new code but I think all of the more common cases are similar or faster. Of course it depends on the speed of all the operations. If you cared about a CPU with no floating point, the new code is a lot faster. If you've got great floating point and a lot of cases of y being a NaN the old code will be better. My guess is that in the general case the new code wins out. There may be a bug I can't see in this code :-) Regards, Ian
Re: Float/Double compare
In terms of performance I just tried sorting an array of floats with the new code (floatAsIntBits should have been floatToIntBits, sorry). The test was to initialize an array of floats backwards and then sort it to be ascending. Running a large enough array and enough iterations to get a ~4 second run, using the Jikes RVM's optimizing compiler, the speed up was around 7.77% on a Pentium 4 Linux box. Ian