Re: [PHP-DEV] Floating point comparisons - A fix

George Whiffen Wed, 20 Mar 2002 03:48:45 -0800

Rasmus,

It seems performance is your main concern, mine too.

Just because nobody else does it is of course not reason enough not to do
it, but doing 2 sprintf's for every floating point comparison makes me
cringe.

Two sprintfs made me cringe too when I tried it. That's why the code I suggest takes this into account:-

> The effect on double comparisons is not negligible.  sprintf's are
  > relatively expensive in terms of performance and adding the overhead of
  > two sprintfs to every single double compare would have been nasty.
  >
  > To minimise this the new compare_function code first of all checks that
  > the operands are not already equal and are "close enough" that a string
  > compare is necessary before forcing the sprintfs. The test is for (a-b)
  > != 0 && ((a-b)/a) < 1e-(precision-1) e.g. (0.2 - 0.1) != 0 && ((0.2 -
  > 0.1)/0.1) < 1e-13.

The overall performance hit is basically the same as single character comparisons i.e. fixed 0.8 == 0.7 + 0.1 runs as fast as 'a' == 'b'. That doesn't seem unreasonable to me, strings are after all the most important type in php. It runs faster than '0.8' == '0.8'. The sprintfs hit is typically only required when we are genuinely hitting recurring binary representation of decimals problems e.g. 0.1. It may be serious, but it will also be rare.

That'd be a pretty serious performance hit to take. Do you know of any
language where floating point comparisons work like that? Certainly in
both C and Perl you will never get 0.8 to equal 0.7+0.1 exactly.

As to other languages, you appear to be correct when it comes to php's e.g. perl/mysql. However, the reason this struck me as so ridiculous in the first place, is that for 15 years I used 4GL/RADs (Info, Structure/4, sqr), where I never dreamt of this problem. Of course, these were commercial products heavily used in financial applications in big corporates, so obviously this was resolved in their very early days.

I am obdurate on this issue and will remain so. It may seem entirely natural to a C programmer, but, php is not for C programmers and it's ridiculous to do so much superb work on making php type-less and then let this through.

I'd expect it from Perl, which is why I don't use Perl. I don't expect it from MySQL and PHP. In one sense MySQL is much worse in that the recurring binary problem is not even resolved for fields with their own explicit precisions e.g. decimal(5,2). On the other hand the MySQL comparison/floor functions are much less central to the tool's core useage. You can be confident that I will be taking this up with the MySQL developers, but I thought I'd better start with php!

George

-Rasmus
On Tue, 19 Mar 2002, George Whiffen wrote:
> Hi Folks,
>
> 0.8 == 0.7 + 0.1 or does it?
>
> I know I've brought this up before, but it's still driving me nuts, and
> I still don't know how to explain it to a novice, so rather than go on
> ranting I thought I'd try a fix.
>
> I'm no C programmer, and what I know about php source can be written on
> the back of a very small envelope. But at least you can laugh at my
> appalling code, and, hopefully, explain to me why it can't go in the
> next release ;).
>
> Summary
> =======
> The fix is based on a string convert and comparison. It seems to work,
> is not too serious from a performance point of view, and only involves
> two sources, four functions and a 100 lines or so of code. It does not
> create cumulative rounding errors, is easily controlled via an existing
> ini variable i.e. "precision" and improves the consistency of php's use
> of precision.
>
> The Bugs/Undesirable Features
> =====================
> 0.8 == 0.7 + 0.1 evaluates to false
> (int) (8.2 - 0.2) evaluates 7
> intval(8.2 - 0.2) evaluates to 7
> floor(10 * (0.7 + 0.1)) evaluates to 7
> ceil (10 * (-0.7 + -0.1) evaluates to -7
>
> Basis of the Fix
> ==========
> In general these would all evaluate correctly if they were evaluated to
> the precision specified in php.ini (typically 14 for IEEE 64bit).
>
> The fix replaces the current equality test on doubles, (a - b == 0),
> with a string compare to 'precision' decimal places e.g.
> sprintf("%.14G",a) == sprintf("%.14G",b). This is a modification to the
> doubles section of the Zend/zend_operators.c compare_function, which
> ultimately handles all comparisons, ==, !=, >=, >, <, <=.
>
> Floor, ceil, (int), intval(), are fixed with an equality check of their
> integer result to one above or below it, (as appropriate) via the
> modified compare_function.
>
> e.g. floor becomes
> if a == floor(a) + 1
> return floor(a) + 1;
> else
> return floor(a)
>
> Apart from some performance tweaks, that's about it i.e.
> Zend/zend_operators.c:compare_function
> Zend/zend_operators.c:convert_to_long_base
> ext/standard/math.c:ceil
> ext/standard/math.c:floor
>
> Testing
> =======
> The fixes seem to work. However, they have only been tested on a 4.1.2
> source under Linux 2.4.8-26mdk. They solve the problems listed above
> and do as well as a basic cgi build of 4.1.2 on run-tests.php (i.e.
> they pass everything except pow.phpt, pear_registry.phpt and 029.phpt).
> They also seem to behave identically to an unfixed version if
> 'precision' is set high enough e.g. set_ini('precision',18).
>
>
> Backward Compatibility
> ================
> I've tried and failed to come up with a realistic scenario where this
> fix compromises existing user code.
>
> The main reason is that string, printf, sprintf already force the
> precision set in php.ini. This means it quite difficult for someone to
> have exploited the fact that compare_function does not. To hit problems
> with the fix they would have to have first decided they care about the
> digits beyond 'precision', but do not care enough to use the bc
> library. They then have to gone to some trouble to get hold of those
> extra digits. They could not, for instance, easily get them into a page
> or database without a conversion to string automatically removing the
> extra digits along the way.
>
> In contrast to a "round to precision after each floating point
> operation" approach, (which would be a nightmare), these fixes should
> not create any new issues with cumulative rounding errors. All the
> changed functions already return booleans, ints, or integer-rounded
> floats.
>
> In any case, everything can easily be reverted to the old functionality
> at execution time simply by increasing the value of precision e.g.
> set_ini('precision',18).
>
>
> Performance
> ===========
> Only doubles are effected. Long comparisons, such as integer for-loops
> (for($i=0;$i<$sizeof($aray);++$i) etc.), are unchanged and run just as
> fast.
>
> The effect on double comparisons is not negligible. sprintf's are
> relatively expensive in terms of performance and adding the overhead of
> two sprintfs to every single double compare would have been nasty.
>
> To minimise this the new compare_function code first of all checks that
> the operands are not already equal and are "close enough" that a string
> compare is necessary before forcing the sprintfs. The test is for (a-b)
> != 0 && ((a-b)/a) < 1e-(precision-1) e.g. (0.2 - 0.1) != 0 && ((0.2 -
> 0.1)/0.1) < 1e-13.
>
> Even when the operands are close and this test is the only overhead,
> this still means an extra floating point division which accounts for
> nearly all the performance degradation. compare_function is so fast
> already, that this extra division seems to make comparisons of non-equal
> doubles about twice as slow.
>
> Fortunately this is pretty small in absolute terms and when compared to
> other simple comparisons. A double comparison using the fix e.g. 0.1 ==
> 0.2, is still no slower than a single character non-numeric string
> comparison e.g. 'a' == 'b'. They remain much faster than a numeric
> string compare e.g. '0.1' == '0.2'.
>
> When operands are close e.g. 1.2345678901234e123 == 1.2345678901233e123
> then the performance hit is much bigger as the two sprintfs and a string
> compare are required. Even then it is still faster than the old
> workaround of casting to string before the compare i.e. (string) 0.1 ==
> (string) 0.2.
>
> Further Optimisations
> =====================
>
> The code can definitely be optimised further.
>
> The main slowdown on the ordinary, non-close, comparisons comes from the
> floating point divide which is needed to make sure that it is the
> "relative" difference not the absolute difference which is compared to
> the precision. If the precision were converted to binary and adjusted
> by the value of the double's exponent it should be possible to avoid the
> floating point division. But this is significantly more complicated, or
> rather I haven't worked out how to do it!
>
> The sprintf/string compare could also be improved.   For example,
> sprintf("%a") seems to be about twice as fast as sprintf("%G").
> Unfortunately %a can return un-normalized formats which would need
> tweaking to stop them failing the string comparison.
>
> There are also significant performance improvements possible in the
> floor, ceil, intval functions by not using compare_function but instead
> doing an in-line comparison. Since the results of these functions are
> themselves integers their differences can be compared directly to
> 1e(-precision) without the floating point divide, (provided of course
> they are less than 1e13 themselves).
>
>
> Many thanks for taking the time to consider this. Any feedback will be
> much appreciated.
>
>
> George
>
> Changes to Zend/zend_operators.c:
>
> compare_function
> ============
>
> OLD CODE:
> --------------
> if ((op1->type == IS_DOUBLE || op1->type == IS_LONG)
>   && (op2->type == IS_DOUBLE || op2->type == IS_LONG)) {
>   result->value.dval = (op1->type == IS_LONG ? (double) op1->value.lval
> : op1->value.dval) - (op2->type == IS_LONG ? (double) op2->value.lval :
> op2->value.dval);
>    result->value.lval = ZEND_NORMALIZE_BOOL(result->value.dval);
>    result->type = IS_LONG;
>    return SUCCESS;
> }
>
> NEW CODE:
> --------------
> ...
>         double dval1, dval2;
>         char   *sval1, *sval2;
>         long   slen1, slen2;
> ...
> if ((op1->type == IS_DOUBLE || op1->type == IS_LONG)
>   && (op2->type == IS_DOUBLE || op2->type == IS_LONG)) {
>
>           if (op1->type == IS_DOUBLE) {
>             dval1 = op1->value.dval;
>           } else {
>             dval1 = (double) op1->value.lval;
>           }
>           if (op2->type == IS_DOUBLE) {
>             dval2 = op2->value.dval;
>           } else {
>             dval2 = (double) op2->value.lval;
>           }
>           result->value.dval = dval1 - dval2;
>
>           if (result->value.dval != 0 && fabs((dval1 - dval2)/(dval1 ==
> 0 ? 1 : dval1)) < pow(10.0,0 - (EG(precision) -1))) {
>        sval1 = (char *) emalloc(MAX_LENGTH_OF_DOUBLE + EG(precision) +
> 1);
>        slen1 = sprintf(sval1, "%.*G", (int) EG(precision),dval1); /*
> SAFE */
>        sval2 = (char *) emalloc(MAX_LENGTH_OF_DOUBLE + EG(precision) +
> 1);
>        slen2 = sprintf(sval2, "%.*G", (int) EG(precision),dval2); /*
> SAFE */
>        if (0 == zend_binary_strcmp(sval1,slen1,sval2,slen2))
>       {
>          result->value.dval = 0;
>        }
>        efree(sval1);
>        efree(sval2);
>      }
>      result->value.lval = ZEND_NORMALIZE_BOOL(result->value.dval);
>      result->type = IS_LONG;
>      return SUCCESS;
> }
>
>
> convert_to_long_base
> ===============
>
> OLD CODE:
> --------------
> ...
>   case IS_DOUBLE:
>    DVAL_TO_LVAL(op->value.dval, op->value.lval);
>    break;
> ...
>
> NEW CODE:
> --------------
> ...
> zval *op1, *op2, *result;
> ...
>   case IS_DOUBLE:
>     MAKE_STD_ZVAL(op1);
>     MAKE_STD_ZVAL(op2);
>     MAKE_STD_ZVAL(result);
>     op1->type = IS_DOUBLE;
>     op2->type = IS_LONG;
>                   op1->value.dval = op->value.dval;
>     DVAL_TO_LVAL(op->value.dval, op->value.lval);
>                   if (op1->value.dval >= 0)
>                    {
>                       op2->value.lval = op->value.lval + 1;
>                     } else {
>                       op2->value.lval = op->value.lval - 1;
>                     }
>     compare_function(result, op1, op2 TSRMLS_CC);
>     if (result->value.lval == 0) {
>       op->value.lval = op2->value.lval;
>     }
>     zval_dtor(result);
>     zval_dtor(op1);
>     zval_dtor(op2);
>    break;
>
> The changes to floor and ceil in ext/standard/math.c are very similar to
> the convert_to_long_base change.
>

Re: [PHP-DEV] Floating point comparisons - A fix

Reply via email to