Tomas Volf <[email protected]> writes:

> the performance increase looks nice, but see few comment below please.
>
> "Dr. Arne Babenhauserheide" <[email protected]> writes:
>> While optimizing an implementation of levenshtein distance, I found that
>> (min ...) took up a significant part of the profile (16%), so I tried
>> replacing it with a simple implementation in Scheme. That worked out
>> surprisingly well (shaved off about 25% of the total runtime) and when I
>> benchmarked it directly it turned out to be between 1.5 and 3 times
>> faster than the min in numbers.c.
>
> There are few difference in behavior

Thank you for catching them!

Those look like we should fix them. But we should base the fixes on the
more efficient version by David, not on my initial one.

> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (min -0.0 0.0)
> $8 = -0.0
> scheme@(guile-user)> (mins -0.0 0.0)
> $9 = 0.0

This means that (< -0.0 0.0) ⇒ #f

What I also see: (negative? -0.0) ⇒ #f

Something is odd there. That -0.0 is neither larger nor smaller than
+0.0 may be correct, but shouldn’t it be negative?

> scheme@(guile-user)> (min (nan) 1)
> $15 = +nan.0
> scheme@(guile-user)> (mins (nan) 1)
> $16 = 1
> --8<---------------cut here---------------end--------------->8---
>
> I would also expect `min' to not care about the order of arguments.  The
> original implementation does not, but yours does
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (min (nan) 1)
> $21 = +nan.0
> scheme@(guile-user)> (min 1 (nan))
> $22 = +nan.0
>
> scheme@(guile-user)> (mins (nan) 1)
> $19 = 1
> scheme@(guile-user)> (mins 1 (nan))
> $20 = +nan.0
> --8<---------------cut here---------------end--------------->8---

NaN is a special case; it’s neither smaller nor larger, but any
computation with NaN must be NaN.

I’m not sure however how min should deal with NaN. It should obviously
be order-independent, but returning NaN for min seems strange when
there’s a non-min number.

I lack the background information about numbers to judge what’s correct
here. Can you judge that?

It seems like we’ll need a full set of tests for min and max before we
can replace them.

I found an article about min and max in C++:
https://pixorblog.wordpress.com/2016/06/27/some-remarks-about-minmax-functions/
>  The IEEE 754 says:
>     In section 6.2 of the revised IEEE 754-2008 standard there are two 
> anomalous functions (the maxnum and minnum functions that return the maximum 
> of two operands that are expected to be numbers) that favor numbers — if just 
> one of the operands is a NaN then the value of the other operand is returned.

So min and max should never return NaN if the arguments contain any
other value.

But from r7rs:

    An arithmetic operation where one operand is NaN
    returns NaN, unless the implementation can prove that the
    result would be the same if the NaN were replaced by any
    rational number.

Are min and max arithmetic operations? Our numbers.c implementation
treats it as such.

> One last observation is that your version does not preserve exactness of
> the arguments, which seems to be a requirement in both R6RS (11.7.4,
> Arithmetic operations) and R7RS (6.2.2)
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (min +inf.0 1)
> $29 = 1.0
> scheme@(guile-user)> (mins +inf.0 1)
> $30 = 1
> --8<---------------cut here---------------end--------------->8---

From r7rs:

    Note: If any argument is inexact, then the result will also be
    inexact (unless the procedure can prove that the inaccuracy is not
    large enough to affect the result, which is possible only in unusual
    implementations). If min or max is used to compare numbers of mixed
    exactness, and the numerical value of the result cannot be
    represented as an inexact number without loss of accuracy, then the
    procedure may report a violation of an implementation restriction.

Here’s a case where this causes my implementation to be incorrect:

(mins 1/7 (exact->inexact 1/7))
$27 = 0.14285714285714285
(mins 1/13 (exact->inexact 1/13))
$32 = 1/13

Also the r7rs says about < and >:

    Note: While it is not an error to compare inexact numbers using
    these predicates, the results are unreliable because a small
    inaccuracy can affect the result; this is especially true of = and
    zero?. When in doubt, consult a numerical analyst.


I have not found a fix to this, yet. Doing an exactness check inside the
loop may be quite expensive.

This is the optimized version by David in hoot:

  (import (ice-9 match))
  (define-syntax-rule (define-minmax-expansion f %f compare)
    (begin
      (define %f
        (case-lambda
          ((x) x)
          ((x y) (if (compare x y) x y))
          ((x . rest)
           (let lp ((m x)
                    (rest rest))
             (match rest
               (() m)
               ((x . rest)
                (if (compare x m)
                    (lp x rest)
                    (lp m rest))))))))
      (define-syntax f
        (lambda (stx)
          (syntax-case stx ()
            (f (identifier? #'f) #'%f)
            ((_ x) #'x)
            ((_ x y)
             #'(let ((x* x)
                     (y* y))
                 (if (compare x* y*) x* y*)))
            ((_ x . y)
             #'(f x (f . y))))))))

  (define-minmax-expansion min %min <)
  (define-minmax-expansion max %max >)


> As for the performance, while your version is a lot faster for integers
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define x (iota #_1_000_000))

What is #_? Is that a specialized prefix (it does not work in my Guile)?
Do I need to import something?

> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) (begin 
> (apply min x) (loop (1- n)))))
> ;; 2.208355s real time, 2.208266s run time.  0.000000s spent in GC.
>
> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) (begin 
> (apply mins x) (loop (1- n)))))
> ;; 1.044301s real time, 1.044260s run time.  0.000000s spent in GC.
> --8<---------------cut here---------------end--------------->8---
>
> it is at the cost of performance regressions elsewhere
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (define x (map (cut / <> 3) (iota #_1_000_000)))
>
> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) (begin 
> (apply min x) (loop (1- n)))))
> ;; 2.720850s real time, 2.720736s run time.  0.000000s spent in GC.
>
> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) (begin 
> (apply mins x) (loop (1- n)))))
> ;; 3.476514s real time, 3.476266s run time.  0.000000s spent in GC.
>
> scheme@(guile-user)> (define x (map (cut + <> 0.1) (iota #_1_000_000)))
>
> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) (begin 
> (apply min x) (loop (1- n)))))
> ;; 2.144861s real time, 2.144699s run time.  0.000000s spent in GC.
>
> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) (begin 
> (apply mins x) (loop (1- n)))))
> ;; 2.719851s real time, 2.719682s run time.  0.000000s spent in GC.
> --8<---------------cut here---------------end--------------->8---
>
> Please correct me if I have messed up somewhere in the above text.
>
> Tomas

Looking at the benchmark (with the nicer version of the apply-part by
David), I don’t see such a clear picture:

The data:

(define L1 (map (λ (x) (+ 0.1 x)) (iota 1)))
(define L2 (map (λ (x) (+ 0.1 x)) (iota 2)))
(define L3 (map (λ (x) (+ 0.1 x)) (iota 3)))
(define L4 (map (λ (x) (+ 0.1 x)) (iota 6)))
(define L5 (map (λ (x) (+ 0.1 x)) (iota 100)))
(define L6 (map (λ (x) (+ 0.1 x)) (iota 1000)))
(define L7 (map (λ (x) (+ 0.1 x)) (iota 1000000)))

The code:

(define min-match
  (case-lambda
    ((x) x)
    ((x y) (if (< x y) x y))
    ((x . rest)
     (let lp ((m x)
              (rest rest))
       (match rest
         (() m)
         ((x . rest)
          (if (< x m)
              (lp x rest)
              (lp m rest))))))))

The results:

("arithmetic.bm: fixnum: min-1" 1000000 real 0.013668326 real/iteration 
1.3668326e-8 run/iteration 1.3577348e-8 core/iteration 1.0069007699999998e-8 gc 
0.0)
("arithmetic.bm: fixnum: min-match-1" 1000000 real 0.011865027 real/iteration 
1.1865027e-8 run/iteration 1.1769995e-8 core/iteration 8.261654699999999e-9 gc 
0.0)

("arithmetic.bm: fixnum: min-2" 100000 real 0.002098059 real/iteration 
2.098059e-8 run/iteration 2.092736e-8 core/iteration 1.74190197e-8 gc 0.0)
("arithmetic.bm: fixnum: min-match-2" 100000 real 0.002045983 real/iteration 
2.045983e-8 run/iteration 2.044696e-8 core/iteration 1.69386197e-8 gc 0.0)

("arithmetic.bm: fixnum: min-3" 100000 real 0.003547318 real/iteration 
3.547318e-8 run/iteration 3.546066e-8 core/iteration 3.19523197e-8 gc 0.0)
("arithmetic.bm: fixnum: min-match-3" 100000 real 0.004556954 real/iteration 
4.556954e-8 run/iteration 4.407558e-8 core/iteration 4.0567239699999994e-8 gc 
0.0)

("arithmetic.bm: fixnum: min-4" 10000 real 5.55016e-4 real/iteration 5.55016e-8 
run/iteration 5.53546e-8 core/iteration 5.18462597e-8 gc 0.0)
("arithmetic.bm: fixnum: min-match-4" 10000 real 0.001111687 real/iteration 
1.111687e-7 run/iteration 1.110488e-7 core/iteration 1.0754045970000001e-7 gc 
0.0)

^ this shows a factor 2 regression for apply on lists with 6 elements.

("arithmetic.bm: fixnum: min-5" 100000 real 0.176713492 real/iteration 
1.76713492e-6 run/iteration 2.39227622e-6 core/iteration 2.3887678797e-6 gc 
0.120515234)
("arithmetic.bm: fixnum: min-match-5" 100000 real 0.204362918 real/iteration 
2.0436291799999998e-6 run/iteration 2.69216163e-6 core/iteration 
2.6886532897000004e-6 gc 0.124272761)


^ but between 100 and one million elements, the regression is down to 10%.

("arithmetic.bm: fixnum: min-6" 10000 real 0.170988928 real/iteration 
1.70988928e-5 run/iteration 2.42823128e-5 core/iteration 2.42788044597e-5 gc 
0.128306471)
("arithmetic.bm: fixnum: min-match-6" 10000 real 0.20625809 real/iteration 
2.0625809e-5 run/iteration 2.73242117e-5 core/iteration 2.73207033597e-5 gc 
0.124695717)

("arithmetic.bm: fixnum: min-7" 10 real 0.217954551 real/iteration 
0.021795455100000003 run/iteration 0.030831519 core/iteration 
0.0308315154916597 gc 0.204189321)
("arithmetic.bm: fixnum: min-match-7" 10 real 0.238831113 real/iteration 
0.0238831113 run/iteration 0.030176484899999998 core/iteration 
0.0301764813916597 gc 0.112002681)


What’s not tested here is min with a fixed number of arguments. Using as
mins the definition of David in hoot (and given above):

scheme@(guile-user) [2]> ,t (let loop ((n 100000000)) (if (zero? n) (when #f 
#f) (begin (mins 1 2 3 4 5 6 7) (loop (1- n)))))
;; 0.086388s real time, 0.085594s run time.  0.000000s spent in GC.
scheme@(guile-user) [2]> ,t (let loop ((n 100000000)) (if (zero? n) (when #f 
#f) (begin (min 1 2 3 4 5 6 7) (loop (1- n)))))
;; 10.321335s real time, 13.731729s run time.  6.150539s spent in GC.



Another advantage of a native Scheme implementation is that we avoid the
round-trip through C, enabling the JIT compiler to optimize more.


Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

Attachment: signature.asc
Description: PGP signature

Reply via email to