Hello, "Dr. Arne Babenhauserheide" <[email protected]> writes:
> Tomas Volf <[email protected]> writes: > >> the performance increase looks nice, but see few comment below please. >> >> "Dr. Arne Babenhauserheide" <[email protected]> writes: >>> While optimizing an implementation of levenshtein distance, I found that >>> (min ...) took up a significant part of the profile (16%), so I tried >>> replacing it with a simple implementation in Scheme. That worked out >>> surprisingly well (shaved off about 25% of the total runtime) and when I >>> benchmarked it directly it turned out to be between 1.5 and 3 times >>> faster than the min in numbers.c. >> >> There are few difference in behavior > > Thank you for catching them! > > Those look like we should fix them. But we should base the fixes on the > more efficient version by David, not on my initial one. I believe it has the same problems due to fairly similar design, so any fixes should be transferable. I have picked your version for testing and benchmarking, since the version from David did not finish running in a reasonable time, and I had to ^C it. > >> --8<---------------cut here---------------start------------->8--- >> scheme@(guile-user)> (min -0.0 0.0) >> $8 = -0.0 >> scheme@(guile-user)> (mins -0.0 0.0) >> $9 = 0.0 > > This means that (< -0.0 0.0) ⇒ #f > > What I also see: (negative? -0.0) ⇒ #f > > Something is odd there. That -0.0 is neither larger nor smaller than > +0.0 may be correct, but shouldn’t it be negative? I am not sure, but 0.0 is not positive, so -0.0 not being negative does have some logic. FWIW, Chez scheme does behave the same: --8<---------------cut here---------------start------------->8--- > (negative? -0.0) #f > (positive? 0.0) #f --8<---------------cut here---------------end--------------->8--- > >> scheme@(guile-user)> (min (nan) 1) >> $15 = +nan.0 >> scheme@(guile-user)> (mins (nan) 1) >> $16 = 1 >> --8<---------------cut here---------------end--------------->8--- >> >> I would also expect `min' to not care about the order of arguments. The >> original implementation does not, but yours does >> >> --8<---------------cut here---------------start------------->8--- >> scheme@(guile-user)> (min (nan) 1) >> $21 = +nan.0 >> scheme@(guile-user)> (min 1 (nan)) >> $22 = +nan.0 >> >> scheme@(guile-user)> (mins (nan) 1) >> $19 = 1 >> scheme@(guile-user)> (mins 1 (nan)) >> $20 = +nan.0 >> --8<---------------cut here---------------end--------------->8--- > > NaN is a special case; it’s neither smaller nor larger, but any > computation with NaN must be NaN. > > I’m not sure however how min should deal with NaN. It should obviously > be order-independent, but returning NaN for min seems strange when > there’s a non-min number. > > I lack the background information about numbers to judge what’s correct > here. Can you judge that? Returning NaN is consistent with Chez, but I am not qualified to say whether it is correct. I just did basically white-box testing of both yours and the built-in version and took note of the differences. ^_^ > > It seems like we’ll need a full set of tests for min and max before we > can replace them. > > I found an article about min and max in C++: > https://pixorblog.wordpress.com/2016/06/27/some-remarks-about-minmax-functions/ >> The IEEE 754 says: >> In section 6.2 of the revised IEEE 754-2008 standard there are two >> anomalous functions (the maxnum and minnum functions that return the maximum >> of two operands that are expected to be numbers) that favor numbers — if just >> one of the operands is a NaN then the value of the other operand is returned. > > So min and max should never return NaN if the arguments contain any > other value. I believe this simplifies it a bit too much. The 2019 version (well, the draft, I am not buying the final document, and consider the very need to *buy* a standard to be absurd) specifies both `minimum' and `minimumNumber' with different behavior regarding NaNs (section 9.6). I do not know whether r7rs prescribes which of those shall be used, but probably not. > > But from r7rs: > > An arithmetic operation where one operand is NaN > returns NaN, unless the implementation can prove that the > result would be the same if the NaN were replaced by any > rational number. > > Are min and max arithmetic operations? ¯\_(ツ)_/¯ > Our numbers.c implementation treats it as such. > >> One last observation is that your version does not preserve exactness of >> the arguments, which seems to be a requirement in both R6RS (11.7.4, >> Arithmetic operations) and R7RS (6.2.2) >> >> --8<---------------cut here---------------start------------->8--- >> scheme@(guile-user)> (min +inf.0 1) >> $29 = 1.0 >> scheme@(guile-user)> (mins +inf.0 1) >> $30 = 1 >> --8<---------------cut here---------------end--------------->8--- > > From r7rs: > > Note: If any argument is inexact, then the result will also be > inexact (unless the procedure can prove that the inaccuracy is not > large enough to affect the result, which is possible only in unusual > implementations). If min or max is used to compare numbers of mixed > exactness, and the numerical value of the result cannot be > represented as an inexact number without loss of accuracy, then the > procedure may report a violation of an implementation restriction. > > Here’s a case where this causes my implementation to be incorrect: > > (mins 1/7 (exact->inexact 1/7)) > $27 = 0.14285714285714285 > (mins 1/13 (exact->inexact 1/13)) > $32 = 1/13 > > Also the r7rs says about < and >: > > Note: While it is not an error to compare inexact numbers using > these predicates, the results are unreliable because a small > inaccuracy can affect the result; this is especially true of = and > zero?. When in doubt, consult a numerical analyst. > > > I have not found a fix to this, yet. Doing an exactness check inside the > loop may be quite expensive. Presumably this is the reason the built-in version is perceived as "slow". :) I am unsure how to solve this. > > This is the optimized version by David in hoot: > > (import (ice-9 match)) > (define-syntax-rule (define-minmax-expansion f %f compare) > (begin > (define %f > (case-lambda > ((x) x) > ((x y) (if (compare x y) x y)) > ((x . rest) > (let lp ((m x) > (rest rest)) > (match rest > (() m) > ((x . rest) > (if (compare x m) > (lp x rest) > (lp m rest)))))))) > (define-syntax f > (lambda (stx) > (syntax-case stx () > (f (identifier? #'f) #'%f) > ((_ x) #'x) > ((_ x y) > #'(let ((x* x) > (y* y)) > (if (compare x* y*) x* y*))) > ((_ x . y) > #'(f x (f . y)))))))) > > (define-minmax-expansion min %min <) > (define-minmax-expansion max %max >) > > >> As for the performance, while your version is a lot faster for integers >> >> --8<---------------cut here---------------start------------->8--- >> scheme@(guile-user)> (define x (iota #_1_000_000)) > > What is #_? Is that a specialized prefix (it does not work in my Guile)? > Do I need to import something? Ah, my bad, it is an extension[0] from my library, it helps to make large numbers more readable. It is sourced automatically on REPL start and I did not realize I have used it. You can just drop all of the _, including the #_ prefix, and it will work. Sorry about that. 0: https://git.wolfsden.cz/guile-wolfsden/tree/wolfsden/x/underscore.scm > >> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) >> (begin (apply min x) (loop (1- n))))) >> ;; 2.208355s real time, 2.208266s run time. 0.000000s spent in GC. >> >> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) >> (begin (apply mins x) (loop (1- n))))) >> ;; 1.044301s real time, 1.044260s run time. 0.000000s spent in GC. >> --8<---------------cut here---------------end--------------->8--- >> >> it is at the cost of performance regressions elsewhere >> >> --8<---------------cut here---------------start------------->8--- >> scheme@(guile-user)> (define x (map (cut / <> 3) (iota #_1_000_000))) >> >> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) >> (begin (apply min x) (loop (1- n))))) >> ;; 2.720850s real time, 2.720736s run time. 0.000000s spent in GC. >> >> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) >> (begin (apply mins x) (loop (1- n))))) >> ;; 3.476514s real time, 3.476266s run time. 0.000000s spent in GC. >> >> scheme@(guile-user)> (define x (map (cut + <> 0.1) (iota #_1_000_000))) >> >> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) >> (begin (apply min x) (loop (1- n))))) >> ;; 2.144861s real time, 2.144699s run time. 0.000000s spent in GC. >> >> scheme@(guile-user)> ,t (let loop ((n 100)) (if (zero? n) (when #f #f) >> (begin (apply mins x) (loop (1- n))))) >> ;; 2.719851s real time, 2.719682s run time. 0.000000s spent in GC. >> --8<---------------cut here---------------end--------------->8--- >> >> Please correct me if I have messed up somewhere in the above text. >> >> Tomas > > Looking at the benchmark (with the nicer version of the apply-part by > David), I don’t see such a clear picture: > > The data: > > (define L1 (map (λ (x) (+ 0.1 x)) (iota 1))) > (define L2 (map (λ (x) (+ 0.1 x)) (iota 2))) > (define L3 (map (λ (x) (+ 0.1 x)) (iota 3))) > (define L4 (map (λ (x) (+ 0.1 x)) (iota 6))) > (define L5 (map (λ (x) (+ 0.1 x)) (iota 100))) > (define L6 (map (λ (x) (+ 0.1 x)) (iota 1000))) > (define L7 (map (λ (x) (+ 0.1 x)) (iota 1000000))) > > The code: > > (define min-match > (case-lambda > ((x) x) > ((x y) (if (< x y) x y)) > ((x . rest) > (let lp ((m x) > (rest rest)) > (match rest > (() m) > ((x . rest) > (if (< x m) > (lp x rest) > (lp m rest)))))))) > > The results: > > ("arithmetic.bm: fixnum: min-1" 1000000 real 0.013668326 real/iteration > 1.3668326e-8 run/iteration 1.3577348e-8 core/iteration 1.0069007699999998e-8 > gc 0.0) > ("arithmetic.bm: fixnum: min-match-1" 1000000 real 0.011865027 real/iteration > 1.1865027e-8 run/iteration 1.1769995e-8 core/iteration 8.261654699999999e-9 > gc 0.0) > > ("arithmetic.bm: fixnum: min-2" 100000 real 0.002098059 real/iteration > 2.098059e-8 run/iteration 2.092736e-8 core/iteration 1.74190197e-8 gc 0.0) > ("arithmetic.bm: fixnum: min-match-2" 100000 real 0.002045983 real/iteration > 2.045983e-8 run/iteration 2.044696e-8 core/iteration 1.69386197e-8 gc 0.0) > > ("arithmetic.bm: fixnum: min-3" 100000 real 0.003547318 real/iteration > 3.547318e-8 run/iteration 3.546066e-8 core/iteration 3.19523197e-8 gc 0.0) > ("arithmetic.bm: fixnum: min-match-3" 100000 real 0.004556954 real/iteration > 4.556954e-8 run/iteration 4.407558e-8 core/iteration 4.0567239699999994e-8 gc > 0.0) > > ("arithmetic.bm: fixnum: min-4" 10000 real 5.55016e-4 real/iteration > 5.55016e-8 run/iteration 5.53546e-8 core/iteration 5.18462597e-8 gc 0.0) > ("arithmetic.bm: fixnum: min-match-4" 10000 real 0.001111687 real/iteration > 1.111687e-7 run/iteration 1.110488e-7 core/iteration 1.0754045970000001e-7 gc > 0.0) > > ^ this shows a factor 2 regression for apply on lists with 6 elements. > > ("arithmetic.bm: fixnum: min-5" 100000 real 0.176713492 real/iteration > 1.76713492e-6 run/iteration 2.39227622e-6 core/iteration 2.3887678797e-6 gc > 0.120515234) > ("arithmetic.bm: fixnum: min-match-5" 100000 real 0.204362918 real/iteration > 2.0436291799999998e-6 run/iteration 2.69216163e-6 core/iteration > 2.6886532897000004e-6 gc 0.124272761) > > > ^ but between 100 and one million elements, the regression is down to 10%. > > ("arithmetic.bm: fixnum: min-6" 10000 real 0.170988928 real/iteration > 1.70988928e-5 run/iteration 2.42823128e-5 core/iteration 2.42788044597e-5 gc > 0.128306471) > ("arithmetic.bm: fixnum: min-match-6" 10000 real 0.20625809 real/iteration > 2.0625809e-5 run/iteration 2.73242117e-5 core/iteration 2.73207033597e-5 gc > 0.124695717) > > ("arithmetic.bm: fixnum: min-7" 10 real 0.217954551 real/iteration > 0.021795455100000003 run/iteration 0.030831519 core/iteration > 0.0308315154916597 gc 0.204189321) > ("arithmetic.bm: fixnum: min-match-7" 10 real 0.238831113 real/iteration > 0.0238831113 run/iteration 0.030176484899999998 core/iteration > 0.0301764813916597 gc 0.112002681) > > > What’s not tested here is min with a fixed number of arguments. Using as > mins the definition of David in hoot (and given above): > > scheme@(guile-user) [2]> ,t (let loop ((n 100000000)) (if (zero? n) (when #f > #f) (begin (mins 1 2 3 4 5 6 7) (loop (1- n))))) > ;; 0.086388s real time, 0.085594s run time. 0.000000s spent in GC. > scheme@(guile-user) [2]> ,t (let loop ((n 100000000)) (if (zero? n) (when #f > #f) (begin (min 1 2 3 4 5 6 7) (loop (1- n))))) > ;; 10.321335s real time, 13.731729s run time. 6.150539s spent in GC. In all the above I see large variances in GC time, so I wonder whether that had some effect on the result. In my tests I have always executed ,gc first, so you can see the GC is always 0 (min/max should not really need to trigger GC anyway). I am unsure how large the effect of that is though, possibly not large. > Another advantage of a native Scheme implementation is that we avoid the > round-trip through C, enabling the JIT compiler to optimize more. True. Have a nice day, Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
