Fair enough! I replied as I found a factor of 60 a bit surprising. Though I shouldn't have been -- my n digits of pi scheme program was consistently about 16-17 times slower than gmp based one (at least up to 1E9), using the same algorithm (Chudnovsky's), which is also highly dependent on multiplication speed. And both (gambit-scheme and gmp) use faster algorithms for very large numbers. Adding these faster algorithms to math/big is probably not justifiable....
Julia seems to use libgmp. "info gmp" reveals it implements Luchny's algorithm. > On Jan 13, 2024, at 4:07 PM, Rob Pike <r...@golang.org> wrote: > > This is getting interesting but maybe only for me, so I'll stop here. > > I did some profiling and the majority of the time is spent in > math/big.addMulVVW. Thinking I might be doing something stupid, I compared it > with the Go implementation at > http://www.luschny.de/math/factorial/scala/FactorialScalaCsharp.htm, and > found that my version (although "straightforward"), is about the same amount > of code but without special casing and support tables, yet about 25% faster - > and yes, they get the same answer. > > So I believe the factor of 60 comes from comparing an implementation in a > language and libraries designed for numerical computation against a much less > specialized and optimized world. Or perhaps from Julia using a different and > dramatically more efficient algorithm. It does seem like a big gap, but I am > no expert in this area. > > Maybe worth investigating further but not by me. > > -rob > > On Sun, Jan 14, 2024 at 9:31 AM Rob Pike <r...@golang.org > <mailto:r...@golang.org>> wrote: >> Oh, I did say my implementation was straightforward. It's free of any clever >> multiplication algorithms or mathematical delights. It could easily be >> giving up 10x or more for that reason alone. And I haven't even profiled it >> yet. >> >> -rob >> >> >> On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah <ba...@iitbombay.org >> <mailto:ba...@iitbombay.org>> wrote: >>> FYI Julia (on M1 MBP) seems much faster: >>> >>> julia> @which factorial(big(100000000)) >>> factorial(x::BigInt) in Base.GMP at gmp.jl:645 >>> >>> julia> @time begin; factorial(big(100000000)); 1; end >>> 27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time) >>> >>> Probably they use Schönhage-Strassen multiplication algorithm for very >>> large numbers as the 1E8! result will have over a 3/4 billion digits. I >>> should try this in Gambit-Scheme (which has an excellent multiply >>> implementation). >>> >>>> On Jan 12, 2024, at 9:32 PM, Rob Pike <r...@golang.org >>>> <mailto:r...@golang.org>> wrote: >>>> >>>> Thanks for the tip. A fairly straightforward implementation of this >>>> algorithm gives me about a factor of two speedup for pretty much any >>>> value. I went up to 1e8!, which took about half an hour compared to nearly >>>> an hour for MulRange. >>>> >>>> I'll probably stick in ivy after a little more tuning. I may even try >>>> parallelization. >>>> >>>> -rob >>>> >>>> >>>> On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah <ba...@iitbombay.org >>>> <mailto:ba...@iitbombay.org>> wrote: >>>>> For that you may wish to explore Peter Luschny's "prime swing" factorial >>>>> algorithm and variations! >>>>> https://oeis.org/A000142/a000142.pdf >>>>> >>>>> And implementations in various languages including go: >>>>> https://github.com/PeterLuschny/Fast-Factorial-Functions >>>>> >>>>>> On Jan 8, 2024, at 9:22 PM, Rob Pike <r...@golang.org >>>>>> <mailto:r...@golang.org>> wrote: >>>>>> >>>>>> Here's an example where it's the bottleneck: ivy factorial >>>>>> >>>>>> >>>>>> !1e7 >>>>>> 1.20242340052e+65657059 >>>>>> >>>>>> )cpu >>>>>> 1m10s (1m10s user, 167.330ms sys) >>>>>> >>>>>> >>>>>> -rob >>>>>> >>>>>> >>>>>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah <ba...@iitbombay.org >>>>>> <mailto:ba...@iitbombay.org>> wrote: >>>>>>> Perhaps you were thinking of this? >>>>>>> >>>>>>> At iteration number k, the value xk contains O(klog(k)) digits, thus >>>>>>> the computation of xk+1 = kxk has cost O(klog(k)). Finally, the total >>>>>>> cost with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)). >>>>>>> A better approach is the binary splitting : it just consists in >>>>>>> recursively cutting the product of m consecutive integers in half. It >>>>>>> leads to better results when products on large integers are performed >>>>>>> with a fast method. >>>>>>> >>>>>>> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html >>>>>>> >>>>>>> I think you can do recursive splitting without using function recursion >>>>>>> by allocating N/2 array (where b = a+N-1) and iterating over it; each >>>>>>> time the array "shrinks" by half. A "cleverer" algorithm would allocate >>>>>>> an array of *words* of a bignum, as you know that the upper limit on >>>>>>> size is N*64 (for 64 bit numbers) so you can just reuse the same space >>>>>>> for each outer iteration (N/2 multiplie, N/4 ...) and apply Karatsuba >>>>>>> 2nd outer iteration onwards. Not sure if this is easy in Go. >>>>>>> >>>>>>>> On Jan 8, 2024, at 11:47 AM, Robert Griesemer <g...@golang.org >>>>>>>> <mailto:g...@golang.org>> wrote: >>>>>>>> >>>>>>>> Hello John; >>>>>>>> >>>>>>>> Thanks for your interest in this code. >>>>>>>> >>>>>>>> In a (long past) implementation of the factorial function, I noticed >>>>>>>> that computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when >>>>>>>> computed in a recursive fashion than when computed iteratively: the >>>>>>>> reason (I believed) was that the iterative approach seemed to produce >>>>>>>> a lot more "internal fragmentation", that is medium-size intermediate >>>>>>>> results where the most significant word (or "limb" as is the term in >>>>>>>> other implementations) is only marginally used, resulting in more work >>>>>>>> than necessary if those words were fully used. >>>>>>>> >>>>>>>> I never fully investigated, it was enough at the time that the >>>>>>>> recursive approach was much faster. In retrospect, I don't quite >>>>>>>> believe my own theory. Also, that implementation didn't have Karatsuba >>>>>>>> multiplication, it just used grade-school multiplication. >>>>>>>> >>>>>>>> Since a, b are uint64 values (words), this could probably be >>>>>>>> implemented in terms of mulAddVWW directly, with a suitable initial >>>>>>>> allocation for the result - ideally this should just need one >>>>>>>> allocation (not sure how close we can get to the right size). That >>>>>>>> would cut down the allocations massively. >>>>>>>> >>>>>>>> In a next step, one should benchmark the implementation again. >>>>>>>> >>>>>>>> But at the very least, the overflow bug should be fixed, thanks for >>>>>>>> finding it! I will send out a CL to fix that today. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> - gri >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com >>>>>>>> <mailto:janno...@gmail.com>> wrote: >>>>>>>>> Actually, both implementations have bugs! >>>>>>>>> >>>>>>>>> The recursive implementation ends with: >>>>>>>>> ``` >>>>>>>>> m := (a + b) / 2 >>>>>>>>> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b)) >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> That's a bug whenever `(a+b)` overflows, making `m` small. >>>>>>>>> FIX: `m := a + (b-a)/2` >>>>>>>>> >>>>>>>>> My iterative implementation went into an infinite loop here: >>>>>>>>> `for m := a + 1; m <= b; m++ {` >>>>>>>>> if b is `math.MaxUint64` >>>>>>>>> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a >>>>>>>>> small penalty for the vast majority of calls that don't have >>>>>>>>> b=MaxUint64 >>>>>>>>> >>>>>>>>> I would add these to `mulRangesN` of the unit test: >>>>>>>>> ``` >>>>>>>>> {math.MaxUint64 - 3, math.MaxUint64 - 1, >>>>>>>>> "6277101735386680760773248120919220245411599323494568951784"}, >>>>>>>>> {math.MaxUint64 - 3, math.MaxUint64, >>>>>>>>> "115792089237316195360799967654821100226821973275796746098729803619699194331160"} >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com >>>>>>>>> <mailto:janno...@gmail.com>> wrote: >>>>>>>>>> I'm equally curious. >>>>>>>>>> >>>>>>>>>> FWIW, I realized the loop should perhaps be >>>>>>>>>> ``` >>>>>>>>>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, >>>>>>>>>> even on 32-bit arch >>>>>>>>>> for m := a + 1; m <= b; m++ { >>>>>>>>>> mb.setUint64(m) >>>>>>>>>> z = z.mul(z, mb) >>>>>>>>>> } >>>>>>>>>> ``` >>>>>>>>>> to avoid allocating repeatedly for `m`, which yields: >>>>>>>>>> BenchmarkIterativeMulRangeN-10 354685 3032 ns/op 2129 >>>>>>>>>> B/op 48 allocs/op >>>>>>>>>> >>>>>>>>>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org >>>>>>>>>> <mailto:r...@golang.org>> wrote: >>>>>>>>>>> It seems reasonable but first I'd like to understand why the >>>>>>>>>>> recursive method is used. I can't deduce why, but the CL that adds >>>>>>>>>>> it, by gri, does Karatsuba multiplication, which implies something >>>>>>>>>>> deep is going on. I'll add him to the conversation. >>>>>>>>>>> >>>>>>>>>>> -rob >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com >>>>>>>>>>> <mailto:janno...@gmail.com>> wrote: >>>>>>>>>>>> I enjoy bignum implementations, so I was looking through nat.go >>>>>>>>>>>> and saw that `mulRange` is implemented in a surprising, recursive >>>>>>>>>>>> way,. In the non-base case, `mulRange(a, b)` returns `mulrange(a, >>>>>>>>>>>> (a+b)/2) * mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony >>>>>>>>>>>> elided). >>>>>>>>>>>> >>>>>>>>>>>> That's fine, but I didn't see any advantage over the >>>>>>>>>>>> straightforward (and simpler?) for loop. >>>>>>>>>>>> >>>>>>>>>>>> ``` >>>>>>>>>>>> z = z.setUint64(a) >>>>>>>>>>>> for m := a + 1; m <= b; m++ { >>>>>>>>>>>> z = z.mul(z, nat(nil).setUint64(m)) >>>>>>>>>>>> } >>>>>>>>>>>> return z >>>>>>>>>>>> ``` >>>>>>>>>>>> >>>>>>>>>>>> In fact, I suspected the existing code was slower, and allocated a >>>>>>>>>>>> lot more. That seems true. A quick benchmark, using the existing >>>>>>>>>>>> unit test as the benchmark, yields >>>>>>>>>>>> BenchmarkRecusiveMulRangeN-10 169417 6856 >>>>>>>>>>>> ns/op 9452 B/op 338 allocs/op >>>>>>>>>>>> BenchmarkIterativeMulRangeN-10 265354 4269 >>>>>>>>>>>> ns/op 2505 B/op 196 allocs/op >>>>>>>>>>>> >>>>>>>>>>>> I doubt `mulRange` is a performance bottleneck in anyone's code! >>>>>>>>>>>> But it is exported as `int.MulRange` so I guess it's viewed with >>>>>>>>>>>> some value. And seeing as how the for-loop seems even easier to >>>>>>>>>>>> understand that the recursive version, maybe it's worth submitting >>>>>>>>>>>> a PR? (If so, should I create an issue first?) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>>>>> Groups "golang-nuts" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>>>> send an email to golang-nuts+unsubscr...@googlegroups.com >>>>>>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com >>>>>>>>>>>> >>>>>>>>>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer>. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "golang-nuts" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>>> an email to golang-nuts+unsubscr...@googlegroups.com >>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>>>>>> >>>>> >>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/4D0207B1-9A65-45E3-A63B-DD8FDE35591C%40iitbombay.org.