Hello John; Thanks for your interest in this code.
In a (long past) implementation of the factorial function, I noticed that computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed in a recursive fashion than when computed iteratively: the reason (I believed) was that the iterative approach seemed to produce a lot more "internal fragmentation", that is medium-size intermediate results where the most significant word (or "limb" as is the term in other implementations) is only marginally used, resulting in more work than necessary if those words were fully used. I never fully investigated, it was enough at the time that the recursive approach was much faster. In retrospect, I don't quite believe my own theory. Also, that implementation didn't have Karatsuba multiplication, it just used grade-school multiplication. Since a, b are uint64 values (words), this could probably be implemented in terms of mulAddVWW directly, with a suitable initial allocation for the result - ideally this should just need one allocation (not sure how close we can get to the right size). That would cut down the allocations massively. In a next step, one should benchmark the implementation again. But at the very least, the overflow bug should be fixed, thanks for finding it! I will send out a CL to fix that today. Thanks, - gri On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com> wrote: > Actually, both implementations have bugs! > > The recursive implementation ends with: > ``` > m := (a + b) / 2 > return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b)) > ``` > > That's a bug whenever `(a+b)` overflows, making `m` small. > FIX: `m := a + (b-a)/2` > > My iterative implementation went into an infinite loop here: > `for m := a + 1; m <= b; m++ {` > if b is `math.MaxUint64` > FIX: add `&& m > a` to the exit condition is an easy fix, but pays a small > penalty for the vast majority of calls that don't have b=MaxUint64 > > I would add these to `mulRangesN` of the unit test: > ``` > {math.MaxUint64 - 3, math.MaxUint64 - 1, > "6277101735386680760773248120919220245411599323494568951784"}, > {math.MaxUint64 - 3, math.MaxUint64, > "115792089237316195360799967654821100226821973275796746098729803619699194331160"} > ``` > > On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com> wrote: > >> I'm equally curious. >> >> FWIW, I realized the loop should perhaps be >> ``` >> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even on >> 32-bit arch >> for m := a + 1; m <= b; m++ { >> mb.setUint64(m) >> z = z.mul(z, mb) >> } >> ``` >> to avoid allocating repeatedly for `m`, which yields: >> BenchmarkIterativeMulRangeN-10 354685 3032 ns/op 2129 B/op >> 48 allocs/op >> >> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org> wrote: >> >>> It seems reasonable but first I'd like to understand why the recursive >>> method is used. I can't deduce why, but the CL that adds it, by gri, does >>> Karatsuba multiplication, which implies something deep is going on. I'll >>> add him to the conversation. >>> >>> -rob >>> >>> >>> >>> >>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com> wrote: >>> >>>> I enjoy bignum implementations, so I was looking through nat.go and saw >>>> that `mulRange` is implemented in a surprising, recursive way,. In the >>>> non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) * >>>> mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided). >>>> >>>> That's fine, but I didn't see any advantage over the straightforward >>>> (and simpler?) for loop. >>>> >>>> ``` >>>> z = z.setUint64(a) >>>> for m := a + 1; m <= b; m++ { >>>> z = z.mul(z, nat(nil).setUint64(m)) >>>> } >>>> return z >>>> ``` >>>> >>>> In fact, I suspected the existing code was slower, and allocated a lot >>>> more. That seems true. A quick benchmark, using the existing unit test as >>>> the benchmark, yields >>>> BenchmarkRecusiveMulRangeN-10 169417 6856 ns/op 9452 >>>> B/op 338 allocs/op >>>> BenchmarkIterativeMulRangeN-10 265354 4269 ns/op 2505 >>>> B/op 196 allocs/op >>>> >>>> I doubt `mulRange` is a performance bottleneck in anyone's code! But it >>>> is exported as `int.MulRange` so I guess it's viewed with some value. And >>>> seeing as how the for-loop seems even easier to understand that the >>>> recursive version, maybe it's worth submitting a PR? (If so, should I >>>> create an issue first?) >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "golang-nuts" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to golang-nuts+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com >>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com.