The overflow fix is pending: CL 554617 I filed https://github.com/golang/go/issues/65027 for a possibly faster mulRange implementation. - gri
On Mon, Jan 8, 2024 at 11:47 AM Robert Griesemer <g...@golang.org> wrote: > Hello John; > > Thanks for your interest in this code. > > In a (long past) implementation of the factorial function, I noticed that > computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed > in a recursive fashion than when computed iteratively: the reason (I > believed) was that the iterative approach seemed to produce a lot more > "internal fragmentation", that is medium-size intermediate results where > the most significant word (or "limb" as is the term in other > implementations) is only marginally used, resulting in more work than > necessary if those words were fully used. > > I never fully investigated, it was enough at the time that the recursive > approach was much faster. In retrospect, I don't quite believe my own > theory. Also, that implementation didn't have Karatsuba multiplication, it > just used grade-school multiplication. > > Since a, b are uint64 values (words), this could probably be implemented > in terms of mulAddVWW directly, with a suitable initial allocation for the > result - ideally this should just need one allocation (not sure how close > we can get to the right size). That would cut down the allocations > massively. > > In a next step, one should benchmark the implementation again. > > But at the very least, the overflow bug should be fixed, thanks for > finding it! I will send out a CL to fix that today. > > Thanks, > - gri > > > > On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com> wrote: > >> Actually, both implementations have bugs! >> >> The recursive implementation ends with: >> ``` >> m := (a + b) / 2 >> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b)) >> ``` >> >> That's a bug whenever `(a+b)` overflows, making `m` small. >> FIX: `m := a + (b-a)/2` >> >> My iterative implementation went into an infinite loop here: >> `for m := a + 1; m <= b; m++ {` >> if b is `math.MaxUint64` >> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a >> small penalty for the vast majority of calls that don't have b=MaxUint64 >> >> I would add these to `mulRangesN` of the unit test: >> ``` >> {math.MaxUint64 - 3, math.MaxUint64 - 1, >> "6277101735386680760773248120919220245411599323494568951784"}, >> {math.MaxUint64 - 3, math.MaxUint64, >> "115792089237316195360799967654821100226821973275796746098729803619699194331160"} >> ``` >> >> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com> wrote: >> >>> I'm equally curious. >>> >>> FWIW, I realized the loop should perhaps be >>> ``` >>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even >>> on 32-bit arch >>> for m := a + 1; m <= b; m++ { >>> mb.setUint64(m) >>> z = z.mul(z, mb) >>> } >>> ``` >>> to avoid allocating repeatedly for `m`, which yields: >>> BenchmarkIterativeMulRangeN-10 354685 3032 ns/op 2129 B/op >>> 48 allocs/op >>> >>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org> wrote: >>> >>>> It seems reasonable but first I'd like to understand why the recursive >>>> method is used. I can't deduce why, but the CL that adds it, by gri, does >>>> Karatsuba multiplication, which implies something deep is going on. I'll >>>> add him to the conversation. >>>> >>>> -rob >>>> >>>> >>>> >>>> >>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com> >>>> wrote: >>>> >>>>> I enjoy bignum implementations, so I was looking through nat.go and >>>>> saw that `mulRange` is implemented in a surprising, recursive way,. In >>>>> the >>>>> non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) * >>>>> mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided). >>>>> >>>>> That's fine, but I didn't see any advantage over the straightforward >>>>> (and simpler?) for loop. >>>>> >>>>> ``` >>>>> z = z.setUint64(a) >>>>> for m := a + 1; m <= b; m++ { >>>>> z = z.mul(z, nat(nil).setUint64(m)) >>>>> } >>>>> return z >>>>> ``` >>>>> >>>>> In fact, I suspected the existing code was slower, and allocated a lot >>>>> more. That seems true. A quick benchmark, using the existing unit test as >>>>> the benchmark, yields >>>>> BenchmarkRecusiveMulRangeN-10 169417 6856 ns/op 9452 >>>>> B/op 338 allocs/op >>>>> BenchmarkIterativeMulRangeN-10 265354 4269 ns/op 2505 >>>>> B/op 196 allocs/op >>>>> >>>>> I doubt `mulRange` is a performance bottleneck in anyone's code! But >>>>> it is exported as `int.MulRange` so I guess it's viewed with some value. >>>>> And seeing as how the for-loop seems even easier to understand that the >>>>> recursive version, maybe it's worth submitting a PR? (If so, should I >>>>> create an issue first?) >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "golang-nuts" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to golang-nuts+unsubscr...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7LowZWoxULxH6f5Y4AYF5VKD%2BWUvSkC_Rx_1sL9oty2Q%40mail.gmail.com.