Hello John;

Thanks for your interest in this code.

In a (long past) implementation of the factorial function, I noticed that
computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed
in a recursive fashion than when computed iteratively: the reason (I
believed) was that the iterative approach seemed to produce a lot more
"internal fragmentation", that is medium-size intermediate results where
the most significant word (or "limb" as is the term in other
implementations) is only marginally used, resulting in more work than
necessary if those words were fully used.

I never fully investigated, it was enough at the time that the recursive
approach was much faster. In retrospect, I don't quite believe my own
theory. Also, that implementation didn't have Karatsuba multiplication, it
just used grade-school multiplication.

Since a, b are uint64 values (words), this could probably be implemented in
terms of mulAddVWW directly, with a suitable initial allocation for the
result - ideally this should just need one allocation (not sure how close
we can get to the right size). That would cut down the allocations
massively.

In a next step, one should benchmark the implementation again.

But at the very least, the overflow bug should be fixed, thanks for finding
it! I will send out a CL to fix that today.

Thanks,
- gri



On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com> wrote:

> Actually, both implementations have bugs!
>
> The recursive implementation ends with:
> ```
> m := (a + b) / 2
> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
> ```
>
> That's a bug whenever `(a+b)` overflows, making `m` small.
> FIX: `m := a + (b-a)/2`
>
> My iterative implementation went into an infinite loop here:
> `for m := a + 1; m <= b; m++ {`
> if b is `math.MaxUint64`
> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a small
> penalty for the vast majority of calls that don't have b=MaxUint64
>
> I would add these to `mulRangesN` of the unit test:
> ```
>  {math.MaxUint64 - 3, math.MaxUint64 - 1,
> "6277101735386680760773248120919220245411599323494568951784"},
> {math.MaxUint64 - 3, math.MaxUint64,
> "115792089237316195360799967654821100226821973275796746098729803619699194331160"}
> ```
>
> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com> wrote:
>
>> I'm equally curious.
>>
>> FWIW, I realized the loop should perhaps be
>> ```
>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even on
>> 32-bit arch
>> for m := a + 1; m <= b; m++ {
>>   mb.setUint64(m)
>>   z = z.mul(z, mb)
>> }
>> ```
>> to avoid allocating repeatedly for `m`, which yields:
>> BenchmarkIterativeMulRangeN-10      354685      3032 ns/op    2129 B/op
>>    48 allocs/op
>>
>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org> wrote:
>>
>>> It seems reasonable but first I'd like to understand why the recursive
>>> method is used. I can't deduce why, but the CL that adds it, by gri, does
>>> Karatsuba multiplication, which implies something deep is going on. I'll
>>> add him to the conversation.
>>>
>>> -rob
>>>
>>>
>>>
>>>
>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com> wrote:
>>>
>>>> I enjoy bignum implementations, so I was looking through nat.go and saw
>>>> that `mulRange` is implemented in a surprising, recursive way,.  In the
>>>> non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) *
>>>> mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided).
>>>>
>>>> That's fine, but I didn't see any advantage over the straightforward
>>>> (and simpler?) for loop.
>>>>
>>>> ```
>>>> z = z.setUint64(a)
>>>> for m := a + 1; m <= b; m++ {
>>>> z = z.mul(z, nat(nil).setUint64(m))
>>>> }
>>>> return z
>>>> ```
>>>>
>>>> In fact, I suspected the existing code was slower, and allocated a lot
>>>> more.  That seems true. A quick benchmark, using the existing unit test as
>>>> the benchmark, yields
>>>> BenchmarkRecusiveMulRangeN-10       169417       6856 ns/op     9452
>>>> B/op      338 allocs/op
>>>> BenchmarkIterativeMulRangeN-10       265354       4269 ns/op     2505
>>>> B/op      196 allocs/op
>>>>
>>>> I doubt `mulRange` is a performance bottleneck in anyone's code! But it
>>>> is exported as `int.MulRange` so I guess it's viewed with some value.  And
>>>> seeing as how the for-loop seems even easier to understand that the
>>>> recursive version, maybe it's worth submitting a PR? (If so, should I
>>>> create an issue first?)
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "golang-nuts" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to golang-nuts+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com.

Reply via email to