I wrote the single allocation version and added it to the issue, but Bakul
is absolutely right that the recursive solution has merit when the input is
large enough.  By building up large values on each "side" of the recursion,
Karatsuba gets used for the larger multiplies and the recursive version
begins to win. On `mulRange(1_000, 200_000)`, it's more than 20 times
faster than the single allocation iterative version.

I can put together a hybrid that uses iterative for shorter spans, and
recursive on larger, if that sounds interesting.

To get this off the mailing list, I will add this note to the issue, and
discussion can continue there. https://github.com/golang/go/issues/65027

On Mon, Jan 8, 2024 at 10:21 PM Bakul Shah <ba...@iitbombay.org> wrote:

> Perhaps you were thinking of this?
>
> At iteration number k, the value xk contains O(klog(k)) digits, thus the
> computation of xk+1 = kxk has cost O(klog(k)). Finally, the total cost
> with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
>
> A better approach is the *binary splitting* : it just consists in
> recursively cutting the product of m consecutive integers in half. It leads
> to better results when products on large integers are performed with a fast
> method.
>
> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
>
>
> I think you can do recursive splitting without using function recursion by
> allocating N/2 array (where b = a+N-1) and iterating over it; each time the
> array "shrinks" by half. A "cleverer" algorithm would allocate an array of
> *words* of a bignum, as you know that the upper limit on size is N*64 (for
> 64 bit numbers) so you can just reuse the same space for each outer
> iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer iteration
> onwards. Not sure if this is easy in Go.
>
> On Jan 8, 2024, at 11:47 AM, Robert Griesemer <g...@golang.org> wrote:
>
> Hello John;
>
> Thanks for your interest in this code.
>
> In a (long past) implementation of the factorial function, I noticed that
> computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed
> in a recursive fashion than when computed iteratively: the reason (I
> believed) was that the iterative approach seemed to produce a lot more
> "internal fragmentation", that is medium-size intermediate results where
> the most significant word (or "limb" as is the term in other
> implementations) is only marginally used, resulting in more work than
> necessary if those words were fully used.
>
> I never fully investigated, it was enough at the time that the recursive
> approach was much faster. In retrospect, I don't quite believe my own
> theory. Also, that implementation didn't have Karatsuba multiplication, it
> just used grade-school multiplication.
>
> Since a, b are uint64 values (words), this could probably be implemented
> in terms of mulAddVWW directly, with a suitable initial allocation for the
> result - ideally this should just need one allocation (not sure how close
> we can get to the right size). That would cut down the allocations
> massively.
>
> In a next step, one should benchmark the implementation again.
>
> But at the very least, the overflow bug should be fixed, thanks for
> finding it! I will send out a CL to fix that today.
>
> Thanks,
> - gri
>
>
>
> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com> wrote:
>
>> Actually, both implementations have bugs!
>>
>> The recursive implementation ends with:
>> ```
>> m := (a + b) / 2
>> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
>> ```
>>
>> That's a bug whenever `(a+b)` overflows, making `m` small.
>> FIX: `m := a + (b-a)/2`
>>
>> My iterative implementation went into an infinite loop here:
>> `for m := a + 1; m <= b; m++ {`
>> if b is `math.MaxUint64`
>> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a
>> small penalty for the vast majority of calls that don't have b=MaxUint64
>>
>> I would add these to `mulRangesN` of the unit test:
>> ```
>>  {math.MaxUint64 - 3, math.MaxUint64 - 1,
>> "6277101735386680760773248120919220245411599323494568951784"},
>> {math.MaxUint64 - 3, math.MaxUint64,
>> "115792089237316195360799967654821100226821973275796746098729803619699194331160"}
>> ```
>>
>> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com> wrote:
>>
>>> I'm equally curious.
>>>
>>> FWIW, I realized the loop should perhaps be
>>> ```
>>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even
>>> on 32-bit arch
>>> for m := a + 1; m <= b; m++ {
>>>   mb.setUint64(m)
>>>   z = z.mul(z, mb)
>>> }
>>> ```
>>> to avoid allocating repeatedly for `m`, which yields:
>>> BenchmarkIterativeMulRangeN-10      354685      3032 ns/op    2129 B/op
>>>      48 allocs/op
>>>
>>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org> wrote:
>>>
>>>> It seems reasonable but first I'd like to understand why the recursive
>>>> method is used. I can't deduce why, but the CL that adds it, by gri, does
>>>> Karatsuba multiplication, which implies something deep is going on. I'll
>>>> add him to the conversation.
>>>>
>>>> -rob
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com>
>>>> wrote:
>>>>
>>>>> I enjoy bignum implementations, so I was looking through nat.go and
>>>>> saw that `mulRange` is implemented in a surprising, recursive way,.  In 
>>>>> the
>>>>> non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) *
>>>>> mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided).
>>>>>
>>>>> That's fine, but I didn't see any advantage over the straightforward
>>>>> (and simpler?) for loop.
>>>>>
>>>>> ```
>>>>> z = z.setUint64(a)
>>>>> for m := a + 1; m <= b; m++ {
>>>>> z = z.mul(z, nat(nil).setUint64(m))
>>>>> }
>>>>> return z
>>>>> ```
>>>>>
>>>>> In fact, I suspected the existing code was slower, and allocated a lot
>>>>> more.  That seems true. A quick benchmark, using the existing unit test as
>>>>> the benchmark, yields
>>>>> BenchmarkRecusiveMulRangeN-10       169417       6856 ns/op     9452
>>>>> B/op      338 allocs/op
>>>>> BenchmarkIterativeMulRangeN-10       265354       4269 ns/op     2505
>>>>> B/op      196 allocs/op
>>>>>
>>>>> I doubt `mulRange` is a performance bottleneck in anyone's code! But
>>>>> it is exported as `int.MulRange` so I guess it's viewed with some value.
>>>>> And seeing as how the for-loop seems even easier to understand that the
>>>>> recursive version, maybe it's worth submitting a PR? (If so, should I
>>>>> create an issue first?)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "golang-nuts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to golang-nuts+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com
> <https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CALZHwkqA3weRhrCbUKiKL9deu8kx5YMLnb2iHVtHTPnb6BQ_oA%40mail.gmail.com.

Reply via email to