This is getting interesting but maybe only for me, so I'll stop here.

I did some profiling and the majority of the time is spent in
math/big.addMulVVW. Thinking I might be doing something stupid, I compared
it with the Go implementation at
http://www.luschny.de/math/factorial/scala/FactorialScalaCsharp.htm, and
found that my version (although "straightforward"), is about the same
amount of code but without special casing and support tables, yet about 25%
faster - and yes, they get the same answer.

So I believe the factor of 60 comes from comparing an implementation in a
language and libraries designed for numerical computation against a much
less specialized and optimized world. Or perhaps from Julia using a
different and dramatically more efficient algorithm. It does seem like a
big gap, but I am no expert in this area.

Maybe worth investigating further but not by me.

-rob

On Sun, Jan 14, 2024 at 9:31 AM Rob Pike <r...@golang.org> wrote:

> Oh, I did say my implementation was straightforward. It's free of any
> clever multiplication algorithms or mathematical delights. It could easily
> be giving up 10x or more for that reason alone. And I haven't even profiled
> it yet.
>
> -rob
>
>
> On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah <ba...@iitbombay.org> wrote:
>
>> FYI Julia (on M1 MBP) seems much faster:
>>
>> julia> @which factorial(big(100000000))
>> factorial(x::BigInt) in Base.GMP at gmp.jl:645
>>
>> julia> @time begin; factorial(big(100000000)); 1; end
>>  27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)
>>
>>
>> Probably they use Schönhage-Strassen multiplication algorithm for very
>> large numbers as the 1E8! result will have over a 3/4 billion digits. I
>> should try this in Gambit-Scheme (which has an excellent multiply
>> implementation).
>>
>> On Jan 12, 2024, at 9:32 PM, Rob Pike <r...@golang.org> wrote:
>>
>> Thanks for the tip. A fairly straightforward implementation of this
>> algorithm gives me about a factor of two speedup for pretty much any value.
>> I went up to 1e8!, which took about half an hour compared to nearly an hour
>> for MulRange.
>>
>> I'll probably stick in ivy after a little more tuning. I may even try
>> parallelization.
>>
>> -rob
>>
>>
>> On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah <ba...@iitbombay.org> wrote:
>>
>>> For that you may wish to explore Peter Luschny's "prime swing" factorial
>>> algorithm and variations!
>>> https://oeis.org/A000142/a000142.pdf
>>>
>>> And implementations in various languages including go:
>>> https://github.com/PeterLuschny/Fast-Factorial-Functions
>>>
>>> On Jan 8, 2024, at 9:22 PM, Rob Pike <r...@golang.org> wrote:
>>>
>>> Here's an example where it's the bottleneck: ivy factorial
>>>
>>>
>>> !1e7
>>> 1.20242340052e+65657059
>>>
>>> )cpu
>>> 1m10s (1m10s user, 167.330ms sys)
>>>
>>>
>>> -rob
>>>
>>>
>>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah <ba...@iitbombay.org> wrote:
>>>
>>>> Perhaps you were thinking of this?
>>>>
>>>> At iteration number k, the value xk contains O(klog(k)) digits, thus
>>>> the computation of xk+1 = kxk has cost O(klog(k)). Finally, the total
>>>> cost with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
>>>>
>>>> A better approach is the *binary splitting* : it just consists in
>>>> recursively cutting the product of m consecutive integers in half. It leads
>>>> to better results when products on large integers are performed with a fast
>>>> method.
>>>>
>>>> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
>>>>
>>>>
>>>> I think you can do recursive splitting without using function recursion
>>>> by allocating N/2 array (where b = a+N-1) and iterating over it; each time
>>>> the array "shrinks" by half. A "cleverer" algorithm would allocate an array
>>>> of *words* of a bignum, as you know that the upper limit on size is N*64
>>>> (for 64 bit numbers) so you can just reuse the same space for each outer
>>>> iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer iteration
>>>> onwards. Not sure if this is easy in Go.
>>>>
>>>> On Jan 8, 2024, at 11:47 AM, Robert Griesemer <g...@golang.org> wrote:
>>>>
>>>> Hello John;
>>>>
>>>> Thanks for your interest in this code.
>>>>
>>>> In a (long past) implementation of the factorial function, I noticed
>>>> that computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when
>>>> computed in a recursive fashion than when computed iteratively: the reason
>>>> (I believed) was that the iterative approach seemed to produce a lot more
>>>> "internal fragmentation", that is medium-size intermediate results where
>>>> the most significant word (or "limb" as is the term in other
>>>> implementations) is only marginally used, resulting in more work than
>>>> necessary if those words were fully used.
>>>>
>>>> I never fully investigated, it was enough at the time that the
>>>> recursive approach was much faster. In retrospect, I don't quite believe my
>>>> own theory. Also, that implementation didn't have Karatsuba multiplication,
>>>> it just used grade-school multiplication.
>>>>
>>>> Since a, b are uint64 values (words), this could probably be
>>>> implemented in terms of mulAddVWW directly, with a suitable initial
>>>> allocation for the result - ideally this should just need one allocation
>>>> (not sure how close we can get to the right size). That would cut down the
>>>> allocations massively.
>>>>
>>>> In a next step, one should benchmark the implementation again.
>>>>
>>>> But at the very least, the overflow bug should be fixed, thanks for
>>>> finding it! I will send out a CL to fix that today.
>>>>
>>>> Thanks,
>>>> - gri
>>>>
>>>>
>>>>
>>>> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com>
>>>> wrote:
>>>>
>>>>> Actually, both implementations have bugs!
>>>>>
>>>>> The recursive implementation ends with:
>>>>> ```
>>>>> m := (a + b) / 2
>>>>> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
>>>>> ```
>>>>>
>>>>> That's a bug whenever `(a+b)` overflows, making `m` small.
>>>>> FIX: `m := a + (b-a)/2`
>>>>>
>>>>> My iterative implementation went into an infinite loop here:
>>>>> `for m := a + 1; m <= b; m++ {`
>>>>> if b is `math.MaxUint64`
>>>>> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a
>>>>> small penalty for the vast majority of calls that don't have b=MaxUint64
>>>>>
>>>>> I would add these to `mulRangesN` of the unit test:
>>>>> ```
>>>>>  {math.MaxUint64 - 3, math.MaxUint64 - 1,
>>>>> "6277101735386680760773248120919220245411599323494568951784"},
>>>>> {math.MaxUint64 - 3, math.MaxUint64,
>>>>> "115792089237316195360799967654821100226821973275796746098729803619699194331160"}
>>>>> ```
>>>>>
>>>>> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm equally curious.
>>>>>>
>>>>>> FWIW, I realized the loop should perhaps be
>>>>>> ```
>>>>>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b,
>>>>>> even on 32-bit arch
>>>>>> for m := a + 1; m <= b; m++ {
>>>>>>   mb.setUint64(m)
>>>>>>   z = z.mul(z, mb)
>>>>>> }
>>>>>> ```
>>>>>> to avoid allocating repeatedly for `m`, which yields:
>>>>>> BenchmarkIterativeMulRangeN-10      354685      3032 ns/op    2129
>>>>>> B/op      48 allocs/op
>>>>>>
>>>>>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org> wrote:
>>>>>>
>>>>>>> It seems reasonable but first I'd like to understand why the
>>>>>>> recursive method is used. I can't deduce why, but the CL that adds it, 
>>>>>>> by
>>>>>>> gri, does Karatsuba multiplication, which implies something deep is 
>>>>>>> going
>>>>>>> on. I'll add him to the conversation.
>>>>>>>
>>>>>>> -rob
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I enjoy bignum implementations, so I was looking through nat.go and
>>>>>>>> saw that `mulRange` is implemented in a surprising, recursive way,.  
>>>>>>>> In the
>>>>>>>> non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) *
>>>>>>>> mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided).
>>>>>>>>
>>>>>>>> That's fine, but I didn't see any advantage over the
>>>>>>>> straightforward (and simpler?) for loop.
>>>>>>>>
>>>>>>>> ```
>>>>>>>> z = z.setUint64(a)
>>>>>>>> for m := a + 1; m <= b; m++ {
>>>>>>>> z = z.mul(z, nat(nil).setUint64(m))
>>>>>>>> }
>>>>>>>> return z
>>>>>>>> ```
>>>>>>>>
>>>>>>>> In fact, I suspected the existing code was slower, and allocated a
>>>>>>>> lot more.  That seems true. A quick benchmark, using the existing unit 
>>>>>>>> test
>>>>>>>> as the benchmark, yields
>>>>>>>> BenchmarkRecusiveMulRangeN-10       169417       6856 ns/op
>>>>>>>> 9452 B/op      338 allocs/op
>>>>>>>> BenchmarkIterativeMulRangeN-10       265354       4269 ns/op
>>>>>>>> 2505 B/op      196 allocs/op
>>>>>>>>
>>>>>>>> I doubt `mulRange` is a performance bottleneck in anyone's code!
>>>>>>>> But it is exported as `int.MulRange` so I guess it's viewed with some
>>>>>>>> value.  And seeing as how the for-loop seems even easier to understand 
>>>>>>>> that
>>>>>>>> the recursive version, maybe it's worth submitting a PR? (If so, 
>>>>>>>> should I
>>>>>>>> create an issue first?)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "golang-nuts" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to golang-nuts+unsubscr...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "golang-nuts" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to golang-nuts+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOXNBZQJraHp5YJBEcEw8OAu6%3DNw%3DYUUCwWV0%3DJYjE9ObDJEnw%40mail.gmail.com.

Reply via email to