I also timed the basecase and divide and conquer division in GMP. The
basecase seems to be the same speed in both (our code is derived from
some version of theirs, so no big surprise there). Their divide and
conquer division is about 10% faster for large sizes. This is likely
to do with the very fast mul_basecase code vs our fairly slow
mulmid_basecase code in MPIR. Actually we do have assembly optimised
code for this, but it is not as highly optimised as the ordinary
mul_basecase code.

The upshot is that if my new code proves to be a major speedup, then
we should also beat GMP handily after the basecase range (i.e. after
about 30 limbs). Given that I already developed basecase code up to
20% faster than what we already have in MPIR then overall we should be
ahead, I hope.

Lots and lots of work before that happens though. Especially a lot of
assembly to write and optimise for numerous arches.

Bill.

On 29 March 2013 21:43, Bill Hart <goodwillh...@googlemail.com> wrote:
> On 29 March 2013 21:37, Brian Gladman <b...@gladman.plus.com> wrote:
>> On 29/03/2013 21:15, Bill Hart wrote:
>>> Someone has replaced longlong.h in MPIR with a really, really slow
>>> version! Why on earth did we do that!?
>>
>> IIRC Jason decided to split longlong.h into constant and processor
>> dependent sections so that it could be more conveniently generated and
>> maintained.
>
> Yes, I see each arch directory has its own longlong.h now.
>
>>
>> Sadly, although I remember having to write the Windows code to do this
>> auto-generation, I don't remember Jason's rationale for this change in
>> any detail.
>
> I think this is to clean up longlong.h, which was a sprawling mess.
> You do lose out for generic C builds, but after all, why should you
> get assembly optimisation for a generic C build. Unfortunately, I just
> did my timings with a generic build without realising that. In BSDNT I
> use some GCC extensions to avoid using inline assembly, so it is
> supposed to be roughly equivalent to the old longlong.h in MPIR. It's
> not in practice, since the C compiler actually still does a poor job
> of this sort of stuff. So I really fell right into this trap.
>
> I should have waited until I had fast assembly optimisation for
> everything. It will be faster, of that I am sure. Probably a factor of
> about 2 overall, I predict. But it is just a lot of hard work before I
> am there.
>
>>
>>> When I replace it with the former fast version, mul_basecase is about
>>> the same speed as BSDNT and unfortunately BSDNT divide and conquer
>>> division does not catch MPIR.
>>>
>>> However, there is still hope for BSDNT as switching longlong.h in MPIR
>>> makes MPIR's division basecase nearly a factor of 3 faster than
>>> BSDNT's.
>>
>> Do you have any idea which parts of longlong.h are involved here?
>>
>
> Only generic C builds.
>
> Bill.

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to