I have made a number of speedups to the precomputed inverse code I
announced earlier, and removed two unneeeded functions. It is now up to 20%
faster than ordinary 2n x n division for n = 1 limb and from n = 3-15 limbs
and up to 2.2 times as fast above n = 120 limbs.

There's nothing more I can do for it now. The gaps that remain are
essential gaps, due to the raw speed of the division code in the mpir trunk
(the code in mpir-2.6.0 is substantially slower).

Unfortunately, in those gaps, the ordinary division code is up to 20%
faster than using a precomputed inverse. We could eventually close the gaps
by rewriting mullow and mulhigh in mpir, but this is a lot of work,
including lots of assembly code and much careful thought about algorithms.

As it is, it is possible to switch on mpir's mullow code for one of the
multiplications and it slows it down, again due to the raw speed of mpir's
ordinary multiplication code.

Bill.


On 18 October 2013 23:36, Bill Hart <goodwillh...@googlemail.com> wrote:

> I rewrote the code by adding some mpz_fdiv_qr_preinvn and
> mpz_tdiv_qr_preinvn functions.
>
> It is now faster than fmpz_fdiv_qr all the way down to about 100 limb
> divisors.
>
> This is about the best one can do without a very large amount of work on
> the mpir mullow and mulhigh functions (which has to be done one day).
>
> The peak speed seems to be some thousands of limbs, where it is 80% faster
> than fmpz_fdiv_qr. There seems to be a very large range over which it is
> faster. In fact, I didn't find an upper limit.
>
> I'm happy with the code now, so go ahead and use it at will. I added a
> define in mpn_extras called PREINVN_CUTOFF which is currently set to 100.
> There's no point in using the preinvn functions for divisors below this
> cutoff. I won't code that into the preinvn functions though because
> otherwise the test code will never reach the range where the function is
> actually used.
>
> Bill.
>
>
> On 18 October 2013 19:32, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>> Actually it is slightly faster, but only for quite large divisions.
>>
>> The underlying mpn_extras code is faster from about 100-1000 limbs.
>>
>> The fmpz code has some overheads that make it more like 500-1000 limbs.
>>
>> Bill.
>>
>>
>> On 18 October 2013 18:17, Bill Hart <goodwillh...@googlemail.com> wrote:
>>
>>> So far the function with precomputed inverse seems to be a lot slower
>>> than the one without. So not very useful.
>>>
>>> I don't know why this is. Overheads maybe.
>>>
>>> Bill.
>>>
>>>
>>> On 18 October 2013 17:58, Bill Hart <goodwillh...@googlemail.com> wrote:
>>>
>>>> I've now added an fmpz_fdiv_qr_preinvn function in the fmpz module.
>>>>
>>>> It's very easy to use, just see the test code.
>>>>
>>>> The function itself is quite a case bash. This seems unavoidable unless
>>>> we implemented mpz functions which take a precomputed inverse and another
>>>> function for single limbs using a length 1 precomputed inverse of this 
>>>> type.
>>>>
>>>> Bill.
>>>>
>>>>
>>>> On 16 October 2013 17:14, Bill Hart <goodwillh...@googlemail.com>wrote:
>>>>
>>>>> I've now changed the interface for the mpn_extras functions as
>>>>> described and adjusted all the test code and docs. So these are now safe 
>>>>> to
>>>>> use.
>>>>>
>>>>> I've sped the functions up a bit too, especially for small lengths.
>>>>>
>>>>> I'll start writing fmpz_fdiv_qr_preinvn now.
>>>>>
>>>>> Bill.
>>>>>
>>>>>
>>>>> On 16 October 2013 00:00, Sebastian Pancratz <
>>>>> sebastian.pancr...@gmail.com> wrote:
>>>>>
>>>>>> Hi Bill,
>>>>>>
>>>>>> That's great news, I am really looking forward to using them in my
>>>>>> implementation of the deformation method once you have a stable interface
>>>>>> from the fmpz module!
>>>>>>
>>>>>> Sebastian
>>>>>>
>>>>>>
>>>>>> On Tuesday, October 15, 2013 10:12:41 PM UTC+1, Bill Hart wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've added some new functions to mpn_extras:
>>>>>>>
>>>>>>> * mod_n_preinvn
>>>>>>> * mod_preinvn
>>>>>>> * divrem_n_preinvn
>>>>>>> * divrem_preinvn
>>>>>>>
>>>>>>> These compute remainder and divrem respectively of integers mod n
>>>>>>> given a precomputed inverse supplied by the preinvn function.
>>>>>>>
>>>>>>> However, I might change the interface slightly.
>>>>>>>
>>>>>>> * the blah_n_preinvn functions will take variable sized dividend
>>>>>>> between n and 2n limbs
>>>>>>> * the blah_preinvn functions will require m >= n
>>>>>>>
>>>>>>> This should make things slightly more efficient and possibly
>>>>>>> slightly easier to use in the final analysis. I should for example be 
>>>>>>> able
>>>>>>> to drop the restrictions such as "r" or "a" requiring space 2*n even if 
>>>>>>> the
>>>>>>> remainder doesn't need it. The remainder will take up precisely n limbs.
>>>>>>>
>>>>>>> Anyway, when I've finalised the interface, these should be very
>>>>>>> useful for everything from p-adics to fmpz_mod_poly.
>>>>>>>
>>>>>>> There is also already the mulmod_preinvn function whose interface I
>>>>>>> am already pretty happy with. It works using the same precomputed 
>>>>>>> inverse.
>>>>>>>
>>>>>>> I also plan to add fmpz_preinvn_init/clear and fmpz_fdiv_qr_preinvn
>>>>>>> to make these easy to use in the fmpz module.
>>>>>>>
>>>>>>> Bill.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to