[mpir-devel] Re: FFT timing problem

2012-10-21 Thread Bill Hart
I couldn't resist adding a speedup to the fft. It is a patch of only
about 3-5 lines and I have now tested it for a few hours until my
machine ran out of memory.

I think this is the last thing to go in before we start tuning/testing
in earnest.

I'll roll a new alpha4 now.

Bill.

On 21 October 2012 00:59, Bill Hart goodwillh...@googlemail.com wrote:
 Hi all,

 I looked into the timing jumps in the FFT in MPIR 2.6 and in flint2.

 About 2/3 of the difference in timings is due to the fact that the FFT
 uses coefficients of twice the size (and convolution length just over
 half the original length) once you go over certain boundaries. The
 pointwise multiplications are thus far more expensive due to the
 quadratic nature of their complexity. (A small fraction of the time is
 due to zeroing coefficients, some of which may not actually need to be
 zeroed -- a target for future optimisation.)

 There is an optimisation that can be made which will help. Namely, one
 can use fft coefficients slightly bigger than the originals (after
 adjusting the convolution length and truncation so that this is
 possible). At some point I may implement this. It shouldn't be hard to
 do, but we probably need to wait until we have really efficient
 pointwise multiplications.

 The effect does become proportionally less as the integers multiplied
 get larger, as the fft coefficients move out of the quadratic range
 for multiplication. But there are clearly many opportunities for
 improvement here.

 Bill.

 On 17 October 2012 23:38, Bill Hart goodwillh...@googlemail.com wrote:
 Hi all,

 In discussing another issue with Paul Zimmermann, he noted the
 following ugly jump in performance in the flint/mpir fft for
 multiplying two integers of the given number of limbs together:

 mpn_mul_fft_main
  limbs   time
 10467800.556109766
 10467810.556993896
 10467820.556378786
 10467830.556399360
 10467840.559116916
 10467850.783891259
 10467860.788226240
 10467870.792805531
 10467880.784128436
 10467890.786670290
 10467900.795639613

 Obviously this is supposed to increase smoothly. So it suggests a
 problem in the fft.

 This could possibly be a few things I can think of:

 * incorrect tuning values (not so much the values in fft-tuning.h as
 the way the functions pick appropriate n, w parameters, etc)

 * I am not doing the truncation correctly

 * there is a big jump overall because of a big jump in timing of
 individual pointwise multiplications

 I haven't found the problem yet, but will endeavour to find it soon.
 There is some limited hope that the fft times will improve if I can
 find the problem (and it isn't too hard to fix).

 Bill.

-- 
You received this message because you are subscribed to the Google Groups 
mpir-devel group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.



[mpir-devel] Re: FFT timing problem

2012-10-20 Thread Bill Hart
Hi all,

I looked into the timing jumps in the FFT in MPIR 2.6 and in flint2.

About 2/3 of the difference in timings is due to the fact that the FFT
uses coefficients of twice the size (and convolution length just over
half the original length) once you go over certain boundaries. The
pointwise multiplications are thus far more expensive due to the
quadratic nature of their complexity. (A small fraction of the time is
due to zeroing coefficients, some of which may not actually need to be
zeroed -- a target for future optimisation.)

There is an optimisation that can be made which will help. Namely, one
can use fft coefficients slightly bigger than the originals (after
adjusting the convolution length and truncation so that this is
possible). At some point I may implement this. It shouldn't be hard to
do, but we probably need to wait until we have really efficient
pointwise multiplications.

The effect does become proportionally less as the integers multiplied
get larger, as the fft coefficients move out of the quadratic range
for multiplication. But there are clearly many opportunities for
improvement here.

Bill.

On 17 October 2012 23:38, Bill Hart goodwillh...@googlemail.com wrote:
 Hi all,

 In discussing another issue with Paul Zimmermann, he noted the
 following ugly jump in performance in the flint/mpir fft for
 multiplying two integers of the given number of limbs together:

 mpn_mul_fft_main
  limbs   time
 10467800.556109766
 10467810.556993896
 10467820.556378786
 10467830.556399360
 10467840.559116916
 10467850.783891259
 10467860.788226240
 10467870.792805531
 10467880.784128436
 10467890.786670290
 10467900.795639613

 Obviously this is supposed to increase smoothly. So it suggests a
 problem in the fft.

 This could possibly be a few things I can think of:

 * incorrect tuning values (not so much the values in fft-tuning.h as
 the way the functions pick appropriate n, w parameters, etc)

 * I am not doing the truncation correctly

 * there is a big jump overall because of a big jump in timing of
 individual pointwise multiplications

 I haven't found the problem yet, but will endeavour to find it soon.
 There is some limited hope that the fft times will improve if I can
 find the problem (and it isn't too hard to fix).

 Bill.

-- 
You received this message because you are subscribed to the Google Groups 
mpir-devel group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.