> b1) Using "properly sized" FFT: Yes, I believe using properly sized FFT's
> saves some time. How much depends on the programmer, the data, the
> language/compiler, and the hardware. Non-power-of-2 FFT's can even be
> useful, but it seems that coding up all (or most) of the possible sizes may
> be more trouble than it's worth.
Well, George went to the trouble of adding 3*2^n, 5*2^n and 7*2^n
run length code into Prime95 - even though it's less efficient than
plain powers-of-two, it's still well worth while, for the exponents that
can benefit from it.
I'd be _delighted_ if someone took it on themselves to add code for
intermediate run lengths into MacLucas/MacLucasUNIX - even just
3*2^n would be very worthwhile. If I had time, I'd do it myself!
>
> b2) Recurring pattern in "N=N^2-2 % (2^P-1)": I don't recall anyone here
> mentioning any interesting repeats, even for composite P. I'm sure many have
> looked, and were disappointed.
If you get to residual 0, then you do get a most interesting repeat -
the next iteration is -2 and all the subsequent ones are +2
This proves that 0 cannot occur as an "interim" residual (any
iteration prior to 2^p-3) if 2^p-1 is going to turn out to be prime.
Although testing for "early zero" would be very quick, it's
nevertheless probably not worth doing. We see no res64=2 values
in the completed results for composite exponents.
Since there are "only" 2^p-1 available values that the residual can
take, if you iterate long enough, the residual is _certain_ to go into
a repeating loop, eventually. However, the chance of this
happenning in the first 2^p-3 iterations would seem to be
vanishingly small. If it _does_ start to repeat "early" then 2^p-1
_must_ be composite - because it can't get to zero without sticking
at 2, see above - but is it really worth adding the extra code to
check?
>
> c2) The LL test does not adapt well to parallel efforts. I think it will
> always be the case that interprocess communication will cause the
> calculations to run slower than if just one processor crunched on the whole
> thing. [Array or vector processor machines are another matter!] But if the
> number of significant bits required were orders of magnitude larger than the
> hardware, then it might make sense to parallel the squarings. We're not
> there yet.
This seems self-evident, however it is certainly true that there is an
efficient way to code an FFT for a vector processor.
>
> d) NTT's (Number Theoretic Transforms): There has been some talk of that
> here before. However, until the ratio of integer to floating point
> throughput improves, NTT's will not be very popular. [Note that the FFT
> "automagically" computes the %2^P-1, using the Crandall-Fagin procedure, so
> NTT's have a lot of ground to make up first.]
>
If you have a processor like StrongArm, which has efficient integer
hardware but no floating-point hardware at all - forcing you to resort
to software emulation for floating-point - then NTT starts to make a
lot of sense. You get _some_ ground made up automatically
because you have no rounding errors, so you can use a smaller
transform size than experience with real/complex FFT would
suggest. I'm still "rather hazy" (read, totally baffled) on exactly how
the Crandall-Fagin procedure works, so I don't know if you could
adapt NTT to do the mod 2^p-1 automatically. But my gut feeling is
that it ought to be possible.
Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm