I've not run the full QA suite as I have more changes planned. The Windows version
is available at ftp://mersenne.org/gimps/p95v231.zip If there is interest I'll make a Linux version available too.
Only P4 users need to download it. There are a number of small speed increases to the SSE2 code. The gains are greatest for P4 Celeron and P4 Northwood CPUs.
The full text from whatsnew.txt:
1) Big SSE2 FFTs now take the L2 cache size into account. P4 Celeron (128KB
L2 cache) is faster for FFTs between 512K and 2M. P4 Northwood (512KB
L2 cache) is faster for FFTs larger than 1M.
2) Benchmark no longer times 256K and 320K FFTs, but does time 2048K FFT.
3) Support for torture testing FFT sizes from 1280K to 4096K added.
4) A 900 MHz P-III is now required to get first time LL tests by default.
5) Slightly faster SSE2 FFTs for lengths of 5*2^N and 7*2^N (e.g. 640K, 896K).
One item missing from whatsnew.txt is prime95 now pages better, resulting in
improved times for several FFT sizes not listed above. This came about from
noticing that the debug version was faster than the release version. The debug
malloc filled memory with a constant, thus making it more likely to be allocated
in contiguous physical memory. Since the FFTs are designed to fill the L2
cache evenly when the FFT data is in contiguous memory, there is a significant
speedup when the FFT code is using nearly all of the L2 cache.
--- Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.449 / Virus Database: 251 - Release Date: 1/27/2003