On Wed, Dec 10, 2003 at 06:26:54PM +0000, Nick Craig-Wood wrote: > I'm in the process of writing (not quite finished or working ;-) some > code which you load as an LD_PRELOAD library under linux. This gets > its fingers into the memory allocation, and makes all malloc space > come from hugetlbfs (how you get large pages under linux). > > My primary user for this was to be mprime of course!
Well I finished the code and here are the results on my lowly laptop running 2.6.0. Intel(R) Pentium(R) III processor CPU speed: 550.78 MHz CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE L1 cache size: 16 KB L2 cache size: 256 KB L1 cache line size: 32 bytes L2 cache line size: 32 bytes TLBS: 64 Prime95 version 22.12, RdtscTiming=1 Normal ------ Best time for 256K FFT length: 80.256 ms. Best time for 320K FFT length: 101.820 ms. Best time for 384K FFT length: 125.191 ms. Best time for 448K FFT length: 145.505 ms. Best time for 512K FFT length: 161.178 ms. Best time for 640K FFT length: 215.113 ms. Best time for 768K FFT length: 258.055 ms. Best time for 896K FFT length: 304.786 ms. Best time for 1024K FFT length: 345.747 ms. Best time for 1280K FFT length: 449.540 ms. Best time for 1536K FFT length: 541.963 ms. Best time for 1792K FFT length: 661.651 ms. With all memory allocations coming from 4 MB pages -------------------------------------------------- Best time for 256K FFT length: 79.293 ms. 1.2% Best time for 320K FFT length: 102.032 ms. -0.2% Best time for 384K FFT length: 124.022 ms. 0.9% Best time for 448K FFT length: 145.492 ms. 0.0% Best time for 512K FFT length: 161.568 ms. -0.2% Best time for 640K FFT length: 213.311 ms. 0.8% Best time for 768K FFT length: 254.609 ms. 1.3% Best time for 896K FFT length: 301.911 ms. 0.9% Best time for 1024K FFT length: 339.203 ms. 1.9% Best time for 1280K FFT length: 439.119 ms. 2.3% Best time for 1536K FFT length: 531.422 ms. 1.9% Best time for 1792K FFT length: 645.350 ms. 2.5% So consistent but small improvements in the larger FFTs. This just goes to show what a good job George has done in not thrashing the TLB! I wonder if Prime95 could be made more efficient if it didn't have to worry about the TLB? Its obviously detecting the TLB slots for this computer which is wrong in this case - perhaps there is a way of overriding this? Please email me if you'd like to experiment with the code - its quite simple (it just took rather a lot of different approaches to get right!). You'll need to be running 2.6.0 with HUGETLB support if you want to play (see hugetlbpage.txt in Documentation in the kernel source for more info). -- Nick Craig-Wood [EMAIL PROTECTED] _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers