On Tuesday 23 December 2003 20:15, Matthias Waldhauer wrote: > > Last friday I read some messages about recent kernel modifications and > patches for version 2.6.0. There is an "imcplicit_large_page" patch, > allowing applications to use large pages without modifications. I don't > have the time to dig into it :(
Sure. This is a much better approach than mucking about with application-specific modifications which would likely involve serious security hazards (leaking kernel priveleges to the application) and/or clash with other applications private large-page code and/or large page enabled kernels in the future. The "bad news" with kernel 2.6 is that the (default) jiffy timer resolution is changed from 10ms to 1ms, resulting in the task scheduler stealing 10 times as many cycles. This will likely cause a small but noticeable drop in the performance of mprime. Probably ~1% on fast systems. In other words the cycles gained by large page efficiency could easily be swallowed up by the task scheduler being tuned to improve interactive responsiveness (and cope with more processors in a SMP setup). I suppose you could retrofit a 10ms jiffy timer to the 2.6 kernel, but then you could just as easily patch large page support into a 2.4 kernel & (hopefully) keep the stability of a tried, tested & trusted kernel. Finally, the "good news". Crandall & Pomerance p441 describes the "ping pong" variant of the Stockham FFT, in which an extra copy of the data is used but the innermost loop runs essentially consecutively through data memory. C&P note that contiguous memory access is "important" on vector processors but similar memory access techniques are surely the key to avoiding problems with TLB architectures _and small processor caches_ - and the largest caches present on commercial x86 architecture are indeed small compared with the size of the work vectors we use for LL testing. Perhaps implementation along these lines could reduce the cache size dependency which seems to affect Prime95/mprime - though paying a very large premium for the "extreme" version of the Intel Pentium 4 is most certainly not cost effective in view of the small performance benefit the extra cache generates, most probably because the Prime95/mprime code appears not to be tuned for the P4 Extreme Edition. Seasonal felicitations Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers