On Tuesday 23 December 2003 20:15, Matthias Waldhauer wrote:
>
> Last friday I read some messages about recent kernel modifications and
> patches for version 2.6.0. There is an "imcplicit_large_page" patch,
> allowing applications to use large pages without modifications. I don't
> have the time to dig into it :(

Sure. This is a much better approach than mucking about with 
application-specific modifications which would likely involve serious 
security hazards (leaking kernel priveleges to the application) and/or clash 
with other applications private large-page code and/or large page enabled 
kernels in the future.

The "bad news" with kernel 2.6 is that the (default) jiffy timer resolution 
is changed from 10ms to 1ms, resulting in the task scheduler stealing 10 
times as many cycles. This will likely cause a small but noticeable drop in 
the performance of mprime. Probably ~1% on fast systems. In other words the 
cycles gained by large page efficiency could easily be swallowed up by the 
task scheduler being tuned to improve interactive responsiveness (and cope 
with more processors in a SMP setup). I suppose you could retrofit a 10ms 
jiffy timer to the 2.6 kernel, but then you could just as easily patch large 
page support into a 2.4 kernel & (hopefully) keep the stability of a tried, 
tested & trusted kernel.

Finally, the "good news". Crandall & Pomerance p441 describes the "ping pong" 
variant of the Stockham FFT, in which an extra copy of the data is used but 
the innermost loop runs essentially consecutively through data memory. C&P 
note that contiguous memory access is "important" on vector processors but 
similar memory access techniques are surely the key to avoiding problems with 
TLB architectures _and small processor caches_ - and the largest caches 
present on commercial x86 architecture are indeed small compared with the 
size of the work vectors we use for LL testing. Perhaps implementation along 
these lines could reduce the cache size dependency which seems to affect 
Prime95/mprime - though paying a very large premium for the "extreme" version 
of the Intel Pentium 4 is most certainly not cost effective in view of the 
small performance benefit the extra cache generates, most probably because 
the Prime95/mprime code appears not to be tuned for the P4 Extreme Edition.

Seasonal felicitations
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to