Hi, 

On Tue,Oct 23 2001 00:15, [EMAIL PROTECTED] wrote:


> (snip)
> Nice work on the IA64 optimizations, Guillermo! Actually, the Alpha
> is another platform where preload rather than true prefetch seems
> the best way to go, although the performance boost you're getting
> on the Itanium is much larger than one sees on the Alpha.
>

The problem of preload, IMHO, is we need a lot of registers to be efficient. 
Have you tried it in the new Mlucas version?. 

The performance boost I get for Itanium is no only because of preload. See 
yia64.h code, I've rewrited the macros with the IA64 execution scheme in 
mind. The aim was to make the task easy to the compiler.  

> > Here is the timings for an Itanium @ 800 Mhz (Compaq Blazer Itanium):
>
> {snip}
>
> > 1024 K  0.134/0.130
>
> {snip}
>
> > 4096 K  0.588/0.574
>
> The timings you sent me some time ago for Glucas 2.7b on a 667MHz Alpha
> 21264 with a similar 4MB L2 cache size at these runlengths are as follows:
>
> 1024 K  0.180/0.171
>
> 4096 K  0.831/0.787
>
> which indicates about 10-15% better per-cycle performance for the IA64
> relative to the 21264. This is good, but (being greedy :) I think the
> IA64 may be able to achieve even better performance with further tuning
> (of both source code and compiler), because the IA64 has such great
> FPU capabilities. If the 21264 could do just 2 adds per cycle (to say
> nothing of multiplies) I estimate the performance on the instruction
> mix typical of this kind of large-FFT code would increase by 20-30%.
>

I think we will see better compiler versions soon and the advantage will 
increase. (the code also should be improved).

> > IA64 architecture has a very nice feature: predication. In the DWT used
> > in most GIMPS clients, the normalization and carry phase has a relevant
> > cost in terms of performance. There some branches hard to predict and
> > here the predication substitutes this branches with great success.
>
> Of course it is possible (and in many cases desirable) to do the normalize
> and carry sans branches
>
> (snip)
>
>.Did you ever
> try such a branchless version on the IA64?
>

Of course, I first used the branchless code and the result was dreadful 
timings. 


> > We still have no made a good timing page, we will send it to E.Mayer and
> > to sourceforge when possible.
>
> Yes, I've been tardy in updating the timings page - been too busy with
> work and Mlucas 2.7c to spend as much time as I should on it. I should
> have it somewhat up-to-date at the same time I release the new version
> of Mlucas, perhaps in a couple of weeks. In any event, it's not like
> Itanium users have a plethora of codes amongst which they must decide. :)
>

Hope the nice competition in non_x86_GIMPS_clients arena will continue ;-)


Guillermo.

-- 
Guillermo Ballester Valor
[EMAIL PROTECTED]
Granada (Spain)

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to