On Monday 19 January 2009 19:32:39 ja...@njkfrudils.plus.com wrote:
> Presently for a 20x20 mul_basecase we have cycle counts of
> 2082 gmp-4.2.4
> 1461 mpir trunk
> 1153 mpir k8-branch
> 1103 pipeline
> 1033 perfect
>
> "pipeline" is my fully pipelined version of mpir k8-branch and "perfect" is
> assuming that every slot can be filled with a macro op. I can perhaps trim
> another 10 cycles off the time, but it seems we have to have some unfilled
> slots.
> ie 13 cycles from branch mispredict, 27 cycles from suboptimal outer loop
> schedule, 20 from first iteration startup.The branch misprediction is
> unavoidible , the suboptimal outer loop schedule is the best I can get ,
> and the first iteration startup is a mystery!
>
> Note: the % speedup from k8-branch to "pipeline" is better for smaller n ,
> we get about 10% for 8x8 mul_basecase
>

Playing around with the alignment of instructions etc , I have managed to 
squeeze some more computational juice from the current addmul. As this only 
reduced the overheads I didn't try it inside mul_basecase till now. It offers 
a nice speedup without having to pipeline the whole function. Dont understand 
why it's faster than the previous function, or more to the point , why the 
previous function was slower.
Just trying to tweek it some more and then I'll post it

Jason

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to