On Monday 19 January 2009 19:32:39 ja...@njkfrudils.plus.com wrote: > Presently for a 20x20 mul_basecase we have cycle counts of > 2082 gmp-4.2.4 > 1461 mpir trunk > 1153 mpir k8-branch > 1103 pipeline > 1033 perfect > > "pipeline" is my fully pipelined version of mpir k8-branch and "perfect" is > assuming that every slot can be filled with a macro op. I can perhaps trim > another 10 cycles off the time, but it seems we have to have some unfilled > slots. > ie 13 cycles from branch mispredict, 27 cycles from suboptimal outer loop > schedule, 20 from first iteration startup.The branch misprediction is > unavoidible , the suboptimal outer loop schedule is the best I can get , > and the first iteration startup is a mystery! > > Note: the % speedup from k8-branch to "pipeline" is better for smaller n , > we get about 10% for 8x8 mul_basecase >
Playing around with the alignment of instructions etc , I have managed to squeeze some more computational juice from the current addmul. As this only reduced the overheads I didn't try it inside mul_basecase till now. It offers a nice speedup without having to pipeline the whole function. Dont understand why it's faster than the previous function, or more to the point , why the previous function was slower. Just trying to tweek it some more and then I'll post it Jason --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---