On Nov 26, 6:18 am, Bill Hart <[EMAIL PROTECTED]> wrote:
> Some other things I forgot to mention:
>
> 1) It probably wouldn't have been possible for me to get 2.5c/l
> without jason's code, in both the mul_1 and addmul_1 cases.

:)

> 2) You can often insert nops with lone or pair instructions which are
> not 3 macro ops together, further proving that the above analysis is
> correct.
>
> 3) The addmul_1 code I get is very close to the code obtained by
> someone else through independent means, so I won't post it here. Once
> the above tricks have been validated on other code, I'll commit the
> addmul_1 code I have to the repo. Or perhaps someone else will
> rediscover it from what I have written above.
>
> In fact I was only able to find about 16 different versions of
> addmul_1 that run in 2.5c/l all of which look very much like the
> solution obtained independently. The order and location of most
> instructions is fixed by the dual requirements of having triplets of
> macro-ops and having almost nothing run in ALU0 other than muls. There
> are very few degrees of freedom.
>
> Bill.

This is very, very cool and I am happy that this is discussed in
public. Any chance to see some performance numbers before and after
the checkin?

<SNIP>

Cheers,

Michael
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to