I have just added the Windows versions of Jason's code to the K8- experimental SVN branch. I have also added an interim version of sqr_basecase.asm that Jason has provided. This does not improve on the performance of the earlier code (in fact it is a bit slower for medium sized operands) but it removes the upper limit on the Karatsuba threshold in the earlier code and hence makes tuning easier.
I have also added a Python program g2y.py that converts JM's code to YASM format. A few lines need manual intervention (PROLOGUE and EPILOGUE in particular as I am unclear how to translate these). I don't change the GCC calling conventions so the resulting code should not require much work to be used with YASM on Linux. g2y.py runs in the build.vc9 directory and translates assembler code in x86_64 into x86_64w if it is not already there. The benchmark on Windows on my AMD Athlon X2 (2.4GHz) is now 10,800 with this code (up from 8700 for the original code). I have also tidied up the Windows AMD64 build in both the trunk and the experimental branches. Happy New Year to all Brian --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---