An mpirbench of 9000 might be achievable just with assembly work. Of
course if the FFT and division code were improved, this could be
substantially higher. It's hard to say by how much because the way the
test calculates the score is somewhat involved. I wouldn't like to put
a figure on it because the change that the division improvement would
make is not something I have comparative timings for. The FFT could
improve the score by up to 40% I think, but of course this will only
affect very large multiplications and divisions and not the rsa score.

Bill.

2009/2/19 Jason Martin <jason.worth.mar...@gmail.com>:
>
> On Wed, Feb 18, 2009 at 7:51 PM, William Stein <wst...@gmail.com> wrote:
>>
>> On Wed, Feb 18, 2009 at 4:49 PM, Jason Martin
>> <jason.worth.mar...@gmail.com> wrote:
>>>
>>> On Wed, Feb 18, 2009 at 7:13 PM,  <ja...@njkfrudils.plus.com> wrote:
>>>>
>>>> On Wednesday 18 February 2009 22:03:43 Mariah wrote:
>>>>> gmp-4.2.4   mpir-0.9.0
>>>>>
>>>>> 2241.9      2251         cicero (pentium4-pc-linux-gnu)
>>>>> 3371.5      3369.3      cleo (ia64-unknown-linux-gnu)
>>>>> 6024.5      7437.8      eno (core2-unknown-linux-gnu)
>>>>> 6022.2      7387.1      fulvia (core2-pc-solaris2.10)
>>>>> 3367.8      3369.5      iras (ia64-unknown-linux-gnu)
>>>>> 1341.3      1343.6      mark (ultrasparc3-sun-solaris2.10)
>>>>> 6100         7421.1      menas (core2-unknown-linux-gnu)
>>>>>
>>>>> Mariah
>>>>
>>>> K10 crushes core-2 (intel fanbois hide their heads in shame :)
>>>>
>>>> gmp-4.2.4       mpir-0.9.0      r1614-k8-branch
>>>> 6014            7379            10118                   box1 
>>>> (k8-unknown-linux-gnu) 1.8Ghz
>>>> 9301            11659           15514                   cuda1 
>>>> (k10-unknown-linux-gnu) 2.6Ghz
>>>>
>>>
>>> I don't think that the core2 can get much faster... the addmul (and
>>> friends) are running just shy of 4 cycles/limb which is the max
>>> throughput rate for the 64-bit multiply instruction on core2.  I'm
>>> appropriately hiding in shame :-)
>>>
>>
>> Do you have any remarks to make about where you think Itanium could go
>> (compared to where it is now) if it were given suitable attention?
>>
>
> IF Intel is being honest in their documentation, then I believe that
> addmul and friends can run at 2 cycles/limb or better, while add_n
> should be able to run under 1 cycle/limb.  This would make IA64
> architecturally comparable with K10.  Of course the highest clock
> speed of Itaniums (that I know of) is 1.66 GHz, which is well below
> the fastest K10 chips.  My *complete guess* at this time is that it
> should be reasonable to get IA64 mpirbench results around 9000 on the
> 1.66GHz Itaniums.
>
> However, there is no equivalent of Agner Fog for Itanium, so much of
> the work in tweaking for IA64 will be writing lots of timing routines
> to find out exactly what latency and throughput rates are for the
> relevant instructions.
>
> --jason
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to