Re: [mpir-devel] Re: Future MPIR compatibility with GMP?

Bill Hart Sun, 10 Jan 2010 15:22:54 -0800

 I had an even better idea. Your p6/mmx supports, from what I can
tell, sse2. Therefore you can probably try:


./configure --build=pentium4-unknown-linux-gnu ABI=32

This may actually speed things up for you! (No guarantees, but there
is a good chance it will work).

If you are not using linux, just replace the p6mmx with pentium4 in
whatever you currently have (run ./config.guess to see what you
currently have).

Bill.

2010/1/10 Bill Hart <goodwillh...@googlemail.com>:
> Thanks! I think it is more likely to be the tuning than the compiler
> in this case. But that is only a guess, based on what I know of the
> way the FFT code works.
>
> I see that all the new FFT TABLE2 tuning values are missing entirely
> (not your fault, but ours). What you might like to try, when you find
> some time is take all the tuning values from the last lines of
> mpn/x86/pentium4/sse2/gmp-mparam.h, including all the TABLE2 values
> and inserting them in your gmp-mparam.h file. That *should* make the
> assert problem go away.
>
> It might be a good idea for us to put these values into all the
> gmp-mparam.h files. Even if they are not completely optimal, they will
> be better than wrong values, which seems to be what we have at the
> moment!
>
> Bill.
>
> 2010/1/10 Gianrico Fini <gianrico.f...@gmail.com>:
>> Yes, of course.
>>
>> $ls -lrt gmp-mparam.h
>> lrwxrwxrwx 1 gian    gian    27 2010-01-10 17:19 gmp-mparam.h -> mpn/
>> x86/p6/mmx/gmp-mparam.h
>> $ cat gmp-mparam.h
>> /* Intel P6/mmx gmp-mparam.h -- Compiler/machine parameter header
>> file.
>>
>> Copyright 1991, 1993, 1994, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
>> 2006
>> Free Software Foundation, Inc.
>>
>> This file is part of the GNU MP Library.
>>
>> The GNU MP Library is free software; you can redistribute it and/or
>> modify
>> it under the terms of the GNU Lesser General Public License as
>> published by
>> the Free Software Foundation; either version 2.1 of the License, or
>> (at your
>> option) any later version.
>>
>> The GNU MP Library is distributed in the hope that it will be useful,
>> but
>> WITHOUT ANY WARRANTY; without even the implied warranty of
>> MERCHANTABILITY
>> or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General
>> Public
>> License for more details.
>>
>> You should have received a copy of the GNU Lesser General Public
>> License
>> along with the GNU MP Library; see the file COPYING.LIB.  If not,
>> write to
>> the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
>> Boston,
>> MA 02110-1301, USA. */
>>
>>
>> #define BITS_PER_MP_LIMB 32
>> #define BYTES_PER_MP_LIMB 4
>>
>>
>> /* NOTE: In a fat binary build SQR_KARATSUBA_THRESHOLD here cannot be
>> more
>>   than the value in mpn/x86/p6/gmp-mparam.h.  The latter is used as a
>> hard
>>   limit in mpn/x86/p6/sqr_basecase.asm.  */
>>
>>
>> /* 1867 MHz Pentium 3/M */
>>
>> /* Generated by tuneup.c, 2006-03-21, gcc 3.4 */
>>
>> #define MUL_KARATSUBA_THRESHOLD          22
>> #define MUL_TOOM3_THRESHOLD              73
>> #define MUL_TOOM4_THRESHOLD             202
>> #define MUL_TOOM7_THRESHOLD             298
>>
>> #define SQR_BASECASE_THRESHOLD            0  /* always (native) */
>> #define SQR_KARATSUBA_THRESHOLD          40
>> #define SQR_TOOM3_THRESHOLD             101
>> #define SQR_TOOM4_THRESHOLD             224
>> #define SQR_TOOM7_THRESHOLD             450
>>
>> #define MULLOW_BASECASE_THRESHOLD         7
>> #define MULLOW_DC_THRESHOLD              48
>> #define MULLOW_MUL_THRESHOLD           6203
>>
>> #define MULHIGH_BASECASE_THRESHOLD        9
>> #define MULHIGH_DC_THRESHOLD             51
>> #define MULHIGH_MUL_THRESHOLD          6142
>>
>> #define MULMOD_2EXPM1_THRESHOLD          18
>>
>> #define DIV_SB_PREINV_THRESHOLD           0  /* always */
>> #define DIV_DC_THRESHOLD                 54
>> #define POWM_THRESHOLD                   93
>> #define FAC_UI_THRESHOLD               2437
>>
>> #define GCD_ACCEL_THRESHOLD              52
>> #define GCDEXT_THRESHOLD                 45
>> #define JACOBI_BASE_METHOD                1
>>
>> #define USE_PREINV_DIVREM_1               1  /* native */
>> #define USE_PREINV_MOD_1                  1  /* native */
>> #define DIVREM_2_THRESHOLD                0  /* always */
>> #define DIVEXACT_1_THRESHOLD              0  /* always (native) */
>> #define MODEXACT_1_ODD_THRESHOLD          0  /* always (native) */
>> #define MOD_1_1_THRESHOLD                32
>> #define MOD_1_2_THRESHOLD                54
>> #define MOD_1_3_THRESHOLD                57
>> #define DIVREM_HENSEL_QR_1_THRESHOLD      5
>> #define RSH_DIVREM_HENSEL_QR_1_THRESHOLD      4
>> #define DIVREM_EUCLID_HENSEL_THRESHOLD     23
>>
>> #define ROOTREM_THRESHOLD                 6
>>
>> #define GET_STR_DC_THRESHOLD             19
>> #define GET_STR_PRECOMPUTE_THRESHOLD     25
>> #define SET_STR_THRESHOLD              3296
>>
>> #define MUL_FFT_TABLE  { 464, 928, 1920, 4608, 10240, 24576, 98304,
>> 393216, 1572864, 6291456, 0 }
>> #define MUL_FFT_MODF_THRESHOLD          480
>> #define MUL_FFT_THRESHOLD              3328
>>
>> #define SQR_FFT_TABLE  { 464, 928, 1920, 5632, 14336, 40960, 98304,
>> 393216, 1572864, 6291456, 0 }
>> #define SQR_FFT_MODF_THRESHOLD          480
>> #define SQR_FFT_THRESHOLD              3840
>>
>> I left the initial comment and the last lines untouched.
>> By the way... the problem can be triggered by the compiler ... because
>> at first I compiled (gcc 4.3) and everything worked, then I both
>> retuned _and_ changed compiler (to gcc 4.4), after recompilation I got
>> the error.
>>
>> On 10 Gen, 23:26, Bill Hart <goodwillh...@googlemail.com> wrote:
>>> Sure, but as I mentioned, right at the start, the speed of
>>> multiplication is critical for almost everything. That is why it is
>>> the most important benchmark.
>>>
>>> And the speed of multiplication is critically dependent on the speed
>>> of the basecase assembly case, which, on your machine, is slower in
>>> MPIR, by about a third. So definitely, you should expect to see a
>>> performance improvement for many things by using GMP (whatever
>>> version) just on this basis. For your machine, they had previously
>>> sped up this assembly function and we had not, so even GMP 4.3.2 will
>>> be faster.
>>>
>>> Anyhow, while you are still there, can you please post the contents of
>>> the gmp-mparam.h file that you used when the fac_ui test failed the
>>> assert. That would be a great help to us!
>>>
>>> Bill.
>>>
>>> 2010/1/10 Gianrico Fini <gianrico.f...@gmail.com>:
>>>
>>> > Let's abandon GMP5 alone for a while.
>>>
>>> > On my CPU, GMP432 get this values:
>>> > Category base=>  1546, 1104
>>> >  Program rsa (weight 1.00) =>   398,  284
>>> >  Program pi (weight 1.00) =>  4.46, 3.18
>>> >  Program bpsw (weight 1.00)  =>  2.31, 1.65
>>> >  Program wagstaff (weight 1.00) =>  10.7, 7.62
>>> >  Program mersenne (weight 1.00) => 0.959,0.685
>>> >  Program fermat (weight 1.00)  =>  70.4, 50.3 =>  12.0, 8.56
>>>
>>> > MPIR-130rc2 get this values:
>>> > Category base=>  1394,  996
>>> >  Program rsa (weight 1.00) =>   316,  226
>>> >  Program pi (weight 1.00) =>  4.03, 2.88
>>> >  Program bpsw (weight 1.00) =>  2.16, 1.54
>>> >  Program wagstaff (weight 1.00) =>  7.69, 5.49
>>> >  Program mersenne (weight 1.00)=>  2.43, 1.74
>>> >  Program fermat (weight 1.00) =>   234,  167 =>  15.1, 10.8
>>>
>>> > They are strange... GMP432 get slightly better values almost
>>> > everywhere: overall for basic operations, for rsa, for pi, for bpsw,
>>> > for wagstaff... and get terribly worst value for mersenne and fermat
>>> > only.
>>> > You speak about applications, well on my old 32-bits machine GMP seems
>>> > faster for almost all applications except two... I'll investigate.
>>>
>>> > Gian.
>>>
>>> > On 10 Gen, 22:18, Bill Hart <goodwillh...@googlemail.com> wrote:
>>> >> You are of course welcome to choose whichever package best meets your
>>> >> needs. And indeed on your particular system, it seems GMP may well do
>>> >> that for you at present.
>>>
>>> >> One thing you should bear in mind however. Here are some times as they
>>> >> have changed over the past year and a half:
>>>
>>> >> K8
>>>
>>> >> Multiplication:   GMP 4.3.0 MPIR 1.2.0  MPIR 1.1.2  MPIR 1.0.0  MPIR
>>> >> 0.9.0   GMP 4.2.1
>>> >> ==========         ========  =========   =========    ========
>>> >> ========   =========
>>> >> 128 x 128 :        52766506   53794646    51802252    35856598
>>> >> 37299412    25896136
>>> >> 512 x 512 :        10879150   12488043    11802334    10928085
>>> >> 8122452     6383542
>>> >> 8192 x 8192 :        114927     117404      111772      111641
>>> >> 86301       60819
>>> >> 131072 x 131072 :      1757       2062        1873        1650
>>> >> 1165         885
>>> >> 2097152 x 2097152 :    52.5       63.4        44.1        44.1
>>> >> 36.8        30.1
>>>
>>> >> So as you can see, the times have changed *MUCH* more for *both*
>>> >> projects than the current difference between them. In fact
>>> >> multiplication speed (the most important speed by far) has nearly
>>> >> doubled in the past year, right across the board. I think with GMP 5
>>> >> and MPIR 1.3 it really has doubled.
>>>
>>> >> So it is the improvement *over time* which is the important thing.
>>> >> You'll also note that the projects have leapfrogged each other, MPIR
>>> >> 0.9.0 beating GMP 4.2.1, GMP 4.3.0 beating MPIR 1.1.2, MPIR 1.2.0
>>> >> beating GMP 4.3.0 and so on. So it does depend at what time you do the
>>> >> comparison whether one or the other is better.
>>>
>>> >> Also, if you look at the times Case provided, what you said about only
>>> >> multiplication above 100000 bits being faster is not really true.
>>> >> There are other places where MPIR beats GMP, even on that system. Also
>>> >> Case's benchmark only tests certain functionality. The full benchmark
>>> >> that we were running earlier shows plenty of other improvements of
>>> >> MPIR over GMP and is intended to give a much better overall guide.
>>> >> Case is showing us benchmarks that he is personally very interested
>>> >> in, and so that will be important for us to look at improving.
>>>
>>> >> Some of the program benchmarks that we have in our full benchmark
>>> >> suite tell a completely different story, putting MPIR well ahead for
>>> >> those sorts of things. They show that in an overall program, we do
>>> >> quite well.
>>>
>>> >> As I said, it is a mixed bag. Neither is showing clear superiority at
>>> >> this point. However, I will accept that on your 32 bit system, the
>>> >> assembly code is better optimised in GMP. That is definitely something
>>> >> we should look at improving further.
>>>
>>> >> Of course that is not completely trivial to do though. You are welcome
>>> >> to give it a go. I believe you will very quickly find that just about
>>> >> everything you try will make it slower. The assembly optimisation has
>>> >> got to such an art these days it cannot be done by hand. We have
>>> >> special optimisation tools for doing it, and it takes large amounts of
>>> >> CPU time, and human hours, to do the optimisation work. Progressively,
>>> >> over time, all the code will get optimised, but it is a long process!
>>>
>>> >> Bill.
>>>
>>> >> 2010/1/10 Gianrico Fini <gianrico.f...@gmail.com>:
>>>
>>> >> > It seems that also on your platform (32 bits you too?) MPIR is faster
>>> >> > only for one thing: multiplication (or squaring) above 100000 digits,
>>> >> > up to 30%.
>>> >> > And slower almost everywhere... somewhere +100% or more...
>>>
>>> >> > This strengthen my decision...
>>>
>>> >> > Gian.
>>>
>>> >> > On 10 Gen, 18:47, Case Vanhorsen <cas...@gmail.com> wrote:
>>> >> >> I'll toss in my benchmark results. :-)
>>>
>>> >> >>                            GMPY performance benchmark
>>>
>>> >> >> Decimal string to mpz:      MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000021 sec      0.00000022 sec
>>> >> >>       100 digits:      0.00000063 sec      0.00000066 sec
>>> >> >>       500 digits:      0.00000318 sec      0.00000302 sec
>>> >> >>      1000 digits:      0.00000716 sec      0.00000693 sec
>>> >> >>      5000 digits:      0.00008661 sec      0.00006298 sec
>>> >> >>     10000 digits:      0.00026616 sec      0.00016775 sec
>>> >> >>     50000 digits:      0.00265514 sec      0.00168555 sec
>>> >> >>    100000 digits:      0.00651324 sec      0.00444604 sec
>>> >> >>    500000 digits:      0.04866513 sec      0.03830050 sec
>>> >> >>   1000000 digits:      0.11429363 sec      0.09162606 sec
>>> >> >>  10000000 digits:      2.31600404 sec      1.59257817 sec
>>>
>>> >> >> Mpz to decimal string:      MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000034 sec      0.00000035 sec
>>> >> >>       100 digits:      0.00000105 sec      0.00000101 sec
>>> >> >>       500 digits:      0.00000717 sec      0.00000589 sec
>>> >> >>      1000 digits:      0.00001586 sec      0.00001262 sec
>>> >> >>      5000 digits:      0.00014800 sec      0.00010783 sec
>>> >> >>     10000 digits:      0.00041150 sec      0.00029588 sec
>>> >> >>     50000 digits:      0.00420932 sec      0.00338085 sec
>>> >> >>    100000 digits:      0.01185473 sec      0.00920948 sec
>>> >> >>    500000 digits:      0.12125288 sec      0.08355007 sec
>>> >> >>   1000000 digits:      0.31727976 sec      0.20738387 sec
>>> >> >>  10000000 digits:      7.70821309 sec      3.94376493 sec
>>>
>>> >> >> Mpz addition:               MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000010 sec      0.00000009 sec
>>> >> >>       100 digits:      0.00000010 sec      0.00000010 sec
>>> >> >>       500 digits:      0.00000012 sec      0.00000011 sec
>>> >> >>      1000 digits:      0.00000014 sec      0.00000013 sec
>>> >> >>      5000 digits:      0.00000051 sec      0.00000050 sec
>>> >> >>     10000 digits:      0.00000073 sec      0.00000073 sec
>>> >> >>     50000 digits:      0.00000430 sec      0.00000429 sec
>>> >> >>    100000 digits:      0.00000822 sec      0.00000818 sec
>>> >> >>    500000 digits:      0.00003971 sec      0.00003959 sec
>>> >> >>   1000000 digits:      0.00007838 sec      0.00007884 sec
>>> >> >>  10000000 digits:      0.00357354 sec      0.00354370 sec
>>> >> >> 100000000 digits:      0.05413541 sec      0.05324940 sec
>>>
>>> >> >> 1NxN mpz multiplication:    MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000009 sec      0.00000009 sec
>>> >> >>       100 digits:      0.00000017 sec      0.00000018 sec
>>> >> >>       500 digits:      0.00000124 sec      0.00000126 sec
>>> >> >>      1000 digits:      0.00000414 sec      0.00000378 sec
>>> >> >>      5000 digits:      0.00004730 sec      0.00004805 sec
>>> >> >>     10000 digits:      0.00012850 sec      0.00012088 sec
>>> >> >>     50000 digits:      0.00123085 sec      0.00109137 sec
>>> >> >>    100000 digits:      0.00290135 sec      0.00280582 sec
>>> >> >>    500000 digits:      0.01663006 sec      0.01763764 sec
>>> >> >>   1000000 digits:      0.03379822 sec      0.03994881 sec
>>> >> >>  10000000 digits:      0.68572044 sec      0.61115754 sec
>>> >> >> 100000000 digits:      6.44622898 sec      7.93841791 sec
>>>
>>> >> >> 5NxN mpz multiplication:    MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000011 sec      0.00000010 sec
>>> >> >>       100 digits:      0.00000038 sec      0.00000040 sec
>>> >> >>       500 digits:      0.00000604 sec      0.00000652 sec
>>> >> >>      1000 digits:      0.00002064 sec      0.00001863 sec
>>> >> >>      5000 digits:      0.00023417 sec      0.00021708 sec
>>> >> >>     10000 digits:      0.00064239 sec      0.00058681 sec
>>> >> >>     50000 digits:      0.00608666 sec      0.00436574 sec
>>> >> >>    100000 digits:      0.00847080 sec      0.00917852 sec
>>> >> >>    500000 digits:      0.05356821 sec      0.06811212 sec
>>> >> >>   1000000 digits:      0.12863311 sec      0.14648414 sec
>>> >> >>  10000000 digits:      2.27829909 sec      2.17810798 sec
>>> >> >> 100000000 digits:     21.30186605 sec     27.38823199 sec
>>>
>>> >> >> 17NxN mpz multiplication:   MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000010 sec      0.00000011 sec
>>> >> >>       100 digits:      0.00000113 sec      0.00000108 sec
>>> >> >>       500 digits:      0.00002057 sec      0.00002183 sec
>>> >> >>      1000 digits:      0.00007094 sec      0.00006423 sec
>>> >> >>      5000 digits:      0.00081254 sec      0.00071725 sec
>>> >> >>     10000 digits:      0.00217992 sec      0.00197989 sec
>>> >> >>     50000 digits:      0.02072028 sec      0.01620061 sec
>>> >> >>    100000 digits:      0.02676870 sec      0.03553003 sec
>>> >> >>    500000 digits:      0.20828125 sec      0.23191699 sec
>>> >> >>   1000000 digits:      0.42618978 sec      0.52746260 sec
>>> >> >>  10000000 digits:      5.84609008 sec      7.77125812 sec
>>> >> >> 100000000 digits:     74.05822110 sec    100.53587508 sec
>>>
>>> >> >> 2N/N mpz quotient:          MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000018 sec      0.00000018 sec
>>> >> >>       100 digits:      0.00000041 sec      0.00000037 sec
>>> >> >>       500 digits:      0.00000234 sec      0.00000203 sec
>>> >> >>      1000 digits:      0.00000729 sec      0.00000638 sec
>>> >> >>      5000 digits:      0.00009662 sec      0.00009747 sec
>>> >> >>     10000 digits:      0.00029030 sec      0.00029359 sec
>>> >> >>     50000 digits:      0.00329851 sec      0.00279975 sec
>>> >> >>    100000 digits:      0.00912671 sec      0.00663861 sec
>>> >> >>    500000 digits:      0.07756643 sec      0.04376046 sec
>>> >> >>   1000000 digits:      0.18805614 sec      0.10166769 sec
>>> >> >>  10000000 digits:      3.46835899 sec      1.65955496 sec
>>> >> >> 100000000 digits:     57.28032804 sec     21.36209702 sec
>>>
>>> >> >> 5N/N mpz quotient:          MPIR 1.3.0           GMP 5.0.0
>>> >> >>        10 digits:      0.00000021 sec      0.00000020 sec
>>> >> >>       100 digits:      0.00000095 sec      0.00000085 sec
>>> >> >>       500 digits:      0.00000846 sec      0.00000747 sec
>>> >> >>      1000 digits:      0.00002843 sec
>>>
>>> >> ...
>>>
>>> >> leggi tutto
>>>
>>> > --
>>> > You received this message because you are subscribed to the Google Groups 
>>> > "mpir-devel" group.
>>> > To post to this group, send email to mpir-de...@googlegroups.com.
>>> > To unsubscribe from this group, send email to 
>>> > mpir-devel+unsubscr...@googlegroups.com.
>>> > For more options, visit this group 
>>> > athttp://groups.google.com/group/mpir-devel?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "mpir-devel" group.
>> To post to this group, send email to mpir-de...@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> mpir-devel+unsubscr...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/mpir-devel?hl=en.
>>
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Re: [mpir-devel] Re: Future MPIR compatibility with GMP?

Reply via email to