I had an even better idea. Your p6/mmx supports, from what I can tell, sse2. Therefore you can probably try:
./configure --build=pentium4-unknown-linux-gnu ABI=32 This may actually speed things up for you! (No guarantees, but there is a good chance it will work). If you are not using linux, just replace the p6mmx with pentium4 in whatever you currently have (run ./config.guess to see what you currently have). Bill. 2010/1/10 Bill Hart <goodwillh...@googlemail.com>: > Thanks! I think it is more likely to be the tuning than the compiler > in this case. But that is only a guess, based on what I know of the > way the FFT code works. > > I see that all the new FFT TABLE2 tuning values are missing entirely > (not your fault, but ours). What you might like to try, when you find > some time is take all the tuning values from the last lines of > mpn/x86/pentium4/sse2/gmp-mparam.h, including all the TABLE2 values > and inserting them in your gmp-mparam.h file. That *should* make the > assert problem go away. > > It might be a good idea for us to put these values into all the > gmp-mparam.h files. Even if they are not completely optimal, they will > be better than wrong values, which seems to be what we have at the > moment! > > Bill. > > 2010/1/10 Gianrico Fini <gianrico.f...@gmail.com>: >> Yes, of course. >> >> $ls -lrt gmp-mparam.h >> lrwxrwxrwx 1 gian gian 27 2010-01-10 17:19 gmp-mparam.h -> mpn/ >> x86/p6/mmx/gmp-mparam.h >> $ cat gmp-mparam.h >> /* Intel P6/mmx gmp-mparam.h -- Compiler/machine parameter header >> file. >> >> Copyright 1991, 1993, 1994, 1999, 2000, 2001, 2002, 2003, 2004, 2005, >> 2006 >> Free Software Foundation, Inc. >> >> This file is part of the GNU MP Library. >> >> The GNU MP Library is free software; you can redistribute it and/or >> modify >> it under the terms of the GNU Lesser General Public License as >> published by >> the Free Software Foundation; either version 2.1 of the License, or >> (at your >> option) any later version. >> >> The GNU MP Library is distributed in the hope that it will be useful, >> but >> WITHOUT ANY WARRANTY; without even the implied warranty of >> MERCHANTABILITY >> or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General >> Public >> License for more details. >> >> You should have received a copy of the GNU Lesser General Public >> License >> along with the GNU MP Library; see the file COPYING.LIB. If not, >> write to >> the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, >> Boston, >> MA 02110-1301, USA. */ >> >> >> #define BITS_PER_MP_LIMB 32 >> #define BYTES_PER_MP_LIMB 4 >> >> >> /* NOTE: In a fat binary build SQR_KARATSUBA_THRESHOLD here cannot be >> more >> than the value in mpn/x86/p6/gmp-mparam.h. The latter is used as a >> hard >> limit in mpn/x86/p6/sqr_basecase.asm. */ >> >> >> /* 1867 MHz Pentium 3/M */ >> >> /* Generated by tuneup.c, 2006-03-21, gcc 3.4 */ >> >> #define MUL_KARATSUBA_THRESHOLD 22 >> #define MUL_TOOM3_THRESHOLD 73 >> #define MUL_TOOM4_THRESHOLD 202 >> #define MUL_TOOM7_THRESHOLD 298 >> >> #define SQR_BASECASE_THRESHOLD 0 /* always (native) */ >> #define SQR_KARATSUBA_THRESHOLD 40 >> #define SQR_TOOM3_THRESHOLD 101 >> #define SQR_TOOM4_THRESHOLD 224 >> #define SQR_TOOM7_THRESHOLD 450 >> >> #define MULLOW_BASECASE_THRESHOLD 7 >> #define MULLOW_DC_THRESHOLD 48 >> #define MULLOW_MUL_THRESHOLD 6203 >> >> #define MULHIGH_BASECASE_THRESHOLD 9 >> #define MULHIGH_DC_THRESHOLD 51 >> #define MULHIGH_MUL_THRESHOLD 6142 >> >> #define MULMOD_2EXPM1_THRESHOLD 18 >> >> #define DIV_SB_PREINV_THRESHOLD 0 /* always */ >> #define DIV_DC_THRESHOLD 54 >> #define POWM_THRESHOLD 93 >> #define FAC_UI_THRESHOLD 2437 >> >> #define GCD_ACCEL_THRESHOLD 52 >> #define GCDEXT_THRESHOLD 45 >> #define JACOBI_BASE_METHOD 1 >> >> #define USE_PREINV_DIVREM_1 1 /* native */ >> #define USE_PREINV_MOD_1 1 /* native */ >> #define DIVREM_2_THRESHOLD 0 /* always */ >> #define DIVEXACT_1_THRESHOLD 0 /* always (native) */ >> #define MODEXACT_1_ODD_THRESHOLD 0 /* always (native) */ >> #define MOD_1_1_THRESHOLD 32 >> #define MOD_1_2_THRESHOLD 54 >> #define MOD_1_3_THRESHOLD 57 >> #define DIVREM_HENSEL_QR_1_THRESHOLD 5 >> #define RSH_DIVREM_HENSEL_QR_1_THRESHOLD 4 >> #define DIVREM_EUCLID_HENSEL_THRESHOLD 23 >> >> #define ROOTREM_THRESHOLD 6 >> >> #define GET_STR_DC_THRESHOLD 19 >> #define GET_STR_PRECOMPUTE_THRESHOLD 25 >> #define SET_STR_THRESHOLD 3296 >> >> #define MUL_FFT_TABLE { 464, 928, 1920, 4608, 10240, 24576, 98304, >> 393216, 1572864, 6291456, 0 } >> #define MUL_FFT_MODF_THRESHOLD 480 >> #define MUL_FFT_THRESHOLD 3328 >> >> #define SQR_FFT_TABLE { 464, 928, 1920, 5632, 14336, 40960, 98304, >> 393216, 1572864, 6291456, 0 } >> #define SQR_FFT_MODF_THRESHOLD 480 >> #define SQR_FFT_THRESHOLD 3840 >> >> I left the initial comment and the last lines untouched. >> By the way... the problem can be triggered by the compiler ... because >> at first I compiled (gcc 4.3) and everything worked, then I both >> retuned _and_ changed compiler (to gcc 4.4), after recompilation I got >> the error. >> >> On 10 Gen, 23:26, Bill Hart <goodwillh...@googlemail.com> wrote: >>> Sure, but as I mentioned, right at the start, the speed of >>> multiplication is critical for almost everything. That is why it is >>> the most important benchmark. >>> >>> And the speed of multiplication is critically dependent on the speed >>> of the basecase assembly case, which, on your machine, is slower in >>> MPIR, by about a third. So definitely, you should expect to see a >>> performance improvement for many things by using GMP (whatever >>> version) just on this basis. For your machine, they had previously >>> sped up this assembly function and we had not, so even GMP 4.3.2 will >>> be faster. >>> >>> Anyhow, while you are still there, can you please post the contents of >>> the gmp-mparam.h file that you used when the fac_ui test failed the >>> assert. That would be a great help to us! >>> >>> Bill. >>> >>> 2010/1/10 Gianrico Fini <gianrico.f...@gmail.com>: >>> >>> > Let's abandon GMP5 alone for a while. >>> >>> > On my CPU, GMP432 get this values: >>> > Category base=> 1546, 1104 >>> > Program rsa (weight 1.00) => 398, 284 >>> > Program pi (weight 1.00) => 4.46, 3.18 >>> > Program bpsw (weight 1.00) => 2.31, 1.65 >>> > Program wagstaff (weight 1.00) => 10.7, 7.62 >>> > Program mersenne (weight 1.00) => 0.959,0.685 >>> > Program fermat (weight 1.00) => 70.4, 50.3 => 12.0, 8.56 >>> >>> > MPIR-130rc2 get this values: >>> > Category base=> 1394, 996 >>> > Program rsa (weight 1.00) => 316, 226 >>> > Program pi (weight 1.00) => 4.03, 2.88 >>> > Program bpsw (weight 1.00) => 2.16, 1.54 >>> > Program wagstaff (weight 1.00) => 7.69, 5.49 >>> > Program mersenne (weight 1.00)=> 2.43, 1.74 >>> > Program fermat (weight 1.00) => 234, 167 => 15.1, 10.8 >>> >>> > They are strange... GMP432 get slightly better values almost >>> > everywhere: overall for basic operations, for rsa, for pi, for bpsw, >>> > for wagstaff... and get terribly worst value for mersenne and fermat >>> > only. >>> > You speak about applications, well on my old 32-bits machine GMP seems >>> > faster for almost all applications except two... I'll investigate. >>> >>> > Gian. >>> >>> > On 10 Gen, 22:18, Bill Hart <goodwillh...@googlemail.com> wrote: >>> >> You are of course welcome to choose whichever package best meets your >>> >> needs. And indeed on your particular system, it seems GMP may well do >>> >> that for you at present. >>> >>> >> One thing you should bear in mind however. Here are some times as they >>> >> have changed over the past year and a half: >>> >>> >> K8 >>> >>> >> Multiplication: GMP 4.3.0 MPIR 1.2.0 MPIR 1.1.2 MPIR 1.0.0 MPIR >>> >> 0.9.0 GMP 4.2.1 >>> >> ========== ======== ========= ========= ======== >>> >> ======== ========= >>> >> 128 x 128 : 52766506 53794646 51802252 35856598 >>> >> 37299412 25896136 >>> >> 512 x 512 : 10879150 12488043 11802334 10928085 >>> >> 8122452 6383542 >>> >> 8192 x 8192 : 114927 117404 111772 111641 >>> >> 86301 60819 >>> >> 131072 x 131072 : 1757 2062 1873 1650 >>> >> 1165 885 >>> >> 2097152 x 2097152 : 52.5 63.4 44.1 44.1 >>> >> 36.8 30.1 >>> >>> >> So as you can see, the times have changed *MUCH* more for *both* >>> >> projects than the current difference between them. In fact >>> >> multiplication speed (the most important speed by far) has nearly >>> >> doubled in the past year, right across the board. I think with GMP 5 >>> >> and MPIR 1.3 it really has doubled. >>> >>> >> So it is the improvement *over time* which is the important thing. >>> >> You'll also note that the projects have leapfrogged each other, MPIR >>> >> 0.9.0 beating GMP 4.2.1, GMP 4.3.0 beating MPIR 1.1.2, MPIR 1.2.0 >>> >> beating GMP 4.3.0 and so on. So it does depend at what time you do the >>> >> comparison whether one or the other is better. >>> >>> >> Also, if you look at the times Case provided, what you said about only >>> >> multiplication above 100000 bits being faster is not really true. >>> >> There are other places where MPIR beats GMP, even on that system. Also >>> >> Case's benchmark only tests certain functionality. The full benchmark >>> >> that we were running earlier shows plenty of other improvements of >>> >> MPIR over GMP and is intended to give a much better overall guide. >>> >> Case is showing us benchmarks that he is personally very interested >>> >> in, and so that will be important for us to look at improving. >>> >>> >> Some of the program benchmarks that we have in our full benchmark >>> >> suite tell a completely different story, putting MPIR well ahead for >>> >> those sorts of things. They show that in an overall program, we do >>> >> quite well. >>> >>> >> As I said, it is a mixed bag. Neither is showing clear superiority at >>> >> this point. However, I will accept that on your 32 bit system, the >>> >> assembly code is better optimised in GMP. That is definitely something >>> >> we should look at improving further. >>> >>> >> Of course that is not completely trivial to do though. You are welcome >>> >> to give it a go. I believe you will very quickly find that just about >>> >> everything you try will make it slower. The assembly optimisation has >>> >> got to such an art these days it cannot be done by hand. We have >>> >> special optimisation tools for doing it, and it takes large amounts of >>> >> CPU time, and human hours, to do the optimisation work. Progressively, >>> >> over time, all the code will get optimised, but it is a long process! >>> >>> >> Bill. >>> >>> >> 2010/1/10 Gianrico Fini <gianrico.f...@gmail.com>: >>> >>> >> > It seems that also on your platform (32 bits you too?) MPIR is faster >>> >> > only for one thing: multiplication (or squaring) above 100000 digits, >>> >> > up to 30%. >>> >> > And slower almost everywhere... somewhere +100% or more... >>> >>> >> > This strengthen my decision... >>> >>> >> > Gian. >>> >>> >> > On 10 Gen, 18:47, Case Vanhorsen <cas...@gmail.com> wrote: >>> >> >> I'll toss in my benchmark results. :-) >>> >>> >> >> GMPY performance benchmark >>> >>> >> >> Decimal string to mpz: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000021 sec 0.00000022 sec >>> >> >> 100 digits: 0.00000063 sec 0.00000066 sec >>> >> >> 500 digits: 0.00000318 sec 0.00000302 sec >>> >> >> 1000 digits: 0.00000716 sec 0.00000693 sec >>> >> >> 5000 digits: 0.00008661 sec 0.00006298 sec >>> >> >> 10000 digits: 0.00026616 sec 0.00016775 sec >>> >> >> 50000 digits: 0.00265514 sec 0.00168555 sec >>> >> >> 100000 digits: 0.00651324 sec 0.00444604 sec >>> >> >> 500000 digits: 0.04866513 sec 0.03830050 sec >>> >> >> 1000000 digits: 0.11429363 sec 0.09162606 sec >>> >> >> 10000000 digits: 2.31600404 sec 1.59257817 sec >>> >>> >> >> Mpz to decimal string: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000034 sec 0.00000035 sec >>> >> >> 100 digits: 0.00000105 sec 0.00000101 sec >>> >> >> 500 digits: 0.00000717 sec 0.00000589 sec >>> >> >> 1000 digits: 0.00001586 sec 0.00001262 sec >>> >> >> 5000 digits: 0.00014800 sec 0.00010783 sec >>> >> >> 10000 digits: 0.00041150 sec 0.00029588 sec >>> >> >> 50000 digits: 0.00420932 sec 0.00338085 sec >>> >> >> 100000 digits: 0.01185473 sec 0.00920948 sec >>> >> >> 500000 digits: 0.12125288 sec 0.08355007 sec >>> >> >> 1000000 digits: 0.31727976 sec 0.20738387 sec >>> >> >> 10000000 digits: 7.70821309 sec 3.94376493 sec >>> >>> >> >> Mpz addition: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000010 sec 0.00000009 sec >>> >> >> 100 digits: 0.00000010 sec 0.00000010 sec >>> >> >> 500 digits: 0.00000012 sec 0.00000011 sec >>> >> >> 1000 digits: 0.00000014 sec 0.00000013 sec >>> >> >> 5000 digits: 0.00000051 sec 0.00000050 sec >>> >> >> 10000 digits: 0.00000073 sec 0.00000073 sec >>> >> >> 50000 digits: 0.00000430 sec 0.00000429 sec >>> >> >> 100000 digits: 0.00000822 sec 0.00000818 sec >>> >> >> 500000 digits: 0.00003971 sec 0.00003959 sec >>> >> >> 1000000 digits: 0.00007838 sec 0.00007884 sec >>> >> >> 10000000 digits: 0.00357354 sec 0.00354370 sec >>> >> >> 100000000 digits: 0.05413541 sec 0.05324940 sec >>> >>> >> >> 1NxN mpz multiplication: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000009 sec 0.00000009 sec >>> >> >> 100 digits: 0.00000017 sec 0.00000018 sec >>> >> >> 500 digits: 0.00000124 sec 0.00000126 sec >>> >> >> 1000 digits: 0.00000414 sec 0.00000378 sec >>> >> >> 5000 digits: 0.00004730 sec 0.00004805 sec >>> >> >> 10000 digits: 0.00012850 sec 0.00012088 sec >>> >> >> 50000 digits: 0.00123085 sec 0.00109137 sec >>> >> >> 100000 digits: 0.00290135 sec 0.00280582 sec >>> >> >> 500000 digits: 0.01663006 sec 0.01763764 sec >>> >> >> 1000000 digits: 0.03379822 sec 0.03994881 sec >>> >> >> 10000000 digits: 0.68572044 sec 0.61115754 sec >>> >> >> 100000000 digits: 6.44622898 sec 7.93841791 sec >>> >>> >> >> 5NxN mpz multiplication: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000011 sec 0.00000010 sec >>> >> >> 100 digits: 0.00000038 sec 0.00000040 sec >>> >> >> 500 digits: 0.00000604 sec 0.00000652 sec >>> >> >> 1000 digits: 0.00002064 sec 0.00001863 sec >>> >> >> 5000 digits: 0.00023417 sec 0.00021708 sec >>> >> >> 10000 digits: 0.00064239 sec 0.00058681 sec >>> >> >> 50000 digits: 0.00608666 sec 0.00436574 sec >>> >> >> 100000 digits: 0.00847080 sec 0.00917852 sec >>> >> >> 500000 digits: 0.05356821 sec 0.06811212 sec >>> >> >> 1000000 digits: 0.12863311 sec 0.14648414 sec >>> >> >> 10000000 digits: 2.27829909 sec 2.17810798 sec >>> >> >> 100000000 digits: 21.30186605 sec 27.38823199 sec >>> >>> >> >> 17NxN mpz multiplication: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000010 sec 0.00000011 sec >>> >> >> 100 digits: 0.00000113 sec 0.00000108 sec >>> >> >> 500 digits: 0.00002057 sec 0.00002183 sec >>> >> >> 1000 digits: 0.00007094 sec 0.00006423 sec >>> >> >> 5000 digits: 0.00081254 sec 0.00071725 sec >>> >> >> 10000 digits: 0.00217992 sec 0.00197989 sec >>> >> >> 50000 digits: 0.02072028 sec 0.01620061 sec >>> >> >> 100000 digits: 0.02676870 sec 0.03553003 sec >>> >> >> 500000 digits: 0.20828125 sec 0.23191699 sec >>> >> >> 1000000 digits: 0.42618978 sec 0.52746260 sec >>> >> >> 10000000 digits: 5.84609008 sec 7.77125812 sec >>> >> >> 100000000 digits: 74.05822110 sec 100.53587508 sec >>> >>> >> >> 2N/N mpz quotient: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000018 sec 0.00000018 sec >>> >> >> 100 digits: 0.00000041 sec 0.00000037 sec >>> >> >> 500 digits: 0.00000234 sec 0.00000203 sec >>> >> >> 1000 digits: 0.00000729 sec 0.00000638 sec >>> >> >> 5000 digits: 0.00009662 sec 0.00009747 sec >>> >> >> 10000 digits: 0.00029030 sec 0.00029359 sec >>> >> >> 50000 digits: 0.00329851 sec 0.00279975 sec >>> >> >> 100000 digits: 0.00912671 sec 0.00663861 sec >>> >> >> 500000 digits: 0.07756643 sec 0.04376046 sec >>> >> >> 1000000 digits: 0.18805614 sec 0.10166769 sec >>> >> >> 10000000 digits: 3.46835899 sec 1.65955496 sec >>> >> >> 100000000 digits: 57.28032804 sec 21.36209702 sec >>> >>> >> >> 5N/N mpz quotient: MPIR 1.3.0 GMP 5.0.0 >>> >> >> 10 digits: 0.00000021 sec 0.00000020 sec >>> >> >> 100 digits: 0.00000095 sec 0.00000085 sec >>> >> >> 500 digits: 0.00000846 sec 0.00000747 sec >>> >> >> 1000 digits: 0.00002843 sec >>> >>> >> ... >>> >>> >> leggi tutto >>> >>> > -- >>> > You received this message because you are subscribed to the Google Groups >>> > "mpir-devel" group. >>> > To post to this group, send email to mpir-de...@googlegroups.com. >>> > To unsubscribe from this group, send email to >>> > mpir-devel+unsubscr...@googlegroups.com. >>> > For more options, visit this group >>> > athttp://groups.google.com/group/mpir-devel?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "mpir-devel" group. >> To post to this group, send email to mpir-de...@googlegroups.com. >> To unsubscribe from this group, send email to >> mpir-devel+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/mpir-devel?hl=en. >> >> >> >> >
-- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-de...@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.