On Friday 04 March 2011 10:56:08 Jeff Gilchrist wrote: > On Thu, Mar 3, 2011 at 5:04 AM, Jason <ja...@njkfrudils.plus.com> wrote: > > OK , but how about tuning and timings? > > Does anyone here have any experience in this ? , or any recommendations > > I don't think I have ever actually tested things in virtual machines > such as tuning/timings. Since you still have your 32bit "native" > install, I would suggest doing some tuning, benchmarking and record > the timings. Set up a 32bit VM in your 64bit Linux environment and > then re-run the tests on the same machine to see if there is a drastic > change. > > Jeff.
Trying VirtualBox on my trusty old K8 which has NO specific virtualization enhancements. Host system is 64 linux and guest is 32 linux and I did 10 trys For a native 32bit linux the mpn cycle count benchmark is the attached "real" and for the virtual linux we get "virtual" You can see the real system has a very consistent set of values whereas the virtual system has timings which vary a lot , sometimes even "faster" , which means we cant even take the smallest value :( For the make tune what params do we get , for the real system see "tune_real" and for virtual linux see "tune_virtual" You can see again the real system is fairly consistent( the one's that arn't either have very similar slopes at the crossover or our tuning is slightly wrong) , and the virtual systems tuning is not very useful. I dont plan to write any 32bit code so I dont care if I cant get reliable timings , the only thing my K8 is going to do is testing for 32/64bit and timings and tuning for 64bit on it's native 64bit OS. I suppose I could install native Linux distro which has both 32 and 64bit in the FULL toolchain(which ones have this?). I think I may try a virtual Solaris on the K8 for testing purposes (cant test the BSD's on it as they require the virtualization extensions.) The K8 is not "made" anymore and the more modern cpu's have virtualization extensions (so I can test the BSD's as well) so I'll give the nehalem a go and see if this makes the timings better. I also want to change the way we do our fake cpu testing , at the mo the fake cpu testing is exactly like the proper test except 1) cpu detection is overridden 2) doesn't work for fat builds 3) doesn't test for instruction extensions from asm or compiler 4) a subtle autotools bug could hide other differences as we override the build mechanism of autotools We cant trap on the cpuid instruction , but we can replace it with a macro to simulate it , this would only leave option 3 above as the only difference from a real chip. Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.
cpu k8 k8 k8 k8 k8 k8 k8 k8 k8 add_n 2030 2030 2030 2030 2030 2030 2030 2030 2030 sub_n 2030 2030 2030 2030 2030 2030 2030 2030 2030 mul_1 3092 3092 3092 3092 3092 3092 3092 3092 3092 addmul_1 3040 3039 3040 3041 3039 3040 3039 3040 3039 submul_1 3039 3040 3039 3039 3040 3041 3040 3040 3044 mul_2 addmul_2 submul_2 addadd_n addsub_n subadd_n lshift 1254 1253 1253 1253 1254 1253 1253 1254 1253 rshift 1251 1250 1250 1250 1250 1250 1250 1251 1250 lshift2 1255 1255 1252 1252 1252 1252 1252 1254 1252 rshift2 1250 1250 1250 1250 1250 1250 1250 1250 1250 lshift1 1255 1254 1252 1252 1252 1252 1254 1252 1253 rshift1 1250 1250 1250 1250 1250 1250 1250 1250 1250 addlsh1_n sublsh1_n addlsh_n sublsh_n inclsh_n declsh_n rsh1add_n rsh1sub_n sumdiff_n store 2014 2014 2014 2014 2014 2014 2014 2014 2014 copyi 781 781 781 781 781 781 781 781 781 copyd 781 781 781 781 781 781 781 781 781 rsblsh1_n addlsh2_n rsblsh2_n popcount 5034 5037 5034 5037 5037 5037 5034 5037 5037 hamdist 5866 6031 5866 6031 5865 5952 5964 5865 5951 com 1025 1026 1025 1025 1026 1025 1025 1025 1026 not 1026 1026 1026 1026 1026 1026 1026 1026 1026 and_n 2221 2221 2221 2221 2221 2221 2221 2221 2221 xor_n 2221 2221 2221 2221 2221 2221 2221 2221 2221 ior_n 2221 2221 2222 2221 2221 2221 2221 2221 2221 nand_n 3014 3014 3014 3014 3014 3014 3014 3014 3014 nior_n 3014 3014 3014 3014 3014 3014 3014 3014 3014 xnor_n 3014 3014 3014 3014 3014 3014 3014 3014 3014 andn_n 3014 3014 3014 3014 3014 3014 3014 3014 3014 iorn_n 3014 3014 3014 3014 3014 3014 3014 3014 3014 lshiftc divexact_byff 3361 3361 3361 3361 3361 3361 3361 3361 3361 divexact_byfobm1 5379 5379 5379 5379 5380 5379 5379 5380 5379 divexact_by3 6025 6025 6025 6025 6025 6025 6025 6025 6025 divexact_1 9098 9098 9098 9098 9098 9098 9098 9098 9096 modexact_1c_odd 7038 7038 7038 7038 7038 7038 7038 7038 7038 add_err1_n 13603 13570 13578 13569 13569 13602 13602 13603 13579 sub_err1_n 14086 13910 13909 14072 14086 14070 13892 14070 13892 divrem_euclidean_qr_1 19792 19815 19792 19792 19792 19792 19815 19792 19785 divrem_euclidean_qr_2 44376 44365 44383 44364 44364 44399 44383 44375 44384 divrem_euclidean_r_1 4908 4908 4911 4908 4908 4908 4911 4906 4906 divrem_1 15654 15666 15666 15666 15671 15671 15671 15666 15666 divrem_2 45115 45099 45099 45115 45098 45104 45099 45100 45110 divrem_hensel_qr_1 10066 10066 10066 10066 10066 10066 10066 10066 10066 divrem_hensel_qr_1_1 8040 8040 8040 8040 8040 8040 8040 8040 8040 divrem_hensel_qr_1_2 10052 10052 10052 10052 10052 10052 10052 10052 10052 divrem_hensel_r_1 8041 8041 8041 8041 8041 8041 8041 8041 8041 divrem_hensel_rsh_qr_1 11046 11046 11046 11045 11046 11046 11045 11046 11046 rsh_divrem_hensel_qr_1 12070 12070 12070 12075 12074 12074 12070 12069 12074 rsh_divrem_hensel_qr_1_1 10054 10057 10054 10054 10054 10054 10054 10057 10054 rsh_divrem_hensel_qr_1_2 12058 12060 12064 12060 12064 12060 12060 12064 12058 mod_1_1 7028 7028 7027 7027 7028 7027 7027 7027 7027 mod_1_2 7728 7728 7728 7727 7727 7725 7728 7727 7728 mod_1_3 4761 4762 4760 4762 4761 4760 4760 4760 4761 mod_1_4 mod_34lsub1 1035 1035 1034 1034 1034 1034 1034 1034 1034
cpu k8 k8 k8 k8 k8 k8 k8 k8 k8 add_n 2482 2114 2292 2299 2044 2280 2092 2231 2149 sub_n 2401 2168 2259 2342 2052 2691 2225 2027 2725 mul_1 3570 3618 2720 3472 3718 3057 3674 3558 3052 addmul_1 3235 3182 3703 3076 2567 3404 3254 3439 3544 submul_1 3406 3547 3423 3065 3173 3425 3971 3168 3556 mul_2 addmul_2 submul_2 addadd_n addsub_n subadd_n lshift 1511 1539 1397 1462 1392 1420 1385 1464 1515 rshift 1561 1335 1391 1394 1674 1370 1208 1429 1377 lshift2 1474 1364 1298 1345 1394 1466 1573 1379 1541 rshift2 1397 1197 1375 1360 1298 1475 1273 1507 1018 lshift1 1385 1448 1289 1472 1368 1236 1234 1332 1589 rshift1 1260 1419 1514 1380 1427 1275 1428 1359 1417 addlsh1_n sublsh1_n addlsh_n sublsh_n inclsh_n declsh_n rsh1add_n rsh1sub_n sumdiff_n store 2287 2132 2645 2063 2142 2198 2040 2378 2043 copyi 869 870 842 588 911 972 981 960 784 copyd 898 930 891 851 896 873 986 840 858 rsblsh1_n addlsh2_n rsblsh2_n popcount 5521 5728 4866 5957 6162 5747 5649 5869 6411 hamdist 6949 6323 6860 6696 6756 6650 6372 5071 6051 com 1263 1058 1315 1156 1102 1044 935 1160 1086 not 1242 1170 1136 1243 1155 1104 1194 1254 1199 and_n 2412 2311 2253 2835 2519 2341 2572 2524 2427 xor_n 2485 2348 2656 2478 2388 2555 2513 2588 2480 ior_n 2534 2481 2710 2735 2232 2596 2595 2559 2456 nand_n 3904 3521 3378 3941 3535 3405 3203 3807 3327 nior_n 3495 3432 3395 3232 3394 3637 3419 3265 3341 xnor_n 3700 3630 3094 3126 3384 3200 3306 3378 3690 andn_n 3512 3431 3192 3027 3722 3657 3470 3134 3389 iorn_n 3297 2962 3022 2968 4106 3282 2624 3295 3738 lshiftc divexact_byff 3791 4061 3743 3633 3833 3448 3870 3398 3692 divexact_byfobm1 7011 5935 5404 6970 6562 6108 6245 6470 5690 divexact_by3 6688 6801 6790 6834 6721 6774 7213 6248 6971 divexact_1 10040 10515 10860 9667 9180 9340 10282 8327 10128 modexact_1c_odd 7666 7681 7500 7421 7723 7144 8000 7932 7533 add_err1_n 13238 15623 13962 15885 16311 15802 10313 15338 15449 sub_err1_n 15385 13936 13041 15750 17489 15984 12583 16250 16013 divrem_euclidean_qr_1 22612 22908 15251 18005 22265 23139 23696 22830 20883 divrem_euclidean_qr_2 45640 51005 52155 53907 53798 51676 50391 45438 51260 divrem_euclidean_r_1 4857 5551 5481 5418 5595 5452 5326 5807 5907 divrem_1 18381 17490 17063 17010 16760 15168 16881 17814 18286 divrem_2 52993 51151 52877 50495 50141 58222 53430 50303 51313 divrem_hensel_qr_1 11809 10592 11959 11243 8775 11669 10140 10147 9442 divrem_hensel_qr_1_1 9380 8792 9594 9350 8781 10458 9284 8733 8719 divrem_hensel_qr_1_2 10611 10527 10877 12202 11589 13080 13503 11031 7745 divrem_hensel_r_1 8962 9527 8895 9211 9145 9084 9004 8441 9638 divrem_hensel_rsh_qr_1 16488 11708 12658 11535 11900 13847 12636 12495 11230 rsh_divrem_hensel_qr_1 13878 14182 13056 12817 11278 14023 12248 14690 12855 rsh_divrem_hensel_qr_1_1 10579 10733 10269 9911 12034 11214 10127 9637 11406 rsh_divrem_hensel_qr_1_2 11935 14033 12105 15604 10113 13502 14378 13289 11672 mod_1_1 8260 7590 8148 9793 8312 6946 5753 8121 7585 mod_1_2 8950 7892 8236 7388 8569 9376 9454 8673 8844 mod_1_3 4706 4863 5348 4763 5486 5909 4972 5407 5558 mod_1_4 mod_34lsub1 1176 1209 1188 1138 1128 1140 1040 964 1218
/* Generated by tuneup.c, 2011-03-17, gcc 4.4 */ by by by by by by by by #define MUL_KARATSUBA_THRESHOLD 28 28 28 28 28 28 28 28 28 #define MUL_TOOM3_THRESHOLD 97 97 97 97 97 97 97 97 97 #define MUL_TOOM4_THRESHOLD 214 214 214 214 214 214 214 214 214 #define MUL_TOOM8H_THRESHOLD 303 303 303 303 303 303 303 303 303 #define SQR_BASECASE_THRESHOLD 0 /* always (native) */ 0 0 0 0 0 0 0 0 #define SQR_KARATSUBA_THRESHOLD 46 46 46 46 46 46 46 46 46 #define SQR_TOOM3_THRESHOLD 90 90 90 90 90 89 89 89 89 #define SQR_TOOM4_THRESHOLD 232 240 232 240 240 236 228 236 244 #define SQR_TOOM8_THRESHOLD 286 254 278 254 270 262 278 278 254 #define POWM_THRESHOLD 175 180 170 195 200 175 190 180 206 #define GCD_THRESHOLD 498 498 498 498 482 502 486 502 502 #define GCDEXT_THRESHOLD 996 996 996 996 996 996 996 978 996 #define JACOBI_BASE_METHOD 1 1 1 1 1 1 1 1 1 #define USE_PREINV_DIVREM_1 1 /* native */ 1 1 1 1 1 1 1 1 #define USE_PREINV_MOD_1 1 /* native */ 1 1 1 1 1 1 1 1 #define DIVREM_2_THRESHOLD 0 /* always */ 0 0 0 0 0 0 0 0 #define DIVEXACT_1_THRESHOLD 0 /* always (native) */ 0 0 0 0 0 0 0 0 #define MODEXACT_1_ODD_THRESHOLD 0 /* always (native) */ 0 0 0 0 0 0 0 0 #define MOD_1_1_THRESHOLD 4 3 4 83 7 230 3 127 132 #define MOD_1_2_THRESHOLD 183 126 135 180 20 732 22 482 172 #define MOD_1_3_THRESHOLD 996 126 987 180 22 739 155 522 173 #define DIVREM_HENSEL_QR_1_THRESHOLD 22 27 17 24 23 14 18 17 32 #define RSH_DIVREM_HENSEL_QR_1_THRESHOLD 13 12 7 10 14 14 9 10 12 #define DIVREM_EUCLID_HENSEL_THRESHOLD 66 8 196 10 17 11 9 26 22 #define ROOTREM_THRESHOLD 7 6 11 6 7 6 6 9 6 #define GET_STR_DC_THRESHOLD 15 15 15 14 15 14 15 15 14 #define GET_STR_PRECOMPUTE_THRESHOLD 25 25 25 26 26 27 26 25 27 #define SET_STR_DC_THRESHOLD 324 345 345 345 327 345 327 333 327 #define SET_STR_PRECOMPUTE_THRESHOLD 330 379 345 5805 416 511 642 618 465 #define MUL_FFT_TABLE { 400, 1184, 1408, 3584, 14336, 57344, 0 } { { { { { { { { #define MUL_FFT_MODF_THRESHOLD 416 416 416 448 416 416 416 416 416 #define MUL_FFT_FULL_THRESHOLD 1664 1664 1664 1664 1664 1664 1664 1664 1664 #define SQR_FFT_TABLE { 368, 992, 1408, 3584, 10240, 40960, 0 } { { { { { { { { #define SQR_FFT_MODF_THRESHOLD 384 384 384 384 384 384 384 384 384 #define SQR_FFT_FULL_THRESHOLD 1664 1664 1664 1664 1664 1664 1664 1664 1664 #define MULLOW_BASECASE_THRESHOLD 13 13 13 13 13 13 13 13 13 #define MULLOW_DC_THRESHOLD 15 15 16 15 15 16 17 16 14 #define MULLOW_MUL_THRESHOLD 2852 2852 2852 2852 2852 2852 2852 2852 2852 #define MULHIGH_BASECASE_THRESHOLD 25 28 27 25 25 27 25 27 27 #define MULHIGH_DC_THRESHOLD 25 28 27 25 25 27 25 27 27 #define MULHIGH_MUL_THRESHOLD 2852 2852 2852 2852 2852 2852 2852 2852 2852 #define MULMOD_2EXPM1_THRESHOLD 24 22 24 24 24 24 24 24 22 #define FAC_UI_THRESHOLD 32756 32756 32756 32756 32756 32756 32756 32756 32756 #define DC_DIV_QR_THRESHOLD 92 100 94 96 102 100 94 104 94 #define DC_DIVAPPR_Q_N_THRESHOLD 748 748 748 748 748 748 748 748 748 #define INV_DIV_QR_THRESHOLD 3344 3344 3344 3344 3344 3344 3344 3344 3344 #define INV_DIVAPPR_Q_N_THRESHOLD 748 748 748 748 748 748 748 748 748 #define DC_DIV_Q_THRESHOLD 777 777 777 777 777 777 777 777 777 #define INV_DIV_Q_THRESHOLD 1187 1187 1187 1187 1187 1187 1187 1187 1187 #define DC_DIVAPPR_Q_THRESHOLD 720 720 720 720 720 720 720 734 734 #define INV_DIVAPPR_Q_THRESHOLD 4823 4823 4823 4823 4823 4823 4823 3690 3690 #define DC_BDIV_QR_THRESHOLD 92 92 90 92 92 92 92 92 92 #define DC_BDIV_Q_THRESHOLD 706 706 680 706 706 680 706 706 680 /* Tuneup completed successfully, took 70 seconds */ completed completed completed completed completed completed completed completed
/* Generated by tuneup.c, 2011-03-17, gcc 4.4 */ by by by by by by by by #define MUL_KARATSUBA_THRESHOLD 26 28 28 28 24 26 28 24 28 #define MUL_TOOM3_THRESHOLD 59 85 43 83 86 106 125 87 86 #define MUL_TOOM4_THRESHOLD 128 133 124 131 92 116 184 124 154 #define MUL_TOOM8H_THRESHOLD 248 214 183 204 260 246 248 272 188 #define SQR_BASECASE_THRESHOLD 0 /* always (native) */ 0 0 0 0 0 0 0 0 #define SQR_KARATSUBA_THRESHOLD 28 50 43 42 53 42 48 39 47 #define SQR_TOOM3_THRESHOLD 58 91 93 77 71 92 90 126 71 #define SQR_TOOM4_THRESHOLD 123 113 125 151 140 155 92 238 102 #define SQR_TOOM8_THRESHOLD 128 234 218 254 188 175 193 246 189 #define POWM_THRESHOLD 35 18 51 39 71 46 44 75 34 #define GCD_THRESHOLD 60 28 25 17 414 33 22 38 55 #define GCDEXT_THRESHOLD 996 58 163 996 996 35 996 996 996 #define JACOBI_BASE_METHOD 1 1 1 1 1 1 1 1 1 #define USE_PREINV_DIVREM_1 1 /* native */ 1 1 1 1 1 1 1 1 #define USE_PREINV_MOD_1 1 /* native */ 1 1 1 1 1 1 1 1 #define DIVREM_2_THRESHOLD 5 0 0 0 0 0 0 0 0 #define DIVEXACT_1_THRESHOLD 0 /* always (native) */ 0 0 0 0 0 0 0 0 #define MODEXACT_1_ODD_THRESHOLD 0 /* always (native) */ 0 0 0 0 0 0 0 0 #define MOD_1_1_THRESHOLD 30 10 3 34 16 39 8 24 6 #define MOD_1_2_THRESHOLD 47 22 83 38 16 58 8 34 33 #define MOD_1_3_THRESHOLD 66 38 84 54 51 91 8 48 49 #define DIVREM_HENSEL_QR_1_THRESHOLD 13 27 14 18 20 25 24 16 26 #define RSH_DIVREM_HENSEL_QR_1_THRESHOLD 10 3 8 8 10 12 11 16 9 #define DIVREM_EUCLID_HENSEL_THRESHOLD 8 24 46 8 30 8 20 20 9 #define ROOTREM_THRESHOLD 7 6 11 7 6 6 7 6 8 #define GET_STR_DC_THRESHOLD 14 7 14 7 10 21 16 12 15 #define GET_STR_PRECOMPUTE_THRESHOLD 32 34 29 37 27 23 28 28 29 #define SET_STR_DC_THRESHOLD 117 107 100 224 100 151 100 135 109 #define SET_STR_PRECOMPUTE_THRESHOLD 117 210 106 297 104 193 186 137 157 #define MUL_FFT_TABLE { 272, 1120, 1920, 3584, 10240, 57344, 0 } { { { { { { { { #define MUL_FFT_MODF_THRESHOLD 336 416 432 336 368 464 312 264 368 #define MUL_FFT_FULL_THRESHOLD 2176 1664 1920 1664 1408 1664 1920 1664 1408 #define SQR_FFT_TABLE { 240, 992, 1152, 2560, 10240, 0 } { { { { { { { { #define SQR_FFT_MODF_THRESHOLD 152 216 344 216 216 216 296 280 264 #define SQR_FFT_FULL_THRESHOLD 1408 1920 1664 1664 1664 1152 1664 1152 1920 #define MULLOW_BASECASE_THRESHOLD 9 13 10 9 18 15 10 11 9 #define MULLOW_DC_THRESHOLD 13 16 17 15 18 17 15 11 17 #define MULLOW_MUL_THRESHOLD 36 450 246 366 23 442 498 517 760 #define MULHIGH_BASECASE_THRESHOLD 24 25 25 27 26 27 25 21 18 #define MULHIGH_DC_THRESHOLD 24 25 25 27 26 27 25 21 18 #define MULHIGH_MUL_THRESHOLD 327 37 57 232 438 49 375 375 104 #define MULMOD_2EXPM1_THRESHOLD 24 24 22 22 24 28 28 24 20 #define FAC_UI_THRESHOLD 1769 4432 1044 2767 4303 3900 2767 1718 2877 #define DC_DIV_QR_THRESHOLD 23 68 34 40 53 41 33 80 93 #define DC_DIVAPPR_Q_N_THRESHOLD 748 839 205 807 998 906 27 998 432 #define INV_DIV_QR_THRESHOLD 1414 3478 511 3344 551 3410 573 379 483 #define INV_DIVAPPR_Q_N_THRESHOLD 748 839 205 807 998 906 27 998 432 #define DC_DIV_Q_THRESHOLD 889 998 49 998 73 562 720 116 924 #define INV_DIV_Q_THRESHOLD 3152 3410 3547 2747 456 2642 2747 3547 2541 #define DC_DIVAPPR_Q_THRESHOLD 229 60 47 72 114 116 19 98 44 #define INV_DIVAPPR_Q_THRESHOLD 7629 6514 7350 8669 6576 6686 6514 8192 7354 #define DC_BDIV_QR_THRESHOLD 130 30 88 64 51 47 60 46 17 #define DC_BDIV_Q_THRESHOLD 69 29 654 136 209 24 217 483 278 /* Tuneup completed successfully, took 145 seconds */ completed completed completed completed completed completed completed completed