On Friday 04 March 2011 10:56:08 Jeff Gilchrist wrote:
> On Thu, Mar 3, 2011 at 5:04 AM, Jason <ja...@njkfrudils.plus.com> wrote:
> > OK , but how about tuning and timings?
> > Does anyone here have any experience in this ? , or any recommendations
> 
> I don't think I have ever actually tested things in virtual machines
> such as tuning/timings.  Since you still have your 32bit "native"
> install, I would suggest doing some tuning, benchmarking and record
> the timings. Set up a 32bit VM in your 64bit Linux environment and
> then re-run the tests on the same machine to see if there is a drastic
> change.
> 
> Jeff.

Trying VirtualBox on my trusty old K8 which has NO specific virtualization 
enhancements. Host system is 64 linux and guest is 32 linux and I did 10 trys

For a native 32bit linux the mpn cycle count benchmark is the attached "real"
and for the virtual linux we get "virtual"
You can see the real system has  a very consistent set of values whereas the 
virtual system has timings which vary a lot , sometimes even "faster" , which 
means we cant even take the smallest value :(

For the make tune what params do we get , for the real system see "tune_real"
and for virtual linux see "tune_virtual"
You can see again the real system is fairly consistent( the one's that arn't 
either have very similar slopes at the crossover or our tuning is slightly 
wrong) , and the virtual systems tuning is not very useful.

I dont plan to write any 32bit code so I dont care if I cant get reliable 
timings , the only thing my K8 is going to do is testing for 32/64bit and 
timings and tuning for 64bit on it's native 64bit OS. I suppose I could 
install native Linux distro which has both 32 and 64bit in the FULL 
toolchain(which ones have this?). I think I may try a virtual Solaris on the 
K8 for testing purposes (cant test the BSD's on it as they require the 
virtualization extensions.)

The K8 is not "made" anymore and the more modern cpu's have virtualization 
extensions (so I can test the BSD's as well) so I'll give the nehalem a go and 
see if  this makes the timings better.

I also want to change the way we do our fake cpu testing  , at the mo the fake 
cpu testing is exactly like the proper test except
1) cpu detection is overridden
2) doesn't work for fat builds
3) doesn't test for instruction extensions from asm or compiler
4) a subtle autotools bug could hide other differences as we override the build 
mechanism of autotools

We cant trap on the cpuid instruction , but we can replace it with a macro to 
simulate it , this would only leave option 3 above as the only difference from 
a real chip.

Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

cpu                     k8      k8      k8      k8      k8      k8      k8      
k8      k8
                   add_n        2030    2030    2030    2030    2030    2030    
2030    2030    2030
                   sub_n        2030    2030    2030    2030    2030    2030    
2030    2030    2030
                   mul_1        3092    3092    3092    3092    3092    3092    
3092    3092    3092
                addmul_1        3040    3039    3040    3041    3039    3040    
3039    3040    3039
                submul_1        3039    3040    3039    3039    3040    3041    
3040    3040    3044
                   mul_2                                                        
                
                addmul_2                                                        
                
                submul_2                                                        
                
                addadd_n                                                        
                
                addsub_n                                                        
                
                subadd_n                                                        
                
                  lshift        1254    1253    1253    1253    1254    1253    
1253    1254    1253
                  rshift        1251    1250    1250    1250    1250    1250    
1250    1251    1250
                 lshift2        1255    1255    1252    1252    1252    1252    
1252    1254    1252
                 rshift2        1250    1250    1250    1250    1250    1250    
1250    1250    1250
                 lshift1        1255    1254    1252    1252    1252    1252    
1254    1252    1253
                 rshift1        1250    1250    1250    1250    1250    1250    
1250    1250    1250
               addlsh1_n                                                        
                
               sublsh1_n                                                        
                
                addlsh_n                                                        
                
                sublsh_n                                                        
                
                inclsh_n                                                        
                
                declsh_n                                                        
                
               rsh1add_n                                                        
                
               rsh1sub_n                                                        
                
               sumdiff_n                                                        
                
                   store        2014    2014    2014    2014    2014    2014    
2014    2014    2014
                   copyi        781     781     781     781     781     781     
781     781     781
                   copyd        781     781     781     781     781     781     
781     781     781
               rsblsh1_n                                                        
                
               addlsh2_n                                                        
                
               rsblsh2_n                                                        
                
                popcount        5034    5037    5034    5037    5037    5037    
5034    5037    5037
                 hamdist        5866    6031    5866    6031    5865    5952    
5964    5865    5951
                     com        1025    1026    1025    1025    1026    1025    
1025    1025    1026
                     not        1026    1026    1026    1026    1026    1026    
1026    1026    1026
                   and_n        2221    2221    2221    2221    2221    2221    
2221    2221    2221
                   xor_n        2221    2221    2221    2221    2221    2221    
2221    2221    2221
                   ior_n        2221    2221    2222    2221    2221    2221    
2221    2221    2221
                  nand_n        3014    3014    3014    3014    3014    3014    
3014    3014    3014
                  nior_n        3014    3014    3014    3014    3014    3014    
3014    3014    3014
                  xnor_n        3014    3014    3014    3014    3014    3014    
3014    3014    3014
                  andn_n        3014    3014    3014    3014    3014    3014    
3014    3014    3014
                  iorn_n        3014    3014    3014    3014    3014    3014    
3014    3014    3014
                 lshiftc                                                        
                
           divexact_byff        3361    3361    3361    3361    3361    3361    
3361    3361    3361
        divexact_byfobm1        5379    5379    5379    5379    5380    5379    
5379    5380    5379
            divexact_by3        6025    6025    6025    6025    6025    6025    
6025    6025    6025
              divexact_1        9098    9098    9098    9098    9098    9098    
9098    9098    9096
         modexact_1c_odd        7038    7038    7038    7038    7038    7038    
7038    7038    7038
              add_err1_n        13603   13570   13578   13569   13569   13602   
13602   13603   13579
              sub_err1_n        14086   13910   13909   14072   14086   14070   
13892   14070   13892
   divrem_euclidean_qr_1        19792   19815   19792   19792   19792   19792   
19815   19792   19785
   divrem_euclidean_qr_2        44376   44365   44383   44364   44364   44399   
44383   44375   44384
    divrem_euclidean_r_1        4908    4908    4911    4908    4908    4908    
4911    4906    4906
                divrem_1        15654   15666   15666   15666   15671   15671   
15671   15666   15666
                divrem_2        45115   45099   45099   45115   45098   45104   
45099   45100   45110
      divrem_hensel_qr_1        10066   10066   10066   10066   10066   10066   
10066   10066   10066
    divrem_hensel_qr_1_1        8040    8040    8040    8040    8040    8040    
8040    8040    8040
    divrem_hensel_qr_1_2        10052   10052   10052   10052   10052   10052   
10052   10052   10052
       divrem_hensel_r_1        8041    8041    8041    8041    8041    8041    
8041    8041    8041
  divrem_hensel_rsh_qr_1        11046   11046   11046   11045   11046   11046   
11045   11046   11046
  rsh_divrem_hensel_qr_1        12070   12070   12070   12075   12074   12074   
12070   12069   12074
rsh_divrem_hensel_qr_1_1        10054   10057   10054   10054   10054   10054   
10054   10057   10054
rsh_divrem_hensel_qr_1_2        12058   12060   12064   12060   12064   12060   
12060   12064   12058
                 mod_1_1        7028    7028    7027    7027    7028    7027    
7027    7027    7027
                 mod_1_2        7728    7728    7728    7727    7727    7725    
7728    7727    7728
                 mod_1_3        4761    4762    4760    4762    4761    4760    
4760    4760    4761
                 mod_1_4                                                        
                
             mod_34lsub1        1035    1035    1034    1034    1034    1034    
1034    1034    1034
cpu                     k8      k8      k8      k8      k8      k8      k8      
k8      k8
                   add_n        2482    2114    2292    2299    2044    2280    
2092    2231    2149
                   sub_n        2401    2168    2259    2342    2052    2691    
2225    2027    2725
                   mul_1        3570    3618    2720    3472    3718    3057    
3674    3558    3052
                addmul_1        3235    3182    3703    3076    2567    3404    
3254    3439    3544
                submul_1        3406    3547    3423    3065    3173    3425    
3971    3168    3556
                   mul_2                                                        
                
                addmul_2                                                        
                
                submul_2                                                        
                
                addadd_n                                                        
                
                addsub_n                                                        
                
                subadd_n                                                        
                
                  lshift        1511    1539    1397    1462    1392    1420    
1385    1464    1515
                  rshift        1561    1335    1391    1394    1674    1370    
1208    1429    1377
                 lshift2        1474    1364    1298    1345    1394    1466    
1573    1379    1541
                 rshift2        1397    1197    1375    1360    1298    1475    
1273    1507    1018
                 lshift1        1385    1448    1289    1472    1368    1236    
1234    1332    1589
                 rshift1        1260    1419    1514    1380    1427    1275    
1428    1359    1417
               addlsh1_n                                                        
                
               sublsh1_n                                                        
                
                addlsh_n                                                        
                
                sublsh_n                                                        
                
                inclsh_n                                                        
                
                declsh_n                                                        
                
               rsh1add_n                                                        
                
               rsh1sub_n                                                        
                
               sumdiff_n                                                        
                
                   store        2287    2132    2645    2063    2142    2198    
2040    2378    2043
                   copyi        869     870     842     588     911     972     
981     960     784
                   copyd        898     930     891     851     896     873     
986     840     858
               rsblsh1_n                                                        
                
               addlsh2_n                                                        
                
               rsblsh2_n                                                        
                
                popcount        5521    5728    4866    5957    6162    5747    
5649    5869    6411
                 hamdist        6949    6323    6860    6696    6756    6650    
6372    5071    6051
                     com        1263    1058    1315    1156    1102    1044    
935     1160    1086
                     not        1242    1170    1136    1243    1155    1104    
1194    1254    1199
                   and_n        2412    2311    2253    2835    2519    2341    
2572    2524    2427
                   xor_n        2485    2348    2656    2478    2388    2555    
2513    2588    2480
                   ior_n        2534    2481    2710    2735    2232    2596    
2595    2559    2456
                  nand_n        3904    3521    3378    3941    3535    3405    
3203    3807    3327
                  nior_n        3495    3432    3395    3232    3394    3637    
3419    3265    3341
                  xnor_n        3700    3630    3094    3126    3384    3200    
3306    3378    3690
                  andn_n        3512    3431    3192    3027    3722    3657    
3470    3134    3389
                  iorn_n        3297    2962    3022    2968    4106    3282    
2624    3295    3738
                 lshiftc                                                        
                
           divexact_byff        3791    4061    3743    3633    3833    3448    
3870    3398    3692
        divexact_byfobm1        7011    5935    5404    6970    6562    6108    
6245    6470    5690
            divexact_by3        6688    6801    6790    6834    6721    6774    
7213    6248    6971
              divexact_1        10040   10515   10860   9667    9180    9340    
10282   8327    10128
         modexact_1c_odd        7666    7681    7500    7421    7723    7144    
8000    7932    7533
              add_err1_n        13238   15623   13962   15885   16311   15802   
10313   15338   15449
              sub_err1_n        15385   13936   13041   15750   17489   15984   
12583   16250   16013
   divrem_euclidean_qr_1        22612   22908   15251   18005   22265   23139   
23696   22830   20883
   divrem_euclidean_qr_2        45640   51005   52155   53907   53798   51676   
50391   45438   51260
    divrem_euclidean_r_1        4857    5551    5481    5418    5595    5452    
5326    5807    5907
                divrem_1        18381   17490   17063   17010   16760   15168   
16881   17814   18286
                divrem_2        52993   51151   52877   50495   50141   58222   
53430   50303   51313
      divrem_hensel_qr_1        11809   10592   11959   11243   8775    11669   
10140   10147   9442
    divrem_hensel_qr_1_1        9380    8792    9594    9350    8781    10458   
9284    8733    8719
    divrem_hensel_qr_1_2        10611   10527   10877   12202   11589   13080   
13503   11031   7745
       divrem_hensel_r_1        8962    9527    8895    9211    9145    9084    
9004    8441    9638
  divrem_hensel_rsh_qr_1        16488   11708   12658   11535   11900   13847   
12636   12495   11230
  rsh_divrem_hensel_qr_1        13878   14182   13056   12817   11278   14023   
12248   14690   12855
rsh_divrem_hensel_qr_1_1        10579   10733   10269   9911    12034   11214   
10127   9637    11406
rsh_divrem_hensel_qr_1_2        11935   14033   12105   15604   10113   13502   
14378   13289   11672
                 mod_1_1        8260    7590    8148    9793    8312    6946    
5753    8121    7585
                 mod_1_2        8950    7892    8236    7388    8569    9376    
9454    8673    8844
                 mod_1_3        4706    4863    5348    4763    5486    5909    
4972    5407    5558
                 mod_1_4                                                        
                
             mod_34lsub1        1176    1209    1188    1138    1128    1140    
1040    964     1218
/* Generated by tuneup.c, 2011-03-17, gcc 4.4 */	by	by	by	by	by	by	by	by
								
#define MUL_KARATSUBA_THRESHOLD          28	28	28	28	28	28	28	28	28
#define MUL_TOOM3_THRESHOLD              97	97	97	97	97	97	97	97	97
#define MUL_TOOM4_THRESHOLD             214	214	214	214	214	214	214	214	214
#define MUL_TOOM8H_THRESHOLD            303	303	303	303	303	303	303	303	303
								
#define SQR_BASECASE_THRESHOLD            0  /* always (native) */	0	0	0	0	0	0	0	0
#define SQR_KARATSUBA_THRESHOLD          46	46	46	46	46	46	46	46	46
#define SQR_TOOM3_THRESHOLD              90	90	90	90	90	89	89	89	89
#define SQR_TOOM4_THRESHOLD             232	240	232	240	240	236	228	236	244
#define SQR_TOOM8_THRESHOLD             286	254	278	254	270	262	278	278	254
								
#define POWM_THRESHOLD                  175	180	170	195	200	175	190	180	206
								
#define GCD_THRESHOLD                   498	498	498	498	482	502	486	502	502
#define GCDEXT_THRESHOLD                996	996	996	996	996	996	996	978	996
#define JACOBI_BASE_METHOD                1	1	1	1	1	1	1	1	1
								
#define USE_PREINV_DIVREM_1               1  /* native */	1	1	1	1	1	1	1	1
#define USE_PREINV_MOD_1                  1  /* native */	1	1	1	1	1	1	1	1
#define DIVREM_2_THRESHOLD                0  /* always */	0	0	0	0	0	0	0	0
#define DIVEXACT_1_THRESHOLD              0  /* always (native) */	0	0	0	0	0	0	0	0
#define MODEXACT_1_ODD_THRESHOLD          0  /* always (native) */	0	0	0	0	0	0	0	0
#define MOD_1_1_THRESHOLD                 4	3	4	83	7	230	3	127	132
#define MOD_1_2_THRESHOLD               183	126	135	180	20	732	22	482	172
#define MOD_1_3_THRESHOLD               996	126	987	180	22	739	155	522	173
#define DIVREM_HENSEL_QR_1_THRESHOLD     22	27	17	24	23	14	18	17	32
#define RSH_DIVREM_HENSEL_QR_1_THRESHOLD     13	12	7	10	14	14	9	10	12
#define DIVREM_EUCLID_HENSEL_THRESHOLD     66	8	196	10	17	11	9	26	22
								
#define ROOTREM_THRESHOLD                 7	6	11	6	7	6	6	9	6
								
#define GET_STR_DC_THRESHOLD             15	15	15	14	15	14	15	15	14
#define GET_STR_PRECOMPUTE_THRESHOLD     25	25	25	26	26	27	26	25	27
#define SET_STR_DC_THRESHOLD            324	345	345	345	327	345	327	333	327
#define SET_STR_PRECOMPUTE_THRESHOLD    330	379	345	5805	416	511	642	618	465
								
#define MUL_FFT_TABLE  { 400, 1184, 1408, 3584, 14336, 57344, 0 }	{	{	{	{	{	{	{	{
#define MUL_FFT_MODF_THRESHOLD          416	416	416	448	416	416	416	416	416
#define MUL_FFT_FULL_THRESHOLD         1664	1664	1664	1664	1664	1664	1664	1664	1664
								
#define SQR_FFT_TABLE  { 368, 992, 1408, 3584, 10240, 40960, 0 }	{	{	{	{	{	{	{	{
#define SQR_FFT_MODF_THRESHOLD          384	384	384	384	384	384	384	384	384
#define SQR_FFT_FULL_THRESHOLD         1664	1664	1664	1664	1664	1664	1664	1664	1664
								
#define MULLOW_BASECASE_THRESHOLD        13	13	13	13	13	13	13	13	13
#define MULLOW_DC_THRESHOLD              15	15	16	15	15	16	17	16	14
#define MULLOW_MUL_THRESHOLD           2852	2852	2852	2852	2852	2852	2852	2852	2852
								
#define MULHIGH_BASECASE_THRESHOLD       25	28	27	25	25	27	25	27	27
#define MULHIGH_DC_THRESHOLD             25	28	27	25	25	27	25	27	27
#define MULHIGH_MUL_THRESHOLD          2852	2852	2852	2852	2852	2852	2852	2852	2852
								
#define MULMOD_2EXPM1_THRESHOLD          24	22	24	24	24	24	24	24	22
								
#define FAC_UI_THRESHOLD              32756	32756	32756	32756	32756	32756	32756	32756	32756
#define DC_DIV_QR_THRESHOLD              92	100	94	96	102	100	94	104	94
#define DC_DIVAPPR_Q_N_THRESHOLD        748	748	748	748	748	748	748	748	748
#define INV_DIV_QR_THRESHOLD           3344	3344	3344	3344	3344	3344	3344	3344	3344
#define INV_DIVAPPR_Q_N_THRESHOLD       748	748	748	748	748	748	748	748	748
#define DC_DIV_Q_THRESHOLD              777	777	777	777	777	777	777	777	777
#define INV_DIV_Q_THRESHOLD            1187	1187	1187	1187	1187	1187	1187	1187	1187
#define DC_DIVAPPR_Q_THRESHOLD          720	720	720	720	720	720	720	734	734
#define INV_DIVAPPR_Q_THRESHOLD        4823	4823	4823	4823	4823	4823	4823	3690	3690
#define DC_BDIV_QR_THRESHOLD             92	92	90	92	92	92	92	92	92
#define DC_BDIV_Q_THRESHOLD             706	706	680	706	706	680	706	706	680
/* Tuneup completed successfully, took 70 seconds */	completed	completed	completed	completed	completed	completed	completed	completed
/* Generated by tuneup.c, 2011-03-17, gcc 4.4 */	by	by	by	by	by	by	by	by
								
#define MUL_KARATSUBA_THRESHOLD          26	28	28	28	24	26	28	24	28
#define MUL_TOOM3_THRESHOLD              59	85	43	83	86	106	125	87	86
#define MUL_TOOM4_THRESHOLD             128	133	124	131	92	116	184	124	154
#define MUL_TOOM8H_THRESHOLD            248	214	183	204	260	246	248	272	188
								
#define SQR_BASECASE_THRESHOLD            0  /* always (native) */	0	0	0	0	0	0	0	0
#define SQR_KARATSUBA_THRESHOLD          28	50	43	42	53	42	48	39	47
#define SQR_TOOM3_THRESHOLD              58	91	93	77	71	92	90	126	71
#define SQR_TOOM4_THRESHOLD             123	113	125	151	140	155	92	238	102
#define SQR_TOOM8_THRESHOLD             128	234	218	254	188	175	193	246	189
								
#define POWM_THRESHOLD                   35	18	51	39	71	46	44	75	34
								
#define GCD_THRESHOLD                    60	28	25	17	414	33	22	38	55
#define GCDEXT_THRESHOLD                996	58	163	996	996	35	996	996	996
#define JACOBI_BASE_METHOD                1	1	1	1	1	1	1	1	1
								
#define USE_PREINV_DIVREM_1               1  /* native */	1	1	1	1	1	1	1	1
#define USE_PREINV_MOD_1                  1  /* native */	1	1	1	1	1	1	1	1
#define DIVREM_2_THRESHOLD                5	0	0	0	0	0	0	0	0
#define DIVEXACT_1_THRESHOLD              0  /* always (native) */	0	0	0	0	0	0	0	0
#define MODEXACT_1_ODD_THRESHOLD          0  /* always (native) */	0	0	0	0	0	0	0	0
#define MOD_1_1_THRESHOLD                30	10	3	34	16	39	8	24	6
#define MOD_1_2_THRESHOLD                47	22	83	38	16	58	8	34	33
#define MOD_1_3_THRESHOLD                66	38	84	54	51	91	8	48	49
#define DIVREM_HENSEL_QR_1_THRESHOLD     13	27	14	18	20	25	24	16	26
#define RSH_DIVREM_HENSEL_QR_1_THRESHOLD     10	3	8	8	10	12	11	16	9
#define DIVREM_EUCLID_HENSEL_THRESHOLD      8	24	46	8	30	8	20	20	9
								
#define ROOTREM_THRESHOLD                 7	6	11	7	6	6	7	6	8
								
#define GET_STR_DC_THRESHOLD             14	7	14	7	10	21	16	12	15
#define GET_STR_PRECOMPUTE_THRESHOLD     32	34	29	37	27	23	28	28	29
#define SET_STR_DC_THRESHOLD            117	107	100	224	100	151	100	135	109
#define SET_STR_PRECOMPUTE_THRESHOLD    117	210	106	297	104	193	186	137	157
								
#define MUL_FFT_TABLE  { 272, 1120, 1920, 3584, 10240, 57344, 0 }	{	{	{	{	{	{	{	{
#define MUL_FFT_MODF_THRESHOLD          336	416	432	336	368	464	312	264	368
#define MUL_FFT_FULL_THRESHOLD         2176	1664	1920	1664	1408	1664	1920	1664	1408
								
#define SQR_FFT_TABLE  { 240, 992, 1152, 2560, 10240, 0 }	{	{	{	{	{	{	{	{
#define SQR_FFT_MODF_THRESHOLD          152	216	344	216	216	216	296	280	264
#define SQR_FFT_FULL_THRESHOLD         1408	1920	1664	1664	1664	1152	1664	1152	1920
								
#define MULLOW_BASECASE_THRESHOLD         9	13	10	9	18	15	10	11	9
#define MULLOW_DC_THRESHOLD              13	16	17	15	18	17	15	11	17
#define MULLOW_MUL_THRESHOLD             36	450	246	366	23	442	498	517	760
								
#define MULHIGH_BASECASE_THRESHOLD       24	25	25	27	26	27	25	21	18
#define MULHIGH_DC_THRESHOLD             24	25	25	27	26	27	25	21	18
#define MULHIGH_MUL_THRESHOLD           327	37	57	232	438	49	375	375	104
								
#define MULMOD_2EXPM1_THRESHOLD          24	24	22	22	24	28	28	24	20
								
#define FAC_UI_THRESHOLD               1769	4432	1044	2767	4303	3900	2767	1718	2877
#define DC_DIV_QR_THRESHOLD              23	68	34	40	53	41	33	80	93
#define DC_DIVAPPR_Q_N_THRESHOLD        748	839	205	807	998	906	27	998	432
#define INV_DIV_QR_THRESHOLD           1414	3478	511	3344	551	3410	573	379	483
#define INV_DIVAPPR_Q_N_THRESHOLD       748	839	205	807	998	906	27	998	432
#define DC_DIV_Q_THRESHOLD              889	998	49	998	73	562	720	116	924
#define INV_DIV_Q_THRESHOLD            3152	3410	3547	2747	456	2642	2747	3547	2541
#define DC_DIVAPPR_Q_THRESHOLD          229	60	47	72	114	116	19	98	44
#define INV_DIVAPPR_Q_THRESHOLD        7629	6514	7350	8669	6576	6686	6514	8192	7354
#define DC_BDIV_QR_THRESHOLD            130	30	88	64	51	47	60	46	17
#define DC_BDIV_Q_THRESHOLD              69	29	654	136	209	24	217	483	278
/* Tuneup completed successfully, took 145 seconds */	completed	completed	completed	completed	completed	completed	completed	completed

Reply via email to