On Wednesday 13 July 2011 15:50:10 Jason wrote:
> On Wednesday 13 July 2011 14:26:02 Jason wrote:
> > On Wednesday 13 July 2011 14:01:39 Cactus wrote:
> > > Darn, I already did the conversion :-(
> > > 
> > > I don't have enough registers to use only the 32-bit registers so I
> > > have to put stuff in r8 and r9 instead.   Given this involved prefix
> > > opcodes, I am wondering what I should o with your coded nops since any
> > > alignment you are seeking won't be the same with r8 and r9?    ANy
> > > ideas on how to optimise this when r8 and r9 are used instead of rsi
> > > and rdx?
> > 
> > The nop's in the loop are for the schedulers/pick I believe , not for
> > code alignment. So it shouldn't matter. If it does then you could try
> > moving the nop's around , or as only two registers are used in the loop
> > just swap them around in the feedin code. See next post on alternative.
> > 
> > >    Brian
> 
> Compare the difference  of running our benchmpn on the current svn , with
> the assembler as gas as opposed to yasm on a K8
> 
> 8c8
> <                 addmul_2      4783
> ---
> 
> >                 addmul_2      4782
> 
> 11c11
> <                 addsub_n      2194
> ---
> 
> >                 addsub_n      2193
> 
> 14,16c14,16
> <                   rshift      2195
> <                  lshift2      1524
> <                  rshift2      1857
> ---
> 
> >                   rshift      2197
> >                  
> >                  lshift2      1525
> >                  rshift2      1862
> 
> 23,26c23,26
> <                 addlsh_n      3032
> <                 sublsh_n      3282
> <                 inclsh_n      3031
> <                 declsh_n      3283
> ---
> 
> >                 addlsh_n      3035
> >                 sublsh_n      3284
> >                 inclsh_n      3034
> >                 declsh_n      3284
> 
> 36c36
> <                 popcount      4701
> ---
> 
> >                 popcount      4704
> 
> 40,42c40,42
> <                    and_n      1522
> <                    xor_n      1522
> <                    ior_n      1522
> ---
> 
> >                    and_n      1525
> >                    xor_n      1525
> >                    ior_n      1525
> 
> 48c48
> <                  lshiftc      2654
> ---
> 
> >                  lshiftc      2657
> 
> 54,55c54,55
> <               add_err1_n      2798
> <               sub_err1_n      2797
> ---
> 
> >               add_err1_n      2796
> >               sub_err1_n      2796
> 
> 58,60c58,60
> <     divrem_euclidean_r_1      3212
> <                 divrem_1      10821
> <                 divrem_2      21069
> ---
> 
> >     divrem_euclidean_r_1      3215
> >     
> >                 divrem_1      10824
> >                 divrem_2      21070
> 
> 67c67
> < rsh_divrem_hensel_qr_1_1      10065
> ---
> 
> > rsh_divrem_hensel_qr_1_1      10066
> 
> 70,71c70,71
> <                  mod_1_2      3527
> <                  mod_1_3      3039
> ---
> 
> >                  mod_1_2      4022
> >                  mod_1_3      3042
> 
> the only significant difference is mod_1_2 where gas runs at 3.5 and yasm
> at 4.0 cycles per word, The listings are attached and you can see that the
> first difference is at "jc skiplp" a forward jump to skip the loop where
> the displacements are different , and the next difference is at "jnc lp"
> where again the displacements are different , but note the addresses are
> all the same  , something is wrong here. I'll post to yasm's list .
> 
> Jason
> 
> 
> 
> Jason

The reply from Peter of Yasm

On Wed, 13 Jul 2011, Jason wrote:
> Assembling the attached file in Gas and Yasm results in the gas code running
> faster by 14% , taking a listing (gas.asm , yasm.asm) you can see the
> differences in the first two jumps , where the offsets are different.
> The code appears to run correctly in both cases , which is inpossible , so
> perhaps it's the listings that are wrong. I will do some more investigations

The list file is indeed incorrect (bug).  But the actual output generated 
by yasm (confirmed via objdump) exactly matches gas, so it's impossible 
for it to be 14% slower.  The only thing I can think of is that the code 
is being relocated/aligned differently by the linker.  Only a more 
complete testcase might be able to identify the actual issue.

-Peter

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to