On Wednesday 13 July 2011 15:50:10 Jason wrote: > On Wednesday 13 July 2011 14:26:02 Jason wrote: > > On Wednesday 13 July 2011 14:01:39 Cactus wrote: > > > Darn, I already did the conversion :-( > > > > > > I don't have enough registers to use only the 32-bit registers so I > > > have to put stuff in r8 and r9 instead. Given this involved prefix > > > opcodes, I am wondering what I should o with your coded nops since any > > > alignment you are seeking won't be the same with r8 and r9? ANy > > > ideas on how to optimise this when r8 and r9 are used instead of rsi > > > and rdx? > > > > The nop's in the loop are for the schedulers/pick I believe , not for > > code alignment. So it shouldn't matter. If it does then you could try > > moving the nop's around , or as only two registers are used in the loop > > just swap them around in the feedin code. See next post on alternative. > > > > > Brian > > Compare the difference of running our benchmpn on the current svn , with > the assembler as gas as opposed to yasm on a K8 > > 8c8 > < addmul_2 4783 > --- > > > addmul_2 4782 > > 11c11 > < addsub_n 2194 > --- > > > addsub_n 2193 > > 14,16c14,16 > < rshift 2195 > < lshift2 1524 > < rshift2 1857 > --- > > > rshift 2197 > > > > lshift2 1525 > > rshift2 1862 > > 23,26c23,26 > < addlsh_n 3032 > < sublsh_n 3282 > < inclsh_n 3031 > < declsh_n 3283 > --- > > > addlsh_n 3035 > > sublsh_n 3284 > > inclsh_n 3034 > > declsh_n 3284 > > 36c36 > < popcount 4701 > --- > > > popcount 4704 > > 40,42c40,42 > < and_n 1522 > < xor_n 1522 > < ior_n 1522 > --- > > > and_n 1525 > > xor_n 1525 > > ior_n 1525 > > 48c48 > < lshiftc 2654 > --- > > > lshiftc 2657 > > 54,55c54,55 > < add_err1_n 2798 > < sub_err1_n 2797 > --- > > > add_err1_n 2796 > > sub_err1_n 2796 > > 58,60c58,60 > < divrem_euclidean_r_1 3212 > < divrem_1 10821 > < divrem_2 21069 > --- > > > divrem_euclidean_r_1 3215 > > > > divrem_1 10824 > > divrem_2 21070 > > 67c67 > < rsh_divrem_hensel_qr_1_1 10065 > --- > > > rsh_divrem_hensel_qr_1_1 10066 > > 70,71c70,71 > < mod_1_2 3527 > < mod_1_3 3039 > --- > > > mod_1_2 4022 > > mod_1_3 3042 > > the only significant difference is mod_1_2 where gas runs at 3.5 and yasm > at 4.0 cycles per word, The listings are attached and you can see that the > first difference is at "jc skiplp" a forward jump to skip the loop where > the displacements are different , and the next difference is at "jnc lp" > where again the displacements are different , but note the addresses are > all the same , something is wrong here. I'll post to yasm's list . > > Jason > > > > Jason
The reply from Peter of Yasm On Wed, 13 Jul 2011, Jason wrote: > Assembling the attached file in Gas and Yasm results in the gas code running > faster by 14% , taking a listing (gas.asm , yasm.asm) you can see the > differences in the first two jumps , where the offsets are different. > The code appears to run correctly in both cases , which is inpossible , so > perhaps it's the listings that are wrong. I will do some more investigations The list file is indeed incorrect (bug). But the actual output generated by yasm (confirmed via objdump) exactly matches gas, so it's impossible for it to be 14% slower. The only thing I can think of is that the code is being relocated/aligned differently by the linker. Only a more complete testcase might be able to identify the actual issue. -Peter -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.