Hi Uros, Here is initial patch to improve performance of 64-bit integer arithmetic in 32-bit mode. We discovered that gcc is significantly behind icc and clang on rsa benchmark from eembc2.0 suite. Te problem function looks like typedef unsigned long long ull; typedef unsigned long ul; ul mul_add(ul *rp, ul *ap, int num, ul w) { ul c1=0; ull t; for (;;) { { t=(ull)w * ap[0] + rp[0] + c1; rp[0]= ((ul)t)&0xffffffffL; c1= ((ul)((t)>>32))&(0xffffffffL); }; if (--num == 0) break; { t=(ull)w * ap[1] + rp[1] + c1; rp[1]= ((ul)(t))&(0xffffffffL); c1= (((ul)((t)>>32))&(0xffffffffL)); }; if (--num == 0) break; { t=(ull)w * ap[2] + rp[2] + c1; rp[2]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); }; if (--num == 0) break; { t=(ull)w * ap[3] + rp[3] + c1; rp[3]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); }; if (--num == 0) break; ap+=4; rp+=4; } return(c1); }
If we apply patch below we will get +6% speed-up for rsa on Silvermont. The patch looks loke (not complete since there are other 64-bit instructions e.g. subtraction): Index: i386.md =================================================================== --- i386.md (revision 236181) +++ i386.md (working copy) @@ -5439,7 +5439,7 @@ (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)" "#" - "reload_completed" + "1" [(parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 1) (match_dup 2)) What is your opinion?