On Tue, May 31, 2016 at 5:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Hi Uros,
>
> Here is initial patch to improve performance of 64-bit integer
> arithmetic in 32-bit mode. We discovered that gcc is significantly
> behind icc and clang on rsa benchmark from eembc2.0 suite.
> Te problem function looks like
> typedef unsigned long long ull;
> typedef unsigned long ul;
> ul mul_add(ul *rp, ul *ap, int num, ul w)
>  {
>  ul c1=0;
>  ull t;
>  for (;;)
>   {
>   { t=(ull)w * ap[0] + rp[0] + c1;
>    rp[0]= ((ul)t)&0xffffffffL; c1= ((ul)((t)>>32))&(0xffffffffL); };
>   if (--num == 0) break;
>   { t=(ull)w * ap[1] + rp[1] + c1;
>    rp[1]= ((ul)(t))&(0xffffffffL); c1= (((ul)((t)>>32))&(0xffffffffL)); };
>   if (--num == 0) break;
>   { t=(ull)w * ap[2] + rp[2] + c1;
>    rp[2]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
>   if (--num == 0) break;
>   { t=(ull)w * ap[3] + rp[3] + c1;
>    rp[3]= (((ul)(t))&(0xffffffffL)); c1= (((ul)((t)>>32))&(0xffffffffL)); };
>   if (--num == 0) break;
>   ap+=4;
>   rp+=4;
>   }
>  return(c1);
>  }
>
> If we apply patch below we will get +6% speed-up for rsa on Silvermont.
>
> The patch looks loke (not complete since there are other 64-bit
> instructions e.g. subtraction):
>
> Index: i386.md
> ===================================================================
> --- i386.md     (revision 236181)
> +++ i386.md     (working copy)
> @@ -5439,7 +5439,7 @@
>     (clobber (reg:CC FLAGS_REG))]
>    "ix86_binary_operator_ok (PLUS, <DWI>mode, operands)"
>    "#"
> -  "reload_completed"
> +  "1"
>    [(parallel [(set (reg:CCC FLAGS_REG)
>                    (compare:CCC
>                      (plus:DWIH (match_dup 1) (match_dup 2))
>
> What is your opinion?

This splitter doesn't depend on hard registers, so there is no
technical obstacle for the proposed patch. OTOH, this is a very old
splitter, it is possible that it was introduced to handle some of
reload deficiencies. Maybe Jeff knows something about this approach.
We have LRA now, so perhaps we have to rethink the purpose of these
DImode splitters.

A pragmatic approach would be - if the patch shows measurable benefit,
and doesn't introduce regressions, then Stage 1 is the time to try it.

BTW: Use "&&  1" in the split condition of the combined insn_and_split
pattern to copy the enable condition from the insn part. If there is
no condition, you should just use "".

Uros.

Reply via email to