------- Additional Comments From uros at kss-loka dot si  2004-11-05 08:00 -------
Another comment on code in comment #3:

LEA instruction uses only 32bit constant as a displacement. The code size of
inner loop is considerably bigger in gcc-4.0 compiled code. (Note that LEA with
a scale factor should be replaced by a shift in P4 case...)

Another feature would be to use an %ecx as a count register in inner loops. In
this case, TARGET_USE_LOOP architectures (such as K6) could use a loop insn for
inner loops.

gcc-4.0 (21 bytes)
  23:   8d 04 d5 00 00 00 00    lea    0x0(,%edx,8),%eax
  2a:   dd 04 01                fldl   (%ecx,%eax,1)
  2d:   dd 1c 03                fstpl  (%ebx,%eax,1)
  30:   83 c2 01                add    $0x1,%edx
  33:   39 55 0c                cmp    %edx,0xc(%ebp)
  36:   7f eb                   jg     23 <LU_copy_matrix+0x23>

gcc-3.2 (13 bytes):
  22:   dd 04 c2                fldl   (%edx,%eax,8)
  25:   dd 1c c1                fstpl  (%ecx,%eax,8)
  28:   83 c0 01                add    $0x1,%eax
  2b:   39 f0                   cmp    %esi,%eax
  2d:   7c f3                   jl     22 <LU_copy_matrix+0x22>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17647

Reply via email to