> Hello!
> 
> > Well you really want both the fpcr and the mxcsr registers, since the fpcr
> > only controls the x87 and the mxcsr controls the xmm registers.  Note, in
> > adding these registers, you are going to have to go through all of the 
> > floating
> > point patterns to add (use:HI FPCR_REG) and (use:SI MXCSR_REG) to each and
> > every pattern so that the optimizer can be told not to move a floating point
> > operation past the setting of the control word.
> 
>   I think that (use:...) clauses are needed only for (float)->(int) patterns

If you make FPCTR/MXCSR real registers, you will need to add use to all
the arithmetic and move pattern that would consume quite some memory and
confuse optimizers.  I think you can get better around simply using volatile
unspecs inserted by LCM pass  (this would limit scheduling, but I don't
think it is that big deal)

> (fix_trunc.. & co.). For i386, we could calculate new mode word in advance 
> (this
> calculation is inserted by LCM), and fldcw insn is inserted just before
> fist/frndint.
> 
> (define_insn_and_split "fix_trunc<mode>_i387_2"
>   [(set (match_operand:X87MODEI12 0 "memory_operand" "=m")
>       (fix:X87MODEI12 (match_operand 1 "register_operand" "f")))
>    (use (match_operand:HI 2 "memory_operand" "m"))
>    (use (match_operand:HI 3 "memory_operand" "m"))]
>   "TARGET_80387 && !TARGET_FISTTP
>    && FLOAT_MODE_P (GET_MODE (operands[1]))
>    && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
>   "#"
>   "reload_completed"
>   [(set (reg:HI FPCR_REG)
>       (unspec:HI [(match_dup 3)] UNSPEC_FLDCW))
>    (parallel [(set (match_dup 0) (fix:X87MODEI12 (match_dup 1)))
>             (use (reg:HI FPCR_REG))])]
>   ""
>   [(set_attr "type" "fistp")
>    (set_attr "i387_cw" "trunc")
>    (set_attr "mode" "<MODE>")])
> 
> 
> (define_insn "*fix_trunc<mode>_i387"
>   [(set (match_operand:X87MODEI12 0 "memory_operand" "=m")
>       (fix:X87MODEI12 (match_operand 1 "register_operand" "f")))
>    (use (reg:HI FPCR_REG))]
>   "TARGET_80387 && !TARGET_FISTTP
>    && FLOAT_MODE_P (GET_MODE (operands[1]))
>    && !SSE_FLOAT_MODE_P (GET_MODE (operands[1]))"
>   "* return output_fix_trunc (insn, operands, 0);"
>   [(set_attr "type" "fistp")
>    (set_attr "i387_cw" "trunc")
>    (set_attr "mode" "<MODE>")])
> 
> I'm trying to use MODE_ENTRY and MODE_EXIT macros to insert mode calculations 
> in

My main motivation for stopping on this point was that reload might
insert new fld/fst instructions in the places where control word is
changes resulting in wrong rounding.  it seems to me that we would have
to make the second LCM pass happen post reloading, that is definitly
doable, just I never got across doing that.

> proper places. Currently, I have a somehow working prototype that switches
> between 2 modes: MODE_UNINITIALIZED, MODE_TRUNC (and MODE_ANY). The trick here
> is, that MODE_ENTRY and MODE_EXIT are defined to MODE_UNINITIALIZED. Secondly,
> every asm statement and call insn switches to MODE_UNINITIALIZED, and when 
> mode
> is switched _from_ MODE_TRUNC _to_ MODE_UNINITIALIZED before these two
> statements (or in exit BBs), an UNSPEC_VOLATILE type fldcw is emitted (again 
> via
> LCM) that switches fpu to saved mode. [UNSPEC_VOLATILE is needed to prevent
> optimizers to remove this pattern]. So, 2 fldcw patterns are defined:

If we use the second LCM pass and we make it to insert code as late as
possible, it seems to be safe to me to just have MODE_<possible values
of CW> and MODE_UNINITIALIZED and insert loads accordingly belivin that
the first LCM pass laredy inserted the computations on correct points.

> 
> (define_insn "x86_fldcw_1"
>   [(set (reg:HI FPCR_REG)
>       (unspec:HI [(match_operand:HI 0 "memory_operand" "m")]
>                    UNSPEC_FLDCW))]
>   "TARGET_80387"
>   "fldcw\t%0"
>   [(set_attr "length" "2")
>    (set_attr "mode" "HI")
>    (set_attr "unit" "i387")
>    (set_attr "athlon_decode" "vector")])
> 
> (define_insn "x86_fldcw_2"
>   [(set (reg:HI FPCR_REG)
>       (unspec_volatile:HI [(match_operand:HI 0 "memory_operand" "m")]
>                             UNSPECV_FLDCW))]
>   "TARGET_80387"
>   "fldcw\t%0"
>   [(set_attr "length" "2")
>    (set_attr "mode" "HI")
>    (set_attr "unit" "i387")
>    (set_attr "athlon_decode" "vector")])
> 
> By using this approach, testcase:
> 
> int test (int *a, double *x) {
>         int i;
> 
>         for (i = 10; i; i--) {
>              a[i] = x[i];
>         }
> 
>         return 0;
> }
> 
> is compiled (with -O2 -fomit-frame-pointer -fgcse-after-reload) into:
> 
> test:
>         pushl  %ebx
>         xorl %edx, %edx
>         subl $4, %esp
>         fnstcw 2(%esp)         <- store current cw
>         movl 12(%esp), %ebx
>         movl 16(%esp), %ecx
>         movzwl 2(%esp), %eax
>         orw  $3072, %ax
>         movw %ax, (%esp)       <- store new cw
>         .p2align 4,,15
> .L2:
>         fldcw  (%esp)          <- hello? gcse-after-reload?
>         fldl 80(%ecx,%edx,8)
>         fistpl 40(%ebx,%edx,4)
>         decl %edx
>         cmpl $-10, %edx
>         jne  .L2
>         fldcw  2(%esp)         <- volatile fldcw in exit block (load stored 
> cw)
>         xorl %eax, %eax
>         popl %edx
>         popl %ebx
>         ret
> 
> Another testcase, involving call:
> 
> extern double xxxx(int a);
> 
> int test (double a) {
>         return xxxx (a);
> }
> 
> is compiled into:
> 
> test:
>         subl $12, %esp
>         fnstcw 10(%esp)        <- store current control word
>         fldl 16(%esp)
>         movzwl 10(%esp), %eax
>         orw  $3072, %ax
>         movw %ax, 8(%esp)
>         fldcw  8(%esp)         <- switch fpu to new mode
>         fistpl (%esp)          <- make conversion
>         fldcw  10(%esp)        <- volatile fldcw before call (load stored cw)
>         call xxxx
>         fnstcw 10(%esp)        <- rewrite stored control word after call
>         movzwl 10(%esp), %eax
>         orw  $3072, %ax
>         movw %ax, 8(%esp)
>         fldcw  8(%esp)         <- load new
>         fistpl 4(%esp)         <- make conversion
>         movl 4(%esp), %eax
>         fldcw  10(%esp)        <- volatile fldcw in exit block (load stored 
> cw)
>         addl $12, %esp
>         ret
> 
> Because ABI specifies that control word should be restored to saved mode, we
> restore saved cw before call. After call, new control word is saved again -
> because xxxxx could be cw-setting function and new cw shouldn't be rewritten 
> by
> saved cw at the beginning of the function.
> 
> Unfortunatelly, in first testcase, fldcw is not moved out of the loop, because
> fix_trunc<mode>_i387_2 is splitted after gcse-after-reload pass (Is this
> intentional for gcse-after-reload pass?)

It is intentional for reload pass.  I guess gcse might be run after
splitting, but not sure what the interferences are.

Honza
> 
> Uros.

Reply via email to