On 08/26/2014 04:57 AM, Ilya Enkovich wrote:
> 2014-08-26 11:49 GMT+04:00 Ilya Enkovich <enkovich....@gmail.com>:
>> 2014-08-25 19:08 GMT+04:00 Vladimir Makarov <vmaka...@redhat.com>:
>>> On 2014-08-22 8:21 AM, Ilya Enkovich wrote:
>>>> Hi,
>>>>
>>>> On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in
>>>> 32bit PIC mode.  It was decided that the best approach would be to not fix
>>>> ebx register, use speudo register for GOT base address and let allocator do
>>>> the rest.  This should be similar to how clang and icc work with GOT base
>>>> address.  I've been working for some time on such patch and now want to
>>>> share my results.
>>>>
>>>> The idea of the patch was very simple and included few things;
>>>>   1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do
>>>> not have any hard reg fixed for PIC.
>>>>   2.  Initialize pic_offset_table_rtx with a new pseudo register in the
>>>> begining of a function expand.
>>>>   3.  Change ABI so that there is a possible implicit PIC argument for
>>>> calls; pic_offset_table_rtx is used as an arg value if such implicit arg
>>>> exist.
>>>>
>>>> Such approach worked well on small tests but trying to run some benchmarks
>>>> we faced a problem with reload of address constants.  The problem is that
>>>> when we try to rematerialize address constant or some constant memory
>>>> reference, we have to use pic_offset_table_rtx.  It means we insert new
>>>> usages of a speudo register and alocator cannot handle it correctly.  Same
>>>> problem also applies for float and vector constants.
>>>>
>>>> Rematerialization is not the only case causing new pic_offset_table_rtx
>>>> usage.  Another case is a split of some instructions using constant but not
>>>> having proper constraints.  E.g. pushtf pattern allows push of constant but
>>>> it has to be replaced with push of memory in reload pass causing additional
>>>> usage of pic_offset_table_rtx.
>>>>
>>>> There are two ways to fix it.  The first one is to support modifications
>>>> of pseudo register live range during reload and correctly allocate hard 
>>>> regs
>>>> for its new usages (currently we have some hard reg allocated for new usage
>>>> of pseudo reg but it may contain value of some other pseudo reg; thus we
>>>> reveal the problem at runtime only).
>>>>
>>> I believe there is already code to deal with this situation.  It is code for
>>> risky transformations (please check flag lra_risky_transformation_p).  If
>>> this flag is set, next lra assign subpass is running and checking
>>> correctness of assignments (e.g. checking situation when two different
>>> pseudos have intersected live ranges and the same assigned hard reg.  If
>>> such dangerous situation is found, it is fixed).
>> I tried to remove my restrictions from setup_reg_equiv and initialize
>> lra_risky_transformation_p with 'true' in lra_constraints instead.  I
>> got only 50% pass rate for SPEC2000 on Ofast with LTO.  Will search
>> for fail reason.
> I've looked into one of fails.  There is still a problem with
> allocation in reload. Here is a piece of code which uses float
> constant:
>
> (insn 1199 1198 1200 96 (set (reg:SI 3 bx)
>         (reg:SI 1301 [528])) /usr/include/bits/stdlib-float.h:28 90
> {*movsi_internal}
>      (nil))
> (call_insn 1200 1199 1201 96 (set (reg:DF 8 st)
>         (call (mem:QI (symbol_ref:SI ("strtod") [flags 0x41]
> <function_decl 0x2b29b8ea8900 strtod>) [0 strtod S1 A8])
>             (const_int 8 [0x8]))) /usr/include/bits/stdlib-float.h:28
> 661 {*call_value}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (expr_list:REG_CALL_DECL (symbol_ref:SI ("strtod") [flags
> 0x41]  <function_decl 0x2b29b8ea8900 strtod>)
>             (expr_list:REG_EH_REGION (const_int 0 [0])
>                 (nil))))
>     (expr_list (use (reg:SI 3 bx))
>         (expr_list:SI (use (reg:SI 3 bx))
>             (expr_list:SI (use (mem/f:SI (reg/f:SI 7 sp) [0  S4 A32]))
>                 (expr_list:SI (use (mem/f:SI (plus:SI (reg/f:SI 7 sp)
>                                 (const_int 4 [0x4])) [0  S4 A32]))
>                     (nil))))))
> (insn 1201 1200 1202 96 (set (reg:DF 321 [ D.7817 ])
>         (reg:DF 8 st)) /usr/include/bits/stdlib-float.h:28 128 
> {*movdf_internal}
>      (expr_list:REG_DEAD (reg:DF 8 st)
>         (nil)))
> (insn 1202 1201 1203 96 (set (reg:SF 322 [ D.7804 ])
>         (float_truncate:SF (reg:DF 321 [ D.7817 ]))) read_arch.c:700
> 157 {*truncdfsf_fast_sse}
>      (expr_list:REG_DEAD (reg:DF 321 [ D.7817 ])
>         (nil)))
> (insn 1203 1202 1204 96 (set (mem:SF (reg/f:SI 198 [ D.7812 ]) [4
> _130->frequency+0 S4 A32])
>         (reg:SF 322 [ D.7804 ])) read_arch.c:700 129 {*movsf_internal}
>      (nil))
> (insn 1204 1203 1205 96 (set (reg:SF 1209)
>         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC12") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> read_arch.c:701 129 {*movsf_internal}
>      (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0])
>         (nil)))
> (note 1205 1204 1206 96 NOTE_INSN_DELETED)
> (note 1206 1205 1207 96 NOTE_INSN_DELETED)
> (insn 1207 1206 1208 96 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 1209)
>             (reg:SF 322 [ D.7804 ]))) read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
> (jump_insn 1208 1207 3075 96 (set (pc)
>         (if_then_else (ge (reg:CCFP 17 flags)
>                 (const_int 0 [0]))
>             (label_ref:SI 3114)
>             (pc))) read_arch.c:701 606 {*jcc_1}
>      (expr_list:REG_DEAD (reg:CCFP 17 flags)
>         (int_list:REG_BR_PROB 2 (nil)))
>  -> 3114)
> (note 3075 1208 1209 97 [bb 97] NOTE_INSN_BASIC_BLOCK)
> (insn 1209 3075 1210 97 (set (reg:SF 1208)
>         (mem/u/c:SF (plus:SI (reg:SI 1301 [528])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [4  S4 A32]))
> read_arch.c:701 129 {*movsf_internal}
>      (expr_list:REG_EQUIV (const_double:SF 1.0e+0 [0x0.8p+1])
>         (nil)))
> (note 1210 1209 1211 97 NOTE_INSN_DELETED)
> (note 1211 1210 1212 97 NOTE_INSN_DELETED)
> (insn 1212 1211 1213 97 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 322 [ D.7804 ])
>             (reg:SF 1208))) read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
>
> We have PIC register r1301 (former r528) used for constant load (insn
> 1209).  This register was actually loaded to bx (insn 1199) and this
> hard reg may be used by insn 1209.  During reload we have insn 1209
> removed and a new one created instead:
>
> (insn 3864 1211 1212 104 (set (reg:SI 0 ax [1468])
>         (plus:SI (reg:SI 6 bp [528])
>             (const:SI (unspec:SI [
>                         (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>                     ] UNSPEC_GOTOFF)))) read_arch.c:701 213 {*leasi}
>      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC11") [flags 0x2])
>         (nil)))
> (insn 1212 3864 1213 104 (set (reg:CCFP 17 flags)
>         (compare:CCFP (reg:SF 21 xmm0 [orig:322 D.7804 ] [322])
>             (mem/u/c:SF (reg:SI 0 ax [1468]) [4  S4 A32])))
> read_arch.c:701 53 {*cmpisf_sse}
>      (nil))
>
> In this new instruction bp is used which is wrong. We actually have
> required value in bx. In debugger I also checked that bp doesn't have
> required value.  I suppose I enabled flag correctly because found this
> in the log: "Spill r1301 after risky transformations".  Is it possible
> we are still not allowed to use the original PIC register (r528) and
> should use a reg copy created for particular region (in this case
> r1301)?
>
It is hard for me to say without the full patch and the test.  I can
only guess that 1301 gets a wrong class and therefore assigned to the
wrong hard ref.

Could you send me the patch and the test.  I'll look at this and inform
you what is going on.



Reply via email to