Hi Richard, Jan and H.J.,

Thanks for all the quick responses and suggestions.

I had tested my patch when tuning for an arch without the LCP stalls,
but it didn't hit an issue in reload because it didn't require
rematerialization. Thanks for pointing out this issue.

Regarding the penalty, it can be >=6 cycles for core2/corei7 so I
thought it would be best to force the splitting even when that would
force the use of a new register, but it is possible that the peephole2
approach will work just fine in the majority of the cases. Thanks for
the peephole2 patch, H.J., I will test that solution out for the case
I was trying to solve.

Regarding the penalty on AMD, reading Agner's guide suggested that
this could be a problem on Bulldozer, but only if there are >3
prefixes, and I'm not sure how often that will occur for this type of
instruction in practice. I will look into removing AMD from the
handled cases.

Will respond later after trying the peephole2 approach.

Teresa

On Fri, Mar 30, 2012 at 9:23 AM, H.J. Lu <hjl.to...@gmail.com> wrote:
> On Fri, Mar 30, 2012 at 8:19 AM, Richard Henderson <r...@redhat.com> wrote:
>> On 03/30/2012 11:11 AM, Richard Henderson wrote:
>>> On 03/30/2012 11:03 AM, Teresa Johnson wrote:
>>>> +(define_insn "*movhi_imm_internal"
>>>> +  [(set (match_operand:HI 0 "memory_operand" "=m")
>>>> +        (match_operand:HI 1 "immediate_operand" "n"))]
>>>> +  "!TARGET_LCP_STALL"
>>>> +{
>>>> +  return "mov{w}\t{%1, %0|%0, %1}";
>>>> +}
>>>> +  [(set (attr "type") (const_string "imov"))
>>>> +   (set (attr "mode") (const_string "HI"))])
>>>> +
>>>>  (define_insn "*movhi_internal"
>>>>    [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m")
>>>> -    (match_operand:HI 1 "general_operand" "r,rn,rm,rn"))]
>>>> +    (match_operand:HI 1 "general_operand" "r,rn,rm,r"))]
>>>>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
>>>
>>> For reload to work correctly, all alternatives must remain part of the same 
>>> pattern.
>>> This issue should be handled with the ISA and ENABLED attributes.
>>
>> I'll also ask if this should better be handled with a peephole2.
>>
>> While movw $1234,(%eax) might be expensive, is it so expensive that we
>> *must* force the use of a free register?  Might it be better only to
>> split the insn in two if and only if a free register exists?
>>
>> That can easily be done with a peephole2 pattern...
>>
>
> Here is a very old LCP patch with peephole2.  It may need some
> updates.
>
>
> --
> H.J.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to