On Mon, Aug 8, 2011 at 10:11 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Mon, Aug 8, 2011 at 5:30 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote:
>> Uros Bizjak wrote:
>>
>>> Although, it would be nice for reload to subsequently fix CSE'd
>>> non-offsetable memory by copying address to temporary reg (*as said in
>>> the documentation*), we could simply require an XMM temporary for
>>> TImode reloads to/from integer registers, and this fixes ICE for x32.
>>
>> Moves are special as far as reload is concerned.  If there is already
>> a move instruction present *before* reload, it will get fixed up
>> according to its constraints as any other instruction.
>>
>> However, reload will *introduce* new moves as part of its operation,
>> and those will *not* themselves get reloaded.  Instead, reload simply
>> assumes that every plain move will just succeed without requiring
>> any reload; if this is not true, the target *must* provide a
>> secondary reload for this move.
>>
>> (Note that the secondary reload could also work by reloading the
>> target address into a temporary; that's up to the target to
>> implement.)
>
> Whoa, indeed.
>
> Using attached patch that reloads memory address instead of going
> through XMM register, the code for the testcase improves from:
>
> test:
> .LFB0:
>        .cfi_startproc
>        pushq   %rbx
>        .cfi_def_cfa_offset 16
>        .cfi_offset 3, -16
>        sall    $4, %esi
>        addl    %edi, %esi
>        movdqa  (%esi), %xmm0
>        movdqa  %xmm0, -16(%rsp)
>        movq    -16(%rsp), %rcx
>        movq    -8(%rsp), %rbx
>        addq    $1, %rcx
>        adcq    $0, %rbx
>        movq    %rcx, -16(%rsp)
>        sall    $4, %edx
>        movq    %rbx, -8(%rsp)
>        movdqa  -16(%rsp), %xmm0
>        movdqa  %xmm0, (%esi)
>        pxor    %xmm0, %xmm0
>        movdqa  %xmm0, (%edx,%esi)
>        popq    %rbx
>        .cfi_def_cfa_offset 8
>        ret
>
> to:
>
> test:
> .LFB0:
>        .cfi_startproc
>        sall    $4, %esi
>        pushq   %rbx
>        .cfi_def_cfa_offset 16
>        .cfi_offset 3, -16
>        addl    %edi, %esi
>        pxor    %xmm0, %xmm0
>        mov     %esi, %eax
>        movq    (%rax), %rcx
>        movq    8(%rax), %rbx
>        addq    $1, %rcx
>        adcq    $0, %rbx
>        sall    $4, %edx
>        movq    %rcx, (%rax)
>        movq    %rbx, 8(%rax)
>        movdqa  %xmm0, (%edx,%esi)
>        popq    %rbx
>        .cfi_def_cfa_offset 8
>        ret
>
> H.J., can you please test attached patch? This optimization won't
> trigger on x86_64 anymore.
>

I will test it.

Thanks.


-- 
H.J.

Reply via email to