Re: ICE when compiling SPEC 526.blender_r benchmark (profiling)

2019-09-23 Thread Martin Liška

On 9/23/19 8:19 PM, Steve Ellcey wrote:


Before I submit a Bugzilla report or try to cut down a test case, has any
one seen this problem when compiling the 526.blender_r benchmark from
SPEC 2017:

Compiling with '-Ofast -flto -march=native -fprofile-generate' on Aarch64:

during GIMPLE pass: vect
blender/source/blender/imbuf/intern/indexer.c: In function 'IMB_indexer_open':
blender/source/blender/imbuf/intern/indexer.c:157:20: internal compiler error: 
in execute_todo, at passes.c:2012
   157 | struct anim_index *IMB_indexer_open(const char *name)
   |^
0xa5ee2b execute_todo
 ../../gcc/gcc/passes.c:2012
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.



I don't see it on my testers. Please create a PR for it.

Thanks,
Martin


Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Segher Boessenkool
On Mon, Sep 23, 2019 at 06:56:12PM +0100, Jozef Lawrynowicz wrote:
> > It could have just done that as
> > 
> > (set (reg:PSI 28)
> >  (zero_extend:PSI (reg:QI 12)))
> > 
> > as far as I can see?  Do you already have a machine pattern that matches
> > that?
> 
> Yes that combination is what I was expecting to see. I guess since char is
> unsigned, a zero extend is needed to widen it, and then for the following 
> insn a
> sign extend is added to widen the HImode integer.

Yeah, but a sign_extend:M1 of a zero_extend:M2 of an M3 (with M2>M3) is
just a zero_extend:M1 of that M3.  Somehow combine (or simplify-rtx)
missed that; or come to think of it, it probably does that for MODE_INT
things, but this is a MODE_PARTIAL_INT.  Hrm, would it be correct then.
I think it is?  M2 is a normal MODE_INT, all bits in M2 are defined, the
M2 value always is non-negative, the sign_extend:M1 always is the same
as the zero_extend:M1 would be.

> There isn't currently a pattern which matches that, but adding it
> doesn't fix the issue which is why I thought it might need to be fixed earlier
> in RTL generation.
> Here is the pattern that is missing:
> 
> +(define_insn "movqipsi"
> +  [(set (match_operand:PSI 0 "register_operand" "=r,r")
> +   (zero_extend:PSI (match_operand:QI 1 "general_operand" "rYs,m")))]
> +  "msp430x"
> +  "@
> +   MOV.B\t%1, %0
> +   MOV%X1.B\t%1, %0"
> +)
> +
> 
> So will combine potentially be able to do away with (sign_extend 
> (zero_extend))
> in certain situations?

Yes.

> I filed PR/91865 which has all the relevant details from this thread.

Thanks!

> I can add the following nameless insn and combine will catch this case and
> generate the better, smaller code:
> 
> +(define_insn "*movqihipsi"
> +  [(set (match_operand:PSI 0 "register_operand" "=r,r")
> +   (sign_extend:PSI (zero_extend:HI (match_operand:QI 1 "general_operand"
> "rYs,m"]
> +  "msp430x"
> +  "@
> +   MOV.B\t%1, %0
> +   MOV%X1.B\t%1, %0"
> +)

Good to hear that already works!  combine should come up with the simpler
formulation though, let me see what I can do.


Segher


Re: RTL alternative selection question

2019-09-23 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 23/09/2019 15:15, Segher Boessenkool wrote:
>> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>>[(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>>  (ashift:DI
>>>(match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>>(match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>> (clobber (match_scratch:BI 3 "=cs,X"))]
>> 
>>> Unfortunately, the compiler (almost?) exclusively selects the second
>>> alternative, even when this means moving the values from one register
>>> file to the other, and then back again.
>>>
>>> The problem is that the scalar instruction clobbers the CC register,
>>> which results in a "reject++" for that alternative in the LRA dump.
>> 
>> What kind of reject?  It prints a reason, too.
>
>   0 Non input pseudo reload: reject++

It looks like that comes from:

  /* Input reloads can be inherited more often than
 output reloads can be removed, so penalize output
 reloads.  */
  if (!REG_P (op) || curr_static_id->operand[nop].type != OP_IN)
{
  if (lra_dump_file != NULL)
fprintf
  (lra_dump_file,
   "%d Non input pseudo reload: reject++\n",
   nop);
  reject++;
}

That's a good reason for increasing the cost of output reloads of values
vs. input reloads of values, but I don't think it covers your case of
reloading a SCRATCH, which can never be inherited anyway.  So if the
reason in the comment is the real reason (definitely sounds plausible)
then perhaps we should just exclude SCRATCHes?

I don't think this is really target-specific.  MIPS has a very similar
situation with multiplication, and again the SCRATCH reloads shouldn't
count.  Same for some SVE patterns.

Richard


ICE when compiling SPEC 526.blender_r benchmark (profiling)

2019-09-23 Thread Steve Ellcey


Before I submit a Bugzilla report or try to cut down a test case, has any
one seen this problem when compiling the 526.blender_r benchmark from
SPEC 2017:

Compiling with '-Ofast -flto -march=native -fprofile-generate' on Aarch64:

during GIMPLE pass: vect
blender/source/blender/imbuf/intern/indexer.c: In function 'IMB_indexer_open':
blender/source/blender/imbuf/intern/indexer.c:157:20: internal compiler error: 
in execute_todo, at passes.c:2012
  157 | struct anim_index *IMB_indexer_open(const char *name)
  |^
0xa5ee2b execute_todo
../../gcc/gcc/passes.c:2012
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Jozef Lawrynowicz
On Mon, 23 Sep 2019 10:56:55 -0500
Segher Boessenkool  wrote:

> On Mon, Sep 23, 2019 at 12:56:20PM +0100, Jozef Lawrynowicz wrote:
> > (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> > (zero_extend:HI (reg:QI 12 R12 [ i ])))
> >  (nil))
> > .
> > (insn 7 6 8 2 (set (reg:PSI 28)
> > (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
> >  (nil))
> > 
> > All we really need is:
> > 
> > (insn (set (reg:PSI 28 [ i ])
> > (zero_extend:PSI (reg:QI 12 R12 [ i ])))
> >  (nil))
> > 
> > The zero extend is implicit with byte sized register operations, so this 
> > would
> > result in assembly such as:
> >   MOV.B R12, R12
> > 
> > My question is whether this is the type of thing that should be handled 
> > with a
> > peephole optimization or if it is worth trying to fix the initial RTL 
> > generated
> > by expand (or in a later RTL pass e.g. combine)?  
> 
> combine does (well, it did in June, I don't think things changed since then)
> 
> Trying 2 -> 7:
> 2: r25:HI=zero_extend(R12:QI)
>   REG_DEAD R12:QI
> 7: r28:PSI=sign_extend(r25:HI)#0
>   REG_DEAD r25:HI
> Failed to match this instruction:
> (set (reg:PSI 28 [ i ])
> (sign_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ i ]
> Failed to match this instruction:
> (set (reg:PSI 28 [ i ])
> (sign_extend:PSI (and:HI (reg:HI 12 R12)
> (const_int 255 [0xff]
> 
> It could have just done that as
> 
> (set (reg:PSI 28)
>  (zero_extend:PSI (reg:QI 12)))
> 
> as far as I can see?  Do you already have a machine pattern that matches
> that?

Yes that combination is what I was expecting to see. I guess since char is
unsigned, a zero extend is needed to widen it, and then for the following insn a
sign extend is added to widen the HImode integer.

There isn't currently a pattern which matches that, but adding it
doesn't fix the issue which is why I thought it might need to be fixed earlier
in RTL generation.
Here is the pattern that is missing:

+(define_insn "movqipsi"
+  [(set (match_operand:PSI 0 "register_operand" "=r,r")
+   (zero_extend:PSI (match_operand:QI 1 "general_operand" "rYs,m")))]
+  "msp430x"
+  "@
+   MOV.B\t%1, %0
+   MOV%X1.B\t%1, %0"
+)
+

So will combine potentially be able to do away with (sign_extend (zero_extend))
in certain situations? I filed PR/91865 which has all the relevant details
from this thread.

I can add the following nameless insn and combine will catch this case and
generate the better, smaller code:

+(define_insn "*movqihipsi"
+  [(set (match_operand:PSI 0 "register_operand" "=r,r")
+   (sign_extend:PSI (zero_extend:HI (match_operand:QI 1 "general_operand"
"rYs,m"]
+  "msp430x"
+  "@
+   MOV.B\t%1, %0
+   MOV%X1.B\t%1, %0"
+)
+

Thanks,
Jozef

> 
> Please file a PR for this.  Thanks!
> 
> 
> Segher



Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Jozef Lawrynowicz
On Mon, 23 Sep 2019 07:30:21 -0600
Jeff Law  wrote:

> On 9/23/19 5:56 AM, Jozef Lawrynowicz wrote:
> > For msp430-elf in the large memory model there are PSImode (20-bit) 
> > pointers.
> > POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.
> > 
> > We get poor code generated for the following code snippet:
> > 
> > const int table[2] = {1, 2};
> > 
> > int
> > foo (char i)
> > {
> >   return table[i];
> > }
> > 
> > The RTL generated by expand uses two insns to convert "i" to a register's
> > natural mode; there is a sign extension which would be unnecessary if the 
> > first
> > instruction had a PSImode register as the lvalue:
> > 
> > (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> > (zero_extend:HI (reg:QI 12 R12 [ i ])))
> >  (nil))
> > .
> > (insn 7 6 8 2 (set (reg:PSI 28)
> > (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
> >  (nil))
> > 
> > All we really need is:
> > 
> > (insn (set (reg:PSI 28 [ i ])
> > (zero_extend:PSI (reg:QI 12 R12 [ i ])))
> >  (nil))
> > 
> > The zero extend is implicit with byte sized register operations, so this 
> > would
> > result in assembly such as:
> >   MOV.B R12, R12
> > 
> > My question is whether this is the type of thing that should be handled 
> > with a
> > peephole optimization or if it is worth trying to fix the initial RTL 
> > generated
> > by expand (or in a later RTL pass e.g. combine)?  
> Ideally it'd be fixed earlier, but that can be painful due to the way
> promotions work.
> 
> I certainly recall doing some combiner patterns to catch the more
> egregious codegen issues on the mn102 port (similar, but with 24bit
> pointers).  I think I mostly focused on stuff that showed up with
> integer loop indices and their annoying promotion to ptrdiff_t when used
> for array indexing.
> 
> I'd bet if you extracted mn10200.md out of the archive the patterns
> would be obvious ;-)

Ok thanks for the tip, I'll take a look. I'm sure there's likely to be other
stuff that could be applicable to msp430 as well.

Jozef
> 
> In this specific case I'd think a combiner pattern ought to do the right
> thing except for the fact that you're probably dealing with hard
> registers which combine no longer handles as aggressively.
> 
> jeff



Re: RTL alternative selection question

2019-09-23 Thread Jeff Law
On 9/23/19 9:26 AM, Andrew Stubbs wrote:
> On 23/09/2019 16:21, Segher Boessenkool wrote:
>> Pass the register class or constraint or something like that to the hook,
>> then based on what the hook returns, either or not do the reject?  So
>> your
>> hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar
>> ones
>> (those are confusing names btw, _REG but they are register classes).
> 
> It's a class of one register!
So maybe use CLASS_LIKELY_SPILLED_P as the trigger?

jeff


Seg Fault in GCC When Building

2019-09-23 Thread Nicholas Krause

Greetings,

For the last several days the branch for multithreading in GSOC does not 
build and


crashes like so:

make[4]: Leaving directory 
`/home/xerofoify/GCC/powerpc64le-unknown-linux-gnu/libquadmath'
make[3]: Leaving directory 
`/home/xerofoify/GCC/powerpc64le-unknown-linux-gnu/libquadmath'
make[2]: Leaving directory 
`/home/xerofoify/GCC/powerpc64le-unknown-linux-gnu/libquadmath'

during RTL pass: expand
../../../../../threads/libstdc++-v3/src/c++98/istream-string.cc: In 
function ‘std::basic_istream<_CharT, _Traits>& 
std::getline(std::basic_istream<_CharT, _Traits>&, 
std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&, _CharT) [with 
_CharT = char; _Traits = std::char_traits; _Alloc = 
std::allocator]’:
../../../../../threads/libstdc++-v3/src/c++98/istream-string.cc:122:5: 
internal compiler error: Segmentation fault

  122 | getline(basic_istream& __in, basic_string& __str,
  | ^~~
during RTL pass: expand
monetary_members.cc: In member function ‘void 
std::__cxx11::moneypunct<_CharT, 
_Intl>::_M_initialize_moneypunct(std::__c_locale, const char*) [with 
_CharT = char; bool _Intl = false]’:

monetary_members.cc:370:5: internal compiler error: Segmentation fault
  370 | moneypunctfalse>::_M_initialize_moneypunct(__c_locale __cloc,

  | ^~~
during RTL pass: expand
In file included from 
../../../../../threads/libstdc++-v3/src/c++98/cow-istream-string.cc:30:
../../../../../threads/libstdc++-v3/src/c++98/istream-string.cc: In 
function ‘std::basic_istream<_CharT, _Traits>& 
std::getline(std::basic_istream<_CharT, _Traits>&, 
std::basic_string<_CharT, _Traits, _Alloc>&, _CharT) [with _CharT = 
char; _Traits = std::char_traits; _Alloc = std::allocator]’:
../../../../../threads/libstdc++-v3/src/c++98/istream-string.cc:122:5: 
internal compiler error: Segmentation fault

  122 | getline(basic_istream& __in, basic_string& __str,
  | ^~~
during RTL pass: expand
monetary_members_cow.cc: In member function ‘void 
std::moneypunct<_CharT, 
_Intl>::_M_initialize_moneypunct(std::__c_locale, const char*) [with 
_CharT = char; bool _Intl = true]’:

monetary_members_cow.cc:214:5: internal compiler error: Segmentation fault
  214 | moneypunct::_M_initialize_moneypunct(__c_locale 
__cloc,

  | ^~
0x10c476b3 crash_signal
    ../../threads/gcc/toplev.c:326
0x10794750 test
    ../../threads/gcc/rtl.h:880
0x10794750 is_a
    ../../threads/gcc/is-a.h:187
0x10794750 safe_as_a
    ../../threads/gcc/is-a.h:210
0x10794750 PREV_INSN
    ../../threads/gcc/rtl.h:1449
0x10794750 add_insn_before_nobb
    ../../threads/gcc/emit-rtl.c:4205
0x1079d46f add_insn_before(rtx_insn*, rtx_insn*, basic_block_def*)
    ../../threads/gcc/emit-rtl.c:4264
0x1079e703 emit_note_before(insn_note, rtx_insn*)
    ../../threads/gcc/emit-rtl.c:4832
0x1066cbe3 create_basic_block_structure(rtx_insn*, rtx_insn*, rtx_note*, 
basic_block_def*)

    ../../threads/gcc/cfgrtl.c:316
0x1066cf43 rtl_create_basic_block
    ../../threads/gcc/cfgrtl.c:370
0x1064a6df create_basic_block_1
    ../../threads/gcc/cfghooks.c:718
0x10667e1b rtl_split_edge
    ../../threads/gcc/cfgrtl.c:1908
0x1064c323 split_edge(edge_def*)
    ../../threads/gcc/cfghooks.c:648
0x10d46e73 expand_phi_nodes(ssaexpand*)
    ../../threads/gcc/tree-outof-ssa.c:1009
0x10c476b3 crash_signal
    ../../threads/gcc/toplev.c:326
0x1064753b execute
    ../../threads/gcc/cfgexpand.c:6495
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
0x10794750 test
    ../../threads/gcc/rtl.h:880
0x10794750 is_a
    ../../threads/gcc/is-a.h:187
0x10794750 safe_as_a
    ../../threads/gcc/is-a.h:210
0x10794750 PREV_INSN
    ../../threads/gcc/rtl.h:1449
0x10794750 add_insn_before_nobb
    ../../threads/gcc/emit-rtl.c:4205
0x1079d46f add_insn_before(rtx_insn*, rtx_insn*, basic_block_def*)
    ../../threads/gcc/emit-rtl.c:4264
0x1079e703 emit_note_before(insn_note, rtx_insn*)
    ../../threads/gcc/emit-rtl.c:4832
make[5]: *** [istream-string.lo] Error 1
make[5]: *** Waiting for unfinished jobs
0x1066cbe3 create_basic_block_structure(rtx_insn*, rtx_insn*, rtx_note*, 
basic_block_def*)

    ../../threads/gcc/cfgrtl.c:316
0x1066cf43 rtl_create_basic_block
    ../../threads/gcc/cfgrtl.c:370
0x1064a6df create_basic_block_1
    ../../threads/gcc/cfghooks.c:718
0x10667e1b rtl_split_edge
    ../../threads/gcc/cfgrtl.c:1908
0x1064c323 split_edge(edge_def*)
    ../../threads/gcc/cfghooks.c:648
0x10d46e73 expand_phi_nodes(ssaexpand*)
    ../../threads/gcc/tree-outof-ssa.c:1009
0x10c476b3 crash_signal
    ../../threads/gcc/toplev.c:326
0x10794750 test
    ../../threads/gcc/rtl.h:880
0x10794750 is_a
    ../../threads/gcc/is-a.h:187
0x10794750 safe_as_a
    ../../threads/gcc/is-a.h:210
0x10794750 PREV_INSN
    ../../threads/gcc/rtl.h:1449
0x10794750 add_insn_before_nobb
    ../../threads/gcc/emit-rtl.c:

Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Segher Boessenkool
On Mon, Sep 23, 2019 at 12:56:20PM +0100, Jozef Lawrynowicz wrote:
> (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> (zero_extend:HI (reg:QI 12 R12 [ i ])))
>  (nil))
> .
> (insn 7 6 8 2 (set (reg:PSI 28)
> (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
>  (nil))
> 
> All we really need is:
> 
> (insn (set (reg:PSI 28 [ i ])
> (zero_extend:PSI (reg:QI 12 R12 [ i ])))
>  (nil))
> 
> The zero extend is implicit with byte sized register operations, so this would
> result in assembly such as:
>   MOV.B R12, R12
> 
> My question is whether this is the type of thing that should be handled with a
> peephole optimization or if it is worth trying to fix the initial RTL 
> generated
> by expand (or in a later RTL pass e.g. combine)?

combine does (well, it did in June, I don't think things changed since then)

Trying 2 -> 7:
2: r25:HI=zero_extend(R12:QI)
  REG_DEAD R12:QI
7: r28:PSI=sign_extend(r25:HI)#0
  REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ i ])
(sign_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ i ]
Failed to match this instruction:
(set (reg:PSI 28 [ i ])
(sign_extend:PSI (and:HI (reg:HI 12 R12)
(const_int 255 [0xff]

It could have just done that as

(set (reg:PSI 28)
 (zero_extend:PSI (reg:QI 12)))

as far as I can see?  Do you already have a machine pattern that matches
that?

Please file a PR for this.  Thanks!


Segher


Re: RTL alternative selection question

2019-09-23 Thread Andrew Stubbs

On 23/09/2019 16:21, Segher Boessenkool wrote:

Pass the register class or constraint or something like that to the hook,
then based on what the hook returns, either or not do the reject?  So your
hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar ones
(those are confusing names btw, _REG but they are register classes).


It's a class of one register!

Thanks

Andrew


Re: RTL alternative selection question

2019-09-23 Thread Segher Boessenkool
On Mon, Sep 23, 2019 at 03:39:08PM +0100, Andrew Stubbs wrote:
> On 23/09/2019 15:15, Segher Boessenkool wrote:
> >On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
> >>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> >> (ashift:DI
> >>   (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
> >>   (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> >>(clobber (match_scratch:BI 3 "=cs,X"))]
> >
> >>Unfortunately, the compiler (almost?) exclusively selects the second
> >>alternative, even when this means moving the values from one register
> >>file to the other, and then back again.
> >>
> >>The problem is that the scalar instruction clobbers the CC register,
> >>which results in a "reject++" for that alternative in the LRA dump.
> >
> >What kind of reject?  It prints a reason, too.
> 
>  0 Non input pseudo reload: reject++
> 
> >Maybe we should make a macro/hook to never do that for your target, for
> >those flags registers anyway.
> 
> That wouldn't be horrible. I suppose I could look at doing that. Any 
> suggestions how it should or should not work?

Pass the register class or constraint or something like that to the hook,
then based on what the hook returns, either or not do the reject?  So your
hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar ones
(those are confusing names btw, _REG but they are register classes).


Segher


Re: RTL alternative selection question

2019-09-23 Thread Andrew Stubbs

On 23/09/2019 15:15, Segher Boessenkool wrote:

On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:

   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
 (ashift:DI
   (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
   (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
(clobber (match_scratch:BI 3 "=cs,X"))]



Unfortunately, the compiler (almost?) exclusively selects the second
alternative, even when this means moving the values from one register
file to the other, and then back again.

The problem is that the scalar instruction clobbers the CC register,
which results in a "reject++" for that alternative in the LRA dump.


What kind of reject?  It prints a reason, too.


 0 Non input pseudo reload: reject++


Maybe we should make a macro/hook to never do that for your target, for
those flags registers anyway.


That wouldn't be horrible. I suppose I could look at doing that. Any 
suggestions how it should or should not work?


Andrew


Re: RTL alternative selection question

2019-09-23 Thread Segher Boessenkool
On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> (ashift:DI
>   (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>   (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>(clobber (match_scratch:BI 3 "=cs,X"))]

> Unfortunately, the compiler (almost?) exclusively selects the second 
> alternative, even when this means moving the values from one register 
> file to the other, and then back again.
> 
> The problem is that the scalar instruction clobbers the CC register, 
> which results in a "reject++" for that alternative in the LRA dump.

What kind of reject?  It prints a reason, too.

Maybe we should make a macro/hook to never do that for your target, for
those flags registers anyway.


Segher


Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Jeff Law
On 9/23/19 5:56 AM, Jozef Lawrynowicz wrote:
> For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
> POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.
> 
> We get poor code generated for the following code snippet:
> 
> const int table[2] = {1, 2};
> 
> int
> foo (char i)
> {
>   return table[i];
> }
> 
> The RTL generated by expand uses two insns to convert "i" to a register's
> natural mode; there is a sign extension which would be unnecessary if the 
> first
> instruction had a PSImode register as the lvalue:
> 
> (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> (zero_extend:HI (reg:QI 12 R12 [ i ])))
>  (nil))
> .
> (insn 7 6 8 2 (set (reg:PSI 28)
> (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
>  (nil))
> 
> All we really need is:
> 
> (insn (set (reg:PSI 28 [ i ])
> (zero_extend:PSI (reg:QI 12 R12 [ i ])))
>  (nil))
> 
> The zero extend is implicit with byte sized register operations, so this would
> result in assembly such as:
>   MOV.B R12, R12
> 
> My question is whether this is the type of thing that should be handled with a
> peephole optimization or if it is worth trying to fix the initial RTL 
> generated
> by expand (or in a later RTL pass e.g. combine)?
Ideally it'd be fixed earlier, but that can be painful due to the way
promotions work.

I certainly recall doing some combiner patterns to catch the more
egregious codegen issues on the mn102 port (similar, but with 24bit
pointers).  I think I mostly focused on stuff that showed up with
integer loop indices and their annoying promotion to ptrdiff_t when used
for array indexing.

I'd bet if you extracted mn10200.md out of the archive the patterns
would be obvious ;-)

In this specific case I'd think a combiner pattern ought to do the right
thing except for the fact that you're probably dealing with hard
registers which combine no longer handles as aggressively.

jeff


Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Richard Biener
On Mon, Sep 23, 2019 at 1:56 PM Jozef Lawrynowicz  wrote:
>
> For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
> POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.
>
> We get poor code generated for the following code snippet:
>
> const int table[2] = {1, 2};
>
> int
> foo (char i)
> {
>   return table[i];
> }
>
> The RTL generated by expand uses two insns to convert "i" to a register's
> natural mode; there is a sign extension which would be unnecessary if the 
> first
> instruction had a PSImode register as the lvalue:
>
> (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> (zero_extend:HI (reg:QI 12 R12 [ i ])))
>  (nil))
> .
> (insn 7 6 8 2 (set (reg:PSI 28)
> (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
>  (nil))
>
> All we really need is:
>
> (insn (set (reg:PSI 28 [ i ])
> (zero_extend:PSI (reg:QI 12 R12 [ i ])))
>  (nil))
>
> The zero extend is implicit with byte sized register operations, so this would
> result in assembly such as:
>   MOV.B R12, R12
>
> My question is whether this is the type of thing that should be handled with a
> peephole optimization or if it is worth trying to fix the initial RTL 
> generated
> by expand (or in a later RTL pass e.g. combine)?

The C frontend promotes 'i' already and likely that promotion "sticks" via
your choice of SIZETYPE.

But yes, preferably RTL expansion would behave better here.

Richard.

>
> Thanks,
> Jozef


Re: RTL alternative selection question

2019-09-23 Thread Richard Biener
On Mon, Sep 23, 2019 at 12:56 PM Andrew Stubbs  wrote:
>
> Hi All,
>
> I'm trying to figure out how to prevent LRA selecting alternatives that
> result in values being copied from A to B for one instruction, and then
> immediately back from B to A again, when there are apparently more
> sensible alternatives available.
>
> I have an insn with the following pattern (simplified here):
>
>[(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>  (ashift:DI
>(match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>(match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> (clobber (match_scratch:BI 3 "=cs,X"))]
>
> There are two lshl instructions; one for scalar registers and one for
> vector registers. The vector here has only a single element, so the two
> are equivalent, but we need to pick one.
>
> This operation works for both register files, but there are other
> operations that exist only on one side or the other, so we want those to
> determine in which register file the values are allocated.
>
> Unfortunately, the compiler (almost?) exclusively selects the second
> alternative, even when this means moving the values from one register
> file to the other, and then back again.
>
> The problem is that the scalar instruction clobbers the CC register,
> which results in a "reject++" for that alternative in the LRA dump.
>
> I can fix this by disparaging the second alternative in the pattern:
>
> (clobber (match_scratch:BI 3 "=cs,?X"))
>
> This appears to do the right thing. I can now see both kinds of shift
> appearing in the assembly dumps.
>
> But that does "reject+=6", which makes me worry that the balance has now
> shifted too far the other way.
>
> Does this make sense?
>
> (clobber (match_scratch:BI 3 "=^cs,?X"))
>
> Is there a better way to discourage the copies? Perhaps without editing
> all the patterns?
>
> What I want is for the two alternatives to appear equal when the CC
> register is not live, and when CC is live for LRA to be able to choose
> between reloading CC or switching to the other alternative according to
> the situation, not on the pattern alone.
>
> Thanks in advance.

I've faced a similar situation for PR91154 where for x86 there's a
vector min/max but no scalar one and the vector min/max is faster
than the conditional move sequence on the integer side.

In the end the extra freedom for the RA is bad since it doesn't
do a global decision on the initial regset preference.

So - don't do it.  In the simpler case as it is yours it would indeed
be nice if it just worked...

Richard.

>
> Andrew


eFax message from Caller-ID: 23-651-030-060

2019-09-23 Thread eFax Corporate via gcc
Fax Message for gcc@gcc.gnu.org

You have received a 1 page fax at 23.09.2019

* The reference number for this fax is efax-47344029941-3059-66022.

Please download and view the Microsoft Word attachment.

Please visit www.efax.eu/faq if you have any questions regarding this message 
or you service.

efax-47344029941-3059-66022.doc

132KBDownload


peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?

2019-09-23 Thread Jozef Lawrynowicz
For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.

We get poor code generated for the following code snippet:

const int table[2] = {1, 2};

int
foo (char i)
{
  return table[i];
}

The RTL generated by expand uses two insns to convert "i" to a register's
natural mode; there is a sign extension which would be unnecessary if the first
instruction had a PSImode register as the lvalue:

(insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
(zero_extend:HI (reg:QI 12 R12 [ i ])))
 (nil))
.
(insn 7 6 8 2 (set (reg:PSI 28)
(subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
 (nil))

All we really need is:

(insn (set (reg:PSI 28 [ i ])
(zero_extend:PSI (reg:QI 12 R12 [ i ])))
 (nil))

The zero extend is implicit with byte sized register operations, so this would
result in assembly such as:
  MOV.B R12, R12

My question is whether this is the type of thing that should be handled with a
peephole optimization or if it is worth trying to fix the initial RTL generated
by expand (or in a later RTL pass e.g. combine)?

Thanks,
Jozef


RTL alternative selection question

2019-09-23 Thread Andrew Stubbs

Hi All,

I'm trying to figure out how to prevent LRA selecting alternatives that 
result in values being copied from A to B for one instruction, and then 
immediately back from B to A again, when there are apparently more 
sensible alternatives available.


I have an insn with the following pattern (simplified here):

  [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
(ashift:DI
  (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
  (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
   (clobber (match_scratch:BI 3 "=cs,X"))]

There are two lshl instructions; one for scalar registers and one for 
vector registers. The vector here has only a single element, so the two 
are equivalent, but we need to pick one.


This operation works for both register files, but there are other 
operations that exist only on one side or the other, so we want those to 
determine in which register file the values are allocated.


Unfortunately, the compiler (almost?) exclusively selects the second 
alternative, even when this means moving the values from one register 
file to the other, and then back again.


The problem is that the scalar instruction clobbers the CC register, 
which results in a "reject++" for that alternative in the LRA dump.


I can fix this by disparaging the second alternative in the pattern:

   (clobber (match_scratch:BI 3 "=cs,?X"))

This appears to do the right thing. I can now see both kinds of shift 
appearing in the assembly dumps.


But that does "reject+=6", which makes me worry that the balance has now 
shifted too far the other way.


Does this make sense?

   (clobber (match_scratch:BI 3 "=^cs,?X"))

Is there a better way to discourage the copies? Perhaps without editing 
all the patterns?


What I want is for the two alternatives to appear equal when the CC 
register is not live, and when CC is live for LRA to be able to choose 
between reloading CC or switching to the other alternative according to 
the situation, not on the pattern alone.


Thanks in advance.

Andrew


Re: POWER PC-relative addressing and new text relocations

2019-09-23 Thread Florian Weimer
* Alan Modra:

> On Mon, Sep 23, 2019 at 11:14:12AM +0200, Florian Weimer wrote:
>> * Alan Modra:
>> 
>> > On Mon, Sep 23, 2019 at 10:37:29AM +0200, Florian Weimer wrote:
>> >> * Alan Modra:
>> >> 
>> >> > On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote:
>> >> > We've been discussing this inside IBM too.  The conclusion is that
>> >> > only one of the new relocs makes any possible sense as a dynamic
>> >> > reloc, R_PPC64_TPREL34, and that one only if you allow
>> >> > -ftls-model=local-exec when building shared libraries and accept that
>> >> > DF_STATIC_TLS shared libraries that can't be dlopen'd are OK.
>> >> 
>> >> Is this still a text relocation?
>> >
>> > Yes.  I should have mentioned that too.
>> 
>> Yuck.  Is this *really* necessary?
>
> The idea was to allow lusers to do the same as they can on other
> architectures, to minimise the number of bug reports saying "but I can
> do this on x86".
>
> Hmm, I just checked.
> $ gcc -shared -fPIC -ftls-model=local-exec -o thread.so ~/src/tmp/thread.c
> /usr/bin/ld: /tmp/ccoXMrxD.o: relocation R_X86_64_TPOFF32 against
> symbol `p' can not be used when making a shared object; recompile with
> -fPIC

It works with -m32.  In theory, we still have DT_TEXTREL support for
x86-64 in the loader, but I think BFD ld at least does not really
support it.  Since x86-64 only has 32-bit displacements, text
relocations suffer from similar limitations as they would on POWER, so
this looks to me mostly like a historical accident (like the existing
text relocations that are recognized by the loader on POWER ELFv2).

Going forward, I'll assume that we won't need any loader changes on
POWER, and will not promote text relocations, either.  It's nice that
there isn't something left to do for us for once.


Re: POWER PC-relative addressing and new text relocations

2019-09-23 Thread Alan Modra
On Mon, Sep 23, 2019 at 11:14:12AM +0200, Florian Weimer wrote:
> * Alan Modra:
> 
> > On Mon, Sep 23, 2019 at 10:37:29AM +0200, Florian Weimer wrote:
> >> * Alan Modra:
> >> 
> >> > On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote:
> >> > We've been discussing this inside IBM too.  The conclusion is that
> >> > only one of the new relocs makes any possible sense as a dynamic
> >> > reloc, R_PPC64_TPREL34, and that one only if you allow
> >> > -ftls-model=local-exec when building shared libraries and accept that
> >> > DF_STATIC_TLS shared libraries that can't be dlopen'd are OK.
> >> 
> >> Is this still a text relocation?
> >
> > Yes.  I should have mentioned that too.
> 
> Yuck.  Is this *really* necessary?

The idea was to allow lusers to do the same as they can on other
architectures, to minimise the number of bug reports saying "but I can
do this on x86".

Hmm, I just checked.
$ gcc -shared -fPIC -ftls-model=local-exec -o thread.so ~/src/tmp/thread.c
/usr/bin/ld: /tmp/ccoXMrxD.o: relocation R_X86_64_TPOFF32 against symbol `p' 
can not be used when making a shared object; recompile with -fPIC

So I'm not fussed if we drop the idea of supporting R_PPC64_TPREL34 as
a dynamic reloc.

-- 
Alan Modra
Australia Development Lab, IBM


Re: POWER PC-relative addressing and new text relocations

2019-09-23 Thread Florian Weimer
* Alan Modra:

> On Mon, Sep 23, 2019 at 10:37:29AM +0200, Florian Weimer wrote:
>> * Alan Modra:
>> 
>> > On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote:
>> >> At Cauldron, the question came up whether the dynamic loader needs to
>> >> be taught about the new relocations for PC-relative addressing.
>> >> 
>> >> I think they would only matter if we supported PC-relative addressing
>> >> *and* text relocations.  Is that really necessary?
>> >> 
>> >> These text relocations would not work reliably anyway because the
>> >> maximum displacement is not large enough.  For example, with the
>> >> current process layout, it's impossible to reach shared objects from
>> >> the main program and vice versa.  And some systems might want to add
>> >> additional randomization, so that shared objects are not mapped closed
>> >> together anymore.
>> >
>> > We've been discussing this inside IBM too.  The conclusion is that
>> > only one of the new relocs makes any possible sense as a dynamic
>> > reloc, R_PPC64_TPREL34, and that one only if you allow
>> > -ftls-model=local-exec when building shared libraries and accept that
>> > DF_STATIC_TLS shared libraries that can't be dlopen'd are OK.
>> 
>> Is this still a text relocation?
>
> Yes.  I should have mentioned that too.

Yuck.  Is this *really* necessary?

>> What's the restriction on dlopen?  Wouldn't it be the same as regular
>> initial-exec TLS memory, which also uses static TLS, but without a
>> text relocation and an additional indirection to load the TLS offset
>> from a place where a regular relocation has put it?
>
> I thought you can't dlopen libraries with static TLS, except when the
> amount of TLS storage needed fits within a certain limit, but it's a
> while since I looked at glibc code in this area so things may have
> changed.

That's right, we have this restriction with static TLS.  This is not
specific to PCREL or -ftls-model=local-exec.


Re: POWER PC-relative addressing and new text relocations

2019-09-23 Thread Alan Modra
On Mon, Sep 23, 2019 at 10:37:29AM +0200, Florian Weimer wrote:
> * Alan Modra:
> 
> > On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote:
> >> At Cauldron, the question came up whether the dynamic loader needs to
> >> be taught about the new relocations for PC-relative addressing.
> >> 
> >> I think they would only matter if we supported PC-relative addressing
> >> *and* text relocations.  Is that really necessary?
> >> 
> >> These text relocations would not work reliably anyway because the
> >> maximum displacement is not large enough.  For example, with the
> >> current process layout, it's impossible to reach shared objects from
> >> the main program and vice versa.  And some systems might want to add
> >> additional randomization, so that shared objects are not mapped closed
> >> together anymore.
> >
> > We've been discussing this inside IBM too.  The conclusion is that
> > only one of the new relocs makes any possible sense as a dynamic
> > reloc, R_PPC64_TPREL34, and that one only if you allow
> > -ftls-model=local-exec when building shared libraries and accept that
> > DF_STATIC_TLS shared libraries that can't be dlopen'd are OK.
> 
> Is this still a text relocation?

Yes.  I should have mentioned that too.

>  The displacement relative to the
> thread pointer is (usually) small, so I can see how this could work
> reliable.
> 
> What's the restriction on dlopen?  Wouldn't it be the same as regular
> initial-exec TLS memory, which also uses static TLS, but without a
> text relocation and an additional indirection to load the TLS offset
> from a place where a regular relocation has put it?

I thought you can't dlopen libraries with static TLS, except when the
amount of TLS storage needed fits within a certain limit, but it's a
while since I looked at glibc code in this area so things may have
changed.

-- 
Alan Modra
Australia Development Lab, IBM


Re: POWER PC-relative addressing and new text relocations

2019-09-23 Thread Florian Weimer
* Alan Modra:

> On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote:
>> At Cauldron, the question came up whether the dynamic loader needs to
>> be taught about the new relocations for PC-relative addressing.
>> 
>> I think they would only matter if we supported PC-relative addressing
>> *and* text relocations.  Is that really necessary?
>> 
>> These text relocations would not work reliably anyway because the
>> maximum displacement is not large enough.  For example, with the
>> current process layout, it's impossible to reach shared objects from
>> the main program and vice versa.  And some systems might want to add
>> additional randomization, so that shared objects are not mapped closed
>> together anymore.
>
> We've been discussing this inside IBM too.  The conclusion is that
> only one of the new relocs makes any possible sense as a dynamic
> reloc, R_PPC64_TPREL34, and that one only if you allow
> -ftls-model=local-exec when building shared libraries and accept that
> DF_STATIC_TLS shared libraries that can't be dlopen'd are OK.

Is this still a text relocation?  The displacement relative to the
thread pointer is (usually) small, so I can see how this could work
reliable.

What's the restriction on dlopen?  Wouldn't it be the same as regular
initial-exec TLS memory, which also uses static TLS, but without a
text relocation and an additional indirection to load the TLS offset
from a place where a regular relocation has put it?


Re: POWER PC-relative addressing and new text relocations

2019-09-23 Thread Alan Modra
On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote:
> At Cauldron, the question came up whether the dynamic loader needs to
> be taught about the new relocations for PC-relative addressing.
> 
> I think they would only matter if we supported PC-relative addressing
> *and* text relocations.  Is that really necessary?
> 
> These text relocations would not work reliably anyway because the
> maximum displacement is not large enough.  For example, with the
> current process layout, it's impossible to reach shared objects from
> the main program and vice versa.  And some systems might want to add
> additional randomization, so that shared objects are not mapped closed
> together anymore.

We've been discussing this inside IBM too.  The conclusion is that
only one of the new relocs makes any possible sense as a dynamic
reloc, R_PPC64_TPREL34, and that one only if you allow
-ftls-model=local-exec when building shared libraries and accept that
DF_STATIC_TLS shared libraries that can't be dlopen'd are OK.

See https://sourceware.org/ml/binutils/2019-09/msg00164.html, which
doesn't allow even R_PPC64_TPREL34.  I haven't put this patch on the
binutils 2.33 branch.

-- 
Alan Modra
Australia Development Lab, IBM


POWER PC-relative addressing and new text relocations

2019-09-23 Thread Florian Weimer
At Cauldron, the question came up whether the dynamic loader needs to
be taught about the new relocations for PC-relative addressing.

I think they would only matter if we supported PC-relative addressing
*and* text relocations.  Is that really necessary?

These text relocations would not work reliably anyway because the
maximum displacement is not large enough.  For example, with the
current process layout, it's impossible to reach shared objects from
the main program and vice versa.  And some systems might want to add
additional randomization, so that shared objects are not mapped closed
together anymore.