Re: Proposed new pass to optimise mode register assignments

2024-09-12 Thread Richard Sandiford via Gcc
Jeff Law  writes:
> On 9/7/24 1:09 AM, Richard Biener wrote:
>> 
>> 
>>> Am 06.09.2024 um 17:38 schrieb Andrew Carlotti :
>>>
>>> Hi,
>>>
>>> I'm working on optimising assignments to the AArch64 Floating-point Mode
>>> Register (FPMR), as part of our FP8 enablement work.  Claudio has already
>>> implemented FPMR as a hard register, with the intention that FP8 intrinsic
>>> functions will compile to a combination of an fpmr register set, followed 
>>> by an
>>> FP8 operation that takes fpmr as an input operand.
>>>
>>> It would clearly be inefficient to retain an explicit FPMR assignment prior 
>>> to whic
>>> each FP8 instruction (especially in the common case where every assignment 
>>> uses
>>> the same FPMR value).  I think the best way to optimise this would be to
>>> implement a new pass that can optimise assignments to individual hard 
>>> registers.
>>>
>>> There are a number of existing passes that do similar optimisations, but 
>>> which
>>> I believe are unsuitable for this scenario for various reasons.  For 
>>> example:
>>>
>>> - cse1 can already optimise FPMR assignments within an extended basic block,
>>>   but can't handle broader optimisations.
>>> - pre (in gcse.c) doesn't work with assigning constant values, which would 
>>> miss
>>>   many potential usages.  It also has limits on how far code can be moved,
>>>   based around ideas of register pressure that don't apply to the context 
>>> of a
>>>   single hard register that shouldn't be used by the register allocator for
>>>   anything else.  Additionally, it doesn't run at -Os.
>>> - hoist (also using gcse.c) only handles constant values, and only runs when
>>>   optimising for size.  It also has the rest of the issues that pre does.
>>> - mode_sw only handles a small finite set of modes.  The mode requirements 
>>> are
>>>   determined solely by the instructions that require the specific mode, so 
>>> mode
>>>   switches don't depend on the output of previous instructions.
>>>
>>>
>>> My intention would be for the new pass to reuse ideas, and hopefully some of
>>> the existing code, from the mode-switching and gcse passes.  In particular,
>>> gcse.c (or it's dependencies) has code that could identify when values 
>>> assigned
>>> to the FPMR are known to be the same (although we may not need the full CSE
>>> capabilities of gcse.c), and mode-switching.cc knows how to globally 
>>> optimise
>>> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to 
>>> avoid
>>> excessively increasing register pressure).
>>>
>>> Initially the new pass would only apply to the AArch64 FPMR register, but in
>>> future it could also be used for other hard registers with similar 
>>> properties.
>>>
>>> Does anyone have any comments on this approach, before I start writing any
>>> code?
>> 
>> Can you explain in more detail why the mode-switching pass
> infrastructure isn’t a good fit?  ISTR it already is customizable via
> target hooks.
> Agreed.  Mode switching seems to be the right pass to look at.
>
> It probably is worth pointing out that mode switching is LCM based and 
> as such never speculates.  Given the potential cost of a mode switch, 
> failure to speculate may be a notable limitation (though the same would 
> apply to the ideas Andrew floated above).
>
> This has recently come up in the RISC-V space due to needing VXRM 
> assignments so that we can utilize the vaaddu add-with-averaging 
> instructions.Placement of VXRM mode switches looks optimal from an 
> LCM standpoint, but speculation can measurably improve performance.  It 
> was something like 2% on the BPI for x264.  The k1/m1 chip in the BPI is 
> almost certainly flushing its pipelines on the VXRM assignment.

Ah yeah, good point.  I expect speculation would be best for FPMR as well.
I imagine most use cases will be well-structured in practice, but for
those that aren't...

> I've got a hack here that I'll submit upstream at some point.  Just not 
> at the top of my list yet -- especially now that our uarch has been 
> fixed to not flush its pipelines at VXRM assignments ;-)

Is that handled by mode-switching, or is it a separate thing?

RIchard


Re: Late combine & mode switch

2024-09-12 Thread Richard Sandiford via Gcc
Sorry for the slow response.

Xi Ruoyao  writes:
> Hi Richard,
>
> When I hack the LoongArch backend I notice something like
>
> slli.d $r4, $r4, 2
> add.w $r4, $r4, $r5
>
> Or
>
> (set (reg:DI 4) (ashift:DI (reg:DI 4) (const_int 2))
> (set (reg:DI 4)
>  (sign_extend:DI (add:SI (reg:SI 4) (reg:SI 5
>
> can appear after split.  On LoongArch it can be done via an alsl.w
> instruction, so I attempted to combine them in late combine with:
>
> (define_insn
>   [(set (match_operand:DI 0 "register_operand" "=r")
> (sign_extend:DI
>   (add:SI
> (subreg:SI
>   (ashift:DI (match_operand:DI 1 "register_operand" "r")
>  (match_operand:SI 2 "const_immalsl_operand" ""))
>   0)
> (match_operand:SI 3 "register_operand" "r"]
>   "TARGET_64BIT"
>   "alsl.w\t%0,%1,%3,%2")
>
> But this does not work and I get "RTL substitution failed" with
> -fdump-rtl-late_combine2-details.
>
> I want to open an RFE in Bugzilla.  But before that I'm wondering: maybe
> I'm just too stupid to figure out the correct way for this?

At the moment, insn_propagation::apply_to_rvalue_1 chickens out of
a change in hard register mode that would involve an explicit subreg.
In this particular case, I'd hope that the subreg:SI would be pushed
into the ashift to give an ashift:SI of a reg:SI, but it seems that
that isn't happening for some reason.  It probably has something to do
with WORD_REGISTER_OPERATIONS (which loongson defines, but aarch64 doesn't).

That answer only applies to some codes though.  Shifts right would still
hit the problem you mention.  I think it'd be ok to try to relax the
condition if a port needs it -- the current code is (supposed to be)
deliberately conservative.

Thanks,
Richard



Re: insn attributes: Support blocks of C-code?

2024-07-17 Thread Richard Sandiford via Gcc
Georg-Johann Lay  writes:
> [...]
> Am 13.07.24 um 13:44 schrieb Richard Sandiford:
>> Georg-Johann Lay  writes:
>>> diff --git a/gcc/read-md.h b/gcc/read-md.h
>>> index 9703551a8fd..ae10b651de1 100644
>>> --- a/gcc/read-md.h
>>> +++ b/gcc/read-md.h
>>> @@ -132,6 +132,38 @@ struct overloaded_name {
>>> overloaded_instance **next_instance_ptr;
>>>   };
>>>   
>>> +/* Structure for each attribute.  */
>>> +
>>> +struct attr_value;
>>> +
>>> +class attr_desc
>>> +{
>>> +public:
>>> +  char *name;  /* Name of attribute.  */
>>> +  const char *enum_name;   /* Enum name for DEFINE_ENUM_NAME.  */
>>> +  class attr_desc *next;   /* Next attribute.  */
>>> +  struct attr_value *first_value; /* First value of this attribute.  */
>>> +  struct attr_value *default_val; /* Default value for this attribute.  */
>>> +  file_location loc;   /* Where in the .md files it occurs.  */
>>> +  unsigned is_numeric  : 1;/* Values of this attribute are 
>>> numeric.  */
>>> +  unsigned is_const: 1;/* Attribute value constant for each 
>>> run.  */
>>> +  unsigned is_special  : 1;/* Don't call `write_attr_set'.  */
>>> +
>>> +  // Print the return type for functions like get_attr_
>>> +  // to stream OUTF, followed by SUFFIX which should be white-space(s).
>>> +  void fprint_type (FILE *outf, const char *suffix) const
>>> +  {
>>> +if (enum_name)
>>> +  fprintf (outf, "enum %s", enum_name);
>>> +else if (! is_numeric)
>>> +  fprintf (outf, "enum attr_%s", name);
>>> +else
>>> +  fprintf (outf, "int");
>>> +
>>> +fprintf (outf, "%s", suffix);
>> 
>> It shouldn't be necessary to emit the enum tag these days.  If removing
>
> Hi Richard,
>
> I am not familiar with the gensupport policies, which is the reason why
> the feature is just a suggestion / proposal and not a patch.
> IMO patches should not come from someone like me who has no experience
> in that area; better someone more experienced would take it over.
>
>> it causes anything to break, I think we should fix whatever that breaking
>> thing is.  Could you try doing that, as a pre-patch?  Or I can give it a
>> go, if you'd rather not.
>
> Yes please.

OK, I pushed b19906a029a to remove the enum tags.  The type name is
now stored as a const char * in attr_desc::cxx_type.

>> If we do that, then we can just a return a const char * for the type.
>
> Yes, const char* would be easier. I just didn't know how to alloc one,
> and where.  A new const char* property in class attr_desc_would solve
> it.
>
>> And then in turn we can pass a const char * to (f)print_c_condition.
>> The MD reader then wouldn't need to know about attributes.
>> 
>> Thanks,
>> Richard
>
> When this feature makes it into GCC, then match_test should behave
> similar, I guess?  I.e. support function bodies that return bool.
> I just wasn't sure which caller of fprint_c_condition runs with
> match_test resp. symbol_ref from which context (insn attribute or
> predicate, etc).

Yeah, might be useful for match_test too.

> Thanks for looking into this and for considering it as an extension.
>
> The shortcomings like non-support of pathological comments like
> /* } */ is probably not such a big issue. And fixing it would have
> to touch the md scanner / lexer and have side effects I don't know,
> like on build performance and stability of course.  That part could
> be fixed when someone actually needs it.

It looks like we don't support \{ and \}, but that's probably an oversight.

Thanks,
Richard


Re: Insn combine trying (ior:HI (clobber:HI (const_int 0)))

2024-07-15 Thread Richard Sandiford via Gcc
Georg-Johann Lay  writes:
> In a test case I see insn combine trying to match such
> expressions, which do not make any sense to me, like:
>
> Trying 2 -> 7:
>  2: r45:HI=r48:HI
>REG_DEAD r48:HI
>  7: {r47:HI=r45:HI|r46:PSI#0;clobber scratch;}
>REG_DEAD r46:PSI
>REG_DEAD r45:HI
> Failed to match this instruction:
> (parallel [
>  (set (reg:HI 47 [ _4 ])
>  (ior:HI (clobber:HI (const_int 0 [0]))
>  (reg:HI 48)))
>  (clobber (scratch:QI))
>  ])
>
> and many other occasions like that.
>
> Is this just insn combine doing its business?
>
> Or should this be some sensible RTL instead?
>
> Seen on target avr with v14 and trunk,
> attached test case and dump compiled with

(clobber:M (const_int 0)) is combine's way of representing
"something went wrong here".  And yeah, recog acts as an error
detection mechanism in these cases.  In other words, the idea
is that recog should eventually fail on nonsense rtl like that,
so earlier code doesn't need to check explicitly.

Richard

>
> $ avr-gcc-14 strange.c -S -Os -dp -da
>
> Johann


Re: insn attributes: Support blocks of C-code?

2024-07-13 Thread Richard Sandiford via Gcc
Georg-Johann Lay  writes:
> So I had that situation where in an insn attribute, providing
> a block of code (rather than just an expression) would be
> useful.
>
> Expressions can provided by means of symbol_ref, like in
>
> (set (attr "length")
>   (symbol_ref ("1 + GET_MODE_SIZE (mode)")))
>
> However providing a block of code gives a syntax error from
> the compiler, *NOT* from md_reader:
>
> (set (attr "length")
>   (symbol_ref
>{
>  int len = 1;
>  return len;
>}))
>
> This means such syntax is already supported to some degree,
> there's just no semantics assigned to such code.
>
> Blocks of code are already supported in insn predicates,
> like in
>
> (define_predicate "my_operand"
>(match_code "code_label,label_ref,symbol_ref,plus,const")
> {
>some code...
>return true-or-false;
> })
>
> In the insn attribute case, I hacked a bit and supported
> blocks of code like in the example above.  The biggest change
> is that class attr_desc has to be moved from genattrtab.cc to
> read-md.h so that it is a complete type as required by
> md_reader::fprint_c_condition().
>
> That method prints to code for symbol_ref and some others, and
> it has to know the type of the attribute, like "int" for the
> "length" attribute.  The implementation in fprint_c_condition()
> is straight forward:
>
> When cond (which is the payload string of symbol_ref, including the
> '{}'s) starts with '{', the print a lambda that's called in place,
> like in
>
> print "( [&]() ->   () )"
>
> The "&" capture is required so that variables like "insn" are
> accessible. "operands[]" and "which_alternative" are global,
> thus also accessible.
>
> Attached is the code I have so far (which is by no means a
> proposed patch, so I am posting here on gcc@).
>
> As far as I can tell, there is no performance penalty, e.g.
> in build times, when the feature is not used.  Of course instead
> of such syntax, a custom function could be used, or the
> braces-brackets-parentheses-gibberish could be written out
> in the symbol_ref as an expression.  Though I think this
> could be a nice addition, in particular because the scanning
> side in md_reader already supports the syntax.

Looks good to me.  I know you said it wasn't a patch submission,
but it looks mostly ready to go.  Some comments below:

> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 7f4335e0aac..3e46693e8c2 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -10265,6 +10265,56 @@ so there is no need to explicitly convert the 
> expression into a boolean
>  (match_test "(x & 2) != 0")
>  @end smallexample
>  
> +@cindex @code{symbol_ref} and attributes
> +@item (symbol_ref "@var{quoted-c-expr}")
> +
> +Specifies the value of the attribute sub-expression as a C expression,
> +where the surrounding quotes are not part of the expression.
> +Similar to @code{match_test}, variables @var{insn}, @var{operands[]}
> +and @var{which_alternative} are available.  Moreover, code and mode
> +attributes can be used to compose the resulting C expression, like in
> +
> +@smallexample
> +(set (attr "length")
> + (symbol_ref ("1 + GET_MODE_SIZE (mode)")))
> +@end smallexample
> +
> +where the according insn has exactly one mode iterator.
> +See @ref{Mode Iterators} and @ref{Code Iterators}.

I got the impression s/See @ref/@xref/ was recommended for sentence
references.

> +
> +@item  (symbol_ref "@{ @var{quoted-c-code} @}")
> +@itemx (symbol_ref @{ @var{c-code} @})
> +
> +The value of this subexpression is determined by running a block
> +of C code which returns the desired value.
> +The braces are part of the code, whereas the quotes in the quoted form are 
> not.
> +
> +This variant of @code{symbol_ref} allows for more comlpex code than
> +just a single C expression, like for example:
> +
> +@smallexample
> +(set (attr "length")
> + (symbol_ref
> +  @{
> +int len;
> +some_function (insn, , mode, & len);
> +return len;
> +  @}))
> +@end smallexample
> +
> +for an insn that has one code iterator and one mode iterator.
> +Again, variables @var{insn}, @var{operands[]} and @var{which_alternative}
> +can be used.  The unquoted form only supports a subset of C,
> +for example no C comments are supported, and strings that contain
> +characters like @samp{@}} are problematic and may need to be escaped
> +as @samp{\@}}.

By unquoted form, do you mean (symbol_ref { ... })?  I'd have expected
that to be better than "{ ... }" (or at least, I thought that was the
intention when { ... } was added).  I was going to suggest not documenting
the "{ ... }" form until I saw this.

> +
> +The return type is @code{int} for the @var{length} attribute, and
> +@code{enum attr_@var{name}} for an insn attribute named @var{name}.
> +The types and available enum values can be looked up in
> +@file{$builddir/gcc/insn-attr-common.h}.
> +
> +
>  @cindex @code{le} and attributes
>  @cindex @code{leu} and attributes
>  @cindex @code

Re: Help with vector cost model

2024-07-11 Thread Richard Sandiford via Gcc
Andrew Pinski  writes:
> I need some help with the vector cost model for aarch64.
> I am adding V2HI and V4QI mode support by emulating it using the
> native V4HI/V8QI instructions (similarly to mmx as SSE is done). The
> problem is I am running into a cost model issue with
> gcc.target/aarch64/pr98772.c (wminus is similar to
> gcc.dg/vect/slp-gap-1.c, just slightly different offsets for the
> address).
> It seems like the cost mode is overestimating the number of loads for
> V8QI case .
> With the new cost model usage (-march=armv9-a+nosve), I get:
> ```
> t.c:7:21: note:  * Analysis succeeded with vector mode V4QI
> t.c:7:21: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2)
> t.c:7:21: note:  Issue info for V4QI loop:
> t.c:7:21: note:load operations = 2
> t.c:7:21: note:store operations = 1
> t.c:7:21: note:general operations = 4
> t.c:7:21: note:reduction latency = 0
> t.c:7:21: note:estimated min cycles per iteration = 2.00
> t.c:7:21: note:  Issue info for V8QI loop:
> t.c:7:21: note:load operations = 12
> t.c:7:21: note:store operations = 1
> t.c:7:21: note:general operations = 6
> t.c:7:21: note:reduction latency = 0
> t.c:7:21: note:estimated min cycles per iteration = 4.33
> t.c:7:21: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
> t.c:7:21: note:  Weighted cycles per iteration of V8QI loop ~= 4.33
> t.c:7:21: note:  Preferring loop with lower cycles per iteration
> t.c:7:21: note:  * Preferring vector mode V4QI to vector mode V8QI
> ```
>
> That is totally wrong and instead of vectorizing using V8QI we
> vectorize using V4QI and the resulting code is worse.
>
> Attached is my current patch for adding V4QI/V2HI to the aarch64
> backend (Note I have not finished up the changelog nor the testcases;
> I have secondary patches that add the testcases already).
> Is there something I am missing here or are we just over estimating
> V8QI cost and is something easy to fix?

Trying it locally, I get:

foo.c:15:23: note:  * Analysis succeeded with vector mode V4QI
foo.c:15:23: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2)
foo.c:15:23: note:  Issue info for V4QI loop:
foo.c:15:23: note:load operations = 2
foo.c:15:23: note:store operations = 1
foo.c:15:23: note:general operations = 4
foo.c:15:23: note:reduction latency = 0
foo.c:15:23: note:estimated min cycles per iteration = 2.00
foo.c:15:23: note:  Issue info for V8QI loop:
foo.c:15:23: note:load operations = 8
foo.c:15:23: note:store operations = 1
foo.c:15:23: note:general operations = 6
foo.c:15:23: note:reduction latency = 0
foo.c:15:23: note:estimated min cycles per iteration = 3.00
foo.c:15:23: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
foo.c:15:23: note:  Weighted cycles per iteration of V8QI loop ~= 3.00
foo.c:15:23: note:  Preferring loop with lower cycles per iteration

The function is:

extern void
wplus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 )
{
for (int y = 0; y < 4; y++ )
{
for (int x = 0; x < 4; x++ )
d[x + y*4] = pix1[x] + pix2[x];
pix1 += 16;
pix2 += 16;
}
}

For V8QI we need a VF of 2, so that there are 8 elements to store to d.
Conceptually, we handle those two iterations by loading 4 V8QIs from
pix1 and pix2 (32 bytes each), with mitigations against overrun,
and then permute the result to single V8QIs.

vectorize_load doesn't seem to be smart enough to realise that only 2
of those 4 loads are actually used in the permuation, and so only 2
loads should be costed for each of pix1 and pix2.

Thanks,
Richard


Re: md: define_code_attr / define_mode_attr: Default value?

2024-07-09 Thread Richard Sandiford via Gcc
Georg-Johann Lay  writes:
> Is it possible to specify a default value in
> define_code_attr resp. define_mode_attr ?
>
> I had a quick look at read-rtl, and it seem to be not the case.

Yeah, that's right.  I'd assumed the attributes would be used
in cases where an active choice has to be made for each code/mode,
with missing codes/modes being a noisy failure.

Adding a default value sounds ok though, and would be consistent
with insn attributes.

Richard


> Or am I missing something?
>
> Johann


Re: [RFC] MAINTAINERS: require a BZ account field

2024-07-07 Thread Richard Sandiford via Gcc
Sam James  writes:
> Richard Sandiford  writes:
>
>> Sam James via Gcc  writes:
>>> Hi!
>>>
>>> This comes up in #gcc on IRC every so often, so finally
>>> writing an RFC.
>>>
>> [...]
>>> TL;DR: The proposal is:
>>>
>>> 1) MAINTAINERS should list a field containing either the gcc.gnu.org
>>> email in full, or their gcc username (bikeshedding semi-welcome);
>>>
>>> 2) It should become a requirement that to be in MAINTAINERS, one must
>>> possess a Bugzilla account (ideally using their gcc.gnu.org email).
>>
>> How about the attached as a compromise?  (gzipped as a poor protection
>> against scraping.)
>>
>
> Thanks! This would work for me. A note on BZ below.
>
>> It adds the gcc.gnu.org/bugzilla account name, without the @gcc.gnu.org,
>> as a middle column to the Write After Approval section.  I think this
>> makes it clear that the email specified in the last column should be
>> used for communication.
>>
>> [..]
>>
>> If this is OK, I'll need to update check-MAINTAINERS.py.
>
> For Bugzilla, there's two issues:
> 1) If someone uses an alternative (n...@gcc.gnu.org) email on Bugzilla,
> unless an exception is made (and Jakub indicated he didn't want to add
> more - there's very few right now), they do not have editbugs and cannot
> assign bugs to themselves or edit fields, etc.
>
> This leads to bugs being open when they don't need to be anymore, etc,
> and pinskia and I often have to clean that up.
>
> People with commit access are usually very happy to switch to
> @gcc.gnu.org when I let them know it grants powers!
>
> 2) CCing someone using a n...@gcc.gnu.org email is a pain, but *if* they
> have to use a n...@gcc.gnu.org email, it might be OK if they use the
> email that is listed in MAINTAINERS otherwise. If they use a third email
> then it becomes a pain though, but your proposal helps if it's just two
> emails in use.
>
> (But I'd still really encourage them to not do that, given the lack of
> perms.)
>
> I care about both but 1) > 2) for me, some others here care a lot about 2)
> if they're the ones doing triage and bisecting.

Ah, yeah, I agree with all of the above.  By "communication" I meant
"normal email" -- sorry for the bad choice of words.

For me, the point of the new middle column is to answer "which gcc.gnu.org
account should I use in bugzilla PRs?".  But adding "@gcc.gnu.org" to each
entry might encourage people to use it for normal email too.

After:

  To report problems in GCC, please visit:

http://gcc.gnu.org/bugs/

how about adding something like:

  If you wish to CC a maintainer in bugzilla, please add @gcc.gnu.org
  to the account name given in the Write After Approval section below.
  Please use the email address given in <...> for direct email communication.

Richard


Re: [RFC] MAINTAINERS: require a BZ account field

2024-07-04 Thread Richard Sandiford via Gcc
Sam James via Gcc  writes:
> Hi!
>
> This comes up in #gcc on IRC every so often, so finally
> writing an RFC.
>
> What?
> ---
>
> I propose that MAINTAINERS be modified to be of the form,
> adding an extra field for their GCC/sourceware account:
>account>
>   Joe Bloggsjoeblo...@example.com  jblo...@gcc.gnu.org
>
> Further, that the field must not be blank (-> must have a BZ account;
> there were/are some without at all)!
>
> Why?
> ---
>
> 1) This is tied to whether or not people should use their committer email
> on Bugzilla or a personal email. A lot of people don't seem to use their
> committer email (-> no permissions) and end up not closing bugs, so
> pinskia (and often myself these days) end up doing it for them.
>
> 2) It's standard practice to wish to CC the committer of a bisect result
> - or to CC someone who you know wrote patches on a subject area. Doing
> this on Bugzilla is challenging when there's no map between committer
> <-> BZ account.
>
> Specifically, there are folks who have git committer+author as
> joeblo...@example.com (or maybe even coold...@example.com) where the
> local part of the address has *no relation* to their GCC/sw account,
> so finding who to CC is difficult without e.g. trawling through gcc-cvs
> mails or asking overseers for help.
>
> Summary
> ---
>
> TL;DR: The proposal is:
>
> 1) MAINTAINERS should list a field containing either the gcc.gnu.org
> email in full, or their gcc username (bikeshedding semi-welcome);
>
> 2) It should become a requirement that to be in MAINTAINERS, one must
> possess a Bugzilla account (ideally using their gcc.gnu.org email).

How about the attached as a compromise?  (gzipped as a poor protection
against scraping.)

It adds the gcc.gnu.org/bugzilla account name, without the @gcc.gnu.org,
as a middle column to the Write After Approval section.  I think this
makes it clear that the email specified in the last column should be
used for communication.

It's awkward to add a new column to the area maintainer section, so this
version also reverses the policy of removing entries from Write After
Approval if they appear in a more specific section.

I've also committed heresy and replaced the tabs with spaces.

The account names are taken from the gcc-cvs archives (thanks to
Andrew for the hint to look there).  I've tried to make the process
relatively conservative, in the hope of avoiding false positives or
collisions.  A handful of entries were derived manually.  There were
four that I couldn't find easily (search for " - ").

James Norris had an entry without an email address.  I've left that
line alone.

If this is OK, I'll need to update check-MAINTAINERS.py.

Thanks,
Richard



MAINTAINERS.gz
Description: application/gzip


Re: [RFC] Merge strathegy for all-SLP vectorizer

2024-05-17 Thread Richard Sandiford via Gcc
Richard Biener via Gcc  writes:
> Hi,
>
> I'd like to discuss how to go forward with getting the vectorizer to
> all-SLP for this stage1.  While there is a personal branch with my
> ongoing work (users/rguenth/vect-force-slp) branches haven't proved
> themselves working well for collaboration.

Speaking for myself, the problem hasn't been so much the branch as
lack of time.  I've been pretty swamped the last eight months of so
(except for the time that I took off, which admittedly was quite a
bit!), and so I never even got around to properly reading and replying
to your message after the Cauldron.  It's been on the "this is important,
I should make time to read and understand it properly" list all this time.
Sorry about that. :(

I'm hoping to have time to work/help out on SLP stuff soon.

> The branch isn't ready to be merged in full but I have been picking
> improvements to trunk last stage1 and some remaining bits in the past
> weeks.  I have refrained from merging code paths that cannot be
> exercised on trunk.
>
> There are two important set of changes on the branch, both critical
> to get more testing on non-x86 targets.
>
>  1. enable single-lane SLP discovery
>  2. avoid splitting store groups (9315bfc661432c3 and 4336060fe2db8ec
> if you fetch the branch)
>
> The first point is also most annoying on the testsuite since doing
> SLP instead of interleaving changes what we dump and thus tests
> start to fail in random ways when you switch between both modes.
> On the branch single-lane SLP discovery is gated with
> --param vect-single-lane-slp.
>
> The branch has numerous changes to enable single-lane SLP for some
> code paths that have SLP not implemented and where I did not bother
> to try supporting multi-lane SLP at this point.  It also adds more
> SLP discovery entry points.
>
> I'm not sure how to try merging these pieces to allow others to
> more easily help out.  One possibility is to merge
> --param vect-single-lane-slp defaulted off and pick dependent
> changes even when they cause testsuite regressions with
> vect-single-lane-slp=1.  Alternatively adjust the testsuite by
> adding --param vect-single-lane-slp=0 and default to 1
> (or keep the default).

FWIW, this one sounds good to me (the default to 1 version).
I.e. mechanically add --param vect-single-lane-slp=0 to any tests
that fail with the new default.  That means that the test that need
fixing are easily greppable for anyone who wants to help.  Sometimes
it'll just be a test update.  Sometimes it will be new vectoriser code.

Thanks,
Richard

> Or require a clean testsuite with
> --param vect-single-lane-slp defaulted to 1 but keep the --param
> for debugging (and allow FAILs with 0).
>
> For fun I merged just single-lane discovery of non-grouped stores
> and have that enabled by default.  On x86_64 this results in the
> set of FAILs below.
>
> Any suggestions?
>
> Thanks,
> Richard.
>
> FAIL: gcc.dg/vect/O3-pr39675-2.c scan-tree-dump-times vect "vectorizing 
> stmts using SLP" 1
> XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> "Vectorizing an unaligned access" 0
> FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> "Vectorizing an unaligned access" 0
> FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 1
> FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect 
> "Vectorizing an unaligned access" 0
> FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect 
> "Vectorizing an unaligned access" 0
> FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorizing stmts using SLP" 1
> FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts 
> using SLP" 1
> FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorizing stmts using SLP" 1
> FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorizing stmts 
> using SLP" 1
> FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorizing stmts using SLP" 1
> FAIL: gcc.dg/vect/slp-19b.c scan-tree-dump-times vect "vectorizing stmts 
> using SLP" 1
> FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorizing stmts using SLP" 1
> FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorized 1 loops" 
> 1
> FAIL: gcc.dg

Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

2023-11-28 Thread Richard Sandiford via Gcc
Richard Earnshaw  writes:
> On 28/11/2023 12:52, Surya Kumari Jangala wrote:
>> Hi Richard,
>> Thanks a lot for your response!
>> 
>> Another failure reported by the Linaro CI is as follows :
>> (Note: I am planning to send a separate mail for each failure, as this will 
>> make
>> the discussion easy to track)
>> 
>> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
>> -moverride=tune=none  check-function-bodies dup_x0_m
>> 
>> Expected code:
>> 
>>...
>>add (x[0-9]+), x0, #?1
>>mov (p[0-7])\.b, p15\.b
>>mov z0\.d, \2/m, \1
>>...
>>ret
>> 
>> 
>> Code obtained w/o patch:
>>  addvl   sp, sp, #-1
>>  str p15, [sp]
>>  add x0, x0, 1
>>  mov p3.b, p15.b
>>  mov z0.d, p3/m, x0
>>  ldr p15, [sp]
>>  addvl   sp, sp, #1
>>  ret
>> 
>> Code obtained w/ patch:
>>  addvl   sp, sp, #-1
>>  str p15, [sp]
>>  mov p3.b, p15.b
>>  add x0, x0, 1
>>  mov z0.d, p3/m, x0
>>  ldr p15, [sp]
>>  addvl   sp, sp, #1
>>  ret
>> 
>> As we can see, with the patch, the following two instructions are 
>> interchanged:
>>  add x0, x0, 1
>>  mov p3.b, p15.b
>
> Indeed, both look acceptable results to me, especially given that we 
> don't schedule results at -O1.
>
> There's two ways of fixing this:
> 1) Simply swap the order to what the compiler currently generates (which 
> is a little fragile, since it might flip back someday).
> 2) Write the test as
>
>
> ** (
> **   add (x[0-9]+), x0, #?1
> **   mov (p[0-7])\.b, p15\.b
> **   mov z0\.d, \2/m, \1
> ** |
> **   mov (p[0-7])\.b, p15\.b
> **   add (x[0-9]+), x0, #?1
> **   mov z0\.d, \1/m, \2
> ** )
>
> Note, we need to swap the match names in the third insn to account for 
> the different order of the earlier instructions.
>
> Neither is ideal, but the second is perhaps a little more bomb proof.
>
> I don't really have a strong feeling either way, but perhaps the second 
> is slightly preferable.
>
> Richard S: thoughts?

Yeah, I agree the second is probably better.  The | doesn't reset the
capture numbers, so I think the final instruction needs to be:

**   mov z0\.d, \3/m, \4

Thanks,
Richard

>
> R.
>
>> I believe that this is fine and the test can be modified to allow it to pass 
>> on
>> aarch64. Please let me know what you think.
>> 
>> Regards,
>> Surya
>> 
>> 
>> On 24/11/23 4:18 pm, Richard Earnshaw wrote:
>>>
>>>
>>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
 Hi Richard,
 Ping. Please let me know if the test failure that I mentioned in the mail 
 below can be handled by changing the expected generated code. I am not 
 conversant with arm, and hence would appreciate your help.

 Regards,
 Surya

 On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
> Hi Richard,
> I had submitted a patch for review 
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
> regarding scaling save/restore costs of callee save registers with block
> frequency in the IRA pass (PR111673).
>
> This patch has been approved by VMakarov
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
>
> With this patch, we are seeing performance improvements with spec on x86
> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
>
> I received a mail from Linaro about some failures seen in the CI pipeline 
> with
> this patch. I have analyzed the failures and I wish to discuss the 
> analysis with you.
>
> One failure reported by the Linaro CI is:
>
> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, 
> r[0-9]+, \\[r[0-9]+\\] 2
>
> The diff in the assembly between trunk and patch is:
>
> 93c93
> <   push    {r4, r5}
> ---
>>     push    {fp}
> 95c95
> <   ldrexd  r4, r5, [r0]
> ---
>>     ldrexd  fp, ip, [r0]
> 99c99
> <   pop {r4, r5}
> ---
>>     ldr fp, [sp], #4
>
>
> The test fails with patch because the ldrexd insn uses fp & ip registers 
> instead
> of r[0-9]+
>
> But the code produced by patch is better because it is pushing and 
> restoring only
> one register (fp) instead of two registers (r4, r5). Hence, this test can 
> be
> modified to allow it to pass on arm. Please let me know what you think.
>
> If you need more information, please let me know. I will be sending 
> separate mails
> for the other test failures.
>
>>>
>>> Thanks for looking at this.
>>>
>>>
>>> The key part of this test is that the compiler generates LDREXD.  The 
>>> registers used for that are pretty much irrelevant as we don't match them 
>>> to any other operation

Re: Arm assembler crc issue

2023-10-19 Thread Richard Sandiford via Gcc
Iain Sandoe  writes:
> Hi Richard,
>
>
> I am being bitten by a problem that falls out from the code that emits
>
>   .arch Armv8.n-a+crc
>
> when the arch is less than Armv8-r.
> The code that does this,  in gcc/common/config/aarch64 is quite recent 
> (2022-09).

Heh.  A workaround for one assembler bug triggers another assembler bug.

The special treatment of CRC is much older than 2022-09 though.  I think
it dates back to 04a99ebecee885e42e56b6e0c832570e2a91c196 (2016-04),
with 4ca82fc9f86fc1187ee112e3a637cb3ca5d2ef2a providing the more
complete explanation.

>
> --
>
> (I admit the permutations are complex and I might have miss-analyzed) - but 
> it appears that llvm assembler (for mach-o, at least) sees an explict mention 
> of an attribute for a feature which is mandatory at a specified arch level as 
> demoting that arch to the minimum that made the explicit feature mandatory.  
> Of course, it could just be a bug in the handling of transitive feature 
> enables...
>
> the problem is that, for example:
>
>   .arch Armv8.4-a+crc
>
> no longer recognises fp16 insns. (and appending +fp16 does not fix this).
>
> 
>
> Even if upstream LLVM is deemed to be buggy (it does not do what I would 
> expect, at least), and fixed - I will still have a bunch of assembler 
> versions that are broken (before the fix percolates through to downstream 
> xcode) - and the LLVM assembler is the only current option for Darwin.
>
> So, it seems that this ought to be a reasonable configure test:
>
>   .arch armv8.2-a
>   .text
> m:
>   crc32b w0, w1, w2 
>
> and then emit HAS_GAS_AARCH64_CRC_BUG (for example) if that fails to assemble 
> which can be used to make the +crc emit conditional on a broken assembler.

AIUI the problem was in the CPU descriptions, so I don't think this
would test for the old gas bug that is being worked around.

Perhaps instead we could have a configure test for the bug that you've
found, and disable the crc workaround if so?

Thanks,
Richard

>
> - I am asking here before constructing the patch, in case there’s some reason 
> that doing this at configure time is not acceptable.
>
> thanks
> Iain


Re: ipa-inline & what TARGET_CAN_INLINE_P can assume

2023-09-25 Thread Richard Sandiford via Gcc
Andrew Pinski  writes:
> On Mon, Sep 25, 2023 at 10:16 AM Richard Sandiford via Gcc
>  wrote:
>>
>> Hi,
>>
>> I have a couple of questions about what TARGET_CAN_INLINE_P is
>> alllowed to assume when called from ipa-inline.  (Callers from the
>> front-end don't matter for the moment.)
>>
>> I'm working on an extension where a function F1 without attribute A
>> can't be inlined into a function F2 with attribute A.  That part is
>> easy and standard.
>>
>> But it's expected that many functions won't have attribute A,
>> even if they could.  So we'd like to detect automatically whether
>> F1's implementation is compatible with attribute A.  This is something
>> we can do by scanning the gimple code.
>>
>> However, even if we detect that F1's code is compatible with attribute A,
>> we don't want to add attribute A to F1 itself because (a) it would change
>> F1's ABI and (b) it would restrict the optimisation of any non-inlined
>> copy of F1.  So this is a test for inlining only.
>>
>> TARGET_CAN_INLINE_P (F2, F1) can check whether F1's current code
>> is compatible with attribute A.  But:
>>
>> (a) Is it safe to assume (going forward) that F1 won't change before
>> it is inlined into F2?  Specifically, is it safe to assume that
>> nothing will be inlined into F1 between the call to TARGET_CAN_INLINE_P
>> and the inlining of F1 into F2?
>>
>> (b) For compile-time reasons, I'd like to cache the result in
>> machine_function.  The cache would be a three-state:
>>
>> - not tested
>> - compatible with A
>> - incompatible with A
>>
>> The cache would be reset to "not tested" whenever TARGET_CAN_INLINE_P
>> is called with F1 as the *caller* rather than the callee.  The idea
>> is to handle cases where something is inlined into F1 after F1 has
>> been inlined into F2.  (This would include calls from the main
>> inlining pass, after the early pass has finished.)
>>
>> Is resetting the cache in this way sufficient?  Or should we have a
>> new interface for this?
>>
>> Sorry for the long question :)  I have something that seems to work,
>> but I'm not sure whether it's misusing the interface.
>
>
> The rs6000 backend has a similar issue and defined the following
> target hooks which seems exactly what you need in this case
> TARGET_NEED_IPA_FN_TARGET_INFO
> TARGET_UPDATE_IPA_FN_TARGET_INFO
>
> And then use that information in can_inline_p target hook to mask off
> the ISA bits:
>   unsigned int info = ipa_fn_summaries->get (callee_node)->target_info;
>   if ((info & RS6000_FN_TARGET_INFO_HTM) == 0)
> {
>   callee_isa &= ~OPTION_MASK_HTM;
>   explicit_isa &= ~OPTION_MASK_HTM;
> }

Thanks!  Like you say, it looks like a perfect fit.

The optimisation of having TARGET_UPDATE_IPA_FN_TARGET_INFO return false
to stop further analysis probably won't trigger for this use case.
I need to track two conditions and the second one is very rare.
But that's still going to be much better than potentially scanning
the same (inlined) stmts multiple times.

Richard


ipa-inline & what TARGET_CAN_INLINE_P can assume

2023-09-25 Thread Richard Sandiford via Gcc
Hi,

I have a couple of questions about what TARGET_CAN_INLINE_P is
alllowed to assume when called from ipa-inline.  (Callers from the
front-end don't matter for the moment.)

I'm working on an extension where a function F1 without attribute A
can't be inlined into a function F2 with attribute A.  That part is
easy and standard.

But it's expected that many functions won't have attribute A,
even if they could.  So we'd like to detect automatically whether
F1's implementation is compatible with attribute A.  This is something
we can do by scanning the gimple code.

However, even if we detect that F1's code is compatible with attribute A,
we don't want to add attribute A to F1 itself because (a) it would change
F1's ABI and (b) it would restrict the optimisation of any non-inlined
copy of F1.  So this is a test for inlining only.

TARGET_CAN_INLINE_P (F2, F1) can check whether F1's current code
is compatible with attribute A.  But:

(a) Is it safe to assume (going forward) that F1 won't change before
it is inlined into F2?  Specifically, is it safe to assume that
nothing will be inlined into F1 between the call to TARGET_CAN_INLINE_P
and the inlining of F1 into F2?

(b) For compile-time reasons, I'd like to cache the result in
machine_function.  The cache would be a three-state:

- not tested
- compatible with A
- incompatible with A

The cache would be reset to "not tested" whenever TARGET_CAN_INLINE_P
is called with F1 as the *caller* rather than the callee.  The idea
is to handle cases where something is inlined into F1 after F1 has
been inlined into F2.  (This would include calls from the main
inlining pass, after the early pass has finished.)

Is resetting the cache in this way sufficient?  Or should we have a
new interface for this?

Sorry for the long question :)  I have something that seems to work,
but I'm not sure whether it's misusing the interface.

Thanks,
Richard


Re: Question on aarch64 prologue code.

2023-09-06 Thread Richard Sandiford via Gcc
Iain Sandoe  writes:
> Hi Folks,
>
> On the Darwin aarch64 port, we have a number of cleanup test fails (pretty 
> much corresponding to the [still open] 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39244).  However, let’s assume 
> that bug could be a red herring..
>
> the underlying reason is missing CFI for the set of the FP which [with 
> Darwin’s LLVM libunwind impl.] breaks the unwind through the function that 
> triggers a signal.

Just curious, do you have more details about why that is?  If the unwinder
is sophisticated enough to process CFI, it seems odd that it requires the
CFA to be defined in terms of the frame pointer.
>
> ———
>
> taking one of the functions in cleanup-8.C (say fn1) which contains calls.
>
> what I am seeing is something like:
>
> __ZL3fn1v:
> LFB28:
> ; BLOCK 2, count:1073741824 (estimated locally) seq:0
> ; PRED: ENTRY [always]  count:1073741824 (estimated locally, freq 1.) 
> (FALLTHRU)
>   stp x29, x30, [sp, -32]!
> // LCFI; or .cfi_xxx is present
>   mov x29, sp
> // *** NO  LCFI (or .cfi_cfa_ when that is enabled)
>   str x19, [sp, 16]
> // LCFI / .cfi_ is present.
>   adrpx19, __ZL3fn4i@PAGE
>   add x19, x19, __ZL3fn4i@PAGEOFF;momd
>   mov x1, x19
>   mov w0, 11
>   bl  _signal
> 
>
> ———
>
> The reason seems to be that, in expand_prolog, emit_frame_chain is true (as 
> we would expect, given that this function makes calls).  However 
> ‘frame_pointer_needed' is false, so that the call to aarch64_add_offset() 
> [line aarch64.cc:10405] does not add CFA adjustments to the load of x29.

Right.

> ———
>
> I have currently worked around this by defining a 
> TARGET_FRAME_POINTER_REQUIRED which returns true unless the function is a 
> leaf (if that’s the correct solution, then all is fine).

I suppose it depends on why the frame-pointer-based CFA is important
for Darwin.  If it's due to a more general requirement for a frame
pointer to be used, then yeah, that's probably the right fix.  If it's
more a quirk of the unwinder. then we could probably expose whatever
that quirk is as a new status bit.  Target-independent code in
dwarf2cfi.cc would then need to be aware as well.

> ———
>
> However, it does seem odd that the existing code sets up the FP, but never 
> produces any CFA for it.
>
> So is this a possible bug, or just that I misunderstand the relevant set of 
> circumstances?

emit_frame_chain fulfills an ABI requirement that every non-leaf function
set up a frame-chain record.  When emit_frame_chain && !frame_pointer_needed,
we set up the FP for ABI purposes only.  GCC can still access everything
relative to the stack pointer, and it can still describe the CFI based
purely on the stack pointer.

glibc-based systems only need the CFA to be based on the frame pointer
if the stack pointer moves during the body of the function (usually due
to alloca or VLAs).

Thanks,
Richard


Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Sandiford via Gcc
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richi.
>
> I am trying to figure out how to adjust finish_cost to lower the LMUL
>
> For example:
>
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
>
> preferred_simd_mode pick LMUL = 8 (RVVM8SImode)

But is the LMUL decided by the mode?  Like Richard says, the vectoriser
already provides a way of trying vectorisation with different modes and
picking the best one, via autovectorize_vector_modes, VECT_COMPARE_COST,
and the cost structures.  preferred_simd_mode then just picks the first
mode to try -- the choide isn't final.

The idea is that you get to see what vectorisation looks like with
multiple mode choices, and can pick the best one.

It's not clear from your reply whether you've tried that or not.

Thanks,
Richard


Re: Hundreds of gcc.dg/guality failures on both 14 and 13.1 branches

2023-07-16 Thread Richard Sandiford via Gcc
Martin Jambor  writes:
> Hi,
>
> On Sat, Jul 15 2023, FX Coudert via Gcc wrote:
>> Hi,
>>
>> I am finding it very hard to reliably compare test results and regressions 
>> with the very large number of gcc.dg/guality test failures that are 
>> apparently the new norm on x86_64-linux: more than one hundred on 13.1, and 
>> several hundreds on 14. Is there any on-going discussion about this?
>>
>> I mean, from an almost-external point of view, these tests should probably 
>> be xfail'ed and a PR opened against them to reenable them.
>>
>
> As far as I understand it, the main problem is that it is not really
> possible to XFAIL a test for one combination of options (say "-O2
> -flto") and not others.

FWIW, we do do that for aarch64*.  The principle is that if anyone
wants to work on improving debug quality, the XFAILs in the testsuite
are just as good a reference as bugzilla (unless the bugzilla PR has
specific analysis).

But it is still dependent on the default configuration and on the version
of GDB used.  It gives zero failures on my set-up, and seems to do the
same for some other gcc-testresults@ emails I've seen.  But it probably
won't for everyone.

I tend to update the XFAIL lists in stage 3/4, when no major changes
are expected.

Having the XFAILs doesn't really change the workflow.  It's still
necessary to do a before-after comparison.  But having fewer XPASSes &
FAILs makes it less likely that something gets missed.  It might even
encourage people to take guality regressions more seriously :)  Before,
there were so many XPASSes and FAILs in the "before" log that it was
easy think "well, what's one more?" or "this thing looks generally
broken, so I'll just ignore it".

Richard


Re: Taking Over MIPS Maintenance

2023-05-18 Thread Richard Sandiford via Gcc
YunQiang Su  writes:
> Greetings all,
>
> I would like to self-nominate as the new GCC MIPS maintainer. Matthew Fortune 
> is listed in MAINTAINERS as the current maintainer of GCC's MIPS Port. 
> However, it has been years since he left MIPS Technologies and had since been 
> inactive.
>
> I currently work for CIP United Inc., which is the exclusive licensor and 
> operator of MIPS IPs in China, Hong Kong, and Macau. Part of our operations 
> include maintaining open source software solutions for MIPS and we are 
> looking to continue maintaining GCC's MIPS port. As the director of the 
> company's software ecosystem department, I have been working with GCC and 
> contributed code to the upstream repository since 2021. In September 2021, I 
> was given write access to the repository:
>
> https://gcc.gnu.org/git/?p=gcc.git&a=search&h=HEAD&st=author&s=YunQiang+Su
>
> Please let me know about your thoughts on this proposal.

FWIW, I'd support this.  The MIPS port has been unmaintained for
many years now.  As the previous maintainer before Matthew, I've
tried to cover the area a bit.  But

(a) It's now close to 15 years since I did any meaningful MIPS work,
so I've forgotten a great deal.

(b) Most new work will be specific to MIPSr6, which I have never used.

(c) It's been very difficult to find the time.

It would be more usual to wait a bit longer until someone becomes
maintainer.  But IMO that's only sensible when there's an existing
maintainer to cover the interim.

Thanks,
Richard


Re: Default initialization of poly-ints

2023-01-12 Thread Richard Sandiford via Gcc
Jeff Law via Gcc  writes:
> On 1/3/23 04:16, Florian Weimer via Gcc wrote:
>> It seems that the default constructor of the non-POD poly-ints does
>> nothing.  Is this intentional?  I expected zero initialization, to match
>> regular ints.
> I think it was intentional.  Richard Sandiford would know for sure.  But 
> Martin Sebor might know as well since I think they discussed it at 
> length a little while back.

Yeah, it was intentional, to try to get -Wunitialized warnings.
But it predates C++11 being required, so = default would probably
be a better choice now.

Thanks,
Richard


Re: RFC: Make builtin types only valid for some target features

2022-12-04 Thread Richard Sandiford via Gcc
"Kewen.Lin"  writes:
> Hi,
>
> I'm working to find one solution for PR106736, which requires us to
> make some built-in types only valid for some target features, and
> emit error messages for the types when the condition isn't satisfied.  
> A straightforward idea is to guard the registry of built-in type under
> the corresponding target feature.  But as Peter pointed out in the
> PR, it doesn't work, as these built-in types are used by some built-in
> functions which are required to be initialized always regardless of
> target features, in order to support target pragma/attribute.  For
> the validity checking on the built-in functions, it happens during
> expanding the built-in functions calls, since till then we already
> know the exact information on specific target feature.
>
> One idea is to support built-in type checking in a similar way, to
> check if all used type_decl (built-in type) are valid or not somewhere.
> I hacked to simply check currently expanding gimple stmt is gassign
> and further check the types of its operands, it did work but checking
> gassign isn't enough.  Maybe I missed something, there seems not an
> efficient way for a full check IMHO.
>
> So I tried another direction, which was inspired by the existing
> attribute altivec, to introduce an artificial type attribute and the
> corresponding macro definition, during attribute handling it can check
> target option node about target feature for validity.  The advantage
> is that the checking happens in FE, so it reports error early, and it
> doesn't need a later full checking on types.  But with some prototyping
> work, I found one issue that it doesn't support param decl well, since
> the handling on attributes of function decl happens after that on
> attributes of param decl, so we aren't able to get exact target feature
> information when handling the attributes on param decl.  It requires
> front-end needs to change the parsing order, I guess it's not acceptable?
> So I planed to give up and return to the previous direction.
>
> Does the former idea sound good?  Any comments/suggestions, and other
> ideas?
>
> Thanks a lot in advance!

FWIW, the aarch64 fp move patterns emit the error directly.  They then
expand an integer-mode move, to provide some error recovery.  (The
alternative would be to make the error fatal.)

(define_expand "mov"
  [(set (match_operand:GPF_TF_F16_MOV 0 "nonimmediate_operand")
(match_operand:GPF_TF_F16_MOV 1 "general_operand"))]
  ""
  {
if (!TARGET_FLOAT)
  {
aarch64_err_no_fpadvsimd (mode);
machine_mode intmode
  = int_mode_for_size (GET_MODE_BITSIZE (mode), 0).require ();
emit_move_insn (gen_lowpart (intmode, operands[0]),
gen_lowpart (intmode, operands[1]));
DONE;
  }

This isn't as user-friendly as catching the error directly in the FE,
but I think in practice it's going to be very hard to trap all invalid
uses of a type there.  Also, the user error in these situations is likely
to be forgetting to enable the right architecture feature, rather than
accidentally using the wrong type.  So an error about missing architecture
features is probably good enough in most cases.

Thanks,
Richard


Re: [RFC] database with API information

2022-09-07 Thread Richard Sandiford via Gcc
Ulrich Drepper via Gcc  writes:
> I talked to Jonathan the other day about adding all the C++ library APIs to
> the name hint file now that the size of the table is not really a concern
> anymore.
>
> Jonathan mentioned that he has to create and maintain a similar file for
> the module support.  It needs to list all the exported interfaces and this
> is mostly a superset of the entries in the hint table.
>
> Instead of duplicating the information it should be kept in one place.
> Neither file itself is a natural fit because the additional information
> needed  (e.g., the standard version information for the name hint table) is
> not needed in the other location.
>
> Hence, let's use a simple database, a CSV file for simplicity, and generate
> both files from this.  Easily done, I have an appropriate script and a CSV
> file with the information of both Jonathan's current export file and the
> current state of the name hint table.
>
> The only detail that keeps me from submitting this right now is the way the
> script is implemented.  This is just a natural fit for a Python script.
> The default installation comes with a csv module and there are nice ways to
> adjust and output boilerplate headers like those needed in those files.
>
> It would be possible to create separate awk scripts (there is only one
> Python script) but it'll be rather ugly and harder to maintain than the
> Python version.
>
> Of course the problem is: I don't think that there is yet any maintainer
> tool written in Python (except some release engineering tools).  The
> question is therefore: is it time to lift this restriction?  I cannot today
> imagine any machine capable of serving a gcc developer which doesn't also
> have a Python implementation.  As long as there is no dependency on exotic
> modules I doubt that anything will break.

FWIW, I agree it's past time to lift the no-Python restriction,
and that Python is a natural fit for stuff like this.

Thanks,
Richard


Re: [PATCH 0/3] picolibc: Add picolibc linking help

2022-09-02 Thread Richard Sandiford via Gcc
Keith Packard via Gcc-patches  writes:
> Picolibc is a C library for embedded systems based on code from newlib
> and avr libc. To connect some system-dependent picolibc functions
> (like stdio) to an underlying platform, the platform may provide an OS
> library.
>
> This OS library must follow the C library in the link command line. In
> current picolibc, that is done by providing an alternate .specs file
> which can rewrite the *lib spec to insert the OS library in the right
> spot.
>
> This patch series adds the ability to specify the OS library on the
> gcc command line when GCC is configured to us picolibc as the default
> C library, and then hooks that up for arm, nds32, riscv and sh targets.

Not really my area, but the approach LGTM FWIW.  Main question/points:

- In:

  +case "${with_default_libc}" in
  +glibc)
  +default_libc=LIBC_GLIBC
  +;;

  should there be a default case that raises an error for unrecognised
  libcs?  Command-line checking for configure isn't very tight, but we
  do raise similar errors for things like invalid --enable-threads values.

- I'm not sure either way about adding LIBC_NEWLIB.  On the one hand
  it makes sense for completeness, but on the other it's write-only.
  Adding it means that --with-default-libc=newlib toolchains have a
  different macro configuration from a default toolchain even in cases
  where newlib is the default.

  On balance I think it would be better to accept
  --with-default-libc=newlib but set default_libc to the empty string.

- Should we raise an error for toolchains that don't support the given
  C library?  It feels like we should, but I realise that could be
  difficult to do.

- Very minor, but in lines like:

  +#if defined(DEFAULT_LIBC) && defined(LIBC_PICOLIBC) && DEFAULT_LIBC == 
LIBC_PICOLIBC

  is LIBC_PICOLIB ever undefined?  It looks like config.gcc provides
  an unconditional definition.  If it is always defined then:

  #if DEFAULT_LIBC == LIBC_PICOLIB

  would be clearer.

Thanks,
Richard

>
> Keith Packard (3):
>   Allow default libc to be specified to configure
>   Add newlib and picolibc as default C library choices
>   Add '--oslib=' option when default C library is picolibc
>
>  gcc/config.gcc| 56 ---
>  gcc/config/arm/elf.h  |  5 
>  gcc/config/nds32/elf.h|  4 +++
>  gcc/config/picolibc.opt   | 26 ++
>  gcc/config/riscv/elf.h|  4 +++
>  gcc/config/sh/embed-elf.h |  5 
>  gcc/configure.ac  |  4 +++
>  7 files changed, 95 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/config/picolibc.opt


Re: try_finally_expr in must_not_throw_expr

2022-08-02 Thread Richard Sandiford via Gcc
Philipp Rimmele via Gcc  writes:
> Hi,
>
> i'm developing a GCC-Plugin. And i don't understand why there is a 
> "try_finally_expr" in a must_not_throw-Area in my AST. It happens in the 
> destructors.
> Here is my AST:
> function_decl Exception::__dt_base
>   1: must_not_throw_expr(->void_type{void})[42]
> 0: statement_list(->void_type{void})
>   0: bind_expr(->void_type{void})[42]
> 1: statement_list(->void_type{void})
>   0: cleanup_point_expr(->void_type{void})[42]
> 0: expr_stmt(->void_type{void})
>   0: convert_expr(->void_type{void})
> 0: 
> modify_expr(->pointer_type->pointer_type{__vtbl_ptr_type}->function_type->integer_type{int})
>   0: 
> component_ref(->pointer_type->pointer_type{__vtbl_ptr_type}->function_type->integer_type{int})
> 0: indirect_ref(->record_type{Exception})
>   0: nop_expr(->pointer_type->record_type{Exception})
> 0: parm_decl(->pointer_type->record_type{Exception}) 
> : this
> 1: 
> field_decl(->pointer_type->pointer_type{__vtbl_ptr_type}->function_type->integer_type{int})
>   1: 
> pointer_plus_expr(->pointer_type->pointer_type{__vtbl_ptr_type}->function_type->integer_type{int})
> 0: 
> addr_expr(->pointer_type->pointer_type{__vtbl_ptr_type}->function_type->integer_type{int})
>   0: 
> var_decl(->array_type->pointer_type{__vtbl_ptr_type}->function_type->integer_type{int})
>  : _ZTV9Exception
> 1: integer_cst : 16 : 1
>   0: try_finally(->void_type{void})[42]
> 0: statement_list(->void_type{void})
> 1: modify_expr(->void_type{void})
>   0: indirect_ref(->record_type)
> 0: nop_expr(->reference_type->record_type)
>   0: parm_decl(->pointer_type->record_type{Exception}) : this
>   1: constructor(->record_type)
> 2: block
>   0: label_expr(->void_type{void})[42]
> 0: label_decl(->void_type{void}) : 
>
> What is the reason for this? There should no Exception be thrown, so why 
> handle it with a try_finally-Expression? I'm currently using GCC-8.2.0.
> I would be realy glad if you could answer me this question. And if you can 
> give me some examples, where the try_finally-expression is also used, it 
> would be realy helpfull.

I'm not an expert on this by any means, but since no-one else has
replied yet... I suspect it's simpler to use try_finally whenever
something needs to be run at the end of a scope, regardless of whether
the scope ends through fallthrough, breaking, continuing, or exceptions.

To put it another way: try_finally at this stage doesn't guarantee
that exception handling will actually be needed.  For example:

  try
{
  int i = 1;
  int j = 2;
  if (i == j)
foo ();
}
  finally ...

starts out with a potentially-throwing call to foo, but it (and the
possibility of an exception) will get optimised away later.  It probably
didn't seem worthwhile having the frontend do a similar but separate
analysis of whether statements might throw.

Thanks,
Richard




Re: Help with an ABI peculiarity

2022-01-21 Thread Richard Sandiford via Gcc
Iain Sandoe  writes:
> Hi Richard,
>> On 20 Jan 2022, at 22:32, Richard Sandiford  
>> wrot>> Iain Sandoe  writes:
 On 10 Jan 2022, at 10:46, Richard Sandiford  
 wrot>> An alternative might be to make promote_function_arg a “proper”
 ABI hook, taking a cumulative_args_t and a function_arg_info.
 Perhaps the return case should become a separate hook at the
 same time.
 
 That would probably require more extensive changes than just
 updating the call sites, and I haven't really checked how much
 work it would be, but hopefully it wouldn't be too bad.
 
 The new hook would still be called before function_arg, but that
 should no longer be a problem, since the new hook arguments would
 give the target the information it needs to decide whether the
 argument is passed in registers.
>>> 
>>> Yeah, this was my next port of call (I have looked at it ~10 times and then
>>> decided “not today, maybe there’s a simpler way”).
>
> … and I did not have a chance to look at this in the meantime …
>
>> BTW, finally catching up on old email, I see this is essentially also
>> the approach that Maxim was taking with the TARGET_FUNCTION_ARG_BOUNDARY
>> patches.  What's the situation with those? 
>
> I have the patches plus amendments to make use of their new functionality on 
> the
> development branch, which is actually in pretty good shape (not much 
> difference
> in testsuite results from other Darwin sub-ports).
>
> Maxim and I need to discuss amending the TARGET_FUNCTION_ARG_BOUNDARY
> changes to account for Richard (B)’s comments.
>
> Likewise, I need to tweak the support for heap allocation of nested function 
> trampolines
> to account for review comments.

Sounds great.

> As always, it’s a question of fitting everything in…

Yeah :-)  The question probably sounded pushier than it was meant to,
sorry.  I just wanted to check that you or Maxim weren't still waiting
on reviews.

Richard


Re: Help with an ABI peculiarity

2022-01-20 Thread Richard Sandiford via Gcc
Iain Sandoe  writes:
>> On 10 Jan 2022, at 10:46, Richard Sandiford  
>> wrot>> An alternative might be to make promote_function_arg a “proper”
>> ABI hook, taking a cumulative_args_t and a function_arg_info.
>> Perhaps the return case should become a separate hook at the
>> same time.
>> 
>> That would probably require more extensive changes than just
>> updating the call sites, and I haven't really checked how much
>> work it would be, but hopefully it wouldn't be too bad.
>> 
>> The new hook would still be called before function_arg, but that
>> should no longer be a problem, since the new hook arguments would
>> give the target the information it needs to decide whether the
>> argument is passed in registers.
>
> Yeah, this was my next port of call (I have looked at it ~10 times and then
> decided “not today, maybe there’s a simpler way”).

BTW, finally catching up on old email, I see this is essentially also
the approach that Maxim was taking with the TARGET_FUNCTION_ARG_BOUNDARY
patches.  What's the situation with those?  Sorry for not responding
to them earlier.

Thanks,
Richard


Re: How to generate a call inst. sequence?

2022-01-19 Thread Richard Sandiford via Gcc
Andras Tantos  writes:
> All,
>
> I'm working on porting GCC to a processor architecture that doesn't have 
> a (HW) stack nor a call instruction. This means that for calls, I need 
> to generate the following instruction sequence:
>
>      // move stack-pointer:
>      $sp <- $sp-4
>      // load return address:
>      $r3 <- return_label
>      // store return address on stack:
>      mem[$sp] <- $r3
>      // jump to callee:
>      $pc <- 

Even though this is internally a jump, it still needs to be represented
as a (call …) rtx in rtl, and emitted using emit_call_insn.

In other words, the "call" expander must always emit a call_insn
of some kind.  (But it can emit other instructions too, such as the
ones you describe above.)

Richard

>    return_label:
>
> Now, I can do all of that as a multi-instruction string sequence in my 
> .md file (which is what I'm doing right now), but there are two problems 
> with that approach. First, it hard-codes the temp register ($r3 above) 
> and requires me to reserve it even though it could be used between calls 
> by the register allocator. Second this approach (I think at least) 
> prevents any passes from merging stack-frame preparation for the call 
> arguments, such as eliminating the stack-pointer update above.
>
> I thought I could circumvent these problems by emitting a piece of RTL 
> in the 'call' pattern:
>
>    (define_expand "call"
>      [(call
>    (match_operand:QI 0 "memory_operand" "")
>    (match_operand 1 "" "")
>      )]
>      ""
>    {
>      brew_expand_call(Pmode, operands);
>    })
>
> where brew_expand_call is:
>
>    void brew_expand_call(machine_mode mode, rtx *operands)
>    {
>      gcc_assert (MEM_P(operands[0]));
>
>      rtx_code_label *label = gen_label_rtx();
>      rtx label_ref = gen_rtx_LABEL_REF(SImode, label);
>      rtx temp_reg = gen_reg_rtx(mode);
>
>      // $sp <- $sp - 4
>      emit_insn(gen_subsi3(
>    stack_pointer_rtx,
>    stack_pointer_rtx,
>    GEN_INT(4)
>      ));
>      // $r3 <- 
>      emit_insn(gen_move_insn(
>    temp_reg,
>    label_ref
>      ));
>      // mem[$sp] <- $r3
>      emit_insn(gen_move_insn(
>    gen_rtx_MEM(Pmode, stack_pointer_rtx),
>    temp_reg
>      ));
>      emit_jump_insn(gen_jump(operands[0]));
>      emit_label(label);
>    }
>
> If I try to compile the following test:
>
>    void x(void)
>    {
>    }
>
>    int main(void)
>    {
>      x();
>      return 0;
>    }
>
> I get an assert:
>
>    during RTL pass: expand
>    dump file: call.c.252r.expand
>    call.c: In function ‘main’:
>    call.c:9:1: internal compiler error: in as_a, at is-a.h:242
>    9 | }
>      | ^
>    0x6999b7 rtx_insn* as_a(rtx_def*)
>    ../../brew-gcc/gcc/is-a.h:242
>    0x6999b7 rtx_sequence::insn(int) const
>    ../../brew-gcc/gcc/rtl.h:1439
>    0x6999b7 mark_jump_label_1
>    ../../brew-gcc/gcc/jump.cc:1077
>    0xcfc31f mark_jump_label_1
>    ../../brew-gcc/gcc/jump.cc:1171
>    0xcfc73d mark_all_labels
>    ../../brew-gcc/gcc/jump.cc:332
>    0xcfc73d rebuild_jump_labels_1
>    ../../brew-gcc/gcc/jump.cc:74
>    0x9e8e62 execute
>    ../../brew-gcc/gcc/cfgexpand.cc:6845
>
> The reference dump file:
>
>    ;; Function x (x, funcdef_no=0, decl_uid=1383, cgraph_uid=1, 
> symbol_order=0)
>
>
>    ;; Generating RTL for gimple basic block 2
>
>
>    try_optimize_cfg iteration 1
>
>    Merging block 3 into block 2...
>    Merged blocks 2 and 3.
>    Merged 2 and 3 without moving.
>    Merging block 4 into block 2...
>    Merged blocks 2 and 4.
>    Merged 2 and 4 without moving.
>
>
>    try_optimize_cfg iteration 2
>
>
>
>    ;;
>    ;; Full RTL generated for this function:
>    ;;
>    (note 1 0 3 NOTE_INSN_DELETED)
>    (note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>    (note 2 3 0 2 NOTE_INSN_FUNCTION_BEG)
>
>    ;; Function main (main, funcdef_no=1, decl_uid=1386, cgraph_uid=2, 
> symbol_order=1)
>
>
>    ;; Generating RTL for gimple basic block 2
>
>    ;; Generating RTL for gimple basic block 3
>
>
>
>    EMERGENCY DUMP:
>
>    int main ()
>    {
>    (note 3 1 2 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>    (note 2 3 4 4 NOTE_INSN_FUNCTION_BEG)
>
>    (note 4 2 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>    (insn 5 4 6 2 (set (reg/f:SI 1 $sp)
>    (minus:SI (reg/f:SI 1 $sp)
>    (const_int 4 [0x4]))) "call.c":7:5 -1
>    (nil))
>    (insn 6 5 7 2 (set (reg:SI 25)
>    (label_ref:SI 9)) "call.c":7:5 -1
>    (insn_list:REG_LABEL_OPERAND 9 (nil)))
>    (insn 7 6 8 2 (set (mem:SI (reg/f:SI 1 $sp) [0  S4 A32])
>    (reg:SI 25)) "call.c":7:5 -1
>    (nil))
>    (jump_insn 8 7 9 2 (set (pc)
>    (label_ref (mem:QI (symbol_ref:SI ("x") [flags 0x3] 
> ) [0 x S1 A8]))) "call.c":7:5 -1
>    (nil))
>    (code_label 9 8 10 2 3 (nil) [1 uses])
>    (call_insn 10 9 11 2 (call (mem:QI (symbol_ref:SI ("x") [flags 0x3]  
> ) [0 x S1 A8])
>    (const_int 16 [0x10])) "call.c":7:5 -1
>    (nil)
>

Re: Help with an ABI peculiarity

2022-01-10 Thread Richard Sandiford via Gcc
Iain Sandoe  writes:
> Hi Folks,
>
> In the aarch64 Darwin ABI we have an unusual (OK, several unusual) feature of 
> the calling convention.
>
> When an argument is passed *in a register* and it is integral and less than 
> SI it is promoted (with appropriate signedness) to SI.  This applies when the 
> function parm is named only.
>
> When the same argument would be placed on the stack (i.e. we ran out of 
> registers) - it occupies its natural size, and is naturally aligned (so, for 
> instance, 3 QI values could be passed as 3 registers - promoted to SI .. or 
> packed into three adjacent bytes on the stack)..
>
> The key is that we need to know that the argument will be placed in a 
> register before we decide whether to promote it.
> (similarly, the promotion is not done in the callee for the in-register case).
>
> I am trying to figure out where to implement this.
>
> * the code that (in regular cases) decides on such promotions is called 
> _before_ we call target’s function_arg.
>
> * OVERRIDE_ABI_FORMAT seems to be called too early (we don’t have enough 
> information on the function - to decide to set the PARM passed-as type).
>
> I’ve experimented with various schemes - specifically that  tm.function_arg 
> can alter the mode of the register in the appropriate cases, and then calls.c 
> can act on the case that the mode has been changed by that callback.
>
> It seems probable that this approach can be made non-invasive - but...
> ... if someone can point me at a better solution - I’m interested.

I agree there doesn't seem to be an out-of-the-box way of doing this.
I'm not sure about having two different ways of specifying promotion
though.  (For one thing, it should be possible to query promotion
without generating “garbage” rtl.)

An alternative might be to make promote_function_arg a “proper”
ABI hook, taking a cumulative_args_t and a function_arg_info.
Perhaps the return case should become a separate hook at the
same time.

That would probably require more extensive changes than just
updating the call sites, and I haven't really checked how much
work it would be, but hopefully it wouldn't be too bad.

The new hook would still be called before function_arg, but that
should no longer be a problem, since the new hook arguments would
give the target the information it needs to decide whether the
argument is passed in registers.

Thanks,
Richard


Re: Mass rename of C++ .c files to .cc suffix?

2022-01-07 Thread Richard Sandiford via Gcc
Martin Jambor  writes:
> Hi,
>
> Would anyone be terribly against mass renaming all *.c files (that are
> actually C++ files) within the gcc subdirectory to ones with .cc suffix?
>
> We already have 47 files with suffix .cc directly in the gcc
> subdirectory and 160 if we also count those in (non-testsuite)
> subdirectories, while the majority of our non-header C++ files still has
> the .c suffix.
>
> I have already missed stuff when grepping because I did not include *.cc
> files and the inconsistency is also just ugly and must be very confusing
> to anyone who encounters it for the first time.
>
> Since we have switched to git, this should have quite small effect on
> anyone who does their development on branches.  With Martin Liška we did
> a few experiments and git blame, git rebase and even git gcc-backport
> worked seamlessly across a rename.
>
> I would be fine waiting with it until GCC 12 gets released but see
> little value in doing so.
>
> What do others think?  (Any important caveats I might have missed?)

+1 in favour FWIW.  And I agree we might as well do it now.  It seems
likely to be less disruptive than waiting to GCC 12, since at that point
there's going to be more bug fixes that need to be applied to both trunk
and the new branch, as well as the unleashed stage 1 patches.

Thanks,
Richard


Re: Some libgcc headers are missing the runtime exception

2021-07-09 Thread Richard Sandiford via Gcc
David Edelsohn  writes:
> On Fri, Jul 9, 2021 at 12:53 PM Richard Sandiford via Gcc
>  wrote:
>>
>> Hi,
>>
>> It was pointed out to me off-list that config/aarch64/value-unwind.h
>> is missing the runtime exception.  It looks like a few other files
>> are too; a fuller list is:
>>
>> libgcc/config/aarch64/value-unwind.h
>> libgcc/config/frv/frv-abi.h
>> libgcc/config/i386/value-unwind.h
>> libgcc/config/pa/pa64-hpux-lib.h
>>
>> Certainly for the aarch64 file this was simply a mistake;
>> it seems to have been copied from the i386 version, both of which
>> reference the runtime exception but don't actually include it.
>>
>> What's the procedure for fixing this?  Can we treat it as a textual
>> error or do the files need to be formally relicensed?
>
> I'm unsure what you mean by "formally relicensed".

It seemed like there were two possibilities: the licence of the files
is actually GPL + exception despite what the text says (the textual
error case), or the licence of the files is plain GPL because the text
has said so since the introduction of the files.  In the latter case
I'd have imagined that someone would need to relicense the code so
that it is GPL + exception.

> It generally is considered a textual omission.  The runtime library
> components of GCC are intended to be licensed under the runtime
> exception, which was granted and approved at the time of introduction.

OK, thanks.  So would a patch to fix at least the i386 and aarch64 header
files be acceptable?  (I'm happy to fix the other two as well if that's
definitely the right thing to do.  It's just that there's more history
involved there…)

Richard


Some libgcc headers are missing the runtime exception

2021-07-09 Thread Richard Sandiford via Gcc
Hi,

It was pointed out to me off-list that config/aarch64/value-unwind.h
is missing the runtime exception.  It looks like a few other files
are too; a fuller list is:

libgcc/config/aarch64/value-unwind.h
libgcc/config/frv/frv-abi.h
libgcc/config/i386/value-unwind.h
libgcc/config/pa/pa64-hpux-lib.h

Certainly for the aarch64 file this was simply a mistake;
it seems to have been copied from the i386 version, both of which
reference the runtime exception but don't actually include it.

What's the procedure for fixing this?  Can we treat it as a textual
error or do the files need to be formally relicensed?

Thanks,
Richard


Re: [PATCH] Port GCC documentation to Sphinx

2021-07-05 Thread Richard Sandiford via Gcc
Eli Zaretskii  writes:
>> Hans-Peter Nilsson  writes:
>> > I've read the discussion downthread, but I seem to miss (a recap
>> > of) the benefits of moving to Sphinx.  Maybe other have too and
>> > it'd be a good idea to repeat them?  Otherwise, the impression
>> > is not so good, as all I see is bits here and there getting lost
>> > in translation.
>> 
>> Better cross-referencing is one big feature.
>
> See below: the Info format has some features in addition to
> cross-references that can make this a much smaller issue.  HTML has
> just the cross-references, so "when you are a hammer, every problem
> looks like a nail".
>
>> IMO this subthread has demonstrated why the limitations of info
>> formatting have held back the amount of cross-referencing in the
>> online html.
>
> I disagree with this conclusion, see below.
>
>> (And based on emperical evidence, I get the impression that far more
>> people use the online html docs than the info docs.)
>
> HTML browsers currently lack some features that make Info the format
> of choice for me when I need to use the documentation efficiently.
> The most important feature I miss in HTML browsers is the index
> search.  A good manual usually has extensive index (or indices) which
> make it very easy to find a specific topic one is looking for,
> i.e. use the manual as a reference (as opposed as a first-time
> reading, when you read large portions of the manual in sequence).
>
> Another important feature is regexp search across multiple sections
> (with HTML you'd be forced to download the manual as a single large
> file for that, and then you'll probably miss regexps).
>
> Yet another feature which, when needed, is something to kill for, is
> the "info apropos" command, which can search all the manuals on your
> system and build a menu from the matching sections found in different
> manuals.  And there are a few more.
>
> (Texinfo folks are working on JavaScript code to add some missing
> capabilities to Web browsers, but that effort is not yet complete.)

Whether info or HTML is the better format isn't the issue though.  The
point is that we do have HTML output that is (emperically) widely used.
And at the moment it isn't as good as it could be.

The question that I was replying to was: what is the benefit of moving
to Sphinx?  And one of the answers is that it improves the HTML output.

>> E.g. quoting from Richard's recent patch:
>> 
>>   @item -fmove-loop-stores
>>   @opindex fmove-loop-stores
>>   Enables the loop store motion pass in the GIMPLE loop optimizer.  This
>>   moves invariant stores to after the end of the loop in exchange for
>>   carrying the stored value in a register across the iteration.
>>   Note for this option to have an effect @option{-ftree-loop-im} has to 
>>   be enabled as well.  Enabled at level @option{-O1} and higher, except 
>>   for @option{-Og}.
>> 
>> In the online docs, this will just be plain text.  Anyone who doesn't
>> know what -ftree-loop-im is will have to search for it manually.
>
> First, even if there are no cross-references, manual search is not the
> best way.  It is much easier to use index-search:
>
>   i ftree TAB
>
> will display a list of options that you could be after, and you can
> simply choose from the list, or type a bit more until you have a
> single match.

Here too I was talking about this being plain text in the online docs,
i.e. in the HTML.

In HTML the user-friendly way of letting users answer the question
“what on earth is -ftree-loop-im” is to have “-ftree-loop-im” be a
hyperlink that goes straight to the documentation of the option.
Same for PDF when viewed digitally.

One of the things that the move to Sphinx does is give us those
hyperlinks.

> Moreover, adding cross-references is easy:
>
>   @item -fmove-loop-stores
>   @opindex fmove-loop-stores
>   Enables the loop store motion pass in the GIMPLE loop optimizer.  This
>   moves invariant stores to after the end of the loop in exchange for
>   carrying the stored value in a register across the iteration.
>   Note for this option to have an effect @option{-ftree-loop-im}
>   (@pxref{Optimize Options, -ftree-loop-im}) 
>   ^^
>   has be enabled as well.  Enabled at level @option{-O1} and higher,
>   except for @option{-Og}.
>
> If this looks like too much work, a simple Texinfo macro (two, if you
> want an anchor where you point) will do.

But this would be redundant in HTML: “foo (see foo)”.

Also, the benefit of hyperlinks in HTML (not info) is that they can be
used outside of prose, such as in lists, without interrupting the flow.

>> Adding the extra references to the html (and pdf) output but dropping
>> them from the info sounds like a good compromise.
>
> But that's not what happens.

Not in the original patch, sure, but it's what I think Martin was
suggesting as a compromise (maybe I misunderstood).  The comment above
was supposed to be in support of doing that.

It sounds like those who use the i

Re: [PATCH] Port GCC documentation to Sphinx

2021-07-05 Thread Richard Sandiford via Gcc
Hans-Peter Nilsson  writes:
> I've read the discussion downthread, but I seem to miss (a recap
> of) the benefits of moving to Sphinx.  Maybe other have too and
> it'd be a good idea to repeat them?  Otherwise, the impression
> is not so good, as all I see is bits here and there getting lost
> in translation.

Better cross-referencing is one big feature.  IMO this subthread has
demonstrated why the limitations of info formatting have held back
the amount of cross-referencing in the online html.  (And based on
emperical evidence, I get the impression that far more people use
the online html docs than the info docs.)

E.g. quoting from Richard's recent patch:

  @item -fmove-loop-stores
  @opindex fmove-loop-stores
  Enables the loop store motion pass in the GIMPLE loop optimizer.  This
  moves invariant stores to after the end of the loop in exchange for
  carrying the stored value in a register across the iteration.
  Note for this option to have an effect @option{-ftree-loop-im} has to 
  be enabled as well.  Enabled at level @option{-O1} and higher, except 
  for @option{-Og}.

In the online docs, this will just be plain text.  Anyone who doesn't
know what -ftree-loop-im is will have to search for it manually.

Adding the extra references to the html (and pdf) output but dropping
them from the info sounds like a good compromise.

Thanks,
Richard


Re: RFC: New mechanism for hard reg operands to inline asm

2021-06-07 Thread Richard Sandiford via Gcc
Paul Koning via Gcc  writes:
>> On Jun 4, 2021, at 2:02 PM, Andreas Krebbel via Gcc  wrote:
>> 
>> Hi,
>> 
>> I wonder if we could replace the register asm construct for
>> inline assemblies with something a bit nicer and more obvious.
>> E.g. turning this (real world example from IBM Z kernel code):
>> 
>> int diag8_response(int cmdlen, char *response, int *rlen)
>> {
>>register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
>>register unsigned long reg3 asm ("3") = (addr_t) response;
>>register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
>>register unsigned long reg5 asm ("5") = *rlen; /* <-- */
>>asm volatile(
>>"   diag%2,%0,0x8\n"
>>"   brc 8,1f\n"
>>"   agr %1,%4\n"
>>"1:\n"
>>: "+d" (reg4), "+d" (reg5)
>>: "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
>>*rlen = reg5;
>>return reg4;
>> }
>> 
>> into this:
>> 
>> int diag8_response(int cmdlen, char *response, int *rlen)
>> {
>>unsigned long len = cmdlen | 0x4000L;
>> 
>>asm volatile(
>>"   diag%2,%0,0x8\n"
>>"   brc 8,1f\n"
>>"   agr %1,%4\n"
>>"1:\n"
>>: "+{r4}" (len), "+{r5}" (*rlen)
>>: "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
>> (*rlen): "cc");
>>return len;
>> }
>> 
>> Apart from being much easier to read because the hard regs become part
>> of the inline assembly it solves also a couple of other issues:
>> 
>> - function calls might clobber register asm variables see BZ100908
>> - the constraints for the register asm operands are superfluous
>> - one register asm variable cannot be used for 2 different inline
>>  assemblies if the value is expected in different hard regs
>> 
>> I've started with a hackish implementation for IBM Z using the
>> TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
>> skip over the {} parts.  But perhaps it would be useful to make this a
>> generic mechanism for all targets?!
>> 
>> Andreas
>
> Yes, I would think this should be made a general mechanism that any target 
> could use.

+1 FWIW.  I think this would also avoid the common confusion around
the semantics of register asms.

We would need to check whether { is used as a constraint string by
any target (hepefully not).

Richard


Re: [RFC] Implementing detection of saturation and rounding arithmetic

2021-05-12 Thread Richard Sandiford via Gcc
Tamar Christina  writes:
> Hi All,
>
> We are looking to implement saturation support in the compiler.  The aim is to
> recognize both Scalar and Vector variant of typical saturating expressions.
>
> As an example:
>
> 1. Saturating addition:
>char sat (char a, char b)
>{
>   int tmp = a + b;
>   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
>}
>
> 2. Saturating abs:
>char sat (char a)
>{
>   int tmp = abs (a);
>   return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
>}
>
> 3. Rounding shifts
>char rndshift (char dc)
>{
>   int round_const = 1 << (shift - 1);
>   return (dc + round_const) >> shift;
>}
>
> etc.
>
> Of course the first issue is that C does not really have a single idiom for
> expressing this.
>
> At the RTL level we have ss_truncate and us_truncate and float_truncate for
> truncation.
>
> At the Tree level we have nothing for truncation (I believe) for scalars. For
> Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
> nothing actually generates this at the moment. it's just an unused tree code.
>
> For rounding there doesn't seem to be any existing infrastructure.
>
> The proposal to handle these are as follow, keep in mind that all of these 
> also
> exist in their scalar form, as such detecting them in the vectorizer would be
> the wrong place.
>
> 1. Rounding:
>a) Use match.pd to rewrite various rounding idioms to shifts.
>b) Use backwards or forward prop to rewrite these to internal functions
>   where even if the target does not support these rounding instructions 
> they
>   have a chance to provide a more efficient implementation than what would
>   be generated normally.
>
> 2. Saturation:
>a) Use match.pd to rewrite the various saturation expressions into min/max
>   operations which opens up the expressions to further optimizations.
>b) Use backwards or forward prop to convert to internal functions if the
>   resulting min/max expression still meet the criteria for being a
>   saturating expression.  This follows the algorithm as outlined in "The
>   Software Vectorization handbook" by Aart J.C. Bik.
>
>   We could get the right instructions by using combine if we don't rewrite
>   the instructions to an internal function, however then during 
> Vectorization
>   we would overestimate the cost of performing the saturation.  The 
> constants
>   will the also be loaded into registers and so becomes a lot more 
> difficult
>   to cleanup solely in the backend.
>
> The one thing I am wondering about is whether we would need an internal 
> function
> for all operations supported, or if it should be modelled as an internal FN 
> which
> just "marks" the operation as rounding/saturating. After all, the only 
> difference
> between a normal and saturating expression in RTL is the xx_truncate RTL 
> surrounding
> the expression.  Doing so would also mean that all targets whom have 
> saturating
> instructions would automatically benefit from this.

I might have misunderstood what you meant here, but the *_truncate
RTL codes are true truncations: the operand has to be wider than the
result.  Using this representation for general arithmetic is a problem
if you're operating at the maximum size that the target supports natively.
E.g. representing a 64-bit saturating addition as:

  - extend to 128 bits
  - do a 128-bit addition
  - truncate to 64 bits

is going to be hard to cost and code-generate on targets that don't support
native 128-bit operations (or at least, don't support them cheaply).
This might not be a problem when recognising C idioms, since the C source
code has to be able do the wider operation before truncating the result,
but it could be a problem if we provide built-in functions or if we want
to introduce compiler-generated saturating operations.

RTL already has per-operation saturation such as ss_plus/us_plus,
ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div,
ss_ashift/us_ashift and ss_abs.  I think we should do the same
in gimple, using internal functions like you say.

Thanks,
Richard



Re: Could vector type of poly_int and compile-time constants be enabled at the same time ?

2021-04-29 Thread Richard Sandiford via Gcc
JojoR via Gcc  writes:
> Hi,
>
>   I have a little know about for 'Sizes and offsets as runtime 
> invariants’,
>
>   and need to add vector types like V2SImode as compile-time constants
>   with enabled vector types of runtime invariants.
>
>   Could I enable two vector types at same time ?
>   I guess it’s not allow :(
>
>   Could anyone give me some hints ?

Vector modes like V2SI always represent fixed-length vectors.
Variable-length vector modes have to be defined explicitly in the
.def file.

A target can have both variable-length and fixed-length vectors.

Probably the best place to look for examples is the aarch64 backend.
There we have:

- V2SI (64-bit Advanced SIMD)
- V4SI (128-bit Advanced SIMD)
- VNx2SI (variable-length SVE vector with 2 32-bit elements per
  128-bit granule)
- VNx4SI (variable-length SVE vector with 4 32-bit elements per
  128-bit granule)
etc.

In target-independent code, things like GET_MODE_NUNITS are always
poly_ints.  In other words, the number of elements might be constant
(is_constant()) or might vary at runtime (!is_constant()).

On targets like x86 that don't define variable-length vectors,
things like GET_MODE_NUNITS are always compile-time constants
*within config/i386*.  This removes the need to use is_constant()
everywhere.

On targets like aarch64 that do define variable-length vectors,
things like GET_MODE_NUNITS are poly_ints even in config/aarch64.
However, GET_MODE_NUNITS is still constant for V2SI and V4SI.
We just have to use to_constant() to get the constant 2 or 4.

Thanks,
Richard


Re: GCC association with the FSF

2021-04-11 Thread Richard Sandiford via Gcc
[ Like many others who have posted to this thread, I've contributed
  to GCC at various times as a hobby and at other times as a paid
  employee.  Here I'm speaking as a personal contributor, not on
  behalf of my current employer. ]

I think it's misleading to talk about GCC “leaving” or “disassociating
itself” from the FSF or from the GNU project.  No-one can force the FSF or
the GNU project to drop GCC (and I don't think anyone's trying to make it
do that).  I think what we're really talking about is whether there should
be a fork.  In some ways that would be like egcs, although perhaps this time
it would be a clean break, without intending the fork to “rejoin” GNU later.

If the fork takes the current gcc.gnu.org infrastructure with it,
the GNU project would have to maintain its version of GCC elsewhere.
But that would be a minor barrier at most.  The likelihood is that there
would be two versions of GCC, which for want of better terms I'll call
“FSF GCC” and “new GCC”.  If FSF GCC does continue as a project in any
meaningful sense, new GCC would be able to cherry-pick useful contributions
from FSF GCC.  Cherry-picking in the opposite direction would also be
technically and legally possible, but would presumably be rejected on
principle by whoever the new FSF GCC maintainers turn out to be (not least
because “new GCC” commits would not be FSF copyright).

This should also satisfy those who believe that only an FSF-copyright
GCC is trustworthy.  People who only trust the FSF can contribute to
and use “FSF GCC” and ignore “new GCC”.

So I think the question becomes: how many GCC developers and organisations
are willing to agree to follow the fork rather than stick with FSF GCC?
Does anyone have any suggestions for a good procedure for testing the
level of support?  I don't think this mailing list is it.

(It's ironic that the process of answering that question shows how
misplaced a lot of the comments about the SC were.  GCC is fundamentally
a developer/contributor-led project, so even an important decision like
this will be made by developers/contributors rather than the SC.)

FWIW, again speaking personally, I would be in favour of joining a fork.[*]

Mark Wielaard  writes:
> On Wed, Apr 07, 2021 at 10:04:21AM -0400, David Malcolm wrote:
> > Another, transitional approach might be to find another Free Software
> > non-profit and for contributors to start assigning copyright on ongoing
> > work to that other non-profit.  That way there would be only two
> > copyright holders on the code; if the FSF somehow survives its current
> > death-spiral then the other nonprofit could assign copyright back to
> > the FSF;  if it doesn't, well, we've already got bigger problems.
>
> Yes, having all new copyrights pooled together so we have just two
> copyright holders would provide most of the same benefits. And makes
> it easier to deal with the legacy FSF copyrights since there would be
> just one legal entity having to deal with them instead of each
> individual copyright holder on their own.

It sounds like it could be the worst of both worlds in some ways though.
There would no longer be a single entity that could relicense the code,
if that became necessary for any reason, yet we would still require all
contributors to go through the off-putting process of assigning copyright.

I think it would be better to have voluntary aggregation of copyright
(for those in a position to offer it) while also allowing contributors
to retain copyright if they prefer.  If enough regular contributors
agree to pool copyright then that should be enough.

> If it has to come to this then we could take a look at what the
> Conservancy already does for aggregating copyright for their member
> projects, the Linux kernel and Debian project:
> https://sfconservancy.org/copyleft-compliance/
>
> I like their idea of having a counsel of developers that gets involved
> in any action taken on behave of the collective:
> https://sfconservancy.org/docs/blank_linux-enforcement-agreement.pdf

I'm not familiar with this system, but yeah, I agree that it looks on the
face of it like a good approach, provided that it's strictly voluntary.

Thanks,
Richard


[*] Not that anyone should care or is likely to care, but for the record,
my reasons are:

The FSF and the GNU project have had a key historical role in developing
and promoting free software as a concept.  But history is one thing and
the future is another.  Due to the success of the early advocacy work,
free software now exists as a principle independently of the FSF and
the GNU project.  And the FSF has provided copyleft licenses that have
stood the test of time.  So like others have said, the question for
the future is: do we as contributors gain anything by having any new
work we do be owned by the FSF and associated with the GNU project?

I think the recent developments, as well as the messages in this
email thread that supposedly give reasons 

Re: Current limitations of define_subst

2021-03-26 Thread Richard Sandiford via Gcc
Colin McEwan via Gcc  writes:
> Hi all,
>
> I was wondering if anyone understands the rationale behind the current
> limitations on (define_subst), ie. working only on (define_insn) and
> (define_expand).
>
> A lot of md cleanup, as well as extra patterns for combiner use, could be
> enabled by supporting (define_split) / (define_insn_and_split) /
> (define_peephole2) so I've had a very brief look at adding support for
> these constructs, and from what I see there are only a few barriers
> standing in the way.
>
> 1. attributes are used to store iterator state. This is fine for insns and
> expands,, but splits and peephole2- do not have attributes. Some alternate
> storage could be used, or attributes could added for splits and peephole2s.
>
> 2. splits and peephole2s will have two RTL patterns which the transform
> might be intended to apply to. Adding an attribute to (define_subst) to
> flag whether it applies to 'old', 'new' or both RTL patterns in the
> definition seems a reasonable change to support this.
>
> Does anyone have any insight into anything else that would stop this from
> working?

Thanks for looking at this.  Extending define_subst to support other
constructs sounds useful.

Normally the define_subst is triggered by the use of special string
attributes, but define_splits don't have the names or (like you say)
attributes in which these special attribute would typically be used.
Is the idea to apply the define_subst to all define_splits whose
patterns match?  Or would there be some other trigger for using
the define_subst?

Apart from that, I don't think there's any fundamental reason why
define_subst hasn't been made to work with other constructs.

Thanks,
Richard


Re: C++11 code in the gcc 10 branch

2020-12-31 Thread Richard Sandiford via Gcc
FX  writes:
>> When are you going to apply your fix that Richard S. approved on the
>> 21st?
>
> When I remember how to set up gcc’s git with write access, and remember how 
> the new ChangeLog entries work. The times where I was a regular contributor 
> were the CVS and SVN times.
>
> I also wanted to ask approval to commit this diff below, fixing 
> aarch64_get_extension_string_for_isa_flags()’s prototype to align it with the 
> actual function definition:
>
> diff --git a/gcc/config/aarch64/driver-aarch64.c 
> b/gcc/config/aarch64/driver-aarch64.c
> index 8840a2d9486c..d99834c99896 100644
> --- a/gcc/config/aarch64/driver-aarch64.c
> +++ b/gcc/config/aarch64/driver-aarch64.c
> @@ -27,8 +27,7 @@
>  #include "tm.h"
>  
>  /* Defined in common/config/aarch64/aarch64-common.c.  */
> -std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
> - unsigned long);
> +std::string aarch64_get_extension_string_for_isa_flags (uint64_t, uint64_t);
>  
>  struct aarch64_arch_extension
>  {

Yeah, that's OK everywhere it's needed, thanks.  (The other patch
was only needed on release branches.)

Richard


What is the type of vector signed + vector unsigned?

2020-12-29 Thread Richard Sandiford via Gcc
Any thoughts on what f should return in the following testcase, given the
usual GNU behaviour of treating signed >> as arithmetic shift right?

typedef int vs4 __attribute__((vector_size(16)));
typedef unsigned int vu4 __attribute__((vector_size(16)));
int
f (void)
{
  vs4 x = { -1, -1, -1, -1 };
  vu4 y = { 0, 0, 0, 0 };
  return ((x + y) >> 1)[0];
}

The C frontend takes the type of x+y from the first operand, so x+y
is signed and f returns -1.

The C++ frontend applies similar rules to x+y as it would to scalars,
with unsigned T having a higher rank than signed T, so x+y is unsigned
and f returns 0x7fff.

FWIW, Clang treats x+y as signed, so f returns -1 for both C and C++.

[Was looking at this in the context of PR96377.]

Thanks,
Richard


Re: C++11 code in the gcc 10 branch

2020-12-21 Thread Richard Sandiford via Gcc
FX via Gcc  writes:
> I’m trying to bootstrap a GCC 10 compiler on macOS with clang, and I am 
> getting errors in stage 1, because there is C++11 code that is rejected by 
> clang (because the bootstrap involves compiling stage 1 with -std=gnu++98, 
> online on master (see top-level configure.ac). These errors are not seen, I 
> believe, when GCC is the bootstrap compiler, because GCC will issue a warning 
> instead of an error (as clang does).
>
> One place with such issue is in aarch64-builtins.c, which contains a C++11 
> constructor. I can fix it with this:
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index cba596765..184e9095d 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -1225,8 +1225,9 @@ aarch64_init_memtag_builtins (void)
>  = aarch64_general_add_builtin ("__builtin_aarch64_memtag_"#N, \
>T, AARCH64_MEMTAG_BUILTIN_##F); \
>aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
> - AARCH64_MEMTAG_BUILTIN_START - 1] = \
> -   {T, CODE_FOR_##I};
> + AARCH64_MEMTAG_BUILTIN_START - 1].ftype = T; \
> +  aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
> + AARCH64_MEMTAG_BUILTIN_START - 1].icode = 
> CODE_FOR_##I;
>  
>fntype = build_function_type_list (ptr_type_node, ptr_type_node,
>  uint64_type_node, NULL);
>
> […stuff that Iain has already answered…]
>
> I would welcome:
>
> 1. confirmation that the C++11 code in aarch64-builtins.c is indeed a bug, 
> and that a patch for it would be welcome

Yeah, it's definitely a bug, thanks for catching it.  The patch above
is OK.

Thanks,
Richard


Re: wide int knowledge/bug?

2020-10-27 Thread Richard Sandiford via Gcc
Jakub Jelinek via Gcc  writes:
> On Tue, Oct 27, 2020 at 01:18:03PM -0400, Andrew MacLeod via Gcc wrote:
>> I was looking at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97596
>> 
>> and the ranger is constructing some 128 bit constants and calling
>> wide_int_to_tree to turn them into trees.
>> 
>> In particular, it starts with the value
>> 
>> p r.lower_bound(0)
>> { = {val = {-65535, 9223372036854775807, 140737488257608},
>> len = 2, precision = 128}, static is_sign_extended = true}
>> 
>> p r.lower_bound(0).dump()
>> [0x7fff,0x0001], precision = 128
>> 
>> 
>>  and proceeds to call
>> 
>> wide_int new_lb = wi::set_bit (r.lower_bound (0), 127)
>> 
>> and creates the value:
>> 
>> p new_lb
>> { = {val = {-65535, -1, 0}, len = 2, precision = 128},
>> static is_sign_extended = true}
>
> This is non-canonical and so invalid, if the low HWI has the MSB set
> and the high HWI is -1, it should have been just
> val = {-65535}, len = 1, precision = 128}
>
> I guess the bug is that wi::set_bit_large doesn't call canonize.

Yeah, looks like a micro-optimisation gone wrong.

> So perhaps totally untested:
>
> --- gcc/wide-int.cc.jj2020-10-19 18:42:41.134426398 +0200
> +++ gcc/wide-int.cc   2020-10-27 18:33:38.546703763 +0100
> @@ -702,8 +702,11 @@ wi::set_bit_large (HOST_WIDE_INT *val, c
>/* If the bit we just set is at the msb of the block, make sure
>that any higher bits are zeros.  */
>if (bit + 1 < precision && subbit == HOST_BITS_PER_WIDE_INT - 1)
> - val[len++] = 0;
> -  return len;
> + {
> +   val[len++] = 0;
> +   return len;
> + }
> +  return canonize (val, len, precision);
>  }
>else
>  {

LGTM, thanks.

Richard


Re: modified_between_p does not check for volatile memory

2020-10-13 Thread Richard Sandiford via Gcc
Tucker Kern via Gcc  writes:
> TL;DR
>
> In GCC 9.3, I believe modified_between_p should return 1 if the memory
> reference is volatile. My layman's understanding of volatile memory is that
> it could change at any time, and thus could be modified between any two
> instructions.

That's true, but in RTL, volatile accesses provide no timing guarantee,
even in an abstract sense.  E.g. suppose we traced the execution of a
function call based on the instructions that exist immediately after
the expand pass.  If a volatile access occurs in the Nth instruction
of this trace, there's no guarantee that the access will still occur
in the Nth instruction of traces for later passes.  Unrelated
instructions can be added and removed at any time.

For example, a volatile access should not stop us from swapping:

  (set (reg:SI R0) (const_int 0))

and:

  (set (mem/v:SI (reg:SI R1)) (const_int 0))

So…

> Possible patch:
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 01af063a222..a395df0627b 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1308,6 +1308,8 @@ modified_between_p (const_rtx x, const rtx_insn
> *start, const rtx_insn *end)
>return 1;
>
>  case MEM:
> +  if (MEM_VOLATILE_P(x))
> +return 1;
>if (modified_between_p (XEXP (x, 0), start, end))
>   return 1;
>if (MEM_READONLY_P (x))

…I think the current code handles volatile correctly.  The issue
is whether any instruction in (START, END) might modify the same
memory as X.  Most of the real work is done by the alias machinery,
which understands (or is supposed to understand) the special rules
around volatile memory and how it interacts with other memory accesses:

  for (insn = NEXT_INSN (start); insn != end; insn = NEXT_INSN (insn))
if (memory_modified_in_insn_p (x, insn))
  return 1;

> Details
> I'm writing a splitter to optimize bit field conditionals (e.g. testing a
> bit of memory mapped I/O). The non-optimized pattern consists of a SET,
> AND,  LSHIFTRT + CMP, JMP pattern which I know can be simplified to an
> AND + JMP as the AND operation will set the necessary CC bits. To achieve
> this optimization in the combine pass a custom general_operand has been
> defined which allows volatile memory.
>
> However in certain situations I've found the combine pass is generating
> invalid patterns when it merges a SET and IF_THEN_ELSE (CMP + JMP)
> patterns. One such situation is when GCC decides to reuse the result of the
> comparison for the return value.
>
> This issue seems intertwined with the combine + volatile_ok issue that has
> been discussed a number of times in the past. The backend is question is
> highly embedded and significant performance gains are made by enabling
> volatile references during combine.
>
> Optimized tree of the test program. In this program the bit field value is
> saved in _1 to be reused in the return value. Why it makes this choice is
> beyond me.
>_1;
>   _Bool _3;
>
>[local count: 1044213930]:
>   _1 ={v} MEM[(volatile struct register_t *)5B].done;
>   if (_1 != 0)
> goto ; [11.00%]
>   else
> goto ; [89.00%]
>
>[local count: 1073741824]:
>   // Additional code trimmed for brevity...
>
>[local count: 114863532]:
>   # _3 = PHI <0(4), _1(5)>
>   return _3;
>
> This produces the following pre-combine RTL.
> (insn 9 6 10 3 (parallel [
> (set (reg:QI 17 [  ])
> (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile
> struct register_t *)5B]+0 S1 A16])
> (const_int 3 [0x3])))
> (clobber (reg:CC 7 FLAG))
> ]) "fail.c":22:26 19 {*lshrqi3_im}
>  (expr_list:REG_DEAD (reg/f:QI 18)
> (expr_list:REG_UNUSED (reg:CC 7 FLAG)
> (nil
> (insn 10 9 12 3 (parallel [
> (set (reg:QI 17 [  ])
> (and:QI (reg:QI 17 [  ])
> (const_int 1 [0x1])))
> (clobber (reg:CC 7 FLAG))
> ]) "fail.c":22:26 10 {andqi3}
>  (expr_list:REG_UNUSED (reg:CC 7 FLAG)
> (nil)))
> (jump_insn 12 10 20 3 (parallel [
> (set (pc)
> (if_then_else (eq (reg:QI 17 [  ])
> (const_int 0 [0]))
> (label_ref:QI 11)
> (pc)))
> (clobber (scratch:QI))
> ]) "fail.c":22:9 41 {*cbranchqi4_im}
>  (int_list:REG_BR_PROB 955630228 (nil))
>  -> 11)
>
> Which when combined generates patterns that will produce invalid code. GCC
> has assumed the memory value will not change between insn 10 & 12.
> (insn 10 9 12 3 (parallel [
> (set (reg:QI 17 [  ])
> (and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1
> MEM[(volatile struct register_t *)5B]+0 S1 A16])
> (const_int 3 [0x3]))
> (const_int 1 [0x1])))
> (clobber (reg:CC 7 FLAG))
> ]) "fail.c":22:18 10 {andqi3}
>  (expr_list:REG_UNUSED (reg:CC 7 FLAG)
> (nil)))
> (jump_