Re: How to avoid some built-in expansions in gcc?

2024-06-05 Thread Michael Matz via Gcc
Hey,

On Wed, 5 Jun 2024, David Brown wrote:

> The ideal here would be to have some way to tell gcc that a given 
> function has the semantics of a different function.  For example, a 
> programmer might have several implementations of "memcpy" that are 
> optimised for different purposes based on the size or alignment of the 
> arguments.  Maybe some of these are written with inline assembly or work 
> in a completely different way (I've used DMA on a microcontroller for 
> the purpose).  If you could tell the compiler that the semantic 
> behaviour and results were the same as standard memcpy(), that could 
> lead to optimisations.
> 
> Then you could declare your "isinf" function with 
> __attribute__((semantics_of(__builtin_isinf))).
> 
> And the feature could be used in any situation where you can write a 
> function in a simple, easy-to-analyse version and a more efficient but 
> opaque version.

Hmm, that actually sounds like a useful feature.  There are some details 
to care for, like what to do with arguments: e.g. do they need to have the 
same types as the referred builtin, only compatible ones, or even just 
convertible ones, and suchlike, but yeah, that sounds nice.


Ciao,
Michael.


Re: How to avoid some built-in expansions in gcc?

2024-06-05 Thread Michael Matz via Gcc
Hello,

On Tue, 4 Jun 2024, Jakub Jelinek wrote:

> On Tue, Jun 04, 2024 at 07:43:40PM +0200, Michael Matz via Gcc wrote:
> > (Well, and without reverse-recognition of isfinite-like idioms in the 
> > sources.  That's orthogonal as well.)
> 
> Why?  If isfinite is better done by a libcall, why isn't isfinite-like
> idiom also better done as a libcall?

It is.  I was just trying to avoid derailing the discussion for finding an 
immediately good solution by searching for the perfect solution.  Idiom 
finding simply is completely independend from the posed problem that 
Georg-Johann has, which remains unsolved AFAICS, as using 
fno-builtin-foobar has its own (perhaps mere theoretical for AVR) 
problems.


Ciao,
Michael.


Re: How to avoid some built-in expansions in gcc?

2024-06-04 Thread Michael Matz via Gcc
Hello,

On Sat, 1 Jun 2024, Richard Biener via Gcc wrote:

> >>> You have a pointer how to define a target optab? I looked into optabs 
> >>> code but found no appropriate hook.  For isinf is seems is is 
> >>> enough to provide a failing expander, but other functions like isnan 
> >>> don't have an optab entry, so there is a hook mechanism to extend optabs?
> >> Just add corresponding optabs for the missing cases (some additions are 
> >> pending, like isnornal).  There’s no hook to prevent folding to FP 
> >> compares nor is that guarded by other means (like availability of native 
> >> FP ops).  Changing the guards would be another reasonable option.
> >> Richard
> > 
> > There are many other such folds, e.g. for isdigit().  The AVR libraries 
> > have all this in hand-optimized assembly, and all these built-in expansions 
> > are bypassing that.  Open-coded C will never beat that assemlbly code, at 
> > least not with the current GCC capabilities.
> 
> The idea is that isdigit() || isalpha() or similar optimize without 
> special casing the builtins.
> 
> > How would I go about disabling / bypassing non-const folds from ctype.h and 
> > the many others?
> 
> I think this mostly shows we lack late recognition of open-coded isdigit 
> and friends, at least for the targets where inlining them is not 
> profitable.
> 
> A pragmatic solution might be a new target hook, indicating a specified 
> builtin is not to be folded into an open-coded form.

Well, that's what the mechanism behind -fno-builtin-foobar is supposed to 
be IMHO.  Hopefully the newly added additional mechanism using optabs and 
ifns (instead of builtins) heeds it.

> A good solution would base this on (size) costs, the perfect solution 
> would re-discover the builtins late and undo inlining that didn’t turn 
> out to enable further simplification.
> 
> How is inlined isdigit bad on AVR?  Is a call really that cheap 
> considering possible register spilling around it?

On AVR with needing to use 8bit registers to do everything?  I'm pretty 
sure the call is cheaper, yeah :)


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-04 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Jonathon Anderson wrote:

> Of course, this doesn't make the build system any less complex, but 
> projects using newer build systems seem easier to secure and audit than 
> those using overly flexible build systems like Autotools and maybe even 
> CMake. IMHO using a late-model build system is a relatively low 
> technical hurdle to overcome for the benefits noted above, switching 
> should be considered and in a positive light.

Note that we're talking not (only) about the build system itself, i.e. how 
to declare dependencies within the sources, and how to declare how to 
build them.  make it just fine for that (as are many others).  (In a way 
I think we meanwhile wouldn't really need automake and autogen, but 
rewriting all that in pure GNUmake is a major undertaking).

But Martin also specifically asked about alternatives for feature tests, 
i.e. autoconfs purpose.  I simply don't see how any alternative to it 
could be majorly "easier" or "less complex" at its core.  Going with the 
examples given upthread there is usually only one major solution: to check 
if a given system supports FOOBAR you need to bite the bullet and compile 
(and potentially run!) a small program using FOOBAR.  A configuration 
system that can do that (and I don't see any real alternative to that), no 
matter in which language it's written and how traditional or modern it is, 
also gives you enough rope to hang yourself, if you so choose.

If you get away without many configuration tests in your project then this 
is because what (e.g.) the compiler gives you, in the form of libstdc++ 
for example, abstracts away many of the peculiarities of a system.  But 
in order to be able to do that something (namely the config system of 
libstdc++) needs to determine what is or isn't supported by the system in 
order to correctly implement these abstractions.  I.e. things you depend 
on did the major lifting of hiding system divergence.

(Well, that, or you are very limited in the number of systems you support, 
which can be the right thing as well!)


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Martin Uecker wrote:

> The backdoor was hidden in a complicated autoconf script...

Which itself had multiple layers and could just as well have been a 
complicated cmake function.

> > (And, FWIW, testing for features isn't "complex".  And have you looked at 
> > other build systems?  I have, and none of them are less complex, just 
> > opaque in different ways from make+autotools).
> 
> I ask a very specific question: To what extend is testing 
> for features instead of semantic versions and/or supported
> standards still necessary?

I can't answer this with absolute certainty, but points to consider: the 
semantic versions need to be maintained just as well, in some magic way.  
Because ultimately software depend on features of dependencies not on 
arbitrary numbers given to them.  The numbers encode these features, in 
the best case, when there are no errors.  So, no, version numbers are not 
a replacement for feature tests, they are a proxy.  One that is manually 
maintained, and hence prone to errors.

Now, supported standards: which one? ;-)  Or more in earnest: while on 
this mailing list here we could chose a certain set, POSIX, some 
languages, Windows, MacOS (versions so-and-so).  What about other 
software relying on other 3rdparty feature providers (libraries or system 
services)?  Without standards?

So, without absolute certainty, but with a little bit of it: yes, feature 
tests are required in general.  That doesn't mean that we could not 
do away with quite some of them for (e.g.) GCC, those that hold true on 
any platform we support.  But we can't get rid of the infrastructure for 
that, and can't get rid of certain classes of tests.

> This seems like a problematic approach that may have been necessary 
> decades ago, but it seems it may be time to move on.

I don't see that.  Many aspects of systems remain non-standardized.


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Martin Uecker via Gcc wrote:

> > > Seems reasonable, but note that it wouldn't make any difference to
> > > this attack.  The liblzma library was modified to corrupt the sshd
> > > binary, when sshd was linked against liblzma.  The actual attack
> > > occurred via a connection to a corrupt sshd.  If sshd was running as
> > > root, as is normal, the attacker had root access to the machine.  None
> > > of the attacking steps had anything to do with having root access
> > > while building or installing the program.
> 
> There does not seem a single good solution against something like this.
> 
> My take a way is that software needs to become less complex. Do 
> we really still need complex build systems such as autoconf?

Do we really need complex languages like C++ to write our software in?  
SCNR :)  Complexity lies in the eye of the beholder, but to be honest in 
the software that we're dealing with here, the build system or autoconf 
does _not_ come to mind first when thinking about complexity.

(And, FWIW, testing for features isn't "complex".  And have you looked at 
other build systems?  I have, and none of them are less complex, just 
opaque in different ways from make+autotools).


Ciao,
Michael.


Re: Improvement of CLOBBER descriptions

2024-02-21 Thread Michael Matz via Gcc
Hello,

On Wed, 21 Feb 2024, Daniil Frolov wrote:

> >> Following the recent introduction of more detailed CLOBBER types in GCC, a
> >> minor
> >> inconsistency has been identified in the description of
> >> CLOBBER_OBJECT_BEGIN:
> >> 
> >>   /* Beginning of object lifetime, e.g. C++ constructor.  */
> >>   CLOBBER_OBJECT_BEGIN

The "e.g." comment mixes concepts of the C++ language with a 
middle-end/GIMPLE concept, and hence is a bit misleading.  What the 
clobbers are intended to convey to the middle-end is the low-level notion 
of "storage associated with this and that object is now accessible 
(BEGIN)/not accessible anymore (END), for purposes related to that very 
object".  The underlying motivation, _why_ that knowledge is interesting 
to the middle-end, is to be able to share storage between different 
objects.

"purposes related to that object" are any operations on the object: 
construction, destruction, access, frobnification.  It's not tied to a 
particular frontend language (although it's the language semantics that 
dictate when emitting the begin/end clobbers is appropriate).  For the 
middle-end the C++ concept of construction/deconstruction are simply 
modifications (conceptual or real) of the storage associated with an 
object ("object" not in the C++ sense, but from a low-level perspective; 
i.e. an "object" doesn't only exist after c++-construction, it comes into 
existence before that, even if perhaps in an indeterminate/invalid state 
from the frontends perspective).

Initially these clobbers were only emitted when decls went ouf of 
scope, and so did have some relation to frontend language semantics 
(although a fairly universal one, namely "scope").  The 
C++ frontend then found further uses (e.g. after a dtor for an 
object _ends_ it's storage can also be reused), and eventually we also 
needed the BEGIN clobbers to convey the meaning of "now storage use for 
object potentially starts, don't share it with any other object".

If certain frontends find use for more fine-grained definitions of 
life-times, then further note kinds need to be invented for those 
frontends use.  They likely won't have influence on the middle-end though 
(perhaps for some sanitizers such new kinds might be useful?).  But the 
current BEGIN/END clobbers need to continue to mark the outermost borders 
of storage validity for an object.


Ciao,
Michael.


Re: Calling convention for Intel APX extension

2023-07-31 Thread Michael Matz via Gcc
Hello,

On Sun, 30 Jul 2023, Thomas Koenig wrote:

> > I've recently submitted a patch that adds some attributes that basically
> > say "these-and-those regs aren't clobbered by this function" (I did them
> > for not clobbered xmm8-15).  Something similar could be used for the new
> > GPRs as well.  Then it would be a matter of ensuring that the interesting
> > functions are marked with that attributes (and then of course do the
> > necessary call-save/restore).
> 
> Interesting.
> 
> Taking this a bit further: The compiler knows which registers it used
> (and which ones might get clobbered by called functions) and could
> generate such information automatically and embed it in the assembly
> file, and the assembler could, in turn, put it into the object file.
> 
> A linker (or LTO) could then check this and elide save/restore pairs
> where they are not needed.

LTO with interprocedural register allocation (-fipa-ra) already does this.  

Doing it without LTO is possible to implement in the way you suggest, but 
is very hard to get effective: the problem is that saving/restoring of 
registers might be scheduled in non-trivial ways and getting rid of 
instruction bytes within function bodies at link time is fairly 
non-trivial: it needs excessive meta-information to be effective (e.g. all 
jumps that potentially cross the removed bytes must get relocations).

So you either limit the ways that prologue and epilogues are emitted to 
help the linker (thereby limiting effectiveness of unchanged xlogues) or 
you emit more meta-info than the instruction bytes themself, bloating 
object files for dubious outcomes.

> It would probably be impossible for calls into shared libraries, since
> the saved registers might change from version to version.

The above scheme could be extended to also allow introducing stubs 
(wrappers) for shared lib functions, handled by the dynamic loader.  But 
then you would get hard problems to solve related to function addresses 
and their uniqueness.

> Still, potential gains could be substantial, and it could have an
> effect which could come close to inlining, while actually saving space
> instead of using extra.
> 
> Comments?

I think it would be an interesting experiment to implement such scheme 
fully just to see how effective it would be in practice.  But it's very 
non-trivial to do, and my guess is that it won't be super effective.  So, 
could be a typical research paper topic :-)

At least outside of extreme cases like the SSE regs, where none are 
callee-saved, and which can be handled in a different way like the 
explicit attributes.


Ciao,
Michael.


Re: Calling convention for Intel APX extension

2023-07-27 Thread Michael Matz via Gcc
Hey,

On Thu, 27 Jul 2023, Thomas Koenig via Gcc wrote:

> Intel recommends to have the new registers as caller-saved for
> compatibility with current calling conventions.  If I understand this
> correctly, this is required for exception unwinding, but not if the
> function called is __attribute__((nothrow)).

That's not the full truth.  It's not (only) exception handling but also 
context switching via setjmp/longjmp and make/get/setcontext.

The data structures for that are part of the ABI unfortunately, and can't 
be assumed to be extensible (as Florian says, for glibc there maybe be 
hacks (or maybe not) on x86-64.  Some other archs implemented 
extensibility from the outset).  So all registers (and register parts!) 
added after the initial psABI is defined usually _have_ to be 
call-clobbered.

> Since Fortran tends to use a lot of registers for its array descriptors,
> and also tends to call nothrow functions (all Fortran functions, and
> all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from
> making some of the new registers callee-saved, to save some spills
> at function calls.

I've recently submitted a patch that adds some attributes that basically 
say "these-and-those regs aren't clobbered by this function" (I did them 
for not clobbered xmm8-15).  Something similar could be used for the new 
GPRs as well.  Then it would be a matter of ensuring that the interesting 
functions are marked with that attributes (and then of course do the 
necessary call-save/restore).


Ciao,
Michael.


Re: [PATCH] Basic asm blocks should always be volatile

2023-06-29 Thread Michael Matz via Gcc
Hello,

On Thu, 29 Jun 2023, Julian Waters via Gcc wrote:

> int main() {
> asm ("nop" "\n"
>  "\t" "nop" "\n"
>  "\t" "nop");
> 
> asm volatile ("nop" "\n"
>   "\t" "nop" "\n"
>   "\t" "nop");
> }
> 
> objdump --disassemble-all -M intel -M intel-mnemonic a.exe > disassembly.txt
> 
> 0001400028c0 :
>1400028c0: 48 83 ec 28 subrsp,0x28
>1400028c4: e8 37 ec ff ffcall   140001500 <__main>
>1400028c9: 90nop
>1400028ca: 90nop
>1400028cb: 90nop
>1400028cc: 31 c0   xoreax,eax
>1400028cd: 48 83 c4 28 addrsp,0x28
>1400028ce: c3ret
> 
> Note how there are only 3 nops above when there should be 6, as the first 3
> have been deleted by the compiler. With the patch, the correct 6 nops
> instead of 3 are compiled into the final code.
> 
> Of course, the above was compiled with the most extreme optimizations
> available to stress test the possible bug, -O3, -ffunction-sections
> -fdata-sections -Wl,--gc-sections -flto=auto. Compiled as C++17 and intel
> assembly syntax

Works just fine here:

% cat xx.c
int main() {
asm ("nop" "\n"
 "\t" "nop" "\n"
 "\t" "nop");

asm volatile ("nop" "\n"
  "\t" "nop" "\n"
  "\t" "nop");
}

% g++ -v
...
gcc version 12.2.1 20230124 [revision 
193f7e62815b4089dfaed4c2bd34fd4f10209e27] (SUSE Linux)

% g++ -std=c++17 -flto=auto -O3 -ffunction-sections -fdata-sections xx.c
% objdump --disassemble-all -M intel -M intel-mnemonic a.out
...
00401020 :
  401020:   90  nop
  401021:   90  nop
  401022:   90  nop
  401023:   90  nop
  401024:   90  nop
  401025:   90  nop
  401026:   31 c0   xoreax,eax
  401028:   c3  ret
  401029:   0f 1f 80 00 00 00 00nopDWORD PTR [rax+0x0]
...

Testing with recent trunk works as well with no differences in output.
This is for x86_64-linux.

So, as suspected, something else is broken for you.  Which compiler 
version are you using?  (And we need to check if it's something in the 
mingw target)


Ciao,
Michael.


Re: types in GIMPLE IR

2023-06-29 Thread Michael Matz via Gcc
Hello,

On Thu, 29 Jun 2023, Krister Walfridsson wrote:

> > The thing with signed bools is that the two relevant values are -1 (true)
> > and 0 (false), those are used for vector bool components where we also
> > need them to be of wider type (32bits in this case).
> 
> My main confusion comes from seeing IR doing arithmetic such as
> 
>_127;
>_169;
>   ...
>   _169 = _127 + -1;
> 
> or
> 
>_127;
>_169;
>   ...
>   _169 = -_127;
> 
> and it was unclear to me what kind of arithmetic is allowed.
> 
> I have now verified that all cases seems to be just one operation of this form
> (where _127 has the value 0 or 1), so it cannot construct values such as 42.
> But the wide signed Boolean can have the three different values 1, 0, and -1,
> which I still think is at least one too many. :)

It definitely is.  For signed bool it should be -1 and 0, for unsigned 
bool 1 and 0.  And of course, arithmetic on bools is always dubious, that  
should all be logical operations.  Modulo-arithmetic (mod 2) could be 
made to work, but then we would have to give up the idea of signed bools 
and always use conversions to signed int to get a bitmaks of all-ones.  
And as mod-2-arithmetic is equivalent to logical ops it seems a bit futile 
to go that way.

Of course, enforcing this all might lead to a surprising heap of errors, 
but one has to start somewhere, so ...

> I'll update my tool to complain if the value is outside the range [-1, 
> 1].

... maybe not do that, at least optionally, that maybe somewhen someone 
can look into fixing that all up? :-)  -fdubious-bools?


Ciao,
Michael.


Re: types in GIMPLE IR

2023-06-28 Thread Michael Matz via Gcc
Hello,

On Wed, 28 Jun 2023, Krister Walfridsson via Gcc wrote:

> Type safety
> ---
> Some transformations treat 1-bit types as a synonym of _Bool and mix the types
> in expressions, such as:
> 
>_2;
>   _Bool _3;
>   _Bool _4;
>   ...
>   _4 = _2 ^ _3;
> 
> and similarly mixing _Bool and enum
> 
>   enum E:bool { E0, E1 };
> 
> in one operation.
> 
> I had expected this to be invalid... What are the type safety rules in the
> GIMPLE IR?

Type safety in gimple is defined in terms of type compatiblity, with 
_many_ exceptions for specific types of statements.  Generally stuff is 
verified in verify_gimple_seq., in this case of a binary assign statement 
in verify_gimple_assign_binary.  As you can see there the normal rules for 
bread-and-butter binary assigns is simply that RHS, LHS1 and LHS2 must 
all be type-compatible.

T1 and T2 are compatible if conversions from T1 to T2 are useless and 
conversion from T2 to T1 are also useless.  (types_compatible_p)  The meat 
for that is all in gimple-expr.cc:useless_type_conversion_p.  For this 
specific case again we have:

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

So, yes, booleans and 1-bit types can be compatible (under certain other 
conditions, see the function).

> Somewhat related, gcc.c-torture/compile/pr96796.c performs a VIEW_CONVERT_EXPR
> from
> 
>   struct S1 {
> long f3;
> char f4;
>   } g_3_4;
> 
> to an int
> 
>   p_51_9 = VIEW_CONVERT_EXPR(g_3_4);
> 
> That must be wrong?

VIEW_CONVERT_EXPR is _very_ generous.  See 
verify_types_in_gimple_reference: 

  if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
{
  /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check
 that their operand is not a register an invariant when
 requiring an lvalue (this usually means there is a SRA or IPA-SRA
 bug).  Otherwise there is nothing to verify, gross mismatches at
 most invoke undefined behavior.  */
  if (require_lvalue
  && (is_gimple_reg (op) || is_gimple_min_invariant (op)))
{
  error ("conversion of %qs on the left hand side of %qs",
 get_tree_code_name (TREE_CODE (op)), code_name);
  debug_generic_stmt (expr);
  return true;
}
  else if (is_gimple_reg (op)
   && TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE 
(op)))
{
  error ("conversion of register to a different size in %qs",
 code_name);
  debug_generic_stmt (expr);
  return true;
}
}

Here the operand is not a register (but a global memory object), so 
everything goes.

It should be said that over the years gimples type system became stricter 
and stricter, but it started as mostly everything-goes, so making it 
stricter is a bumpy road that isn't fully travelled yet, because checking 
types often results in testcase regressions :-)

> Semantics of 
> 
> "Wide" Booleans, such as , seems to allow more values than
> 0 and 1. For example, I've seen some IR operations like:
> 
>   _66 = _16 ? _Literal () -1 : 0;
> 
> But I guess there must be some semantic difference between 
>  and a 32-bit int, otherwise the wide Boolean type 
> would not be needed... So what are the difference?

See above, normally conversions to booleans that are wider than 1 bit are 
_not_ useless (because they require booleanization to true/false).  In the 
above case the not-useless cast is within a COND_EXPR, so it's quite 
possible that the gimplifier didn't look hard enough to split this out 
into a proper conversion statement.  (The verifier doesn't look inside 
the expressions of the COND_EXPR, so also doesn't catch this one)

If that turns out to be true and the above still happens when -1 is 
replaced by (say) 42, then it might be possible to construct a 
wrong-code testcase based on the fact that _66 as boolean should contain 
only two observable values (true/false), but could then contain 42.  OTOH, 
it might also not be possible to create such testcase, namely when GCC is 
internally too conservative in handling wide bools :-)  In that case we 
probably have a missed optimization somewhere, which when implemented 
would enable construction of such wrong-code testcase ;)

(I'm saying that -1 should be replaced by something else for a wrong-code 
testcase, because -1 is special and could justifieably be special-cased in 
GCC: -1 is the proper arithmetic value for a signed boolean that is 
"true").


Ciao,
Michael.


Re: [PATCH] Basic asm blocks should always be volatile

2023-06-28 Thread Michael Matz via Gcc
Hello,

On Wed, 28 Jun 2023, Julian Waters via Gcc wrote:

> On the contrary, code compiled with gcc with or without the applied patch
> operates very differently, with only gcc with the applied patch producing a
> fully correctly operating program as expected. Even if the above inline
> assembly blocks never worked due to other optimizations however, the
> failure mode of the program would be very different from how it fails now:
> It should fail noisily with an access violation exception due to
> the malformed assembly, but instead all the assembly which installs an
> exception handler never makes it into the final program because with
> anything higher than -O1 gcc deletes all of it (I have verified this with
> objdump too),

Can you please provide a _full_ compilable testcase (preprocessed).  What 
Andrew says is (supposed to be) correct: ignoring the other 
problems you're going to see with your asms (even if you make them 
volatile) GCC should not remove any of the asm statements of them.

If something changes when you add 'volatile' by hand then we have another 
problem lurking somewhere, and adding the parser patch might not fully 
solve it (even if it changes behaviour for you).


Ciao,
Michael.


Re: [gimple-ssa] Get the gimple corresponding to the definition of a VAR_DECL

2023-06-27 Thread Michael Matz via Gcc
Hello,

On Tue, 27 Jun 2023, Pierrick Philippe wrote:

> My main problem is regarding uninitialized definition, but still not being
> considered undefined behavior.
> For example, the following code:
> 
>int x;
>int *y = &x;
>*y = 6;
> 
> What I'm doing is basically looking at each gimple statement if the lhs has a
> given attribute for the purpose of the analysis I'm trying to perform.
> To precise, I'm plugged into the analyzer, so an out-of-tree plugin.
> 
> But in the example above, the definition of x is not really within the
> gimple_seq of the function as it is never directly assigned.

Then you need to be a bit more precise in what exactly you want.  There 
are multiple ways to interpret "definition of a variable".

a) assigning a value to it: that's what Richard alluded to, you need to 
   iterate all gimple statements to see if any assign to variables you're 
   interested in (in SSA form there will only be one, outside SSA there 
   may be many).  As you notice there also may be none at all that 
   directly assign a value.  You need to solve the associated data-flow 
   problem in order to (conservatively) know the answer to this question.
   In particular you need points-to sets (above for instance, that 'y' 
   points to 'x' so that when you modify '*y' that you can note down that 
   "whatever y points to (i.e. at least x) is modified").

   There is no ready-made list of statements that potentially modify a 
   local variable in question.  You need to do that yourself, but GCC 
   contains many helper routines for parts of this problem (as it needs to 
   answer these questions itself as well, for optimization purposes).

b) allocating storage for the variable in question (and possibly giving it 
   an initial value): there are _no_ gimple instruction that express this 
   idea.  The very fact that variables are mentioned in local_decls (and 
   are used in a meaningful way in the instruction stream) leads to them
   being allocated on the stack during function expansion (see 
   expand_used_vars).

non-local variables are similarly handled, they are placed in various 
lists that lead to appropriate assembler statements allocating static 
storage for them (in the data or bss, or whatever other appropriate, 
segment).  They aren't defined (in the allocate-it sense) by gimple 
statement either.


Ciao,
Michael.


Re: Different ASM for ReLU function between GCC11 and GCC12

2023-06-20 Thread Michael Matz via Gcc
Hello,

On Tue, 20 Jun 2023, Jakub Jelinek via Gcc wrote:

> ce1 pass results in emit_conditional_move with
> (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (reg:SF 84)
> operands in the GCC 11 case and so is successfully matched by
> ix86_expand_fp_movcc as ix86_expand_sse_fp_minmax.
> But, in GCC 12+, emit_conditional_move is called with
> (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (const_double:SF 
> 0.0 [0x0.0p+0])
> instead (reg:SF 84 in both cases contains the (const_double:SF 0.0 [0x0.0p+0])
> value, in the GCC 11 case loaded from memory, in the GCC 12+ case set
> directly in a move.  The reason for the difference is exactly that
> because cheap SSE constant can be moved directly into a reg, it is done so
> instead of reusing a pseudo that contains that value already.

But reg84 is _also_ used as operand of emit_conditional_move, so there's 
no reason to not also use it as third operand.  It seems more canonical to 
call a function with

  X-containing-B, A, B

than with

  X-containing-B, A, something-equal-to-B-but-not-B

so either the (const_double) RTL should be used in both, or reg84 should, 
but not a mix.  Exactly so to ...

> actually a minmax.  Even if it allowed the cheap SSE constants,
> it wouldn't know that r84 is also zero (unless the expander checks that
> it is a pseudo with a single setter and verifies it or something similar).

... not have this problem.


Ciao,
Michael.


Re: gcc tricore porting

2023-06-19 Thread Michael Matz via Gcc
Hello,

note that I know next to nothing about Tricore in particular, so take 
everything with grains of salt.  Anyway:

On Mon, 19 Jun 2023, Claudio Eterno wrote:

> in your reply you mentioned "DSP". Do you want to use the DSP instructions
> for final assembly?

It's not a matter of me wanting or not wanting, I have no stake in 
tricore.  From a 20-second look at the Infineon architecture overview I've 
linked it looked like that some DSP instructions could be used for 
implementing normal floating point support, which of course would be 
desirable in a compiler supporting all of C (otherwise you'd have to 
resort to softfloat emulation).  But I have no idea if the CPU and the DSP 
parts are interconnected enough (hardware wise) to make that feasible (or 
even required, maybe the CPU supports floating point itself already?).

> Michael, based on your experience, how much time is necessary to release
> this porting?

Depending on experience in compilers in general and GCC in particular: 
anything between a couple weeks (fulltime) and a year.

> And.. have you any idea about where to start?

If you don't have an assembler and linker already, then with that.  An 
assembler/linker is not part of GCC, but it relies on one.  So look at 
binutils for this.

Once binutils are taken care of: Richis suggestion is a good one: start 
with an existing port of a target with similar features as you intend to 
implement, and modify it according to your needs.  After that works (say, 
you can compile a hello-world successfully): throw it away and restart a 
completely new target from scratch with everything you learned until then.  
(This is so that you don't start with all the cruft that the target you 
used as baseline comes with).

It helps if you already have a toolchain that you can work against, but 
it's not required.

You need to be familiar with some GCC internals, and the documentation 
coming with GCC is a good starting point: 
  https://gcc.gnu.org/onlinedocs/gccint/
(the "Machine Description" chapter will be the important one, but for that 
you need to read a couple other chapters as well)

There are a couple online resources about writing new targets for GCC.  
Stackoverflow refers to some.  E.g. 
  
https://stackoverflow.com/questions/44904644/gcc-how-to-add-support-to-a-new-architecture
refers to https://kristerw.blogspot.com/2017/08/writing-gcc-backend_4.html 
which is something not too old.  For concrete questions this mailing list 
is a good place to ask.


Good luck,
Michael.

> 
> Ciao
> Claudio
> 
> Il giorno lun 19 giu 2023 alle ore 16:16 Michael Matz  ha
> scritto:
> 
> > Hello,
> >
> > On Mon, 19 Jun 2023, Richard Biener via Gcc wrote:
> >
> > > On Sun, Jun 18, 2023 at 12:00 PM Claudio Eterno via Gcc 
> > wrote:
> > > >
> > > > Hi, this is my first time with open source development. I worked in
> > > > automotive for 22 years and we (generally) were using tricore series
> > for
> > > > these products. GCC doesn't compile on that platform. I left my work
> > some
> > > > days ago and so I'll have some spare time in the next few months. I
> > would
> > > > like to know how difficult it is to port the tricore platform on gcc
> > and if
> > > > during this process somebody can support me as tutor and... also if
> > the gcc
> > > > team is interested in this item...
> > >
> > > We welcome ports to new architectures.  Quick googling doesn't find me
> > > something like an ISA specification though so it's difficult to assess
> > the
> > > complexity of porting to that architecture.
> >
> > https://en.wikipedia.org/wiki/Infineon_TriCore
> >
> > https://www.infineon.com/dgdl/TC1_3_ArchOverview_1.pdf?fileId=db3a304312bae05f0112be86204c0111
> >
> > CPU part looks like fairly regular 32bit RISC.  DSP part seems quite
> > normal as well.  There even was once a GCC port to Tricore, version 3.3
> > from HighTec (now part of Infineon itself), but not even the wayback
> > machine has the files for that anymore:
> >
> >
> > https://web.archive.org/web/20150205040416/http://www.hightec-rt.com:80/en/downloads/sources.html
> >
> > Given the age of that port it's probably better to start from scratch
> > anyway :)  (the current stuff from them/Infineon doesn't seem to be
> > GCC-based anymore?)
> >
> >
> > Ciao,
> > Michael.
> 
> 
> 
> 


Re: gcc tricore porting

2023-06-19 Thread Michael Matz via Gcc
Hello,

On Mon, 19 Jun 2023, Richard Biener via Gcc wrote:

> On Sun, Jun 18, 2023 at 12:00 PM Claudio Eterno via Gcc  
> wrote:
> >
> > Hi, this is my first time with open source development. I worked in
> > automotive for 22 years and we (generally) were using tricore series for
> > these products. GCC doesn't compile on that platform. I left my work some
> > days ago and so I'll have some spare time in the next few months. I would
> > like to know how difficult it is to port the tricore platform on gcc and if
> > during this process somebody can support me as tutor and... also if the gcc
> > team is interested in this item...
> 
> We welcome ports to new architectures.  Quick googling doesn't find me
> something like an ISA specification though so it's difficult to assess the
> complexity of porting to that architecture.

https://en.wikipedia.org/wiki/Infineon_TriCore
https://www.infineon.com/dgdl/TC1_3_ArchOverview_1.pdf?fileId=db3a304312bae05f0112be86204c0111

CPU part looks like fairly regular 32bit RISC.  DSP part seems quite 
normal as well.  There even was once a GCC port to Tricore, version 3.3 
from HighTec (now part of Infineon itself), but not even the wayback 
machine has the files for that anymore:

https://web.archive.org/web/20150205040416/http://www.hightec-rt.com:80/en/downloads/sources.html

Given the age of that port it's probably better to start from scratch 
anyway :)  (the current stuff from them/Infineon doesn't seem to be 
GCC-based anymore?)


Ciao,
Michael.


Re: Passing the complex args in the GPR's

2023-06-07 Thread Michael Matz via Gcc
Hey,

On Tue, 6 Jun 2023, Umesh Kalappa via Gcc wrote:

> Question is : Why does GCC choose to use GPR's here and have any
> reference to support this decision  ?

You explicitely used -m32 ppc, so 
https://www.polyomino.org.uk/publications/2011/Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf
 
applies.  It explicitely states in "B.1 ATR-Linux Inclusion and 
Conformance" that it is "ATR-PASS-COMPLEX-IN-GPRS", and other sections 
detail what that means (namely passing complex args in r3 .. r10, whatever 
fits).  GCC adheres to that, and has to.

The history how that came to be was explained in the thread.


Ciao,
Michael.

 > 
> Thank you
> ~Umesh
> 
> 
> 
> On Tue, Jun 6, 2023 at 10:16 PM Segher Boessenkool
>  wrote:
> >
> > Hi!
> >
> > On Tue, Jun 06, 2023 at 08:35:22PM +0530, Umesh Kalappa wrote:
> > > Hi Adnrew,
> > > Thank you for the quick response and for PPC64 too ,we do have
> > > mismatches in ABI b/w complex operations like
> > > https://godbolt.org/z/bjsYovx4c .
> > >
> > > Any reason why GCC chose to use GPR 's here ?
> >
> > What did you expect, what happened instead?  Why did you expect that,
> > and why then is it an error what did happen?
> >
> > You used -O0.  As long as the code works, all is fine.  But unoptimised
> > code frequently is hard to read, please use -O2 instead?
> >
> > As Andrew says, why did you use -m32 for GCC but -m64 for LLVM?  It is
> > hard to compare those at all!  32-bit PowerPC Linux ABI (based on 32-bit
> > PowerPC ELF ABI from 1995, BE version) vs. 64-bit ELFv2 ABI from 2015
> > (LE version).
> >
> >
> > Segher
> 


Re: More C type errors by default for GCC 14

2023-05-15 Thread Michael Matz via Gcc
Hello,

On Fri, 12 May 2023, Florian Weimer wrote:

> * Alexander Monakov:
> 
> > This is not valid (constraint violation):
> >
> >   unsigned x;
> >
> >   int *p = &x;
> >
> > In GCC this is diagnosed under -Wpointer-sign:
> >
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25892
> 
> Thanks for the reference.  I filed:
> 
>   -Wpointer-sign must be enabled by default
>   

Can you please not extend the scope of your proposals in this thread but 
create a new one?

(FWIW: no, this should not be an error, a warning is fine, and I actually 
think the current state of it not being in Wall is the right thing as 
well)


Ciao,
Michael.


Re: More C type errors by default for GCC 14

2023-05-15 Thread Michael Matz via Gcc
Hello,

On Fri, 12 May 2023, Jakub Jelinek via Gcc wrote:

> On Fri, May 12, 2023 at 11:33:01AM +0200, Martin Jambor wrote:
> > > One fairly big GCC-internal task is to clear up the C test suite so that
> > > it passes with the new compiler defaults.  I already have an offer of
> > > help for that, so I think we can complete this work in a reasonable time
> > > frame.
> 
> I'd prefer to keep at least significant portion of those tests as is with
> -fpermissive added (plus of course we need new tests that verify the errors
> are emitted), so that we have some testsuite coverage for those.

Yes, this!  Try to (!) never change committed testcase souces, however 
invalid they may be (changing how they are compiled, including by 
introducing new dg-directives and using them in comments, is of course 
okay).

(And FWIW: I'm fine with Florians proposal.  I personally think the 
arguments for upgrading the warnings to errors are _not_ strong enough, 
but I don't think so very strongly :) )


Ciao,
Michael.


Re: MIN/MAX and trapping math and NANs

2023-04-11 Thread Michael Matz via Gcc
Hello,

On Tue, 11 Apr 2023, Richard Biener via Gcc wrote:

> In the case we ever implement conforming FP exception support
> either targets would need to be fixed to mask unexpected exceptions
> or we have to refrain from moving instructions where the target
> implementation may rise exceptions across operations that might
> raise exceptions as originally written in source (and across
> points of FP exception state inspection).
> 
> That said, the effect to the FP exception state according to IEEE
> is still unanswered.

The IEEE 754-2008 predicate here is minNum/maxNum, and those are 
general-computational with floating point result.  That means any sNaN 
input raises-invalid (and delivers-qNaN if masked).  For qNaN input 
there's a special case: the result is the non-qNaN input (normal handling 
would usually require the qNaN to be returned).

Note that this makes minNum/maxNum (and friends) not associative.  Also, 
different languages and different hardware implement fmin/fmax different 
and sometimes in conflict with 754-2008 (e.g. on SSE2 maxsd isn't 
commutative but maxNum is!).  This can be considered a defect in 754-2008.  
As result these operations were demoted in 754-2019 and new functions 
minimumNumber (and friends) recommended (those propagate a qNaN).

Of course IEEE standards aren't publicly available and I don't have 
754-2019 (but -2008), so I can't be sure about the _exact_ wording 
regarding minimumNumber, but for background of the min/max woes: 

  https://754r.ucbtest.org/background/minNum_maxNum_Removal_Demotion_v3.pdf

In short: it's not so easy :-)  (it may not be advisable to slavishly 
follow 754-2008 for min/max)

> The NaN handling then possibly allows
> implementation with unordered compare + mask ops.


Ciao,
Michael.


Re: [RFC PATCH] driver: unfilter default library path [PR 104707]

2023-04-06 Thread Michael Matz via Gcc
Hello,

On Thu, 6 Apr 2023, Shiqi Zhang wrote:

> Currently, gcc delibrately filters out default library paths "/lib/" and
> "/usr/lib/", causing some linkers like mold fails to find libraries.

If linkers claim to be a compatible replacement for other linkers then 
they certainly should behave in a similar way.  In this case: look into 
/lib and /usr/lib when something isn't found till then.

> This behavior was introduced at least 31 years ago in the initial
> revision of the git repo, personally I think it's obsolete because:
>  1. The less than 20 bytes of saving is negligible compares to the command
> line argument space of most hosts today.

That's not the issue that is solved by ignoring these paths in the driver 
for %D/%I directives.  The issue is (traditionally) that even if the 
startfiles sit in /usr/lib (say), you don't want to add -L/usr/lib to the 
linker command line because the user might have added -L/usr/local/lib 
explicitely into her link command and depending on order of spec file 
entries the -L/usr/lib would be added in front interacting with the 
expectations of where libraries are found.

Hence: never add something in (essentially) random places that is default 
fallback anyway.  (Obviously the above problem could be solved in a 
different, more complicated, way.  But this is the way it was solved since 
about forever).

If mold doesn't look into {,/usr}/lib{,64} (as appropriate) by default 
then that's the problem of mold.


Ciao,
Michael.


Re: [Tree-SSA] Question from observation, bogus SSA form?

2023-03-17 Thread Michael Matz via Gcc
Hello,

On Fri, 17 Mar 2023, Pierrick Philippe wrote:

> > This means that global variables, volatile variables, aggregates,
> > variables which are not considered aggregates but are nevertheless
> > partially modified (think insertion into a vector) or variables which
> > need to live in memory (most probably because their address was taken)
> > are not put into an SSA form.  It may not be easily possible.
> 
> Alright, I understand, but I honestly find it confusing.

You can write something only into SSA form if you see _all_ assignments to 
the entity in question.  That's not necessarily the case for stuff sitting 
in memory.  Code you may not readily see (or might not be able to 
statically know the behaviour of) might be able to get ahold of it and 
hence change it behind your back or in unknown ways.  Not in your simple 
example (and if you look at it during some later passes in the compiler 
you will see that 'x' will indeed be written into SSA form), but in some 
that are only a little more complex:

int foo (int i) {
  int x, *y=&x;
  x = i;  // #1
  bar(y); // #2
  return x;
}

or

int foo (int i) {
  int x, z, *y = i ? &x : &z;
  x = z = 1;  // #1
  *y = 42;// #2
  return x;
}

here point #1 is very obviously a definition of x (and z) in both 
examples.  And point #2?  Is it a definition or not?  And if it is, then 
what entity is assigned to?  Think about that for a while and what that 
means for SSA form.

> I mean, aren't they any passes relying on the pure SSA form properties 
> to analyze code? For example to DSE or DCE.

Of course.  They all have to deal with memory in a special way (many by 
not doing things on memory).  Because of the above problems they would 
need to special-case memory no matter what.  (E.g. in GCC memory is dealt 
with via the virtual operands, the '.MEM_x = VDEF<.MEM_y>' and VUSE 
constructs you saw in the dumps, to make dealing with memory in an 
SSA-based compiler at least somewhat natural).


Ciao,
Michael.


Re: access to include path in front end

2022-12-05 Thread Michael Matz via Gcc
Hey,

On Fri, 2 Dec 2022, James K. Lowden wrote:

> > > 3.  Correct the entries in the default_compilers array.  Currently I
> > > have in cobol/lang-specs.h:
> > > 
> > > {".cob", "@cobol", 0, 0, 0},
> > > {".COB", "@cobol", 0, 0, 0},
> > > {".cbl", "@cobol", 0, 0, 0},
> > > {".CBL", "@cobol", 0, 0, 0},
> > > {"@cobol", 
> > >   "cobol1 %i %(cc1_options) %{!fsyntax-only:%(invoke_as)}", 
> > >   0, 0, 0}, 
> > 
> > It misses %(cpp_unique_options) which was the reason why your -I
> > arguments weren't passed to cobol1.  
> 
> If I understood you correctly, I don't need to modify gcc.cc.  I only
> need to modify cobol/lang-specs.h, which I've done.  But that's
> evidently not all I need to do, because it doesn't seem to work.  
> 
> The last element in the fragment in cobol/lang-specs.h is now: 
> 
> {"@cobol",
>   "cobol1 %i %(cc1_options) %{!fsyntax-only:%(invoke_as)} "
>   "%(cpp_unique_options) ",

%(invoke_as) needs to be last.  What it does is effectively add this to 
the command line (under certain conditions): "-somemoreoptions | as".  
Note the pipe symbol.  Like in normal shell commands also the gcc driver 
interprets this as "and now start the following command as well connection 
stdout of the first to stdin of the second".  So all in all the generated 
cmdline will be somewhat like:

  cobol1 input.cbl -stuff-from-cc1-options | as - -stuff-from-cpp-options

Your cpp_unique_options addition will effectively be options to that 'as' 
command, but you wanted it to be options for cobol1.  So, just switch 
order of elements.

> I see the -B and -I options, and others, with their arguments, contained
> in COLLECT_GCC_OPTIONS on lines 9 and 11.  I guess that represents an
> environment string?

Yes.  It's our round-about-way of passing the gcc options as the user gave 
them downwards in case collect2 (a wrapper for the linking step for, gah, 
don't ask) needs to call gcc itself recursively.  But in the -### (or -v) 
output, if the assembler is invoked in your example (i.e. cobol1 doesn't 
fail for some reason) you should see your -I options being passed to that 
one (wrongly so, of course :) ).


Ciao,
Michael.


Re: access to include path in front end

2022-12-01 Thread Michael Matz via Gcc
Hey,

On Thu, 1 Dec 2022, James K. Lowden wrote:

> > E.g. look in gcc.cc for '@c' (matching the file extension) how that
> > entry uses '%(cpp_unique_options)', and how cpp_unique_options is
> > defined for the specs language:
> > 
> >   INIT_STATIC_SPEC ("cpp_unique_options",   &cpp_unique_options),
> > 
> > and
> > 
> > static const char *cpp_unique_options =
> >   "%{!Q:-quiet} %{nostdinc*} %{C} %{CC} %{v} %@{I*&F*} %{P} %I\  
> 
> Please tell me if this looks right and complete to you:
> 
> 1.  Add an element to the static_specs array: 
> 
> INIT_STATIC_SPEC ("cobol_options", &cobol_options),

That, or expand its contents where you'd use '%(cobol_options)' in the 
strings.

> 
> 2.  Define the referenced structure: 
> 
>   static const char *cobol_options =  "%{v} %@{I*&F*}"
>   or just
>   static const char *cobol_options =  "%{v} %@{I*}"
> 
>   because I don't know what -F does, or if I need it.

I.e. as long as it's that short expanding inline would work nicely.

> I'm using "cobol_options" instead of "cobol_unique_options" because the
> options aren't unique to Cobol, and because only cpp seems to have
> unique options.  
> 
> I'm including %{v} against the future, when the cobol1 compiler
> supports a -v option. 

Makes sense.

> 3.  Correct the entries in the default_compilers array.  Currently I
> have in cobol/lang-specs.h:
> 
> {".cob", "@cobol", 0, 0, 0},
> {".COB", "@cobol", 0, 0, 0},
> {".cbl", "@cobol", 0, 0, 0},
> {".CBL", "@cobol", 0, 0, 0},
> {"@cobol", 
>   "cobol1 %i %(cc1_options) %{!fsyntax-only:%(invoke_as)}", 
>   0, 0, 0}, 
> 
> That last one is a doozy.  Is it even slightly right?

It misses %(cpp_unique_options) which was the reason why your -I arguments 
weren't passed to cobol1.  You would just your new %(cobol_options), or 
simply '%{v} %{I*}' directly in addition to cc1_options.

> IIUC, I should at
> least remove 
> 
>   %{!fsyntax-only:%(invoke_as)}
> 
> because I don't need the options from the invoke_as string in gcc.cc. 

I think you do, as cobol1 will write out assembler code (it does, right?), 
so to get an object file the driver needs to invoke 'as' as well.  
Basically invoke_as tacks another command to run at the end of the already 
build command line (the one that above would start with 'cobol1 
inputfile.cob ... all the options ...'.  It will basically tack the 
equivalent of '| as tempfile.s -o realoutput.o' to the end (which 
eventually will make the driver executate that command as well).

> That would still leave me with too much, because cobol1 ignores most of
> the options cc1 accepts.  What would you do?

I would go through all cc1_options and see if they _really_ shouldn't be 
interpreted by cobol1.  I guess for instance '-Ddefine' really doesn't 
make sense, but e.g. '-m...' and '-f...' do, and maybe -quiet as well, and 
suchlike.  In that case I'd just use cc1_options (in addition to your new 
%{I*} snippet).

If you then really determine, that no, most options do not make sense you 
need to extract a subset of cc1_options that do, and write them into the 
@cobol entry.  Look e.g. what the fortran frontend does (in 
fortran/lang-specs.h) it simply attaches more things to cc1_options.

> I don't understand the relationship between default_compliers and
> static_specs.  

static_specs lists the names of 'variables' you can use within the specs 
strings, and to what they should expand.  E.g. when I would want to use 
'%(foobar)' in any of my specs strings that needs to be registered in 
static_spec[]:

  INIT_STATIC_SPEC ("foobar", &a_variable_containing_a_string)

The specs parse would then replace '%(foobar)' in specs strings with 
whatever that variable contains.

Using such variables mostly makes sense if you want to enable users (who 
can provide their own specs file) to refer to well-known snippets 
maintained by GCC itself.  For most such strings it's not necessary, and 
you'd be fine with the approach of fortran:

 #define MY_FOOBAR_STRING "%{v} ... this and that ..."

...

  {@mylang, ... "lang1 %i " MY_FOOBAR_STRING "" ... }

> I have made no special provision for "compiler can deal with
> multiple source files", except that cobol1 accepts multiple source
> files on the command line, and iterates over them.  If that's enough,
> then I'll set compiler::combinable to 1.

No good advise here for combinable.  Try it :)

> As I mentioned, for a year I've been able to avoid the Specs Language,
> apparently because some things happen by default.  The options defined
> in cobol/lang.opt are passed from gcobol to cobol1.  The value of the
> -findicator-column option is available (but only if the option starts
> with "f"; -indicator-column doesn't work).  cobol1 sees the value of
> -fmax-errors. 

That's because "%{f*}" is contained in %(cc1_options): 'pass on all 
options starting with "f"', and because you listed cc1_options in your 
cobol1 command line.


Ciao,
Michael.


Re: access to include path in front end

2022-11-30 Thread Michael Matz via Gcc
Hello,

On Tue, 29 Nov 2022, James K. Lowden wrote:

> I don't understand how to access in a front end the arguments to the -I
> option on the command line.  
> 
> Cobol has a feature similar to the C preprecessor, known as the
> Compiler Directing Facility (CDF).  The CDF has a COPY statement that
> resembles an #include directive in C, and shares the property that COPY
> names a file that is normally found in a "copybook" which, for our
> purposes, is a directory of such files.  The name of that directory is
> defined outside the Cobol program.  
> 
> I would like to use the -I option to pass the names of copybook
> directories to the cobol front end.  A bit of exploration yesterday left
> me with the sense that the -I argument, in C at least, is not passed to
> the compiler, but to the preprocessor. Access to -fmax-errors I think
> I've figured out, but -I is a mystery. 
> 
> I'm a little puzzled by the status quo as I understand it.  Unless I
> missed it, it's not discussed in gccint.  ISTM ideally there would be
> some kind of getopt(3) processing, and the whole set of command-line
> options captured in an array of structures accessible to any front
> end.

There is, it's just much more complicated than getopt :)

If you're looking at the C frontends for inspiration, then:

c-family/c.opt defines which options are recognized and several specifics 
about them, e.g. for -I it has:


I
C ObjC C++ ObjC++ Joined Separate MissingArgError(missing path after %qs)
-I Add  to the end of the main include path.


(look at some other examples therein, also in common.opt to get a feel).

Then code in c-family/c-opts.c:c_common_handle_option actually handles the 
option:

case OPT_I:
  if (strcmp (arg, "-"))
add_path (xstrdup (arg), INC_BRACKET, 0, true);
  else
  .,.

That function is made a langhook for option processing so that it's 
actually called via c/c-objc-common.h:

  #define LANG_HOOKS_HANDLE_OPTION c_common_handle_option

If you're also using the model of a compiler driver (i.e. the gcc program, 
source in gcc.cc) that actually calls compiler (cc1), assembler and 
linker, then you also need to arrange for that program to pass all -I 
options to the compiler proper.  That is done with the spec language, by 
somewhere having '{I*}' in the specs for invoking the cobol compiler.  
E.g. look in gcc.cc for '@c' (matching the file extension) how that entry 
uses '%(cpp_unique_options)', and how cpp_unique_options is defined for 
the specs language:

  INIT_STATIC_SPEC ("cpp_unique_options",   &cpp_unique_options),

and

static const char *cpp_unique_options =
  "%{!Q:-quiet} %{nostdinc*} %{C} %{CC} %{v} %@{I*&F*} %{P} %I\  

(the specs language used here is documented in a lengthy comment early in 
gcc.cc, "The Specs Language")

The "%@{I*F*}" is the one that makes gcc pass -Iwhatever to cc1 (and 
ensures relative order with -F options is retained and puts all these into 
an @file if one is given on the cmdline, otherwise leaves it on cmdline).  
If you use the compiler driver then using '-v' when invoking it will 
quickly tell you if that options passing worked, as it will show the 
concrete command it exec's for the compiler proper.

Hope this helps.


Ciao,
Michael.


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Michael Matz via Gcc
Hey,

On Tue, 29 Nov 2022, Uecker, Martin wrote:

> It does not require any changes on how arrays are represented.
> 
> As part of VM-types the size becomes part of the type and this
> can be used for static or dynamic analysis, e.g. you can 
> - today - get a run-time bounds violation with the sanitizer:
> 
> void foo(int n, char (*buf)[n])
> {
>   (*buf)[n] = 1;
> }

This can already statically analyzed as being wrong, no need for dynamic 
checking.  What I mean is the checking of the claimed contract.  Above you 
assure for the function body that buf has n elements.  This is also a 
pre-condition for calling this function and _that_ can't be checked in all 
cases because:

  void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
  void callfoo(char * buf) { foo(10, buf); }

buf doesn't have a known size.  And a pre-condition that can't be checked 
is no pre-condition at all, as only then it can become a guarantee for the 
body.

The compiler has no choice than to trust the user that the pre-condition 
for calling foo is fulfilled.  I can see how being able to just check half 
of the contract might be useful, but if it doesn't give full checking then 
any proposal for syntax should be even more obviously orthogonal than the 
current one.

> For
> 
> void foo(int n, char buf[n]);
> 
> it semantically has no meaning according to the C standard,
> but a compiler could still warn. 

Hmm?  Warn about what in this decl?

> It could also warn for
> 
> void foo(int n, char buf[n]);
> 
> int main()
> {
> char buf[9];
> foo(buf);
> }

You mean if you write 'foo(10,buf)' (the above, as is, is simply a syntax 
error for non-matching number of args).  Or was it a mispaste and you mean 
the one from the godbolt link, i.e.:

void foo(char buf[10]){ buf[9] = 1; }
int main()
{
char buf[9];
foo(buf);
}

?  If so, yeah, we warn already.  I don't think this is an argument for 
(or against) introducing new syntax.

...

> But in general: This feature is useful not only for documentation
> but also for analysis.

Which feature we're talking about now?  The ones you used all work today, 
as you demonstrated.  I thought we would be talking about that ".whatever" 
syntax to refer to arbitrary parameters, even following ones?  I think a 
disrupting syntax change like that should have a higher bar than "in some 
cases, depending on circumstance, we might even be able to warn".


Ciao,
Michael.


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Michael Matz via Gcc
Hey,

On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:

> How about the compiler parsing the parameter list twice?

This _is_ unbounded look-ahead.  You could avoid this by using "." for 
your new syntax.  Use something unambiguous that can't be confused with 
other syntactic elements, e.g. with a different punctuator like '@' or the 
like.  But I'm generally doubtful of this whole feature within C itself.  
It serves a purpose in documentation, so in man-pages it seems fine enough 
(but then still could use a different puncuator to not be confusable with 
C syntax).

But within C it still can only serve a documentation purpose as no 
checking could be performed without also changes in how e.g. arrays are 
represented (they always would need to come with a size).  It seems 
doubtful to introduce completely new and ambiguous syntax with all the 
problems Joseph lists just in order to be able to write documentation when 
there's a perfectly fine method to do so: comments.


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-17 Thread Michael Matz via Gcc
Hello,

On Wed, 16 Nov 2022, Paul Eggert wrote:

> On 2022-11-16 06:26, Michael Matz wrote:
> > char foobar(void);
> > int main(void) {
> >return &foobar != 0;
> > }
> 
> That still has undefined behavior according to draft C23,

This is correct (and also holds for the actually working variant later, 
with a volatile variable).  If your argument is then that as both 
solutions for the link-test problem are relying on undefined behaviour 
they are equivalent and hence no change is needed, you have a point, but I 
disagree.  In practice one (with the call) will cause more problems than 
the other (with address taking).

> If Clang's threatened pickiness were of some real use elsewhere, it 
> might be justifiable for default Clang to break Autoconf. But so far we 
> haven't seen real-world uses that would justify this pickiness for 
> Autoconf's use of 'char memset_explicit(void);'.

Note that both, GCC and clang, already warn (not error out!) about the 
mismatching decl, even without any headers.  So we are in the pickiness 
era already.

I.e. a C file containing just a single line "char printf(void);" will be 
warned about, by default.  There is about nothing that autoconf could do 
to rectify this, except containing a long list of prototypes for 
well-known functions, with the associated maintenance hassle.  But 
autoconf _can_ do something about how the decls are used in the 
link-tests.


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hello,

On Wed, 16 Nov 2022, Jonathan Wakely wrote:

> > > Unrelated but I was a bit tempted to ask for throwing in
> > > -Wbuiltin-declaration-mismatch to default -Werror while Clang 16 was at
> > > it, but I suppose we don't want the world to burn too much,
> >
> > :-)  It's IMHO a bug in the standard that it misses "if any of its
> > associated headers are included" in the item for reservation of external
> > linkage identifiers; it has that for all other items about reserved
> > identifiers in the Library clause.  If that restriction were added you
> > couldn't justify erroring on the example at hand (because it doesn't
> > include e.g.  and then printf wouldn't be reserved).  A warning
> > is of course always okay and reasonable.  As is, you could justify
> > erroring out, but I too think that would be overzealous.
> 
> 
> I think that's very intentional and not a defect in the standard.
> 
> If one TU was allowed to define:
> 
> void printf() { }
> 
> and have that compiled into the program, then that would cause
> unexpected behaviour for every other TU which includes  and
> calls printf. They would get the non-standard rogue printf.

True.  But suppose the restriction would be added.  I could argue that 
then your problem program (in some other TU) _does_ include the header, 
hence the identifier would have been reserved and so the above definition 
would have been wrong.  I.e. I think adding the restriction wouldn't allow 
the problematic situation either.

I'm aware that the argument would then invoke all the usual problems of 
what constitutes a full program, and if that includes the library even 
when not including headers and so on.  And in any case currently the 
standard does say they're reserved so it's idle speculation anyway :)


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hello,

On Wed, 16 Nov 2022, Sam James wrote:

> Unrelated but I was a bit tempted to ask for throwing in 
> -Wbuiltin-declaration-mismatch to default -Werror while Clang 16 was at 
> it, but I suppose we don't want the world to burn too much,

:-)  It's IMHO a bug in the standard that it misses "if any of its 
associated headers are included" in the item for reservation of external 
linkage identifiers; it has that for all other items about reserved 
identifiers in the Library clause.  If that restriction were added you 
couldn't justify erroring on the example at hand (because it doesn't 
include e.g.  and then printf wouldn't be reserved).  A warning 
is of course always okay and reasonable.  As is, you could justify 
erroring out, but I too think that would be overzealous.


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hey,

On Wed, 16 Nov 2022, Alexander Monakov wrote:

> > The idea is so obvious that I'm probably missing something, why autoconf 
> > can't use that idiom instead.  But perhaps the (historic?) reasons why it 
> > couldn't be used are gone now?
> 
> Ironically, modern GCC and LLVM optimize '&foobar != 0' to '1' even at -O0,
> and thus no symbol reference remains in the resulting assembly.

Err, right, *head-->table*.
Playing with volatile should help:

char foobar(void);
char (* volatile ptr)(void);
int main(void) {
ptr = foobar;
return ptr != 0;
}


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hi,

On Tue, 15 Nov 2022, Paul Eggert wrote:

> On 2022-11-15 11:27, Jonathan Wakely wrote:
> > Another perspective is that autoconf shouldn't get in the way of
> > making the C and C++ toolchain more secure by default.
> 
> Can you cite any examples of a real-world security flaw what would be 
> found by Clang erroring out because 'char foo(void);' is the wrong 
> prototype? Is it plausible that any such security flaw exists?
> 
> On the contrary, it's more likely that Clang's erroring out here would 
> *introduce* a security flaw, because it would cause 'configure' to 
> incorrectly infer that an important security-relevant function is 
> missing and that a flawed substitute needs to be used.
> 
> Let's focus on real problems rather than worrying about imaginary ones.

I sympathize, and I would think a compiler emitting an error (not a 
warning) in the situation at hand (in absence of -Werror) is overly 
pedantic.  But, could autoconf perhaps avoid the problem?  AFAICS the 
ac_fn_c_check_func really does only a link test to check for symbol 
existence, and the perceived problem is that the call statement in main() 
invokes UB.  So, let's avoid the call then while retaining the access to 
the symbol?  Like:

-
char foobar(void);
int main(void) {
  return &foobar != 0;
}
-

No call involved: no reason for compiler to complain.  The prototype decl 
itself will still be "wrong", but compilers complaining about that (in 
absence of a pre-existing different prototype, which is avoided by 
autoconf) seem unlikely.

Obviously this program will also say "foobar exists" if it's a data 
symbol, but that's the same with the variant using the call on most 
platforms (after all it's not run).

The idea is so obvious that I'm probably missing something, why autoconf 
can't use that idiom instead.  But perhaps the (historic?) reasons why it 
couldn't be used are gone now?


Ciao,
Michael.


Re: Local type inference with auto is in C2X

2022-11-03 Thread Michael Matz via Gcc
Hello,

On Thu, 3 Nov 2022, Florian Weimer via Gcc wrote:

> will not have propagated widely once GCC 13 releases, so rejecting
> implicit ints in GCC 13 might be too early.  GCC 14 might want to switch
> to C23/C24 mode by default, activating auto support, if the standard
> comes out in 2023 (which apparently is the plan).
> 
> Then we would go from
> warning to changed semantics in a single release.
> 
> Comments?

I would argue that changing the default C mode to c23 in the year that 
comes out (or even a year later) is too aggressive and early.  Existing 
sources are often compiled with defaults, and hence would change 
semantics, which seems unattractive.  New code can instead easily use 
-std=c23 for a time.

E.g. c99/gnu99 (a largish deviation from gnu90) was never default and 
gnu11 was made default only in 2014.


Ciao,
Michael.


Re: Counting static __cxa_atexit calls

2022-08-24 Thread Michael Matz via Gcc
Hello,

On Wed, 24 Aug 2022, Florian Weimer wrote:

> > On Wed, 24 Aug 2022, Florian Weimer wrote:
> >
> >> > Isn't this merely moving the failure point from exception-at-ctor to 
> >> > dlopen-fails?
> >> 
> >> Yes, and that is a soft error that can be handled (likewise for
> >> pthread_create).
> >
> > Makes sense.  Though that actually hints at a design problem with ELF 
> > static ctors/dtors: they should be able to soft-fail (leading to dlopen or 
> > pthread_create error returns).  So, maybe the _best_ way to deal with this 
> > is to extend the definition of the various object-initionalization means 
> > in ELF to allow propagating failure.
> 
> We could enable unwinding through the dynamic linker perhaps.  But as I
> said, those Itanium ABI functions tend to be noexcept, so there's work
> on that front as well.

Yeah, my idea would have been slightly less ambitious: redefine the ABI of 
.init_array functions to be able to return an int.  The loader would abort 
loading if any of them return non-zero.  Now change GCC code emission of 
those helper functions placed in .init_array to catch all exceptions and 
(in case an exception happened) return non-zero.  Or, even easier, don't 
deal with exceptions, but rather just check if __cxa_atexit worked, and if 
not return non-zero right away.  That way all the exception propagation 
(or cxa_atexit error handling) stays purely within the GCC generated code 
and the dynamic loader only needs to deal with return values, not 
exceptions and unwinding.

For backward compat we can't just change the ABI of .init_array, but we 
can devise an alternative: .init_array_mayfail and the associated DT tags.

> For thread-local storage, it's even more difficult because any first
> access can throw even if the constructor is noexcept.

That's extending the scope somewhat, pre-counting cxa_atexit wouldn't 
solve this problem either, right?

> >> I think we need some level of link editor support to avoid drastically
> >> over-counting multiple static calls that get merged into one
> >> implementation as the result of vague linkage.  Not sure how to express
> >> that at the ELF level?
> >
> > Hmm.  The __cxa_atexit calls are coming from the per-file local static 
> > initialization_and_destruction routine which doesn't have vague linkage, 
> > so its contribution to the overall number of cxa_atexit calls doesn't 
> > change from .o to final-exe.  Can you show an example of what you're 
> > worried about?
> 
> Sorry if I didn't use the correct terminology.
> 
> I was thinking about this:
> 
> #include 
> 
> template 
> struct S {
>   static std::vector vec;
> };
> 
> template  std::vector S::vec(i);
> 
> std::vector &
> f()
> {
>   return S<1009>::vec;
> }
> 
> The initialization is deduplicated with the help of a guard variable,
> and that also bounds to number of __cxa_atexit invocations to at most
> one per type.

Ah, right, thanks.  The guard variable for class-local statics, I was 
thinking file-scope globals.  Double-hmm.  I don't readily see a nice way 
to correctly precalculate the number of cxa_atexit calls here.  A simple 
problem is the following: assume a couple files each defining such class 
templates, that ultimately define and initialize static members A<1>::a 
and B<1>::b (assume vague linkage).  Assume we have four files:

a:  defines A::a
b:  defines B::b
ab: defines A::a and B::b
ba: defines B::b and A::a

Now link order influences which file gets to actually initialize the 
members and which ones skip it due to guard variables.  But the object 
files themself don't know enough context of which will be which.  Not even 
the link editor know that because the non-taken cxa_atexit calls aren't in 
linkonce/group sections, there are all there in 
object.o:.text:_Z41__static_initialization_and_destruction_0ii .

So, what would need to be emitted is for instance a list of cxa_atexit 
calls plus guard variable; the link editor could then count all unguarded 
cxa_atexit calls plus all guarded ones, but the latter only once per 
guard.  The key would be the identity of the guard variable.

That seems like an awful lot of complexity at the wrong level for a very 
specific usecase when we could also make .init_array failable, which then 
even might have more usecases.

> > A completely different way would be to not use cxa_atexit at all: 
> > allocate memory statically for the object and dtor addresses in 
> > .rodata (instead of in .text right now), and iterate over those at 
> > static_destruction time.  (For the thread-local ones it would need to 
> > store arguments to __tls_get_addr).
> 
> That only works if the compiler and linker can figure out the
> construction order.  In general, that is not possible, and that case
> seems even quite common with C++.  If the construction order is not
> known ahead of time, it is necessary to record it somewhere, so that
> destruction can happen in reverse.  So I think storing things in .rodata
> is out.

Hmm, right.  The ba

Re: Counting static __cxa_atexit calls

2022-08-24 Thread Michael Matz via Gcc
Hello,

On Wed, 24 Aug 2022, Florian Weimer wrote:

> > Isn't this merely moving the failure point from exception-at-ctor to 
> > dlopen-fails?
> 
> Yes, and that is a soft error that can be handled (likewise for
> pthread_create).

Makes sense.  Though that actually hints at a design problem with ELF 
static ctors/dtors: they should be able to soft-fail (leading to dlopen or 
pthread_create error returns).  So, maybe the _best_ way to deal with this 
is to extend the definition of the various object-initionalization means 
in ELF to allow propagating failure.

> > Probably a note section, which the link editor could either transform into 
> > a dynamic tag or leave as note(s) in the PT_NOTE segment.  The latter 
> > wouldn't require any specific tooling support in the link editor.  But the 
> > consumer would have to iterate through all the notes to add the 
> > individual counts together.  Might be acceptable, though.
> 
> I think we need some level of link editor support to avoid drastically
> over-counting multiple static calls that get merged into one
> implementation as the result of vague linkage.  Not sure how to express
> that at the ELF level?

Hmm.  The __cxa_atexit calls are coming from the per-file local static 
initialization_and_destruction routine which doesn't have vague linkage, 
so its contribution to the overall number of cxa_atexit calls doesn't 
change from .o to final-exe.  Can you show an example of what you're 
worried about?

A completely different way would be to not use cxa_atexit at all: allocate 
memory statically for the object and dtor addresses in .rodata (instead of 
in .text right now), and iterate over those at static_destruction time.  
(For the thread-local ones it would need to store arguments to 
__tls_get_addr).

Doing that or defining failure modes for ELF init/fini seems a better 
design than hacking around the current limitation via counting static 
cxa_atexit calls.


Ciao,
Michael.


Re: Counting static __cxa_atexit calls

2022-08-23 Thread Michael Matz via Gcc
Hello,

On Tue, 23 Aug 2022, Florian Weimer via Gcc wrote:

> We currently have a latent bug in glibc where C++ constructor calls can
> fail if they have static or thread storage duration and a non-trivial
> destructor.  The reason is that __cxa_atexit (and
> __cxa_thread_atexit_impl) may have to allocate memory.  We can avoid
> that if we know how many such static calls exist in an object (for C++,
> the compiler will never emit these calls repeatedly in a loop).  Then we
> can allocate the resources beforehand, either during process and thread
> start, or when dlopen is called and new objects are loaded.

Isn't this merely moving the failure point from exception-at-ctor to 
dlopen-fails?  If an individual __cxa_atexit can't allocate memory anymore 
for its list structure, why should pre-allocation (which is still dynamic, 
based on the number of actual atexit calls) have any more luck?

> What would be the most ELF-flavored way to implement this?  After the
> final link, I expect that the count (or counts, we need a separate
> counter for thread-local storage) would show up under a new dynamic tag
> in the dynamic segment.  This is actually a very good fit because older
> loaders will just ignore it.  But the question remains what GCC should
> emit into assembler & object files, so that the link editor can compute
> the total count from that.

Probably a note section, which the link editor could either transform into 
a dynamic tag or leave as note(s) in the PT_NOTE segment.  The latter 
wouldn't require any specific tooling support in the link editor.  But the 
consumer would have to iterate through all the notes to add the 
individual counts together.  Might be acceptable, though.


Ciao,
Michael.


Re: DWARF question about size of a variable

2022-06-09 Thread Michael Matz via Gcc
Hello,

On Wed, 8 Jun 2022, Carl Love via Gcc wrote:

> Is there dwarf information that gives the size of a variable?

Yes, it's in the type description.  For array types the siblings of it 
give the index types and ranges.  If that range is 
computed at runtime DWARF will (try to) express it as an expression in 
terms of other available values (like registers, constants, or memory), 
and as such can also change depending on where (at which PC) you evaluate 
that expression (and the expression itself can also change per PC).  For 
instance, in your example, on x86 with -O3 we have these relevant DWARF 
snippets (readelf -wi):

 <2>: Abbrev Number: 12 (DW_TAG_variable)
   DW_AT_name: a
   DW_AT_type: <0xa29>

So, 'a' is a variable of type 0xa29, which is:

 <1>: Abbrev Number: 13 (DW_TAG_array_type)
   DW_AT_type: <0xa4a>
   DW_AT_sibling : <0xa43>
 <2>: Abbrev Number: 14 (DW_TAG_subrange_type)
   DW_AT_type: <0xa43>
   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31 1c   
(DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl; DW_OP_const1u: 32; 
DW_OP_shra; DW_OP_lit1; DW_OP_minus)
 <2>: Abbrev Number: 0

So, type 0xa29 is an array type, whose element type is 0xa4a (which will 
turn out to be a signed char), and whose (single) dimension type is 0xa43 
(unsigned long) with an upper bound that is runtime computed, see below.
The referenced types from that are:

 <1>: Abbrev Number: 1 (DW_TAG_base_type)
   DW_AT_byte_size   : 8
   DW_AT_encoding: 7   (unsigned)
   DW_AT_name: (indirect string, offset: 0x13b): long unsigned 
int

 <1>: Abbrev Number: 1 (DW_TAG_base_type)
   DW_AT_byte_size   : 1
   DW_AT_encoding: 6   (signed char)
   DW_AT_name: (indirect string, offset: 0x1ce): char

With that gdb has all information to compute the size of this array 
variable in its scope ((upper-bound + 1 minus lower-bound (default 0)) 
times sizeof(basetype)).  Compare the above for instance with the 
debuginfo generated at -O0, only the upper-range expression changes:

 <2>: Abbrev Number: 10 (DW_TAG_subrange_type)
   DW_AT_type: <0xa29>
   DW_AT_upper_bound : 3 byte block: 91 68 6   (DW_OP_fbreg: -24; 
DW_OP_deref)

Keep in mind that DWARF expressions are based on a simple stack machine.
So, for instance, the computation for the upper bound in the O3 case is:
 ((register %rdi + 1) << 32 >> 32) - 1
(i.e. basically the 32-to-64 signextension of %rdi).

On ppc I assume that either the upper_bound attribute isn't there or 
contains an uninformative expression (or one that isn't valid at the 
program-counter gdb stops at), in which case you would want to look at 
dwarf2out.cc:subrange_type_die or add_subscript_info (look for 
TYPE_MAX_VALUE of the subscripts domain type).  Hope this helps.


Ciao,
Michael.


Re: reordering of trapping operations and volatile

2022-01-17 Thread Michael Matz via Gcc
Hello,

On Sat, 15 Jan 2022, Martin Uecker wrote:

> > Because it interferes with existing optimisations. An explicit 
> > checkpoint has a clear meaning. Using every volatile access that way 
> > will hurt performance of code that doesn't require that behaviour for 
> > correctness.
> 
> This is why I would like to understand better what real use cases of 
> performance sensitive code actually make use of volatile and are 
> negatively affected. Then one could discuss the tradeoffs.

But you seem to ignore whatever we say in this thread.  There are now 
multiple examples that demonstrate problems with your proposal as imagined 
(for lack of a _concrete_ proposal with wording from you), problems that 
don't involve volatile at all.  They all stem from the fact that you order 
UB with respect to all side effects (because you haven't said how you want 
to avoid such total ordering with all side effects).

As I said upthread: you need to define a concept of time at whose 
granularity you want to limit the effects of UB, and the borders of each 
time step can't simply be (all) the existing side effects.  Then you need 
to have wording of what it means for UB to occur within such time step, in 
particular if multiple UB happens within one (for best results it should 
simply be UB, not individual instances of different UBs).

If you look at the C++ proposal (thanks Jonathan) I think you will find 
that if you replace 'std::observable' with 'sequence point containing a 
volatile access' that you basically end up with what you wanted.  The 
crucial point being that the time steps (epochs in that proposal) aren't 
defined by all side effects but by a specific and explicit thing only (new 
function in the proposal, volatile accesses in an alternative).

FWIW: I think for a new language feature reusing volatile accesses as the 
clock ticks are the worse choice: if you intend that feature to be used 
for writing safer programs (a reasonable thing) I think being explicit and 
at the same time null-overhead is better (i.e. a new internal 
function/keyword/builtin, specified to have no effects except moving the 
clock forward).  volatile accesses obviously already exist and hence are 
easier to integrate into the standard, but in a given new/safe program, 
whenever you see a volatile access you would always need to ask 'is thise 
for clock ticks, or is it a "real" volatile access for memmap IO'.


Ciao,
Michael.


Re: git hooks: too strict check

2022-01-14 Thread Michael Matz via Gcc
Hello,

On Fri, 14 Jan 2022, Martin Liška wrote:

> Hello.
> 
> I'm working on a testsuite clean-up where some of the files are wrongly named.
> More precisely, so files have .cc extension and should use .C. However there's
> existing C test-case and it leads to:
> 
> marxin@marxinbox:~/Programming/gcc/gcc/testsuite> find . -name test-asm.*
> ./jit.dg/test-asm.C
> ./jit.dg/test-asm.c

You can't have that, the check is correct.  There are filesystems (NTFS 
for instance) that are case-preserving but case-insensitive, on those you 
really can't have two files that differ only in casing.  You need to find 
a different solution, either consistently use .cc instead of .C, live with 
the inconsistency or rename the base name of these files.


Ciao,
Michael.

> 
> test-kunlun me/rename-testsuite-files
> Enumerating objects: 804, done.
> Counting objects: 100% (804/804), done.
> Delta compression using up to 16 threads
> Compressing objects: 100% (242/242), done.
> Writing objects: 100% (564/564), 142.13 KiB | 7.48 MiB/s, done.
> Total 564 (delta 424), reused 417 (delta 295), pack-reused 0
> remote: Resolving deltas: 100% (424/424), completed with 222 local objects.
> remote: *** The following filename collisions have been detected.
> remote: *** These collisions happen when the name of two or more files
> remote: *** differ in casing only (Eg: "hello.txt" and "Hello.txt").
> remote: *** Please re-do your commit, chosing names that do not collide.
> remote: ***
> remote: *** Commit: 7297e1de9bed96821d2bcfd034bad604ce035afb
> remote: *** Subject: Rename tests in jit sub-folder.
> remote: ***
> remote: *** The matching files are:
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-quadratic.C
> remote: *** gcc/testsuite/jit.dg/test-quadratic.c
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-switch.C
> remote: *** gcc/testsuite/jit.dg/test-switch.c
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-asm.C
> remote: *** gcc/testsuite/jit.dg/test-asm.c
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-alignment.C
> remote: *** gcc/testsuite/jit.dg/test-alignment.c
> 
> Can we please do something about it?
> 
> Thanks,
> Martin
> 


Re: reordering of trapping operations and volatile

2022-01-14 Thread Michael Matz via Gcc
Hello,

On Thu, 13 Jan 2022, Martin Uecker wrote:

> > > >  Handling all volatile accesses in the very same way would be 
> > > > possible but quite some work I don't see much value in.
> > > 
> > > I see some value. 
> > > 
> > > But an alternative could be to remove volatile
> > > from the observable behavior in the standard
> > > or make it implementation-defined whether it
> > > is observable or not.
> > 
> > But you are actually arguing for making UB be observable
> 
> No, I am arguing for UB not to have the power
> to go back in time and change previous defined
> observable behavior.

But right now that's equivalent to making it observable,
because we don't have any other terms than observable or
undefined.  As aluded to later you would have to
introduce a new concept, something pseudo-observable,
which you then started doing.  So, see below.
 
> > That's 
> > much different from making volatile not be
> > observable anymore (which  obviously would
> > be a bad idea), and is also much harder to
> 
> I tend to agree that volatile should be
> considered observable. But volatile is
> a bit implementation-defined anyway, so this
> would be a compromise so that implementations
> do not have to make all the implied changes
> if we revise the meaning of UB.

Using volatile accesses for memory mapped IO is a much stronger use-case 
than your wish of using volatile accesses to block moving of UB as a 
debugging aid, and the former absolutely needs some guarantees, so I don't 
think it would be a compromise at all.  Mkaing volatile not be observable 
would break the C language.

> > Well, what you _actually_ want is an implied
> > dependency between some UB and volatile accesses
> > (and _only_ those, not e.g. with other UB), and the 
> > difficulty now is to define "some" and to create
> > the dependency without making that specific UB
> > to be properly observable. 
> 
> Yes, this is what I actually want.
> 
> >  I think to define this 
> > all rigorously seems futile (you need a new
> > category between observable  and UB), so it comes
> > down to compiler QoI on a case by case basis.
> 
> We would simply change UB to mean "arbitrary
> behavior at the point of time the erraneous
> construct is encountered at run-time"  and 
> not "the complete program is invalid all
> together". I see no problem in specifying this
> (even in a formally precise way)

First you need to define "point in time", a concept which doesn't exist 
yet in C.  The obvious choice is of course observable behaviour in the 
execution environment and its specified ordering from the abstract 
machine, as clarified via sequence points.  With that your "at the point 
in time" becomes something like "after all side effects of previous 
sequence point, but strictly before all side effects of next sequence 
point".

But doing that would have very far reaching consequences, as already
stated in this thread.  The above would basically make undefined behaviour 
be reliably countable, and all implementations would need to produce the 
same counts of UB.  That in turn disables many code movement and 
commonization transformations, e.g. this:

int a = ..., b = ...;
int x = a + b;
int y = a + b;

can't be transformed into "y = x = a + b" anymore, because the addition 
_might_ overflow, and if it does you have two UBs originally but would 
have one UB after.  I know that you don't want to inhibit this or similar 
transformations, but that would be the result of making UB countable, 
which is the result of forcing UB to happen at specific points in time.  
So, I continue to see problems in precisely specifying what you want, _but 
not more_.

I think all models in which you order the happening of UB with respect to 
existing side effects (per abstract machine, so it includes modification 
of objects!) have this same problem, it always becomes a side effect 
itself (one where you don't specify what actually happens, but a side 
effect nontheless) and hence becomes observable.


Ciao,
Michael.


Re: reordering of trapping operations and volatile

2022-01-13 Thread Michael Matz via Gcc
Hello,

On Tue, 11 Jan 2022, Martin Uecker via Gcc wrote:

> >  Handling all volatile accesses in the
> > very same way would be possible but quite some work I don't
> > see much value in.
> 
> I see some value. 
> 
> But an alternative could be to remove volatile
> from the observable behavior in the standard
> or make it implementation-defined whether it
> is observable or not.

But you are actually arguing for making UB be observable (which then 
directly implies an ordering with respect to volatile accesses).  That's 
much different from making volatile not be observable anymore (which 
obviously would be a bad idea), and is also much harder to do, it's 
the nature of undefined behaviour to be hard to define :)

Well, what you _actually_ want is an implied dependency between some UB 
and volatile accesses (and _only_ those, not e.g. with other UB), and the 
difficulty now is to define "some" and to create the dependency without 
making that specific UB to be properly observable.  I think to define this 
all rigorously seems futile (you need a new category between observable 
and UB), so it comes down to compiler QoI on a case by case basis.


Ciao,
Michael.


Re: environment register / ABI

2021-10-14 Thread Michael Matz via Gcc
Hello,

On Wed, 13 Oct 2021, Martin Uecker wrote:

> > [... static chain ...]
> > If you mean that, then it's indeed psABI specific, and possibly not
> > al ABIs specify it (in which case GCC will probably have set a de-
> > facto standard at least for unixy systems).  The x86-64 psABI for
> > instance does specify a  register for this, which is separate from
> > the normal argument passing registers.  Other psABIs could say that
> > it's passed like a hidden  argument prepended to the formal list of
> > args.
> > 
> 
> Yes, most architecture seem to define a register. I am wondering
> if there is a table or summary somewhere.

Not that I know of, and I doubt it exists.  The most comprehensive is 
probably the result of (from within gcc sources):

% grep 'define.*STATIC_CHAIN_REG' config/*/*.[ch]

(that get's you all archs of GCC that support a static chain in registers, 
and it's often very obvious from above result which one it is), plus the 
result of

% grep TARGET_STATIC_CHAIN config/*/*.[ch]

(that get's you the few targets that don't necessarily use a reg for the 
static chain, but e.g. a stack slot or a different register depending on 
circumstances.  These are only i386, moxie and xtensa currently, but you 
need to inspect the target hook function to determine when which place is 
used, i.e. not as obvious as above).

> > Or do you mean something else entirely?  It might also help to know 
> > the purpose of your question :)
> 
> There is currently no standard way to set or query
> the static chain from C although this is used by
> many other languages. Also function pointers in C
> usually can not store the static chain. I am going
> to propose to WG14 to add some kind of wide function
> pointer to address this.  I am doing back ground
> research to understand whether this exists everywhere.

I see.  Is that sensible without C itself having the possibility to write 
nested functions?  There are other, more obvious (for C!) reasons to have 
wide function pointers: shared libs often are implemented such that the 
static data of a library is reachable by a pointer (often called GOT 
pointer, or PIC register or similar terms), so calling an indirect 
function needs to setup that GOT pointer plus contain the function address 
itself.  This is often implemented either by setup code in the function 
prologue or by using function descriptors, or by an extra entry point 
containing that setup.  Fat function pointers (which effectively are 
then function descriptors) could contain that as well. (it will be very 
target dependend how that would be filled, but that is the case with 
static chains as well).

There's another case for fat function pointers: C++ virtual methods: 
unbound pointers to them will just be type-id plus vtable index (in the 
usual implementations of C++), bound pointers will be a function address 
plus this pointer.

There may be more items that can be imagined to be stuffed into a fat 
function pointer.

So, I'm wondering what you are pondering about, to which extend you want 
to go with fat function pointers, what the usecases will be, i.e. which 
problem you want to solve :)


Ciao,
Michael.


Re: environment register / ABI

2021-10-13 Thread Michael Matz via Gcc
Hello,

On Wed, 13 Oct 2021, Martin Uecker wrote:

> does anybody know if all architectures support passing
> an environment pointer in their function call ABI? 

Define "environment pointer".  I can imagine two things you could mean: 
the custom to pass envp as third argument to main() in hosted C programs:

  int main (int argc, char *argv[], char *envp[]);

but then this is specific to main and more a question of process 
initialization than function call ABI.  If you mean this the answer will 
most often be: if envp is passed to main (a question of the operating 
system or runtime environment, e.g. if there's something like an 
environment in the getenv() sense to start with), then it is passed like 
any other third argument of pointer type on the psABI in question, and 
that definition would be independend of the psABI.

Or you could mean what normally would be called 'static chain', i.e. a 
pointer to the stack frame of outer functions for languages supporting 
nested functions.  I could imagine this also be called environment.  If 
you mean that, then it's indeed psABI specific, and possibly not all ABIs 
specify it (in which case GCC will probably have set a de-facto standard 
at least for unixy systems).  The x86-64 psABI for instance does specify a 
register for this, which is separate from the normal argument passing 
registers.  Other psABIs could say that it's passed like a hidden 
argument prepended to the formal list of args.

Or do you mean something else entirely?  It might also help to know the 
purpose of your question :)


Ciao,
Michael.


Re: S390 should change the meaning of -m31

2021-09-30 Thread Michael Matz via Gcc
Hello,

On Wed, 29 Sep 2021, Jesus Antonio via Gcc wrote:

> m31 is semantically the same as the m32 option.
> 
> 
> The m31 option allows for 32 bit addressing and that is confusing since 
> the m31 option in S390 would mean 2 GiB space addressing

Indeed that's exactly what it means, and what it's supposed to mean.  On 
s390, in AMODE(31) the toplevel bit of an (32bit) address is either 
ignored or an indicator to switch back to 24bit addresses from the s360 
times.  Either way that leaves 31 bits to generate the virtual address.  
On s390 you indeed have a 2GB address space, not more.

> Code used:
> 
>     volatile uint64_t *gib_test = (volatile uint64_t *)0x7FFF;
>     memset(gib_test, 1, 4096);
> 
> 
> Hercules dump:
> 
> r 0x7FFF-0x81FF
> R:7FFF:K:06=01 .

I'm not sure what you believe to have demonstrated here.  The (virtual or 
physical) address 0x7FFF is either (in AMODE(24)) equivalent to 
0x00ff or to 0x (in AMODE(31)), either way, the top byte of 
the addressable range ...

> R:800F:K:06=01 01010101 01010101 01010101 010101 

... while address 0x8001 is equivalent to address 0x1 (in AMODE(24) 
and AMODE(31)).  Again, the top bit (or bits in AMODE(24)) are ignored.  
So, you've built a memset that wraps around the line (AMODE(24)) or the 
bar (AMODE(32)).  Perhaps unusual and not what you expected, but as 
designed by IBM.

> The option i used was m31 of course, however this option is misleading 
> since it allows 32 bit mode, and there is no m32 so you have to use m31 
> - just lots of misleading options.

The -mXX options are supposed to reflect the address space's size, not the 
size of the general purpose registers.  An option that reflect AMODE(24) 
would also be called -m24, despite the registers still being 32bit in 
size.


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-09 Thread Michael Matz via Gcc
Hello,

On Thu, 9 Sep 2021, Aldy Hernandez wrote:

> > Here there's no simple latch block to start with (the backedge comes
> > directly out of the loop exit block).  So my suggested improvement
> > (testing if the latch was empty and only then reject the thread), would
> > solve this.
> 
> Well, there's the thing.  Loop discovery is marking BB5 as the latch, so 
> it's not getting threaded:

Yes, it's a latch, but not an empty one.  So the thread would make it just 
even more non-empty, which might not be a problem anymore.  So amending my 
patch somewhere with a strategic

  && empty_block_p (latch)

and only then rejecting the thread should make this testcase work again.

(There's still a catch, though: if this non-empty latch, which is also the 
exit test block, is threaded through and is followed by actual code, then 
that code will be inserted onto the back edge, not into the latch block 
before the exit test, and so also create a (new) non-empty latch.  That 
might or might not create problems downstreams, but not as many as 
transformaing an empty into a non-empty latch would do; but it could 
create for instance a second back edge (not in this testcase) and 
suchlike)

> BTW, I'm not sure your check for the non-last position makes a difference:
> 
> > diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
> > index 449232c7715..528a753b886 100644
> > --- a/gcc/tree-ssa-threadbackward.c
> > +++ b/gcc/tree-ssa-threadbackward.c
> > -   threaded_through_latch = true;
> > +   {
> > + threaded_through_latch = true;
> > + if (j != 0)
> > +   latch_within_path = true;
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, " (latch)");
> > +   }
> >  }
> 
> If the last position being considered is a simple latch, it only has a simple
> outgoing jump.  There's no need to thread that.  You need a block with >= 2
> succ edges to thread anything.

So, you are saying that any candidate thread path wouldn't have the latch 
in the last position if it were just an empty forwarder?  I simply wasn't 
sure about that, so was conservative and only wanted to reject things I 
knew where positively bad (namely code in the path following the latch 
that is in danger of being moved into the latch).


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-09 Thread Michael Matz via Gcc
Hello,

On Thu, 9 Sep 2021, Aldy Hernandez wrote:

> The ldist-22 regression is interesting though:
> 
> void foo ()
> {
>   int i;
> 
>:
>   goto ; [INV]
> 
>:
>   a[i_1] = 0;
>   if (i_1 > 100)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>:
>   b[i_1] = i_1;
> 
>:
>   i_8 = i_1 + 1;
> 
>:
>   # i_1 = PHI <0(2), i_8(5)>
>   if (i_1 <= 1023)
> goto ; [INV]
>   else
> goto ; [INV]

Here there's no simple latch block to start with (the backedge comes 
directly out of the loop exit block).  So my suggested improvement 
(testing if the latch was empty and only then reject the thread), would 
solve this.

> Would it be crazy to suggest that we disable threading through latches 
> altogether,

I think it wouldn't be crazy, but we can do a bit better as suggested 
above (only reject empty latches, and reject it only for the threaders 
coming before the loop optims).


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-09 Thread Michael Matz via Gcc
Hello,

On Thu, 9 Sep 2021, Richard Biener wrote:

> > I believe something like below would be appropriate, it disables 
> > threading if the path contains a latch at the non-last position (due 
> > to being backwards on the non-first position in the array).  I.e. it 
> > disables rotating the loop if there's danger of polluting the back 
> > edge.  It might be improved if the blocks following (preceding!) the 
> > latch are themself empty because then no code is duplicated.  It might 
> > also be improved if the latch is already non-empty.  That code should 
> > probably only be active before the loop optimizers, but currently the 
> > backward threader isn't differentiating between before/after 
> > loop-optims.
> >
> > I haven't tested this patch at all, except that it fixes the testcase 
> > :)
> 
> Lame comment at the current end of the thread - it's not threading 
> through the latch but threading through the loop header that's 
> problematic,

I beg to differ, but that's probably because of the ambiguity of the word 
"through" (does it or does it not include the ends of the path :) ).  If 
you thread through the loop header from the entry block (i.e. duplicate 
code from header to entry) all would be fine (or not, in case you created 
multiple entries from outside).  If you thread through the latch, then 
through an empty header and then through another non-empty basic block 
within the loop, you still wouldn't be fine: you've just created code on 
the back edge (and hence into the new latch block).  If you thread through 
the latch and through an empty header (and stop there), you also are fine.

Also note that in this situation you do _not_ create new entries into the 
loop, not even intermediately.  The old back edge is the one that goes 
away due to the threading, the old entry edge is moved comletely out of 
the loop, the edge header->thread-dest becomes the new entry edge, and the 
edge new-latch->thread-dest becomes the back edge.  No additional entries.

So, no, it's not the threading through the loop header that is problematic 
but creating a situation that fills the (new) latch with code, and that 
can only happen if the candidate thread contains the latch block.

(Of course it's somewhat related to the header block as well, because that 
is simply the only destination the latch has, and hence is usually 
included in any thread that also include the latch; but it's not the 
header that indicates the situation).

> See tree-ssa-threadupdate.c:thread_block_1
> 
>   e2 = path->last ()->e;
>   if (!e2 || noloop_only)
> {
>   /* If NOLOOP_ONLY is true, we only allow threading through the
>  header of a loop to exit edges.  */
> 
>   /* One case occurs when there was loop header buried in a jump
>  threading path that crosses loop boundaries.  We do not try
>  and thread this elsewhere, so just cancel the jump threading
>  request by clearing the AUX field now.  */
>   if (bb->loop_father != e2->src->loop_father
>   && (!loop_exit_edge_p (e2->src->loop_father, e2)
>   || flow_loop_nested_p (bb->loop_father,
>  e2->dest->loop_father)))
> {
>   /* Since this case is not handled by our special code
>  to thread through a loop header, we must explicitly
>  cancel the threading request here.  */
>   delete_jump_thread_path (path);
>   e->aux = NULL;
>   continue;
> }

Yeah, sure, but I'm not sure if the above check is _really_ testing it 
wants to test or if the effects it achieves are side effects.  Like in my 
proposed patch: I could also test for existence of loop header in the 
thread and reject that; it would work as well, except that it works 
because any useful thread including a latch (which is the problematic one) 
also includes the header.  I'm not sure if the above check is in the same 
line, or tests for some still another situation.


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-08 Thread Michael Matz via Gcc
Hello,

[lame answer to self]

On Wed, 8 Sep 2021, Michael Matz wrote:

> > > The forward threader guards against this by simply disallowing 
> > > threadings that involve different loops.  As I see
> > 
> > The thread in question (5->9->3) is all within the same outer loop, 
> > though. BTW, the backward threader also disallows threading across 
> > different loops (see path_crosses_loops variable).
...
> Maybe it's possible to not disable threading over latches alltogether in 
> the backward threader (like it's tried now), but I haven't looked at the 
> specific situation here in depth, so take my view only as opinion from a 
> large distance :-)

I've now looked at the concrete situation.  So yeah, the whole path is in 
the same loop, crosses the latch, _and there's code following the latch 
on that path_.  (I.e. the latch isn't the last block in the path).  In 
particular, after loop_optimizer_init() (before any threading) we have:

   [local count: 118111600]:
  # j_19 = PHI 
  sum_11 = c[j_19];
  if (n_10(D) > 0)
goto ; [89.00%]
  else
goto ; [11.00%]

  [local count: 105119324]:
...

   [local count: 118111600]:
  # sum_21 = PHI 
  c[j_19] = sum_21;
  j_13 = j_19 + 1;
  if (n_10(D) > j_13)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  goto ; [100.00%]

With bb9 the outer (empty) latch, bb3 the outer header, and bb8 the 
pre-header of inner loop, but more importantly something that's not at the 
start of the outer loop.

Now, any thread that includes the backedge 9->3 _including_ its 
destination (i.e. where the backedge isn't the last to-be-redirected edge) 
necessarily duplicates all code from that destination onto the back edge.  
Here it's the load from c[j] into sum_11.

The important part is the code is emitted onto the back edge, 
conceptually; in reality it's simply included into the (new) latch block 
(the duplicate of bb9, which is bb12 intermediately, then named bb7 after 
cfg_cleanup).

That's what we can't have for some of our structural loop optimizers: 
there must be no code executed after the exit test (e.g. in the latch 
block).  (This requirement makes reasoning about which code is or isn't 
executed completely for an iteration trivial; simply everything in the 
body is always executed; e.g. loop interchange uses this to check that 
there are no memory references after the exit test, because those would 
then be only conditional and hence make loop interchange very awkward).

Note that this situation can't be later rectified anymore: the duplicated 
instructions (because they are memory refs) must remain after the exit 
test.  Only by rerolling/unrotating the loop (i.e. noticing that the 
memory refs on the loop-entry path and on the back edge are equivalent) 
would that be possible, but that's something we aren't capable of.  Even 
if we were that would simply just revert the whole work that the threader 
did, so it's better to not even do that to start with.

I believe something like below would be appropriate, it disables threading 
if the path contains a latch at the non-last position (due to being 
backwards on the non-first position in the array).  I.e. it disables 
rotating the loop if there's danger of polluting the back edge.  It might 
be improved if the blocks following (preceding!) the latch are themself 
empty because then no code is duplicated.  It might also be improved if 
the latch is already non-empty.  That code should probably only be active 
before the loop optimizers, but currently the backward threader isn't 
differentiating between before/after loop-optims.

I haven't tested this patch at all, except that it fixes the testcase :)


Ciao,
Michael.

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..528a753b886 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -600,6 +600,7 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
   loop_p loop = m_path[0]->loop_father;
   bool path_crosses_loops = false;
   bool threaded_through_latch = false;
+  bool latch_within_path = false;
   bool multiway_branch_in_path = false;
   bool threaded_multiway_branch = false;
   bool contains_hot_bb = false;
@@ -725,7 +726,13 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
 the last entry in the array when determining if we thread
 through the loop latch.  */
   if (loop->latch == bb)
-   threaded_through_latch = true;
+   {
+ threaded_through_latch = true;
+ if (j != 0)
+   latch_within_path = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, " (latch)");
+   }
 }
 
   gimple *stmt = get_gimple_control_stmt (m_path[0]);
@@ -845,6 +852,15 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
 "a multiway branch.\n");
   return false;
 }
+
+  if (latch_within_path)
+{
+  if 

Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-08 Thread Michael Matz via Gcc
Hello,

On Wed, 8 Sep 2021, Aldy Hernandez wrote:

> > The forward threader guards against this by simply disallowing 
> > threadings that involve different loops.  As I see
> 
> The thread in question (5->9->3) is all within the same outer loop, 
> though. BTW, the backward threader also disallows threading across 
> different loops (see path_crosses_loops variable).
> 
> > the threading done here should be 7->3 (outer loop entry) to bb 8 
> > rather than one involving the backedge.  Also note the condition is 
> > invariant in the loop and in fact subsumed by the condition outside of 
> > the loop and it should have been simplified by VRP after pass_ch but I 
> > see there's a jump threading pass inbetween pass_ch and the next VRP 
> > which is likely the problem.
> 
> A 7->3->8 thread would cross loops though, because 7 is outside the 
> outer loop.

...
 
> However, even if there are alternate ways of threading this IL, 
> something like 5->9->3 could still happen.  We need a way to disallow 
> this.  I'm having a hard time determining the hammer for this.  I would 
> vote for disabling threading through latches, but it seems the backward 
> threader is aware of this scenario and allows it anyhow (see 
> threaded_through_latch variable).  Ughh.

The backward threader seems to want to be careful with latches, but still 
allow it in some situations, in particular when doing so doesn't create a 
loop with non-simple latches (which is basically a single and empty latch 
block).  If this improvement under discussion leads to creating a 
non-empty latch then those checks aren't restrictive enough (anymore).

I think threading through a latch is always dubious regarding the loop 
structure, it's basically either loop rotation or iteration peeling, even 
if it doesn't cause non-simple latches.  Those transformations should 
probably be left to a loop optimizer, or be only done when destroying loop 
structure is fine (i.e. late).

Maybe it's possible to not disable threading over latches alltogether in 
the backward threader (like it's tried now), but I haven't looked at the 
specific situation here in depth, so take my view only as opinion from a 
large distance :-)

Does anything break if you brutally disable any backward threading when 
any of the involved blocks is a latch when current_loops is set?  (I guess 
for that latter test to be effective you want to disable the 
loop_optimizer_init() for the "late" jump thread passes)


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-07 Thread Michael Matz via Gcc
Hello,

On Tue, 7 Sep 2021, Aldy Hernandez via Gcc wrote:

> The regression comes from the simple_reduc_1() function in
> tree-ssa/loop-interchange-9.c, and it involves the following path:
> 
> === BB 5 
> Imports: n_10(D)  j_19
> Exports: n_10(D)  j_13  j_19
>  j_13 : j_19(I)
> n_10(D)   int [1, 257]
> j_19  int [0, 256]
> Relational : (j_13 > j_19)
>  [local count: 118111600]:
> # sum_21 = PHI 
> c[j_19] = sum_21;
> j_13 = j_19 + 1;
> if (n_10(D) > j_13)
>   goto ; [89.00%]
> else
>   goto ; [11.00%]

So, this is the outer loops exit block ...

> === BB 9 
> n_10(D)   int [2, 257]
> j_13  int [1, 256]
> Relational : (n_10(D) > j_19)
> Relational : (n_10(D) > j_13)
>  [local count: 105119324]:
> goto ; [100.00%]

... this the outer loops latch block ...

> === BB 3 
> Imports: n_10(D)
> Exports: n_10(D)
> n_10(D)   int [1, +INF]
>  [local count: 118111600]:
> # j_19 = PHI 
> sum_11 = c[j_19];
> if (n_10(D) > 0)
>   goto ; [89.00%]
> else
>   goto ; [11.00%]

... and this the outer loops header, as well as inner loops pre-header.
The attempted thread hence involves a back-edge (of the outer loop) and a 
loop-entry edge (bb3->bb8) of the inner loop.  Doing that threading would 
destroy some properties that our loop optimizers rely on, e.g. that the 
loop header of a natural loop is only reached by two edges: entry edge and 
back edge, and that the latch blocks are empty and that there's only a 
single latch.  (What exactly we require depends on several flags in 
loops_state_satisfies_p).

> With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that 
> the loop cannot legally iterate outside the size of c[256].  So, j_13 
> lies within [1, 257] and n_10 is [2, +INF] at the end of the path.  
> This allows us to thread BB3 to BB8.

So, IIUC doing this threading would create a new entry to BB8: it would 
then be entered by its natural entry (bb3), by its natural back edge 
(whatever bb that is now) and the new entry from the threaded outer back 
edge (bb9 or bb5?).

The inner loop wouldn't then be recognized as natural anymore and the 
whole nest not as perfect, and hence loop interchange can't easily happen 
anymore.  Most other structural loop optimizers of us would have problems 
with that situation as well.

> All the blocks lie within the same loop, and the path passes all the 
> tests in path_profitable_p().
> 
> Is there something about the shape of this path that should make it 
> invalid in the backward threader, or should the test be marked with 
> -fdisable-tree-threadN (or something else entirely)?

This is a functionality test checking that the very necessary interchange 
in this test does happen with default plus -floop-interchange (with the 
intention of it being enabled by O3 or with profile info).  So no 
additional flags can be added without changing the meaning of this test.

> Note that the 
> backward threader is allowed to peel through loop headers.

Something needs to give way in the path threaders before loop 
optimizations: either threading through back edges, through loop latches 
or through loop headers needs to be disabled.  I think traditionally at 
least threading through latches should be disabled, because doing so 
usually destroys simple loop structure.  I see that profitable_path_p() of 
the backward threader wants to be careful in some situations involving 
loops and latches; possibly it's not careful enough yet for the additional 
power brought by ranger.

See also the additional tests tree-cfgcleanup.c:tree_forwarder_block_p is 
doing when loops are active.

After the structural loop optimizers the threader can go wild and thread 
everything it finds.


Ciao,
Michael.


Re: post-commit hook failure

2021-08-25 Thread Michael Matz via Gcc
Hello,

On Wed, 25 Aug 2021, Martin Liška wrote:

> > remote:   File "hooks/post_receive.py", line 47, in post_receive_one
> > remote: update.send_email_notifications()
> > remote:   File
> > "/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py",
...
> > remote: UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in
> > position 14638: invalid start byte
...
> I believe ChangeLog will be updated correctly as we don't read content 
> of the changes:

But the email notifications (and bugzilla updating) isn't done if that 
place throws, so that should eventually be made more robust in the future.


Ciao,
Michael.


Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Michael Matz via Gcc
Hello,

On Fri, 6 Aug 2021, Stefan Kanthak wrote:

> >> For -ffast-math, where the sign of -0.0 is not handled and the 
> >> spurios invalid floating-point exception for |argument| >= 2**63 is 
> >> acceptable,
> > 
> > This claim would need to be proven in the wild.
> 
> I should have left the "when" after the "and" which I originally had 
> written...
> 
> > |argument| > 2**52 are already integer, and shouldn't generate a 
> > spurious exception from the various to-int conversions, not even in 
> > fast-math mode for some relevant set of applications (at least 
> > SPECcpu).
> > 
> > Btw, have you made speed measurements with your improvements?
> 
> No.
> 
> > The size improvements are obvious, but speed changes can be fairly 
> > unintuitive, e.g. there were old K8 CPUs where the memory loads for 
> > constants are actually faster than the equivalent sequence of shifting 
> > and masking for the >= compares.  That's an irrelevant CPU now, but it 
> > shows that intuition about speed consequences can be wrong.
> 
> I know. I also know of CPUs that can't load a 16-byte wide XMM register 
> in one go, but had to split the load into 2 8-byte loads.
> 
> If the constant happens to be present in L1 cache, it MAY load as fast
> as an immediate.
> BUT: on current CPUs, the code GCC generates
> 
> movsd  .LC1(%rip), %xmm2
> movsd  .LC0(%rip), %xmm4
> movapd %xmm0, %xmm3
> movapd %xmm0, %xmm1
> andpd  %xmm2, %xmm3
> ucomisd %xmm3, %xmm4
> jbe38 <_trunc+0x38>
>  
> needs
> - 4 cycles if the movsd are executed in parallel and the movapd are
>   handled by the register renamer,
> - 5 cycles if the movsd and the movapd are executed in parallel,
> - 7 cycles else,
> plus an unknown number of cycles if the constants are not in L1.

You also need to consider the case that the to-int converters are called 
in a loop (which ultimately are the only interesting cases for 
performance), where it's possible to load the constants before the loop 
and keep them in registers (at the expense of two register pressure of 
course) effectively removing the loads from cost considerations.  It's all 
tough choices, which is why stuff needs to be measured in some contexts 
:-)

(I do like your sequences btw, it's just not 100% clearcut that they are 
always a speed improvement).


Ciao,
Michael.

> The proposed
> 
> movq   rax, xmm0
> addrax, rax
> shrrax, 53
> cmpeax, 53+1023
> jaereturn
> 
> needs 5 cycles (moves from XMM to GPR are AFAIK not handled by the
> register renamer).
> 
> Stefan
> 


Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Michael Matz via Gcc
Hello,

On Fri, 6 Aug 2021, Stefan Kanthak wrote:

> For -ffast-math, where the sign of -0.0 is not handled and the spurios
> invalid floating-point exception for |argument| >= 2**63 is acceptable,

This claim would need to be proven in the wild.  |argument| > 2**52 are 
already integer, and shouldn't generate a spurious exception from the 
various to-int conversions, not even in fast-math mode for some relevant 
set of applications (at least SPECcpu).

Btw, have you made speed measurements with your improvements?  The size 
improvements are obvious, but speed changes can be fairly unintuitive, 
e.g. there were old K8 CPUs where the memory loads for constants are 
actually faster than the equivalent sequence of shifting and masking for 
the >= compares.  That's an irrelevant CPU now, but it shows that 
intuition about speed consequences can be wrong.


Ciao,
Michael.


Re: Optional machine prefix for programs in for -B dirs, match ing Clang

2021-08-05 Thread Michael Matz via Gcc
Hello,

On Wed, 4 Aug 2021, John Ericson wrote:

> On Wed, Aug 4, 2021, at 10:48 AM, Michael Matz wrote:
> > ... the 'as' and 'ld' executables should be simply found within the 
> > version and target specific GCC libexecsubdir, possibly by being symlinks 
> > to whatever you want.  That's at least how my crosss are configured and 
> > installed, without any --with-{as,ld} options.
> 
> Yes that does work, and that's probably the best option today. I'm just 
> a little wary of unprefixing things programmatically.

The libexecsubdir _is_ the prefix in above case :)

> For some context, this is NixOS where we assemble a ton of cross 
> compilers automatically and each package gets its own isolated many FHS. 
> For that reason I would like to eventually avoid the target-specific 
> subdirs entirely, as I have the separate package trees to disambiguate 
> things. Now, I know that exact same argument could also be used to say 
> target prefixing is also superfluous, but eventually things on the PATH 
> need to be disambiguated.

Sure, which is why (e.g.) cross binutils do install with an arch prefix 
into ${bindir}.  But as GCC has the capability to look into libexecsubdir 
for binaries as well (which quite surely should never be in $PATH on any 
system), I don't see the conflict.

> There is no requirement that the libexec things be named like the bin 
> things, but I sort of feel it's one less thing to remember and makes 
> debugging easier.

Well, the naming scheme of binaries in libexecsubdir reflects the scheme 
that the compilers are using: cc1, cc1plus etc.  Not 
aarch64-unknown-linux-cc1.

> I am sympathetic to the issue that if GCC accepts everything Clang does 
> and vice-versa, we'll Postel's-law ourselves ourselves over time into 
> madness as mistakes are accumulated rather than weeded out.

Right.  I supposed it wouldn't hurt to also look for "${targettriple}-as" 
in $PATH before looking for 'as' (in $PATH).  But I don't think we can (or 
should) switch off looking for 'as' in libexecsubdir.  I don't even see 
why that behaviour should depend on an option, it could just be added by 
default.

> I now have some patches for this change I suppose I could also submit.

Even better :)


Ciao,
Michael.