Re: typeof and operands in named address spaces

2020-11-05 Thread Richard Biener via Gcc
On Thu, Nov 5, 2020 at 9:56 AM Uros Bizjak  wrote:
>
> On Thu, Nov 5, 2020 at 8:26 AM Richard Biener
>  wrote:
> >
> > On Wed, Nov 4, 2020 at 7:33 PM Uros Bizjak via Gcc  wrote:
> > >
> > > Hello!
> > >
> > > I was looking at the recent linux patch series [1] where segment
> > > qualifiers (named address spaces) were introduced to handle percpu
> > > variables. In the patch [2], the author mentions that:
> > >
> > > --q--
> > > Unfortunately, gcc does not provide a way to remove segment
> > > qualifiers, which is needed to use typeof() to create local instances
> > > of the per-cpu variable. For this reason, do not use the segment
> > > qualifier for per-cpu variables, and do casting using the segment
> > > qualifier instead.
> > > --/q--
> > >
> > > The core of the problem can be seen with the following testcase:
> > >
> > > --cut here--
> > > #define foo(_var)\
> > >   ({\
> > > typeof(_var) tmp__;\
> >
> > Looks like writing
> >
> > typeof((typeof(_var))0) tmp__;
> >
> > makes it work.  Assumes there's a literal zero for the type of course.
>
> This is very limiting assumption, which already breaks for the following test:
>
> --cut here--
> typedef struct { short a; short b; } pair_t;
>
> #define foo(_var) \
>   ({ \
> typeof((typeof(_var))0) tmp__; \
> asm ("mov %1, %0" : "=r"(tmp__) : "m"(_var));\
> tmp__; \
>   })
>
> __seg_fs pair_t x;
>
> pair_t
> test (void)
> {
>   pair_t y;
>
>   y = foo (x);
>   return y;
> }
> --cut here--
>
> So, what about introducing e.g. typeof_noas (not sure about the name)
> that would simply strip the address space from typeof?

Well, I think we should fix typeof to not retain the address space.  It's
probably our implementation detail of having those in TYPE_QUALS
that exposes the issue and not standard mandated.

The rvalue trick is to avoid depending on a "fixed" GCC.

Joseph should know how typeof should behave here.

Richard.

> > Basically I try to get at a rvalue for the typeof.
> >
> > Is there a way to query the address space of an object so I can
> > put another variable in the same address space?
>
> I think that would go hand in hand with the above typeof_noas. Perhaps
> typeof_as, that would return the address space of the variable?
>
> > > asm ("mov %1, %0" : "=r"(tmp__) : "m"(_var));\
> > > tmp__;\
> > >   })
> > >
> > > __seg_fs int x;
> > >
> > > int test (void)
> > > {
> > >   int y;
> > >
> > >   y = foo (x);
> > >   return y;
> > > }
> > > --cut here--
> > >
> > > when compiled with -O2 for x86 target, the compiler reports:
> > >
> > > pcpu.c: In function ‘test’:
> > > pcpu.c:14:3: error: ‘__seg_fs’ specified for auto variable ‘tmp__’
> > >
> > > It looks to me that the compiler should remove address space
> > > information when typeof is used, otherwise, there is no way to use
> > > typeof as intended in the above example.
> > >
> > > A related problem is exposed when we want to cast address from the
> > > named address space to a generic address space (e.g. to use it with
> > > LEA):
> > >
> > > --cut here--
> > > typedef __UINTPTR_TYPE__ uintptr_t;
> > >
> > > __seg_fs int x;
> > >
> > > uintptr_t test (void)
> > > {
> > >   uintptr_t *p = (uintptr_t *) &y;
> >
> >uintptr_t *p = (uintptr_t *)(uintptr_t) &y;
>
> Indeed, this works as expected.
>
> > works around the warning.  I think the wording you cite
> > suggests (uintptr_t) &y here, not sure if there's a reliable
> > way to get the lea with just a uintptr_t operand though.
>
> No, because we have to use the "m" constraint for the LEA. We get the
> following error:
>
> as1.c:10:49: error: memory input 1 is not directly addressable
>
> Uros.


Re: typeof and operands in named address spaces

2020-11-04 Thread Richard Biener via Gcc
On Wed, Nov 4, 2020 at 7:33 PM Uros Bizjak via Gcc  wrote:
>
> Hello!
>
> I was looking at the recent linux patch series [1] where segment
> qualifiers (named address spaces) were introduced to handle percpu
> variables. In the patch [2], the author mentions that:
>
> --q--
> Unfortunately, gcc does not provide a way to remove segment
> qualifiers, which is needed to use typeof() to create local instances
> of the per-cpu variable. For this reason, do not use the segment
> qualifier for per-cpu variables, and do casting using the segment
> qualifier instead.
> --/q--
>
> The core of the problem can be seen with the following testcase:
>
> --cut here--
> #define foo(_var)\
>   ({\
> typeof(_var) tmp__;\

Looks like writing

typeof((typeof(_var))0) tmp__;

makes it work.  Assumes there's a literal zero for the type of course.
Basically I try to get at a rvalue for the typeof.

Is there a way to query the address space of an object so I can
put another variable in the same address space?

> asm ("mov %1, %0" : "=r"(tmp__) : "m"(_var));\
> tmp__;\
>   })
>
> __seg_fs int x;
>
> int test (void)
> {
>   int y;
>
>   y = foo (x);
>   return y;
> }
> --cut here--
>
> when compiled with -O2 for x86 target, the compiler reports:
>
> pcpu.c: In function ‘test’:
> pcpu.c:14:3: error: ‘__seg_fs’ specified for auto variable ‘tmp__’
>
> It looks to me that the compiler should remove address space
> information when typeof is used, otherwise, there is no way to use
> typeof as intended in the above example.
>
> A related problem is exposed when we want to cast address from the
> named address space to a generic address space (e.g. to use it with
> LEA):
>
> --cut here--
> typedef __UINTPTR_TYPE__ uintptr_t;
>
> __seg_fs int x;
>
> uintptr_t test (void)
> {
>   uintptr_t *p = (uintptr_t *) &y;

   uintptr_t *p = (uintptr_t *)(uintptr_t) &y;

works around the warning.  I think the wording you cite
suggests (uintptr_t) &y here, not sure if there's a reliable
way to get the lea with just a uintptr_t operand though.

>   uintptr_t addr;
>
>   asm volatile ("lea %1, %0" : "=r"(addr) : "m"(*p));
>
>   return addr;
> }
> --cut here--
>
> The gcc documentation advises explicit casts:
>
> --q--
> This means that explicit casts are required to convert pointers
> between these address spaces and the generic address space.  In
> practice the application should cast to 'uintptr_t' and apply the
> segment base offset that it installed previously.
> --/q--
>
> However, a warning is emitted when compiling the above example:
>
> pcpu1.c: In function ‘test’:
> pcpu1.c:7:18: warning: cast to generic address space pointer from
> disjoint __seg_fs address space pointer
>
> but the desired result is obtained nevertheless.
>
>lea x(%rip), %rax
>
> As shown in the referred patchset, named address spaces have quite
> some optimization potential, please see [1] for the list.
>
> [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2053461.html
> [2] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2053462.html
>
> Uros.


Re: Dead Field Elimination and Field Reordering

2020-11-03 Thread Richard Biener via Gcc
On Fri, Oct 30, 2020 at 6:44 PM Erick Ochoa
 wrote:
>
> Hello again,
>
> I've been working on several implementations of data layout
> optimizations for GCC, and I am again kindly requesting for a review of
> the type escape based dead field elimination and field reorg.
>
> Thanks to everyone that has helped me. The main differences between the
> previous commits have been fixing the style, adding comments explaining
> classes and families of functions, exit gracefully if we handle unknown
> gimple syntax, and added a heuristic to handle void* casts.
>
> This patchset is organized in the following way:
>
> * Adds a link-time warning if dead fields are detected
> * Allows for the dead-field elimination transformation to be applied
> * Reorganizes fields in structures.
> * Adds some documentation
> * Gracefully does not apply transformation if unknown syntax is detected.
> * Adds a heuristic to handle void* casts
>
> I have tested this transformations as extensively as I can. The way to
> trigger these transformations are:
>
> -fipa-field-reorder and -fipa-type-escape-analysis
>
> Having said that, I welcome all criticisms and will try to address those
> criticisms which I can. Please let me know if you have any questions or
> comments, I will try to answer in a timely manner.
>
> The code is in:
>
>refs/vendors/ARM/heads/arm-struct-reorg-wip
>
> Future work includes extending the current heuristic with ipa-modref
> extending the analysis to use IPA-PTA as discussed previously.
>
> Few notes:
>
> * Currently it is not safe to use -fipa-sra.
> * I added some tests which are now failing by default. This is because
> there is no way to safely determine within the test case that a layout
> has been transformed. I used to determine that a field was eliminated
> doing pointer arithmetic on the fields. And since that is not safe, the
> analysis decides not to apply the transformation. There is a way to deal
> with this (add a flag to allow the address of a field to be taken) but I
> wanted to hear other possibilities to see if there is a better option.
> * At this point we’d like to thank the again GCC community for their
> patient help so far on the mailing list and in other channels. And we
> ask for your support in terms of feedback, comments and testing.

I've only had a brief look at the branch - if you want to even have a
remote chance of making this stage1 you should break the branch
up into a proper patch series and post it with appropriate ChangeLogs
and descriptions.

First, standard includes may _not_ be included after including system.h,
in fact, they _need_ to be included from system.h - that includes
things like  or .  There are "convenient" defines you
can use like

#define INCLUDE_SET
#include "system.h"

and system.h will do what you want.  Not to say that you should
use GCCs containers and not the standard library ones.

You expose way too many user-visible command-line options.

All the stmt / tree walking "meta" code should be avoided - it
would need to be touched each time we change GIMPLE or
GENERIC.  Instead use available walkers if you really need
it in such DFS-ish way.

That "IPA SRA is not safe" is of course not an option but hints
at a shortcoming in your safety analysis.

In DFE in handle_pointer_arithmetic_constants you
look at the type of an operand - that's not safe since
this type doesn't carry any semantics.

The DFE code is really hard to follow since it diverges
from GCC style which would do sth like the following
to iterate over all stmt [operands]:

FOR_EACH_BB_FN (fun, bb)
  {
 for (auto gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
walk PHIs
 for (auto gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
walk stmts, for example via walk_gimple_stmt ()
  }

and I'd expect a single visitor with a switch () over the gimple/operation
kind rather than gazillion overloads I have no idea what they exactly
visit and how.

In a later change on the branch I see sth like ABORT_IF_NOT_C
where I'm not sure what this is after - you certainly can handle
IL constructs you do not handle conservatively (REFERENCE_TYPE
is the same as POINTER_TYPE - they are exchangable for the
middle-end, METHOD_TYPE is the same as FUNCTION_TYPE,
a QUAL_UNION_TYPE is not semantically different from
a UNION_TYPE for the middle-end it only differs in layout handing.

I see you only want to replace the void * "coloring" with modref
so you'll keep using "IPA type escape analysis".  I don't think
that's good to go.

Richard.


Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

2020-10-30 Thread Richard Biener via Gcc
On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich  wrote:
>
> On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote:
> > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
> >  wrote:
> > > Hi,
> > >
> > > I'd like to revive the old discussion regarding the interaction of
> > > jump threading and b_c_p causing the latter to incorrectly return 1
> > > in
> > > certain cases:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
> > >
> > > The conclusion was that this happening during threading is just a
> > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p
> > > should
> > > not be registered for incremental updating.
> > >
> > > I performed a little experiment and added an assertion to
> > > create_new_def_for:
> > >
> > > --- a/gcc/tree-into-ssa.c
> > > +++ b/gcc/tree-into-ssa.c
> > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple
> > > *stmt,
> > > def_operand_p def)
> > >  {
> > >tree new_name;
> > >
> > > +  gcc_checking_assert (!used_by_bcp_p (old_name));
> > > +
> > >timevar_push (TV_TREE_SSA_INCREMENTAL);
> > >
> > >if (!update_ssa_initialized_fn)
> > >
> > > This has of course fired when performing basic block duplication
> > > during
> > > threading, which can be fixed by avoiding duplication of basic
> > > blocks
> > > wi
> > > th b_c_p calls:
> > >
> > > --- a/gcc/tree-cfg.c
> > > +++ b/gcc/tree-cfg.c
> > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block
> > > bb)
> > >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> > >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
> > >   || gimple_call_internal_p (g,
> > > IFN_GOMP_SIMT_XCHG_BFLY)
> > > - || gimple_call_internal_p (g,
> > > IFN_GOMP_SIMT_XCHG_IDX)))
> > > + || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)
> > > + || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P)))
> > > return false;
> > >  }
> > >
> > > The second occurrence is a bit more interesting:
> > >
> > > gimple *
> > > vrp_insert::build_assert_expr_for (tree cond, tree v)
> > > {
> > >   ...
> > >   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
> > >   assertion = gimple_build_assign (NULL_TREE, a);
> > >   ...
> > >   tree new_def = create_new_def_for (v, assertion, NULL);
> > >
> > > The fix is also simple though:
> > >
> > > --- a/gcc/tree-vrp.c
> > > +++ b/gcc/tree-vrp.c
> > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for
> > > (tree
> > > name, assert_locus *loc)
> > >if (loc->expr == loc->val)
> > >  return false;
> > >
> > > +  if (used_by_bcp_p (name))
> > > +return false;
> > > +
> > >cond = build2 (loc->comp_code, boolean_type_node, loc->expr,
> > > loc-
> > > > val);
> > >assert_stmt = build_assert_expr_for (cond, name);
> > >if (loc->e)
> > >
> > > My original testcase did not trigger anything else.  I'm planning
> > > to
> > > check how this affects the testsuite, but before going further I
> > > would
> > > like to ask: is this the right direction now?  To me it looks a
> > > little-bit more heavy-handed than the original approach, but maybe
> > > it's
> > > worth it.
> >
> > Disabling threading looks reasonable but I'd rather not disallow BB
> > duplication
> > or renaming.  For VRP I guess we want to instead change
> > register_edge_assert_for* to not register assertions for the result
> > of
> > __builtin_constant_p rather than just not allowing VRP to process it
> > (there are other consumers still).
>
> If I understood Jeff correctly, we should disable incremental updates
> for absolutely all b_c_p arguments, so affecting as many consumers as
> possible must actually be a good thing when this approach is
> considered?
>
> That said, regtest has revealed one more place where this is happening:
> rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi ->
> create_new_def_for.  The reduced code is:

Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

2020-10-28 Thread Richard Biener via Gcc
On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
 wrote:
>
> Hi,
>
> I'd like to revive the old discussion regarding the interaction of
> jump threading and b_c_p causing the latter to incorrectly return 1 in
> certain cases:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
>
> The conclusion was that this happening during threading is just a
> symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p should
> not be registered for incremental updating.
>
> I performed a little experiment and added an assertion to
> create_new_def_for:
>
> --- a/gcc/tree-into-ssa.c
> +++ b/gcc/tree-into-ssa.c
> @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple *stmt,
> def_operand_p def)
>  {
>tree new_name;
>
> +  gcc_checking_assert (!used_by_bcp_p (old_name));
> +
>timevar_push (TV_TREE_SSA_INCREMENTAL);
>
>if (!update_ssa_initialized_fn)
>
> This has of course fired when performing basic block duplication during
> threading, which can be fixed by avoiding duplication of basic blocks
> wi
> th b_c_p calls:
>
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
>   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
>   || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
>   || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
> - || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
> + || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)
> + || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P)))
> return false;
>  }
>
> The second occurrence is a bit more interesting:
>
> gimple *
> vrp_insert::build_assert_expr_for (tree cond, tree v)
> {
>   ...
>   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
>   assertion = gimple_build_assign (NULL_TREE, a);
>   ...
>   tree new_def = create_new_def_for (v, assertion, NULL);
>
> The fix is also simple though:
>
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for (tree
> name, assert_locus *loc)
>if (loc->expr == loc->val)
>  return false;
>
> +  if (used_by_bcp_p (name))
> +return false;
> +
>cond = build2 (loc->comp_code, boolean_type_node, loc->expr, loc-
> >val);
>assert_stmt = build_assert_expr_for (cond, name);
>if (loc->e)
>
> My original testcase did not trigger anything else.  I'm planning to
> check how this affects the testsuite, but before going further I would
> like to ask: is this the right direction now?  To me it looks a
> little-bit more heavy-handed than the original approach, but maybe it's
> worth it.

Disabling threading looks reasonable but I'd rather not disallow BB duplication
or renaming.  For VRP I guess we want to instead change
register_edge_assert_for* to not register assertions for the result of
__builtin_constant_p rather than just not allowing VRP to process it
(there are other consumers still).

Richard.

> Best regards,
> Ilya
>


Re: [__mulvti3] register allocator plays shell game

2020-10-27 Thread Richard Biener via Gcc
On Tue, Oct 27, 2020 at 12:01 AM Stefan Kanthak  wrote:
>
> Richard Biener  wrote:
>
> > On Sun, Oct 25, 2020 at 8:37 PM Stefan Kanthak  
> > wrote:
> >>
> >> Hi,
> >>
> >> for the AMD64 alias x86_64 platform and the __int128_t [DW]type,
> >> the first few lines of the __mulvDI3() function from libgcc2.c
> >>
> >> | DWtype
> >> | __mulvDI3 (DWtype u, DWtype v)
> >> | {
> >> |   /* The unchecked multiplication needs 3 Wtype x Wtype multiplications,
> >> |  but the checked multiplication needs only two.  */
> >> |   const DWunion uu = {.ll = u};
> >> |   const DWunion vv = {.ll = v};
> >> |
> >> |   if (__builtin_expect (uu.s.high == uu.s.low >> (W_TYPE_SIZE - 1), 1))
> >> | {
> >> |   /* u fits in a single Wtype.  */
> >> |   if (__builtin_expect (vv.s.high == vv.s.low >> (W_TYPE_SIZE - 1), 
> >> 1))
> >> |  {
> >> |/* v fits in a single Wtype as well.  */
> >> |/* A single multiplication.  No overflow risk.  */
> >> |return (DWtype) uu.s.low * (DWtype) vv.s.low;
> >> |  }
> >>
> >> are compiled to this braindead code (obtained from libgcc.a of
> >> GCC 10.2.0 installed on Debian):
> >>
> >>  <__mulvti3>:
> >>0: 41 55 push   %r13
> >>2: 49 89 cb  mov%rcx,%r11
> >>5: 48 89 d0  mov%rdx,%rax
> >>8: 49 89 d2  mov%rdx,%r10
> >>b: 41 54 push   %r12
> >>d: 49 89 fc  mov%rdi,%r12
> >>   10: 48 89 d1  mov%rdx,%rcx
> >>   13: 49 89 f0  mov%rsi,%r8
> >>   16: 4c 89 e2  mov%r12,%rdx
> >>   19: 49 89 f5  mov%rsi,%r13
> >>   1c: 53push   %rbx
> >>   1d: 48 89 fe  mov%rdi,%rsi
> >>   20: 48 c1 fa 3f   sar$0x3f,%rdx
> >>   24: 48 c1 f8 3f   sar$0x3f,%rax
> >>   28: 4c 89 df  mov%r11,%rdi
> >>   2b: 4c 39 c2  cmp%r8,%rdx
> >>   2e: 75 18 jne48 <__mulvti3+0x48>
> >>   30: 4c 39 d8  cmp%r11,%rax
> >>   33: 75 6b jnea0 <__mulvti3+0xa0>
> >>   35: 4c 89 e0  mov%r12,%rax
> >>   38: 49 f7 ea  imul   %r10
> >>   3b: 5bpop%rbx
> >>   3c: 41 5c pop%r12
> >>   3e: 41 5d pop%r13
> >>   40: c3retq
> >> ...
> >>
> >> There are EIGHT superfluous MOV instructions here, clobbering the
> >> non-volatile registers RBX, R12 and R13, plus THREE superfluous
> >> PUSH/POP pairs.
> >>
> >> What stops GCC from generating the following straightforward code
> >> (11 instructions in 31 bytes instead of 25 instructions in 65 bytes)?
> >>
> >> .intel_syntax noprefix
> >> __mulvti3:
> >> mov   r8, rdi
> >> mov   r9, rdx
> >> sra   r8, 63
> >> sra   r9, 63
> >> cmp   r8, rsi
> >> jne   __mulvti3+0x48+65-31
> >> cmp   r9, rcx
> >> jne   __mulvti3+0xa0+65-31
> >> mov   rax, rdi
> >> imul  rdx
> >> ret
> >> ...
> >>
> >>
> >> not amused
> >
> > can you open a bugreport please?
>
>
> I'd like to discuss alternative implementations of __mulvti3() first:
> the very first lines of libgcc2.c reads
>
> | /* More subroutines needed by GCC output code on some machines.  */
> | /* Compile this one with gcc.  */
>
>
> Why don't you take advantage of that and implement __mulvDI3() as
> follows?
>
> DWtype
> __mulvDI3 (DWtype multiplicand, DWtype multiplier)
> {
> DWtype product;
>
> if (__builtin_mul_overflow(multiplicand, multiplier, &product))
> abort();
>
> return product;
> }

Sure, that's possible - but the main concern is that such fancy builtins
can end up calling libgcc.  Which isn't really relevant for the
trap-on-overflow variants though.

I guess such change is desirable (also for the other arithmetic ops)
and code generation improvements should be done for the
__builtin_mul_overflow expansion, outside of libgcc.

Note __OPvMODE are considered legacy and -ftrapv dysfunctional
in general ...

>
> For the AMD64 platform, instead of the 131 i

Re: Recognizing loop pattern

2020-10-26 Thread Richard Biener via Gcc
On Mon, Oct 26, 2020 at 10:59 AM Stefan Schulze Frielinghaus via Gcc
 wrote:
>
> I'm trying to detect loops of the form
>
>   while (*x != y)
> ++x;
>
> which mimic the behaviour of function rawmemchr.  Note, the size of *x is not
> necessarily one byte.  Thus ultimately I would like to detect such loops and
> replace them with calls to builtins rawmemchr8, rawmemchr16, rawmemchr32 if
> they are implemented in the backend:
>
>   x = __builtin_rawmemchr16(x, y);
>
> I'm wondering whether there is a particular place in order to detect such loop
> patterns.  For example, in the loop distribution pass GCC recognizes loops
> which mimic the behavior of memset, memcpy, memmove and replaces them with
> calls to their corresponding builtins, respectively.  The pass and in
> particular partitioning of statements depends on whether a statement is used
> outside of a partition or not.  This works perfectly fine for loops which
> implement the mentioned mem* operations since their result is typically
> ignored.  However, the result of a rawmemchr function is/should never be
> ignored.  Therefore, such loops are currently recognized as having a reduction
> which makes an implementation into the loop distribution pass not straight
> forward to me.
>
> Are there other places where you would detect such loops?  Any comments?

loop distribution is the correct pass to look at.  You're simply the first to
recognize a reduction pattern.  And yes, you'll likely need some generic
adjustments to the code to handle this.

Richard.

> Cheers,
> Stefan


Re: [__mulvti3] register allocator plays shell game

2020-10-26 Thread Richard Biener via Gcc
On Sun, Oct 25, 2020 at 8:37 PM Stefan Kanthak  wrote:
>
> Hi,
>
> for the AMD64 alias x86_64 platform and the __int128_t [DW]type,
> the first few lines of the __mulvDI3() function from libgcc2.c
>
> | DWtype
> | __mulvDI3 (DWtype u, DWtype v)
> | {
> |   /* The unchecked multiplication needs 3 Wtype x Wtype multiplications,
> |  but the checked multiplication needs only two.  */
> |   const DWunion uu = {.ll = u};
> |   const DWunion vv = {.ll = v};
> |
> |   if (__builtin_expect (uu.s.high == uu.s.low >> (W_TYPE_SIZE - 1), 1))
> | {
> |   /* u fits in a single Wtype.  */
> |   if (__builtin_expect (vv.s.high == vv.s.low >> (W_TYPE_SIZE - 1), 1))
> |  {
> |/* v fits in a single Wtype as well.  */
> |/* A single multiplication.  No overflow risk.  */
> |return (DWtype) uu.s.low * (DWtype) vv.s.low;
> |  }
>
> are compiled to this braindead code (obtained from libgcc.a of
> GCC 10.2.0 installed on Debian):
>
>  <__mulvti3>:
>0: 41 55 push   %r13
>2: 49 89 cb  mov%rcx,%r11
>5: 48 89 d0  mov%rdx,%rax
>8: 49 89 d2  mov%rdx,%r10
>b: 41 54 push   %r12
>d: 49 89 fc  mov%rdi,%r12
>   10: 48 89 d1  mov%rdx,%rcx
>   13: 49 89 f0  mov%rsi,%r8
>   16: 4c 89 e2  mov%r12,%rdx
>   19: 49 89 f5  mov%rsi,%r13
>   1c: 53push   %rbx
>   1d: 48 89 fe  mov%rdi,%rsi
>   20: 48 c1 fa 3f   sar$0x3f,%rdx
>   24: 48 c1 f8 3f   sar$0x3f,%rax
>   28: 4c 89 df  mov%r11,%rdi
>   2b: 4c 39 c2  cmp%r8,%rdx
>   2e: 75 18 jne48 <__mulvti3+0x48>
>   30: 4c 39 d8  cmp%r11,%rax
>   33: 75 6b jnea0 <__mulvti3+0xa0>
>   35: 4c 89 e0  mov%r12,%rax
>   38: 49 f7 ea  imul   %r10
>   3b: 5bpop%rbx
>   3c: 41 5c pop%r12
>   3e: 41 5d pop%r13
>   40: c3retq
> ...
>
> There are EIGHT superfluous MOV instructions here, clobbering the
> non-volatile registers RBX, R12 and R13, plus THREE superfluous
> PUSH/POP pairs.
>
> What stops GCC from generating the following straightforward code
> (11 instructions in 31 bytes instead of 25 instructions in 65 bytes)?
>
> .intel_syntax noprefix
> __mulvti3:
> mov   r8, rdi
> mov   r9, rdx
> sra   r8, 63
> sra   r9, 63
> cmp   r8, rsi
> jne   __mulvti3+0x48+65-31
> cmp   r9, rcx
> jne   __mulvti3+0xa0+65-31
> mov   rax, rdi
> imul  rdx
> ret
> ...
>
>
> not amused

can you open a bugreport please?

Richard.

> Stefan Kanthak


Re: Fortran Shared Coarrays for GCC 11

2020-10-23 Thread Richard Biener via Gcc
On October 23, 2020 7:49:04 PM GMT+02:00, "Nicolas König" 
 wrote:
>Hello everyone,
>
>I'm hoping to get shared coarrays for fortran (the devel/coarray_native
>branch) merged for GCC 11 as an experimental feature, but, since the
>library uses a lot of low-level routines, I'm a bit scared of breaking
>bootstrap. It would be great if some people with more unusual setup-ups
>could try building with the branch (at the moment, I have tested it on
>Linux both on Power and x86_64). The focus at the moment is just on
>bootstrap.
>
>Thanks in advance!

The library part could be made opt-in for known working platforms like we do 
for others (through configure.tgt)

Richard. 

>   Nicolas König



Re: Missing functionality

2020-10-22 Thread Richard Biener via Gcc
On Fri, Oct 23, 2020 at 5:10 AM Gary Oblock via Gcc  wrote:
>
> I'm finishing up coding my patterns for the structure reorganization
> optimization. They recognize certain instructions and replace them
> other instructions. I've got some code that generates gimple which is
> inserted as it's created with gsi_insert_before.  This code is
> something I'd like to use at other points in my optimization so I'd
> like to create a function to do this.
>
> Now comes the interesting bit. I'd like to use gimple sequences and
> after reading the internals documentation I put together something
> using gimple_seq_add_stmt to add the generated gimple to a new
> sequence. After this I tried inserting the sequence into the basic
> block's sequence with gsi_link_seq_before. It turns out there is no
> gsi_link_seq_before! I could probably write one myself but that begs
> the question, why is there no gsi_link_seq_before and what should I
> use instead?

gsi_insert_seq_before

> Thank
>
> Gary
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Re: LTO slows down calculix by more than 10% on aarch64

2020-10-21 Thread Richard Biener via Gcc
On Wed, Oct 21, 2020 at 12:04 PM Prathamesh Kulkarni
 wrote:
>
> On Thu, 24 Sep 2020 at 16:44, Richard Biener  
> wrote:
> >
> > On Thu, Sep 24, 2020 at 12:36 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 23 Sep 2020 at 16:40, Richard Biener  
> > > wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 12:11 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Wed, 23 Sep 2020 at 13:22, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Sep 22, 2020 at 6:25 PM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 22 Sep 2020 at 16:36, Richard Biener 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 22 Sep 2020 at 12:56, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I obtained perf stat results for following 
> > > > > > > > > > > > > > > benchmark runs:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -O2:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 7856832.692380  task-clock (msec) 
> > > > > > > > > > > > > > > #1.000 CPUs utilized
> > > > > > > > > > > > > > >   3758   context-switches 
> > > > > > > > > > > > > > >  #0.000 K/sec
> > > > > > > > > > > > > > > 40 cpu-migrations 
> > > > > > > > > > > > > > > #0.000 K/sec
> > > > > > > > > > > > > > >  40847  page-faults   
> > > > > > > > > > > > > > > #0.005 K/sec
> > > > > > > > > > > > > > >  7856782413676  cycles
> > > > > > > > > > > > > > >#1.000 GHz
> > > > > > > > > > > > > > >  6034510093417  instructions  
> > > > > > > > > > > > > > >  #0.77  insn per cycle
> > > > > > > > > > > > > > >   363937274287   branches 
> > > > > > > > > > > > > > >   #   46.321 M/sec
> > > > > > > > > > > > > > >48557110132   branch-misses
> > > > > > > > > > > > > > > #   13.34% of all branches
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (ouch, 2+ hours per run is a lot, collecting a 
> > > > > > > > > > > > > > profile over a minute should be
> > > > > > > > > > > > > > enough for this kind of code)
> > > > > > > > > > > > > >
> 

Re: Where did my function go?

2020-10-20 Thread Richard Biener via Gcc
On Wed, Oct 21, 2020 at 5:21 AM Gary Oblock  wrote:
>
> >IPA transforms happens when get_body is called.  With LTO this also
> >trigger reading the body from disk.  So if you want to see all bodies
> >and work on them, you can simply call get_body on everything but it will
> >result in increased memory use since everything will be loaded form disk
> >and expanded (by inlining) at once instead of doing it on per-function
> >basis.
> Jan,
>
> Doing
>
> FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node) node->get_body ();
>
> instead of
>
> FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node) node->get_untransformed_body ();
>
> instantaneously breaks everything...

I think during WPA you cannot do ->get_body (), only
->get_untransformed_body ().  But
we don't know yet where in the IPA process you're experiencing the issue.

Richard.

> Am I missing something?
>
> Gary
> ________
> From: Jan Hubicka 
> Sent: Tuesday, October 20, 2020 4:34 AM
> To: Richard Biener 
> Cc: GCC Development ; Gary Oblock 
> Subject: Re: Where did my function go?
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> > > On Tue, Oct 20, 2020 at 1:02 PM Martin Jambor  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Oct 20 2020, Richard Biener wrote:
> > > > > On Mon, Oct 19, 2020 at 7:52 PM Gary Oblock 
> > > > >  wrote:
> > > > >>
> > > > >> Richard,
> > > > >>
> > > > >> I guess that will work for me. However, since it
> > > > >> was decided to remove an identical function,
> > > > >> why weren't the calls to it adjusted to reflect it?
> > > > >> If the call wasn't transformed that means it will
> > > > >> be mapped at some later time. Is that mapping
> > > > >> available to look at? Because using that would
> > > > >> also be a potential solution (assuming call
> > > > >> graph information exists for the deleted function.)
> > > > >
> > > > > I'm not sure how the transitional cgraph looks like
> > > > > during WPA analysis (which is what we're talking about?),
> > > > > but definitely the IL is unmodified in that state.
> > > > >
> > > > > Maybe Martin has an idea.
> > > > >
> > > >
> > > > Exactly, the cgraph_edges is where the correct call information is
> > > > stored until the inlining transformation phase calls
> > > > cgraph_edge::redirect_call_stmt_to_callee is called on it - inlining is
> > > > a special pass in this regard that performs this IPA-infrastructure
> > > > function in addition to actual inlining.
> > > >
> > > > In cgraph means the callee itself but also information in
> > > > e->callee->clone.param_adjustments which might be interesting for any
> > > > struct-reorg-like optimizations (...and in future possibly in other
> > > > transformation summaries).
> > > >
> > > > The late IPA passes are in very unfortunate spot here since they run
> > > > before the real-IPA transformation phases but after unreachable node
> > > > removals and after clone materializations and so can see some but not
> > > > all of the changes performed by real IPA passes.  The reason for that is
> > > > good cache locality when late IPA passes are either not run at all or
> > > > only look at small portion of the compilation unit.  In such case IPA
> > > > transformations of a function are followed by all the late passes
> > > > working on the same function.
> > > >
> > > > Late IPA passes are unfortunately second class citizens and I would
> > > > strongly recommend not to use them since they do not fit into our
> > > > otherwise robust IPA framework very well.  We could probably provide a
> > > > mechanism that would allow late IPA passes to run all normal IPA
> > > > transformations on a function so they could clearly see what they are
> > > > looking at, but extensive use would slow compilation down so its use
> > > > would be frowned upon at the very least.
> > >
> > > So IPA PTA does get_body () on the nodes it wants to analyze and I
> > > thought that triggers any pending IPA transforms?
> >

Re: Where did my function go?

2020-10-20 Thread Richard Biener via Gcc
On Tue, Oct 20, 2020 at 1:02 PM Martin Jambor  wrote:
>
> Hi,
>
> On Tue, Oct 20 2020, Richard Biener wrote:
> > On Mon, Oct 19, 2020 at 7:52 PM Gary Oblock  
> > wrote:
> >>
> >> Richard,
> >>
> >> I guess that will work for me. However, since it
> >> was decided to remove an identical function,
> >> why weren't the calls to it adjusted to reflect it?
> >> If the call wasn't transformed that means it will
> >> be mapped at some later time. Is that mapping
> >> available to look at? Because using that would
> >> also be a potential solution (assuming call
> >> graph information exists for the deleted function.)
> >
> > I'm not sure how the transitional cgraph looks like
> > during WPA analysis (which is what we're talking about?),
> > but definitely the IL is unmodified in that state.
> >
> > Maybe Martin has an idea.
> >
>
> Exactly, the cgraph_edges is where the correct call information is
> stored until the inlining transformation phase calls
> cgraph_edge::redirect_call_stmt_to_callee is called on it - inlining is
> a special pass in this regard that performs this IPA-infrastructure
> function in addition to actual inlining.
>
> In cgraph means the callee itself but also information in
> e->callee->clone.param_adjustments which might be interesting for any
> struct-reorg-like optimizations (...and in future possibly in other
> transformation summaries).
>
> The late IPA passes are in very unfortunate spot here since they run
> before the real-IPA transformation phases but after unreachable node
> removals and after clone materializations and so can see some but not
> all of the changes performed by real IPA passes.  The reason for that is
> good cache locality when late IPA passes are either not run at all or
> only look at small portion of the compilation unit.  In such case IPA
> transformations of a function are followed by all the late passes
> working on the same function.
>
> Late IPA passes are unfortunately second class citizens and I would
> strongly recommend not to use them since they do not fit into our
> otherwise robust IPA framework very well.  We could probably provide a
> mechanism that would allow late IPA passes to run all normal IPA
> transformations on a function so they could clearly see what they are
> looking at, but extensive use would slow compilation down so its use
> would be frowned upon at the very least.

So IPA PTA does get_body () on the nodes it wants to analyze and I
thought that triggers any pending IPA transforms?

Richard.

> Martin
>


Re: Where did my function go?

2020-10-20 Thread Richard Biener via Gcc
On Mon, Oct 19, 2020 at 7:52 PM Gary Oblock  wrote:
>
> Richard,
>
> I guess that will work for me. However, since it
> was decided to remove an identical function,
> why weren't the calls to it adjusted to reflect it?
> If the call wasn't transformed that means it will
> be mapped at some later time. Is that mapping
> available to look at? Because using that would
> also be a potential solution (assuming call
> graph information exists for the deleted function.)

I'm not sure how the transitional cgraph looks like
during WPA analysis (which is what we're talking about?),
but definitely the IL is unmodified in that state.

Maybe Martin has an idea.

Richard.

> Gary
>
>
>
> 
> From: Richard Biener 
> Sent: Sunday, October 18, 2020 11:28 PM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Where did my function go?
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Fri, Oct 16, 2020 at 9:59 PM Gary Oblock via Gcc  wrote:
> >
> > I have a tiny program composed of a few functions
> > and one of those functions (setupB) has gone missing.
> > Since I need to walk its GIMPLE, this is a problem.
> >
> > The program:
> >
> > -- aux.h -
> > #include "stdlib.h"
> > typedef struct A A_t;
> > typedef struct A B_t;
> > struct A {
> >   int i;
> >   double x;
> > };
> >
> > #define MAX(x,y) ((x)>(y) ? (x) : (y))
> >
> > extern int max1( A_t *, size_t);
> > extern double max2( B_t *, size_t);
> > extern A_t *setupA( size_t);
> > extern B_t *setupB( size_t);
> > -- aux.c 
> > #include "aux.h"
> > #include "stdlib.h"
> >
> > A_t *
> > setupA( size_t size)
> > {
> >   A_t *data = (A_t *)malloc( size * sizeof(A_t));
> >   size_t i;
> >   for( i = 0; i < size; i++ ) {
> > data[i].i = rand();
> > data[i].x = drand48();
> >   }
> >   return data;
> > }
> >
> > B_t *
> > setupB( size_t size)
> > {
> >   B_t *data = (B_t *)malloc( size * sizeof(B_t));
> >   size_t i;
> >   for( i = 0; i < size; i++ ) {
> > data[i].i = rand();
> > data[i].x = drand48();
> >   }
> >   return data;
> > }
> >
> > int
> > max1( A_t *array, size_t len)
> > {
> >   size_t i;
> >   int result = array[0].i;
> >   for( i = 1; i < len; i++  ) {
> > result = MAX( array[i].i, result);
> >   }
> >   return result;
> > }
> >
> > double
> > max2( B_t *array, size_t len)
> > {
> >   size_t i;
> >   double result = array[0].x;
> >   for( i = 1; i < len; i++  ) {
> > result = MAX( array[i].x, result);
> >   }
> >   return result;
> > }
> > -- main.c -
> > #include "stdio.h"
> >
> > A_t *data1;
> >
> > int
> > main(void)
> > {
> >   B_t *data2 = setupB(200);
> >   data1 = setupA(100);
> >
> >   printf("First %d\n" , max1(data1,100));
> >   printf("Second %e\n", max2(data2,200));
> > }
> > 
> >
> > Here is its GIMPLE dump:
> > (for the sole purpose of letting you see
> > with your own eyes that setupB is indeed missing)
> > 
> > Program:
> >   static struct A_t * data1;
> > struct A_t *  (size_t)
> >
> > ;; Function setupA (setupA, funcdef_no=4, decl_uid=4398, cgraph_uid=6, 
> > symbol_order=48) (executed once)
> >
> > setupA (size_t size)
> > {
> >   size_t i;
> >   struct A_t * data;
> >
> >[local count: 118111600]:
> >   _1 = size_8(D) * 16;
> >   data_11 = malloc (_1);
> >   goto ; [100.00%]
> >
> >[local count: 955630225]:
> >   _2 = i_6 * 16;
> >   _3 = data_11 + _2;
> >   _4 = rand ();
> >   _3->i = _4;
> >   _5 = drand48 ();
> >   _3->x = _5;
> >   i_16 = i_6 + 1;
> >
> >[local count: 1073741824]:
> >   # i_6 = PHI <0(2), i_16(3)>
> >   if (i_6 < size_8(D))
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 118111600]:
> >   return data_11;
> >
> > }
> >

Re: Where did my function go?

2020-10-18 Thread Richard Biener via Gcc
On Fri, Oct 16, 2020 at 9:59 PM Gary Oblock via Gcc  wrote:
>
> I have a tiny program composed of a few functions
> and one of those functions (setupB) has gone missing.
> Since I need to walk its GIMPLE, this is a problem.
>
> The program:
>
> -- aux.h -
> #include "stdlib.h"
> typedef struct A A_t;
> typedef struct A B_t;
> struct A {
>   int i;
>   double x;
> };
>
> #define MAX(x,y) ((x)>(y) ? (x) : (y))
>
> extern int max1( A_t *, size_t);
> extern double max2( B_t *, size_t);
> extern A_t *setupA( size_t);
> extern B_t *setupB( size_t);
> -- aux.c 
> #include "aux.h"
> #include "stdlib.h"
>
> A_t *
> setupA( size_t size)
> {
>   A_t *data = (A_t *)malloc( size * sizeof(A_t));
>   size_t i;
>   for( i = 0; i < size; i++ ) {
> data[i].i = rand();
> data[i].x = drand48();
>   }
>   return data;
> }
>
> B_t *
> setupB( size_t size)
> {
>   B_t *data = (B_t *)malloc( size * sizeof(B_t));
>   size_t i;
>   for( i = 0; i < size; i++ ) {
> data[i].i = rand();
> data[i].x = drand48();
>   }
>   return data;
> }
>
> int
> max1( A_t *array, size_t len)
> {
>   size_t i;
>   int result = array[0].i;
>   for( i = 1; i < len; i++  ) {
> result = MAX( array[i].i, result);
>   }
>   return result;
> }
>
> double
> max2( B_t *array, size_t len)
> {
>   size_t i;
>   double result = array[0].x;
>   for( i = 1; i < len; i++  ) {
> result = MAX( array[i].x, result);
>   }
>   return result;
> }
> -- main.c -
> #include "stdio.h"
>
> A_t *data1;
>
> int
> main(void)
> {
>   B_t *data2 = setupB(200);
>   data1 = setupA(100);
>
>   printf("First %d\n" , max1(data1,100));
>   printf("Second %e\n", max2(data2,200));
> }
> 
>
> Here is its GIMPLE dump:
> (for the sole purpose of letting you see
> with your own eyes that setupB is indeed missing)
> 
> Program:
>   static struct A_t * data1;
> struct A_t *  (size_t)
>
> ;; Function setupA (setupA, funcdef_no=4, decl_uid=4398, cgraph_uid=6, 
> symbol_order=48) (executed once)
>
> setupA (size_t size)
> {
>   size_t i;
>   struct A_t * data;
>
>[local count: 118111600]:
>   _1 = size_8(D) * 16;
>   data_11 = malloc (_1);
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   _2 = i_6 * 16;
>   _3 = data_11 + _2;
>   _4 = rand ();
>   _3->i = _4;
>   _5 = drand48 ();
>   _3->x = _5;
>   i_16 = i_6 + 1;
>
>[local count: 1073741824]:
>   # i_6 = PHI <0(2), i_16(3)>
>   if (i_6 < size_8(D))
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   return data_11;
>
> }
>
>
> int  (struct A_t *)
>
> ;; Function max1.constprop (max1.constprop.0, funcdef_no=1, decl_uid=4397, 
> cgraph_uid=5, symbol_order=58) (executed once)
>
> max1.constprop (struct A_t * array)
> {
>   size_t i;
>   int result;
>   size_t len;
>
>[local count: 118111600]:
>
>[local count: 118111600]:
>   result_2 = array_1(D)->i;
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   _4 = i_3 * 16;
>   _5 = array_1(D) + _4;
>   _6 = _5->i;
>   result_8 = MAX_EXPR <_6, result_7>;
>   i_9 = i_3 + 1;
>
>[local count: 1073741824]:
>   # i_3 = PHI <1(2), i_9(3)>
>   # result_7 = PHI 
>   if (i_3 <= 99)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   # result_10 = PHI 
>   return result_10;
>
> }
>
>
> double  (struct B_t *)
>
> ;; Function max2.constprop (max2.constprop.0, funcdef_no=3, decl_uid=4395, 
> cgraph_uid=3, symbol_order=59) (executed once)
>
> max2.constprop (struct B_t * array)
> {
>   size_t i;
>   double result;
>   size_t len;
>
>[local count: 118111600]:
>
>[local count: 118111600]:
>   result_2 = array_1(D)->x;
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   _4 = i_3 * 16;
>   _5 = array_1(D) + _4;
>   _6 = _5->x;
>   if (_6 > result_7)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 477815112]:
>
>[local count: 955630225]:
>   # _10 = PHI 
>   i_8 = i_3 + 1;
>
>[local count: 1073741824]:
>   # i_3 = PHI <1(2), i_8(5)>
>   # result_7 = PHI 
>   if (i_3 <= 199)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   # result_9 = PHI 
>   return result_9;
>
> }
>
>
> int  (void)
>
> ;; Function main (main, funcdef_no=5, decl_uid=4392, cgraph_uid=1, 
> symbol_order=25) (executed once)
>
> main ()
> {
>   struct B_t * data2;
>
>[local count: 1073741824]:
>   data2_6 = setupB (200);
>   _1 = setupA (100);
>   data1 = _1;
>   _2 = max1 (_1, 100);
>   printf ("First %d\n", _2);
>   _3 = max2 (data2_6, 200);
>   printf ("Second %e\n", _3);
>   return 0;
>
> }
> 
> The pass is invoked at this location in passes.def
>
>   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
>  passes are executed after partitioning and thus s

GCC 11.0.0 Status Report (2020-10-16), Stage 1 ends Nov 15th

2020-10-16 Thread Richard Biener


Status
==

GCC trunk which eventually will become GCC 11 is still open for general
development.  Stage 1 will end on the end of Sunday, Nov 15th 2020
at which point we will transition into Stage 3 which allows for general
bugfixing.

We have accumulated quite a number of regressions, a lot of the
untriaged and eventually stale.  Please help in cleaning up.


Quality Data


Priority  #   Change from last report
---   ---
P1   33   +  33
P2  256   +  35
P3   74   +  47
P4  185   +  12
P5   24   +   2
---   ---
Total P1-P3 363   + 121
Total   572   + 135


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2020-April/000505.html


Re: Question about callgraph and call_stmt

2020-10-13 Thread Richard Biener via Gcc
On Tue, Oct 13, 2020 at 2:40 PM Erick Ochoa
 wrote:
>
>
>
> On 13/10/2020 13:37, Richard Biener wrote:
> > On Tue, Oct 13, 2020 at 1:17 PM Erick Ochoa
> >  wrote:
> >>
> >> Hi,
> >>
> >> I am analyzing gimple calls during LTO.
> >
> > What's symtab->state at this point?
>
> The state is IPA_SSA_AFTER_INLINING.
>
> >
> >> I found a gimple call statement
> >> s that has the following properties:
> >>
> >> ```
> >> tree fndecl = gimple_call_fndecl(s)
> >> gcc_assert(fndecl)
> >> // That is, the gimple call returns a non-null fndecl.
> >> cgraph_node *n = cgraph_node::get(fndecl);
> >> gcc_assert(!n)
> >> // That is, there is no cgraph_node for this fndecl
> >> ```
> >>
> >> Does anyone know hot to obtain the cgraph_node in this case? Or
> >> alternatively, why there is no cgraph_node associated with this call
> >> statement? I have already tried with adding TODO_rebuild_cgraph_edges
> >> before running my pass but I suspect this has more to do with gimple
> >> than with the call graph.
> >
> > what's the particular fndecl?
>
> The callsite is in a function that has been inlined and specialized. The
> callsite calls a function that is not inlined but has also been
> specialized. I'll try reducing my test case a bit.

So you eventually shouldn't look at bodies saved just for inlining
(node->inlined_to)

Richard.

> >
> >>
> >> Thanks!


Re: Question about callgraph and call_stmt

2020-10-13 Thread Richard Biener via Gcc
On Tue, Oct 13, 2020 at 1:17 PM Erick Ochoa
 wrote:
>
> Hi,
>
> I am analyzing gimple calls during LTO.

What's symtab->state at this point?

>I found a gimple call statement
> s that has the following properties:
>
> ```
> tree fndecl = gimple_call_fndecl(s)
> gcc_assert(fndecl)
> // That is, the gimple call returns a non-null fndecl.
> cgraph_node *n = cgraph_node::get(fndecl);
> gcc_assert(!n)
> // That is, there is no cgraph_node for this fndecl
> ```
>
> Does anyone know hot to obtain the cgraph_node in this case? Or
> alternatively, why there is no cgraph_node associated with this call
> statement? I have already tried with adding TODO_rebuild_cgraph_edges
> before running my pass but I suspect this has more to do with gimple
> than with the call graph.

what's the particular fndecl?

>
> Thanks!


Re: GCC DWARF Issue - Frame Pointer Dependency

2020-10-12 Thread Richard Biener via Gcc
On Tue, Oct 13, 2020 at 5:10 AM AJ D via Gcc  wrote:
>
> Hi,
>
>
>
> I have a function for which GCC is generating the following code (just
> showing the relevant snippet here).
>
>
>
> 5a70 :
>
> 5a70:   4c 8d 54 24 08  lea0x8(%rsp),%r10
>
> 5a75:   48 83 e4 f0 and$0xfff0,%rsp
>
> 5a79:   41 ff 72 f8 pushq  -0x8(%r10)
>
> 5a7d:   55  push   %rbp
>
> 5a7e:   48 89 e5mov%rsp,%rbp
>
> 5a81:   41 57   push   %r15
>
> 5a83:   41 56   push   %r14
>
> 5a85:   41 55   push   %r13
>
> 5a87:   41 54   push   %r12
>
> :
>
> 5b08:   5b  pop%rbx
>
> 5b09:   41 5a   pop%r10
>
> 5b0b:   41 5c   pop%r12
>
> 5b0d:   41 5d   pop%r13
>
> 5b0f:   41 5e   pop%r14
>
> 5b11:   41 5f   pop%r15
>
> 5b13:   5d  pop%rbp
>
> *=>5b14:   49 8d 62 f8  lea-0x8(%r10),%rsp*
>
> 5b18:   c3  retq
>
>
>
> I am using a SIGPROF based CPU profiler (Google CPU Profiler) to profile my
> code. The SIGPROF handler (of the Google CPU Profiler) tries to unwind the
> stack (using libunwind) every time it gets a SIGPROF. And libunwind (used
> for unwinding the stack) uses DWARF unwind table (dumped by gcc -O3
> -mstackrealign -fomit-frame-pointer).
>
>
> And I noticed that I get a crash every time my code gets interrupted by
> SIGPROF while my program is in the middle of setting / resetting frame
> pointer and the frame pointer %rbp happens to point to the parent/previous
> frame at that point, for example, in instruction *5b14* (shown above with
> => and red).
>
>
>
> *=>5b14:   49 8d 62 f8  lea-0x8(%r10),%rsp*
>
>
> DWARF dumped by GCC for the snippet shown above is the following:
>
>
>
> 02f4 0044 02f8 FDE cie=
> pc=5a70..5d7c
>
>   DW_CFA_advance_loc: 5 to 5a75
>
>   DW_CFA_def_cfa: r10 (r10) ofs 0
>
>   DW_CFA_advance_loc: 9 to 5a7e
>
>   DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
>
>   DW_CFA_advance_loc: 11 to 5a89
>
>   DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
>
>   DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
>
>   DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
>
>   DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
>
>   DW_CFA_advance_loc: 5 to 5a8e
>
>   DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
>
>   DW_CFA_advance_loc: 4 to 5a92
>
> *>>  DW_CFA_expression: r3 (rbx) (DW_OP_breg6 (rbp): -48)*
>
>   DW_CFA_advance_loc1: 121 to 5b0b
>
>   DW_CFA_remember_state
>
> *>>  DW_CFA_def_cfa: r10 (r10) ofs 0*
>
>   DW_CFA_advance_loc: 13 to 5b18
>
>   DW_CFA_def_cfa: r7 (rsp) ofs 8
>
>   DW_CFA_advance_loc: 8 to 5b20
>
>   DW_CFA_restore_state
>
>
>
> 02f4 0044 02f8 FDE cie=
> pc=5a70..5d7c
>
>LOC   CFA  rbx   rbp   r12   r13   r14   r15   ra
>
> 5a70 rsp+8u u u u u u c-8
>
> 5a75 r10+0u u u u u u c-8
>
> 5a7e r10+0u exp   u u u u c-8
>
> 5a89 r10+0u exp   exp   exp   exp   exp   c-8
>
> 5a8e exp  u exp   exp   exp   exp   exp   c-8
>
> 5a92 exp  exp   exp   exp   exp   exp   exp   c-8
>
> *5b0b r10+0**exp   **exp   exp   exp   exp   exp   c-8*
>
> 5b18 rsp+8exp   exp   exp   exp   exp   exp   c-8
>
> 5b20 exp  exp   exp   exp   exp   exp   exp   c-8
>
>
>
> And if you see here, the DWARF expression for fetching the CFA is correct,
> but what about the DWARF expression for fetching the value of %rbx?
>
>
>
> *>>  DW_CFA_expression: r3 (rbx) (DW_OP_breg6 (rbp): -48)*
>
>
>
> Value of %rbx is (in DWARF) is “%rbp relative” and since %rbp is pointing
> to the wrong (parent/previous) frame, we will obviously get garbage for the
> value of %rbx.
>
>
>
> If you look at the generated DWARF carefully, pretty much everything is
> ‘%rbp relative’, so values for each of these registers cannot be restored
> in this scenario.
>
>
>
>   DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
>
>   DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
>
>   DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
>
>   DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
>
>
>
> I was just wondering, instead of making these %rbp-relative, could we have
> made this CFA-relative? That would have taken care of this particular
> issue, since CFA is correctly maintained/restored in this example.
>
>
>
> Another question, is there a known work around for this issue

Re: Loop question

2020-10-05 Thread Richard Biener
On Mon, 5 Oct 2020, Jakub Jelinek wrote:

> Hi!
> 
> Compiling the following testcase with -O2 -fopenmp:
> int a[1][128];
> 
> __attribute__((noipa)) void
> foo (void)
> {
>   #pragma omp for simd schedule (simd: dynamic, 32) collapse(2)
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   a[i][j] += 3;
> }
> 
> int
> main ()
> {
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   {
>   asm volatile ("" : : "r" (&a[0][0]) : "memory");
>   a[i][j] = i + j;
>   }
>   foo ();
>   for (int i = 0; i < 1; i++)
> for (int j = 0; j < 128; j++)
>   if (a[i][j] != i + j + 3)
>   __builtin_abort ();
>   return 0;
> }
> doesn't seem result in the vectorization I was hoping to see.
> As has been changed recently, I'm only trying to vectorize now the
> innermost loop of the collapse with outer loops around it being normal
> scalar loops like those written in the source and with only omp simd
> it works fine, but for the combined constructs the current thread gets
> assigned some range of logical iterations, therefore I get a pair of
> in this case i and j starting values.
> 
> At the end of ompexp I have:
> ...
>   D.2106 = (unsigned int) D.2105;
>   D.2107 = MIN_EXPR ;
>   D.2103 = D.2107 + .iter.4;
>   goto ; [INV]
> ;;succ:   5
> 
> ;;   basic block 4, loop depth 2
> ;;pred:   5
>   i = i.0;
>   j = j.1;
>   _1 = a[i][j];
>   _2 = _1 + 3;
>   a[i][j] = _2;
>   .iter.4 = .iter.4 + 1;
>   j.1 = j.1 + 1;
> ;;succ:   5
> 
> ;;   basic block 5, loop depth 2
> ;;pred:   4
> ;;3
> ;;7
>   if (.iter.4 < D.2103)
> goto ; [87.50%]
>   else
> goto ; [12.50%]
> ;;succ:   4
> ;;6
> 
> ;;   basic block 6, loop depth 2
> ;;pred:   5
>   i.0 = i.0 + 1;
>   if (i.0 < 1)
> goto ; [87.50%]
>   else
> goto ; [12.50%]
> ;;succ:   8
> ;;7
> 
> ;;   basic block 7, loop depth 2
> ;;pred:   6
>   j.1 = 0;
>   D.2108 = D.2099 - .iter.4;
>   D.2109 = MIN_EXPR ;
>   D.2103 = D.2109 + .iter.4;
>   goto ; [INV]
> 
> I was really hoping bbs 4 and 5 would be one loop (the one I set safelen
> and force_vectorize etc. for) and that basic blocks 6 and 7 would be
> together with that inner loop another loop, but apparently loop discovery
> thinks it is just one loop.
> Any ideas what I'm doing wrong or is there any way how to make it two loops
> (that would also survive all the cfg cleanups until vectorization)?

The early CFG looks like we have a common header with two latches
so it boils down to how we disambiguate those in the end (we seem
to unify the latches via a forwarder).  IIRC OMP lowering builds
loops itself, could it not do the appropriate disambiguation itself?

Richard.

> Essentially, in C I'm trying to have:
> int a[1][128];
> void get_me_start_end (int *, int *);
> void
> foo (void)
> {
>   int start, end, curend, i, j;
>   get_me_start_end (&start, &end);
>   i = start / 128;
>   j = start % 128;
>   curend = start + (end - start > 128 - j ? 128 - j : end - start);
>   goto doit;
>   for (i = 0; i < 1; i++)
> {
>   j = 0;
>   curend = start + (end - start > 128 ? 128 : end - start);
>   doit:;
>   /* I'd use start < curend && j < 128 as condition here, but
>the vectorizer doesn't like that either.  So I went to
>using a single IV.  */
>   for (; start < curend; start++, j++)
> a[i][j] += 3;
> }
> }
> 
> This isn't vectorized with -O3 either for the same reason.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-30 Thread Richard Biener via Gcc
On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson  wrote:
>
> On Tue, Sep 29, 2020 at 11:40 PM Richard Biener
>  wrote:
> > But this also doesn't work on GIMPLE.  On GIMPLE riscv_vlen would
> > be a barrier for code motion if you make it __attribute__((returns_twice))
> > since then abnormal edges distort the CFG in a way preventing such motion.
>
> At the gimple level, all vector operations have an implicit vsetvl, so
> it doesn't matter much how they are sorted.  As long as they don't get
> sorted across an explicit vsetvl that they depend on.  But the normal
> way to use explicit vsetvl is to control a loop, and you can't move
> dependent operations out of the loop, so it tends to work.  Setting
> vsetvl in the middle of a basic block is less useful and less common,
> and very unlikely to work unless you really know what you are doing.
> Basically, RISC-V wasn't designed to work this way, and so you
> probably shouldn't be writing your code this way.  There might be edge
> cases where we aren't handling this right, as we aren't writing code
> this way, and hence we aren't testing this support.  This is still a
> work in progress.
>
> Good RVV code should look more like this:
>
> #include 
> #include 
>
> void saxpy(size_t n, const float a, const float *x, float *y) {
>   size_t l;
>
>   vfloat32m8_t vx, vy;
>
>   for (; (l = vsetvl_e32m8(n)) > 0; n -= l) {
> vx = vle32_v_f32m8(x);
> x += l;
> vy = vle32_v_f32m8(y);
> // vfmacc
> vy = a * vx + vy;
> vse32_v_f32m8(y, vy);
> y += l;
>   }
> }

Ah, ok - that makes sense.

> We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that
> we are using for testing the vector support.

That doesn't seem to exist (but maybe it's just not on trunk yet).

Richard.

> Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-29 Thread Richard Biener via Gcc
On Tue, Sep 29, 2020 at 9:46 PM Jim Wilson  wrote:
>
> On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc  wrote:
> > I tried to set the "vlen" after the add & multi, as shown in the following 
> > code:
>
> > vf32 x3,x4;
> > void foo1(float16_t* input, float16_t* output, int vlen){
> > vf32 add = x3 + x4;
> > vf32 mul = x3 * x4;
> > __builtin_riscv_vlen(vlen);  //<
> > storevf(&output[0], add);
> > storevf(&output[4], mul);
> > }
>
> Not clear what __builtin_riscv_vlen is doing, or what exactly your
> target is, but the gcc port I did for the RISC-V draft V extension
> creates new fake vector type and vector length registers, like the
> existing fake fp and arg pointer registers, and the vsetvl{i}
> instruction sets the fake vector type and vector length registers, and
> all vector instructions read the fake vector type and vector length
> registers.  That creates the dependence between the instructions that
> prevents reordering.  It is a little more complicated than that, as
> you can have more than one vsetvl{i} instruction setting different
> vector type and/or vector length values, so we have to match on the
> expected values to make sure that vector instructions are tied to the
> right vsetvl{i} instruction.  This is a work in progress, but overall
> it is working pretty well.  This requires changes to the gcc port, as
> you have to add the new fake registers in gcc/config/riscv/riscv.h.
> This isn't something you can do with macros and extended asms.

But this also doesn't work on GIMPLE.  On GIMPLE riscv_vlen would
be a barrier for code motion if you make it __attribute__((returns_twice))
since then abnormal edges distort the CFG in a way preventing such motion.

> See for instance
> 
> https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ
>
> Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-29 Thread Richard Biener via Gcc
On Tue, Sep 29, 2020 at 12:55 PM 夏 晋 via Gcc  wrote:
>
> Hi everyone,
> I tried to set the "vlen" after the add & multi, as shown in the following 
> code:
> ➜
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> __builtin_riscv_vlen(vlen);  //<
> storevf(&output[0], add);
> storevf(&output[4], mul);
> }
> but after compilation, the "vlen" is reordered:
> ➜
> foo1:
> lui a5,%hi(.LANCHOR0)
> addia5,a5,%lo(.LANCHOR0)
> addia4,a5,64
> vfldv0,a5
> vfldv1,a4
> csrwvlen,a2  //<
> vfadd   v2,v0,v1
> addia5,a1,8
> vfmul   v0,v0,v1
> vfstv2,a1
> vfstv0,a5
> ret
> And I've tried to add some barrier code shown as the following:
> ➜
> #define barrier() __asm__ __volatile__("": : :"memory")
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> barrier();
> __builtin_riscv_vlen(vlen);
> barrier();
> storevf(&output[0], add);
> storevf(&output[4], mul);
> }
> ➜
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> __asm__ __volatile__ ("csrw\tvlen,%0" : : "rJ"(vlen) : "memory");
> storevf(&output[0], add);
> storevf(&output[4], mul);
> }
> Both methods compiled out the same false assembly.
> ===
> But if I tried the code like: (add & multi are using different operands)
> ➜
> vf32 x1,x2;
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x1 * x2;
> __builtin_riscv_vlen(vlen);
> storevf(&output[0], add);
> storevf(&output[4], mul);
> }
> the assembly will be right:
> ➜
> foo1:
> lui a5,%hi(.LANCHOR0)
> addia5,a5,%lo(.LANCHOR0)
> addia0,a5,64
> addia3,a5,128
> addia4,a5,192
> vfldv1,a5
> vfldv3,a0
> vfldv0,a3
> vfldv2,a4
> vfadd   v1,v1,v3
> vfmul   v0,v0,v2
> csrwvlen,a2  <
> addia5,a1,8
> vfstv1,a1
> vfstv0,a5
> ret
>
> Is there any other way for coding or other option for gcc compilation to deal 
> with this issue.
> Any suggestion would be appreciated. Thank you very much!

You need to present GCC with a data dependence that prevents the re-ordering
for example by adding input/outputs for add/mul like

asm volatile ("crsw\tvlen, %0" : "=r" (add), "=r" (mul) : "0" (add),
"0" (mul), "rJ" (vlen));

Richard.

> Best,
> Jin


Re: Git rejecting branch merge

2020-09-29 Thread Richard Biener via Gcc
On Tue, Sep 29, 2020 at 11:11 AM Jan Hubicka  wrote:
>
> > On Tue, Sep 29, 2020 at 9:17 AM Jan Hubicka  wrote:
> > >
> > > Hello,
> > > I am trying to update me/honza-gcc-benchmark-branch to current trunk
> > > which I do by deleting it, recreating locally and pushing out.
> > >
> > > The problem is that the push fails witih:
> > >
> > > remote: *** The following commit was rejected by your 
> > > hooks.commit-extra-checker script (status: 1)
> > > remote: *** commit: 03e87724864a17e22c9b692cc0caa014e9dba6b1
> > > remote: *** The first line of a commit message should be a short
> > > description of the change, not a single word.
> > > remote: error: hook declined to update
> > >
> > > The broken commit is:
> > >
> > > $ git show 03e87724864a17e22c9b692cc0caa014e9dba6b1
> > >
> > > commit 03e87724864a17e22c9b692cc0caa014e9dba6b1
> > > Author: Georg-Johann Lay 
> > > Date:   Tue Jan 14 11:09:38 2020 +0100
> >
> > It's odd that your branch contains a change as old as this?
>
> I use it for benchmarking on lnt and it is indeed usued for some time.
> >
> > Maybe you can rebase onto sth more recent.
>
> That is what I am trying to do. But my understanding is that in meantime
> the hook checking for more than one word in description was implemented
> and thus I can't.
>
> Perhaps I can just remove the remote branch and create again?

That should work as well.

> Honza
> >
> > >
> > > Typo.
> > > libgcc/
> > > * config/avr/lib1funcs.S (skip): Simplify.
> > >
> > >
> > > I wonder is there a way to get this done?
> > >
> > > Thanks,
> > > Honza


Re: Git rejecting branch merge

2020-09-29 Thread Richard Biener via Gcc
On Tue, Sep 29, 2020 at 9:17 AM Jan Hubicka  wrote:
>
> Hello,
> I am trying to update me/honza-gcc-benchmark-branch to current trunk
> which I do by deleting it, recreating locally and pushing out.
>
> The problem is that the push fails witih:
>
> remote: *** The following commit was rejected by your 
> hooks.commit-extra-checker script (status: 1)
> remote: *** commit: 03e87724864a17e22c9b692cc0caa014e9dba6b1
> remote: *** The first line of a commit message should be a short
> description of the change, not a single word.
> remote: error: hook declined to update
>
> The broken commit is:
>
> $ git show 03e87724864a17e22c9b692cc0caa014e9dba6b1
>
> commit 03e87724864a17e22c9b692cc0caa014e9dba6b1
> Author: Georg-Johann Lay 
> Date:   Tue Jan 14 11:09:38 2020 +0100

It's odd that your branch contains a change as old as this?

Maybe you can rebase onto sth more recent.

>
> Typo.
> libgcc/
> * config/avr/lib1funcs.S (skip): Simplify.
>
>
> I wonder is there a way to get this done?
>
> Thanks,
> Honza


Re: On IPA-PTA field sensitivity and pointer expressions

2020-09-25 Thread Richard Biener via Gcc
On Fri, Sep 25, 2020 at 2:27 PM Erick Ochoa
 wrote:
>
>
>
> On 25/09/2020 13:30, Richard Biener wrote:
> > On Fri, Sep 25, 2020 at 9:05 AM Erick Ochoa
> >  wrote:
> >>
> >> Hi,
> >>
> >> I am working on an alias analysis using the points-to information
> >> generated during IPA-PTA. If we look at the varmap varinfo_t array in
> >> gcc/tree-ssa-struct.c, most of the constraint variable info structs
> >> contain a non-null decl field which points to a valid tree in gimple
> >> (which is an SSA variable and a pointer). I am trying to find out a way
> >> to obtain points-to information for pointer expressions. By this, the
> >> concrete example I have in mind is answering the following question:
> >>
> >> What does `astruct->aptrfield` points to?
> >>
> >> Here I have a concrete example:
> >>
> >>
> >> #include 
> >>
> >> struct A { char *f1; struct A *f2;};
> >>
> >> int __GIMPLE(startwith("ipa-pta"))
> >> main (int argc, char * * argv)
> >> {
> >> struct A * p1;
> >> char * pc;
> >> int i;
> >> int _27;
> >>
> >> i_15 = 1;
> >> pc = malloc(100); // HEAP(1)
> >> p1 = malloc (16); // HEAP(2)
> >> p1->f1 = pc;
> >> p1->f2 = p1;
> >> _27 = (int) 0;
> >> return _27;
> >> }
> >>
> >>
> >> Will give the following correct points-to information:
> >>
> >> HEAP(1) = { }
> >> HEAP(2) = { HEAP(1) HEAP(2) }
> >> pc_30 = { HEAP(1) }
> >> p1_32 = { HEAP(2) }
> >>
> >> However, there does not seem to be information printed for neither:
> >>
> >> p1->f1
> >> p1->f2
> >>
> >> which I would expect (or like) something like:
> >>
> >> p1_32->0 = { HEAP(1) }
> >> p1_32->64 = { HEAP(2) }
> >>
> >> Looking more closely at the problem, I found that some varinfo_t have a
> >> non-null "complex" field. Which has an array of "complex" constraints
> >> used to handle offsets and dereferences in gimple. For this same gimple
> >> code, we have the following complex constraints for the variable p1_32:
> >>
> >> main.clobber = p1_32 + 64
> >> *p1_32 = pc_30
> >> *p1_32 + 64 = p1_32
> >
> > The issue is that allocated storage is not tracked field-sensitive since
> > we do not know it's layout at the point of allocation (where we allocate
> > the HEAP variable).  There are some exceptions, see what we do
> > for by-reference parameters in create_variable_info_for_1:
> >
> >if (vi->only_restrict_pointers
> >&& !type_contains_placeholder_p (TREE_TYPE (decl_type))
> >&& handle_param
> >&& !bitmap_bit_p (handled_struct_type,
> >  TYPE_UID (TREE_TYPE (decl_type
> >  {
> >varinfo_t rvi;
> >tree heapvar = build_fake_var_decl (TREE_TYPE (decl_type));
> >DECL_EXTERNAL (heapvar) = 1;
> >if (var_can_have_subvars (heapvar))
> >  bitmap_set_bit (handled_struct_type,
> >  TYPE_UID (TREE_TYPE (decl_type)));
> >rvi = create_variable_info_for_1 (heapvar, "PARM_NOALIAS", true,
> >  true, handled_struct_type);
> >if (var_can_have_subvars (heapvar))
> >  bitmap_clear_bit (handled_struct_type,
> >TYPE_UID (TREE_TYPE (decl_type)));
> >rvi->is_restrict_var = 1;
> >insert_vi_for_tree (heapvar, rvi);
> >make_constraint_from (vi, rvi->id);
> >make_param_constraints (rvi);
> >
> > where we create a heapvarwith a specific aggregate type.  Generally
> > make_heapvar (for the allocation case) allocates a variable without
> > subfields:
> >
> > static varinfo_t
> > make_heapvar (const char *name, bool add_id)
> > {
> >varinfo_t vi;
> >tree heapvar;
> >
> >heapvar = build_fake_var_decl (ptr_type_node);
> >DECL_EXTERNAL (heapvar) = 1;
> >
> >vi = new_var_info (heapvar, name, add_id);
> >vi->is_heap_var = true;
> >vi->is_unknown_size_var = true;
> >vi->offset = 0;
> >vi->fullsize = ~

Re: On IPA-PTA field sensitivity and pointer expressions

2020-09-25 Thread Richard Biener via Gcc
On Fri, Sep 25, 2020 at 9:05 AM Erick Ochoa
 wrote:
>
> Hi,
>
> I am working on an alias analysis using the points-to information
> generated during IPA-PTA. If we look at the varmap varinfo_t array in
> gcc/tree-ssa-struct.c, most of the constraint variable info structs
> contain a non-null decl field which points to a valid tree in gimple
> (which is an SSA variable and a pointer). I am trying to find out a way
> to obtain points-to information for pointer expressions. By this, the
> concrete example I have in mind is answering the following question:
>
> What does `astruct->aptrfield` points to?
>
> Here I have a concrete example:
>
>
> #include 
>
> struct A { char *f1; struct A *f2;};
>
> int __GIMPLE(startwith("ipa-pta"))
> main (int argc, char * * argv)
> {
>struct A * p1;
>char * pc;
>int i;
>int _27;
>
>i_15 = 1;
>pc = malloc(100); // HEAP(1)
>p1 = malloc (16); // HEAP(2)
>p1->f1 = pc;
>p1->f2 = p1;
>_27 = (int) 0;
>return _27;
> }
>
>
> Will give the following correct points-to information:
>
> HEAP(1) = { }
> HEAP(2) = { HEAP(1) HEAP(2) }
> pc_30 = { HEAP(1) }
> p1_32 = { HEAP(2) }
>
> However, there does not seem to be information printed for neither:
>
>p1->f1
>p1->f2
>
> which I would expect (or like) something like:
>
>p1_32->0 = { HEAP(1) }
>p1_32->64 = { HEAP(2) }
>
> Looking more closely at the problem, I found that some varinfo_t have a
> non-null "complex" field. Which has an array of "complex" constraints
> used to handle offsets and dereferences in gimple. For this same gimple
> code, we have the following complex constraints for the variable p1_32:
>
> main.clobber = p1_32 + 64
> *p1_32 = pc_30
> *p1_32 + 64 = p1_32

The issue is that allocated storage is not tracked field-sensitive since
we do not know it's layout at the point of allocation (where we allocate
the HEAP variable).  There are some exceptions, see what we do
for by-reference parameters in create_variable_info_for_1:

  if (vi->only_restrict_pointers
  && !type_contains_placeholder_p (TREE_TYPE (decl_type))
  && handle_param
  && !bitmap_bit_p (handled_struct_type,
TYPE_UID (TREE_TYPE (decl_type
{
  varinfo_t rvi;
  tree heapvar = build_fake_var_decl (TREE_TYPE (decl_type));
  DECL_EXTERNAL (heapvar) = 1;
  if (var_can_have_subvars (heapvar))
bitmap_set_bit (handled_struct_type,
TYPE_UID (TREE_TYPE (decl_type)));
  rvi = create_variable_info_for_1 (heapvar, "PARM_NOALIAS", true,
true, handled_struct_type);
  if (var_can_have_subvars (heapvar))
bitmap_clear_bit (handled_struct_type,
  TYPE_UID (TREE_TYPE (decl_type)));
  rvi->is_restrict_var = 1;
  insert_vi_for_tree (heapvar, rvi);
  make_constraint_from (vi, rvi->id);
  make_param_constraints (rvi);

where we create a heapvarwith a specific aggregate type.  Generally
make_heapvar (for the allocation case) allocates a variable without
subfields:

static varinfo_t
make_heapvar (const char *name, bool add_id)
{
  varinfo_t vi;
  tree heapvar;

  heapvar = build_fake_var_decl (ptr_type_node);
  DECL_EXTERNAL (heapvar) = 1;

  vi = new_var_info (heapvar, name, add_id);
  vi->is_heap_var = true;
  vi->is_unknown_size_var = true;
  vi->offset = 0;
  vi->fullsize = ~0;
  vi->size = ~0;
  vi->is_full_var = true;

I've once had attempted to split (aka generate subfields) a variable
on-demand during solving but that never worked well.

So for specific cases like C++ new T we could create heapvars
appropriately typed.  But you have to double-check for correctness
because of may_have_pointers and so on.

> It seems to me that I can probably parse these complex constraints to
> generate the answers which I want. Is this the way this is currently
> being handled in GCC or is there some other standard mechanism for this?

GCC is in the end only interested in points-to sets for SSA names
which never have subfields.  The missing subfields for aggregates
simply make the points-to solution less precise.

Richard.

> Thanks!


Re: LTO slows down calculix by more than 10% on aarch64

2020-09-24 Thread Richard Biener via Gcc
On Thu, Sep 24, 2020 at 12:36 PM Prathamesh Kulkarni
 wrote:
>
> On Wed, 23 Sep 2020 at 16:40, Richard Biener  
> wrote:
> >
> > On Wed, Sep 23, 2020 at 12:11 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 23 Sep 2020 at 13:22, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Sep 22, 2020 at 6:25 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Tue, 22 Sep 2020 at 16:36, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 22 Sep 2020 at 12:56, Richard Biener 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I obtained perf stat results for following benchmark 
> > > > > > > > > > > > > runs:
> > > > > > > > > > > > >
> > > > > > > > > > > > > -O2:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 7856832.692380  task-clock (msec) #   
> > > > > > > > > > > > >  1.000 CPUs utilized
> > > > > > > > > > > > >   3758   context-switches 
> > > > > > > > > > > > >  #0.000 K/sec
> > > > > > > > > > > > > 40 cpu-migrations 
> > > > > > > > > > > > > #0.000 K/sec
> > > > > > > > > > > > >  40847  page-faults   
> > > > > > > > > > > > > #0.005 K/sec
> > > > > > > > > > > > >  7856782413676  cycles
> > > > > > > > > > > > >#1.000 GHz
> > > > > > > > > > > > >  6034510093417  instructions  
> > > > > > > > > > > > >  #0.77  insn per cycle
> > > > > > > > > > > > >   363937274287   branches 
> > > > > > > > > > > > >   #   46.321 M/sec
> > > > > > > > > > > > >48557110132   branch-misses
> > > > > > > > > > > > > #   13.34% of all branches
> > > > > > > > > > > >
> > > > > > > > > > > > (ouch, 2+ hours per run is a lot, collecting a profile 
> > > > > > > > > > > > over a minute should be
> > > > > > > > > > > > enough for this kind of code)
> > > > > > > > > > > >
> > > > > > > > > > > > > -O2 with orthonl inlined:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 8319643.114380  task-clock (msec)   #
> > > > > > > > > > > > > 1.000 CPUs utilized
> > > > > > > > > > > > >   4285   context-switches 
> > > > > > > > > > > > > #0.001 K/sec
> > > > > > > > > > > > > 28 cpu-migrations 
> > > > > > > > > > > > >#0.000 K/sec
> > > >

Re: LTO slows down calculix by more than 10% on aarch64

2020-09-23 Thread Richard Biener via Gcc
On Wed, Sep 23, 2020 at 12:11 PM Prathamesh Kulkarni
 wrote:
>
> On Wed, 23 Sep 2020 at 13:22, Richard Biener  
> wrote:
> >
> > On Tue, Sep 22, 2020 at 6:25 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 22 Sep 2020 at 16:36, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Tue, 22 Sep 2020 at 12:56, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > > I obtained perf stat results for following benchmark runs:
> > > > > > > > > > >
> > > > > > > > > > > -O2:
> > > > > > > > > > >
> > > > > > > > > > > 7856832.692380  task-clock (msec) #
> > > > > > > > > > > 1.000 CPUs utilized
> > > > > > > > > > >   3758   context-switches 
> > > > > > > > > > >  #0.000 K/sec
> > > > > > > > > > > 40 cpu-migrations 
> > > > > > > > > > > #0.000 K/sec
> > > > > > > > > > >  40847  page-faults   
> > > > > > > > > > > #0.005 K/sec
> > > > > > > > > > >  7856782413676  cycles   
> > > > > > > > > > > #1.000 GHz
> > > > > > > > > > >  6034510093417  instructions   #  
> > > > > > > > > > >   0.77  insn per cycle
> > > > > > > > > > >   363937274287   branches   # 
> > > > > > > > > > >   46.321 M/sec
> > > > > > > > > > >48557110132   branch-misses#   
> > > > > > > > > > > 13.34% of all branches
> > > > > > > > > >
> > > > > > > > > > (ouch, 2+ hours per run is a lot, collecting a profile over 
> > > > > > > > > > a minute should be
> > > > > > > > > > enough for this kind of code)
> > > > > > > > > >
> > > > > > > > > > > -O2 with orthonl inlined:
> > > > > > > > > > >
> > > > > > > > > > > 8319643.114380  task-clock (msec)   #
> > > > > > > > > > > 1.000 CPUs utilized
> > > > > > > > > > >   4285   context-switches 
> > > > > > > > > > > #0.001 K/sec
> > > > > > > > > > > 28 cpu-migrations 
> > > > > > > > > > >#0.000 K/sec
> > > > > > > > > > >  40843  page-faults   
> > > > > > > > > > >#0.005 K/sec
> > > > > > > > > > >  8319591038295  cycles  # 
> > > > > > > > > > >1.000 GHz
> > > > > > > > > > >  6276338800377  instructions  #   
> > > > > > > > > > >  0.75  insn per cycle
> > > > > > > > > > >   467400726106   branches  #  
> > > > > > > > > > >  56.180 M/sec
> > > > > > > > > > >45986364011branch-misses   

Re: LTO slows down calculix by more than 10% on aarch64

2020-09-23 Thread Richard Biener via Gcc
On Tue, Sep 22, 2020 at 6:25 PM Prathamesh Kulkarni
 wrote:
>
> On Tue, 22 Sep 2020 at 16:36, Richard Biener  
> wrote:
> >
> > On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 22 Sep 2020 at 12:56, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > I obtained perf stat results for following benchmark runs:
> > > > > > > > >
> > > > > > > > > -O2:
> > > > > > > > >
> > > > > > > > > 7856832.692380  task-clock (msec) #1.000 
> > > > > > > > > CPUs utilized
> > > > > > > > >   3758   context-switches  #  
> > > > > > > > >   0.000 K/sec
> > > > > > > > > 40 cpu-migrations 
> > > > > > > > > #0.000 K/sec
> > > > > > > > >  40847  page-faults   
> > > > > > > > > #0.005 K/sec
> > > > > > > > >  7856782413676  cycles   #
> > > > > > > > > 1.000 GHz
> > > > > > > > >  6034510093417  instructions   #
> > > > > > > > > 0.77  insn per cycle
> > > > > > > > >   363937274287   branches   #   
> > > > > > > > > 46.321 M/sec
> > > > > > > > >48557110132   branch-misses#   
> > > > > > > > > 13.34% of all branches
> > > > > > > >
> > > > > > > > (ouch, 2+ hours per run is a lot, collecting a profile over a 
> > > > > > > > minute should be
> > > > > > > > enough for this kind of code)
> > > > > > > >
> > > > > > > > > -O2 with orthonl inlined:
> > > > > > > > >
> > > > > > > > > 8319643.114380  task-clock (msec)   #1.000 
> > > > > > > > > CPUs utilized
> > > > > > > > >   4285   context-switches #   
> > > > > > > > >  0.001 K/sec
> > > > > > > > > 28 cpu-migrations
> > > > > > > > > #0.000 K/sec
> > > > > > > > >  40843  page-faults  
> > > > > > > > > #0.005 K/sec
> > > > > > > > >  8319591038295  cycles  #
> > > > > > > > > 1.000 GHz
> > > > > > > > >  6276338800377  instructions  #
> > > > > > > > > 0.75  insn per cycle
> > > > > > > > >   467400726106   branches  #   
> > > > > > > > > 56.180 M/sec
> > > > > > > > >45986364011branch-misses  #
> > > > > > > > > 9.84% of all branches
> > > > > > > >
> > > > > > > > So +100e9 branches, but +240e9 instructions and +480e9 cycles, 
> > > > > > > > probably implying
> > > > > > > > that extra instructions are appearing in this loop nest, but 
> > > > > > > > not in the innermost
> > > > > > > > loop. As a reminder for others, the innermost loop has only 3 
> > > > > > > > iterations.
> > > > > > > >
> > > > > > > > > -O2 with orthonl inlined and PRE disabled (this removes the 
> > > > > > > > > extra b

Re: LTO slows down calculix by more than 10% on aarch64

2020-09-22 Thread Richard Biener via Gcc
On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni
 wrote:
>
> On Tue, 22 Sep 2020 at 12:56, Richard Biener  
> wrote:
> >
> > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov  
> > > > > wrote:
> > > > > >
> > > > > > > I obtained perf stat results for following benchmark runs:
> > > > > > >
> > > > > > > -O2:
> > > > > > >
> > > > > > > 7856832.692380  task-clock (msec) #1.000 CPUs 
> > > > > > > utilized
> > > > > > >   3758   context-switches  #
> > > > > > > 0.000 K/sec
> > > > > > > 40 cpu-migrations #   
> > > > > > >  0.000 K/sec
> > > > > > >  40847  page-faults   #   
> > > > > > >  0.005 K/sec
> > > > > > >  7856782413676  cycles   #
> > > > > > > 1.000 GHz
> > > > > > >  6034510093417  instructions   #0.77  
> > > > > > > insn per cycle
> > > > > > >   363937274287   branches   #   
> > > > > > > 46.321 M/sec
> > > > > > >48557110132   branch-misses#   13.34% 
> > > > > > > of all branches
> > > > > >
> > > > > > (ouch, 2+ hours per run is a lot, collecting a profile over a 
> > > > > > minute should be
> > > > > > enough for this kind of code)
> > > > > >
> > > > > > > -O2 with orthonl inlined:
> > > > > > >
> > > > > > > 8319643.114380  task-clock (msec)   #1.000 CPUs 
> > > > > > > utilized
> > > > > > >   4285   context-switches #
> > > > > > > 0.001 K/sec
> > > > > > > 28 cpu-migrations#
> > > > > > > 0.000 K/sec
> > > > > > >  40843  page-faults  #
> > > > > > > 0.005 K/sec
> > > > > > >  8319591038295  cycles  #
> > > > > > > 1.000 GHz
> > > > > > >  6276338800377  instructions  #0.75  
> > > > > > > insn per cycle
> > > > > > >   467400726106   branches  #   56.180 
> > > > > > > M/sec
> > > > > > >45986364011branch-misses  #9.84% 
> > > > > > > of all branches
> > > > > >
> > > > > > So +100e9 branches, but +240e9 instructions and +480e9 cycles, 
> > > > > > probably implying
> > > > > > that extra instructions are appearing in this loop nest, but not in 
> > > > > > the innermost
> > > > > > loop. As a reminder for others, the innermost loop has only 3 
> > > > > > iterations.
> > > > > >
> > > > > > > -O2 with orthonl inlined and PRE disabled (this removes the extra 
> > > > > > > branches):
> > > > > > >
> > > > > > >8207331.088040  task-clock (msec)   #1.000 CPUs 
> > > > > > > utilized
> > > > > > >   2266   context-switches#0.000 
> > > > > > > K/sec
> > > > > > > 32 cpu-migrations   #
> > > > > > > 0.000 K/sec
> > > > > > >  40846  page-faults #
> > > > > > > 0.005 K/sec
> > > > > > >  8207292032467  cycles #   1.000 GHz
> > > > > > >  6035724436440  instructions #0.74  insn 
> > > > > > > p

Re: LTO slows down calculix by more than 10% on aarch64

2020-09-22 Thread Richard Biener via Gcc
On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni
 wrote:
>
> On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov  wrote:
> > > >
> > > > > I obtained perf stat results for following benchmark runs:
> > > > >
> > > > > -O2:
> > > > >
> > > > > 7856832.692380  task-clock (msec) #1.000 CPUs 
> > > > > utilized
> > > > >   3758   context-switches  #0.000 
> > > > > K/sec
> > > > > 40 cpu-migrations #
> > > > > 0.000 K/sec
> > > > >  40847  page-faults   #
> > > > > 0.005 K/sec
> > > > >  7856782413676  cycles   #1.000 
> > > > > GHz
> > > > >  6034510093417  instructions   #0.77  
> > > > > insn per cycle
> > > > >   363937274287   branches   #   46.321 
> > > > > M/sec
> > > > >48557110132   branch-misses#   13.34% of 
> > > > > all branches
> > > >
> > > > (ouch, 2+ hours per run is a lot, collecting a profile over a minute 
> > > > should be
> > > > enough for this kind of code)
> > > >
> > > > > -O2 with orthonl inlined:
> > > > >
> > > > > 8319643.114380  task-clock (msec)   #1.000 CPUs 
> > > > > utilized
> > > > >   4285   context-switches #0.001 
> > > > > K/sec
> > > > > 28 cpu-migrations#
> > > > > 0.000 K/sec
> > > > >  40843  page-faults  #
> > > > > 0.005 K/sec
> > > > >  8319591038295  cycles  #1.000 GHz
> > > > >  6276338800377  instructions  #0.75  insn 
> > > > > per cycle
> > > > >   467400726106   branches  #   56.180 
> > > > > M/sec
> > > > >45986364011branch-misses  #9.84% of 
> > > > > all branches
> > > >
> > > > So +100e9 branches, but +240e9 instructions and +480e9 cycles, probably 
> > > > implying
> > > > that extra instructions are appearing in this loop nest, but not in the 
> > > > innermost
> > > > loop. As a reminder for others, the innermost loop has only 3 
> > > > iterations.
> > > >
> > > > > -O2 with orthonl inlined and PRE disabled (this removes the extra 
> > > > > branches):
> > > > >
> > > > >8207331.088040  task-clock (msec)   #1.000 CPUs utilized
> > > > >   2266   context-switches#0.000 K/sec
> > > > > 32 cpu-migrations   #0.000 
> > > > > K/sec
> > > > >  40846  page-faults #0.005 
> > > > > K/sec
> > > > >  8207292032467  cycles #   1.000 GHz
> > > > >  6035724436440  instructions #0.74  insn per 
> > > > > cycle
> > > > >   364415440156   branches #   44.401 M/sec
> > > > >53138327276branch-misses #   14.58% of all 
> > > > > branches
> > > >
> > > > This seems to match baseline in terms of instruction count, but without 
> > > > PRE
> > > > the loop nest may be carrying some dependencies over memory. I would 
> > > > simply
> > > > check the assembly for the entire 6-level loop nest in question, I hope 
> > > > it's
> > > > not very complicated (though Fortran array addressing...).
> > > >
> > > > > -O2 with orthonl inlined and hoisting disabled:
> > > > >
> > > > >7797265.206850  task-clock (msec) #1.000 CPUs 
> > > > > utilized
> > > > >   3139  context-switches  #0.000 
> > > > > K/sec
> > > > > 20cpu-migrations #
> > > > > 0.000 K/sec
> > > > >  40846  page-faults  #
> > > > > 0.005 K/sec
> > > > >  7797221351467  cycles  #1.000 GHz
> > > > >  6187348757324  instructions  #0.79  insn 
> > > > > per cycle
> > > > >   461840800061   branches  #   59.231 
> > > > > M/sec
> > > > >26920311761branch-misses #5.83% of all 
> > > > > branches
> > > >
> > > > There's a 20e9 reduction in branch misses and a 500e9 reduction in 
> > > > cycle count.
> > > > I don't think the former fully covers the latter (there's also a 90e9 
> > > > reduction
> > > > in insn count).
> > > >
> > > > Given that the inner loop iterates only 3 times, my main suggestion is 
> > > > to
> > > > consider how the profile for the entire loop nest looks like (it's 6 
> > > > loops deep,
> > > > each iterating only 3 times).
> > > >
> > > > > Perf profiles for
> > > > > -O2 -fno-code-hoisting and inlined orthonl:
> > > > > https://people.linaro.or

Re: Import license issue

2020-09-21 Thread Richard Biener via Gcc
On Mon, Sep 21, 2020 at 10:55 AM Andrew Stubbs  wrote:
>
> Ping.

Sorry, but you won't get any help resolving license issues from the
mailing list.
Instead you should eventually ask the SC to "resolve" this issue with the FSF.

Richard.

> On 14/09/2020 17:56, Andrew Stubbs wrote:
> > Hi All,
> >
> > I need to update include/hsa.h to access some newer APIs. The existing
> > file was created by copying from the user manual, thus side-stepping
> > licensing issues, but the updated user manual omits some important
> > details from the APIs I need (mostly the contents of structs and value
> > of enums). Of course, I can go see those details in the source, but
> > that's not the same thing.
> >
> > So, what I would like to do is import the header files I need into the
> > GCC sources; there's precedent for importing (unmodified) copyright
> > files for libffi etc., AFAICT, but of course the license needs to be
> > acceptable.
> >
> > The relevant files are here:
> >
> > https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa.h
> > https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h
> >
> > https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_image.h
> >
> >
> > When I previously enquired about this on IRC I was advised that the
> > Illinois license would be unacceptable because it contains an
> > attribution clause that would require all binary distributors to credit
> > AMD in their documentation, which seems like a reasonable position. I've
> > requested that AMD provide a copy of these specific files with a more
> > acceptable license, and I may yet be successful, but it's not that simple.
> >
> > The problem is that GCC already has this exact same license in
> > libsanitizer/LICENSE.TXT so, again reasonably, AMD want to know why that
> > licence is acceptable and their license is not.
> >
> > Looking at the files myself, there appears to be some kind of dual
> > license thing going on, and the word "Illinois" doesn't actually appear
> > in any libsanitizer source file (many of which contain an Apache license
> > header). Does this mean that the Illinois license is not actually active
> > here? Or is it that it is active and binary distributors really should
> > be obeying this attribution clause already?
> >
> > Can anybody help me untangle this, please?
> >
> > Are the files acceptable, and if not, how is this different from the
> > other cases?
> >
> > Thanks very much
> >
> > Andrew
>


Re: Question about instrumenting gimple

2020-09-16 Thread Richard Biener via Gcc
On Tue, Sep 15, 2020 at 5:14 PM Erick Ochoa
 wrote:
>
> Hi,
>
> I am trying to instrument gimple so that "hello world" is printed after
> each call to malloc. I've tried instrumenting using the following code
>
> static void
> // G points to the gcall which corresponds to malloc
> call_hello_world(gimple* g)
> {
>gimple_stmt_iterator gsi = gsi_start(g);
>
>// create string constant "hello world\n"
>const char* _string = "hello world\n";
>// plus 1 for the null char
>const unsigned _size = strlen(_string) + 1;
>tree _string_cst = build_string (_size, _string);
>
>// create char*
>tree _char_ptr = build_pointer_type(char_type_node);
>
>// create variable hello_string
>tree _var_decl = build_decl(UNKNOWN_LOCATION, VAR_DECL,
> get_identifier("hellostring"), _char_ptr);
>
>// char* hello_string = "hello world\n";
>gassign *assign_stmt = gimple_build_assign(_var_decl, _string_cst);
>gsi_insert_after(&gsi, assign_stmt, GSI_NEW_STMT);
>update_stmt(assign_stmt);

you don't need the above stmt and the wrong thing is that you
forget the ADDR_EXPR around the "hello world\n" string.  But you can
pass build_fold_addr_expr (_string_cst) directly as call argument
since it is an invariant.

>gcall *call_stmt =
> gimple_build_call(builtin_decl_explicit(BUILT_IN_PRINTF), 1, _var_decl);
>gsi_insert_after(&gsi, call_stmt, GSI_NEW_STMT);
>
>update_stmt(call_stmt);
> }
>
> but when GCC is compiled with these changes it segfaults in the
> following place:
>
>
> 0xcca9ff crash_signal
>  /home/eochoa/code/ipa-dlo/gcc/gcc/toplev.c:327
> 0x9b99c0 useless_type_conversion_p(tree_node*, tree_node*)
>  /home/eochoa/code/ipa-dlo/gcc/gcc/gimple-expr.c:71
> 0xd1a5a7 verify_gimple_assign_single
>  /home/eochoa/code/ipa-dlo/gcc/gcc/tree-cfg.c:4440
> 0xd1a5a7 verify_gimple_assign
>  /home/eochoa/code/ipa-dlo/gcc/gcc/tree-cfg.c:4667
> 0xd1a5a7 verify_gimple_stmt
>  /home/eochoa/code/ipa-dlo/gcc/gcc/tree-cfg.c:4932
> 0xd2126b verify_gimple_in_cfg(function*, bool)
>  /home/eochoa/code/ipa-dlo/gcc/gcc/tree-cfg.c:5418
> 0xbd6ca3 execute_function_todo
>  /home/eochoa/code/ipa-dlo/gcc/gcc/passes.c:1992
> 0xbd7a63 do_per_function
>  /home/eochoa/code/ipa-dlo/gcc/gcc/passes.c:1647
> 0xbd7ae3 execute_todo
>  /home/eochoa/code/ipa-dlo/gcc/gcc/passes.c:2046
>
> This tells me that gimple was ill formed and that there's likely a bad
> type conversion... and that the type conversion was ill formed during
> the assign statement... but nothing is immediately obvious why the
> assignment statement is ill formed. Do I have to update something or
> make sure to have push_cfun the function that I'm modifying?
>
> Thanks!


Re: A problem with one instruction multiple latencies and pipelines

2020-09-07 Thread Richard Biener via Gcc
On Mon, Sep 7, 2020 at 10:46 AM Qian, Jianhua  wrote:
>
> Hi Richard
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, September 7, 2020 3:41 PM
> > To: Qian, Jianhua/钱 建华 
> > Cc: gcc@gcc.gnu.org
> > Subject: Re: A problem with one instruction multiple latencies and pipelines
> >
> > On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua  wrote:
> > >
> > > Hi
> > >
> > > I'm adding a new machine model. I have a problem when writing the
> > "define_insn_reservation" for instruction scheduling.
> > > How to write the "define_insn_reservation" for one instruction that there 
> > > are
> > different latencies and pipelines according to parameter.
> > >
> > > For example, the ADD (shifted register) instruction in a64fx
> > >
> > > InstructionOption Latency
> > Pipeline
> > > ADD (shifted register)   = 0 1  EX*
> > | EAG*
> > >= [1-4] && =LSL  1+1
> > (EXA + EXA) | (EXB + EXB)
> > >  2+1   (EXA
> > + EXA) | (EXB + EXB)
> > >
> > > In aarch64.md ADD (shifted register) instruction is defined as following.
> > >  (define_insn "*add__"
> > >   [(set (match_operand:GPI 0 "register_operand" "=r")
> > > (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand"
> > "r")
> > >   (match_operand:QI 2
> > "aarch64_shift_imm_" "n"))
> > >   (match_operand:GPI 3 "register_operand" "r")))]
> > >   ""
> > >   "add\\t%0, %3, %1,  %2"
> > >   [(set_attr "type" "alu_shift_imm")]
> > > )
> > >
> > > It could not be distinguished by the type "alu_shift_imm" when writing
> > "define_insn_reservation" for ADD (shifted register).
> > > What should I do?
> >
> > Just a guess - I'm not very familiar with the pipeline modeling, you 
> > probably
> > need to expose two alternatives so you can assign a different type to the 
> > second
> > one.
> I'm considering such method,
> but if I do that I'm afraid it has side effects on other machine models of 
> aarch64 series.
> Some instructions' definition will be changed in aarch64.md file.
>
> > Other than that modeling the more restrictive (or permissive?) variant might
> > work good enough in practice.
> Is your mean that an approximate modeling is good enough?

Yes.

> > a64fx is probably out-of-order anyway.
> Yes, a64fx is an out-of-order architecture.
>
> Regards
> Qian
>
> >
> > Richard.
> >
> > > Regards
> > > Qian
> > >
> > >
> > >
> >
>
>
>


Re: A problem with one instruction multiple latencies and pipelines

2020-09-07 Thread Richard Biener via Gcc
On Mon, Sep 7, 2020 at 8:10 AM Qian, Jianhua  wrote:
>
> Hi
>
> I'm adding a new machine model. I have a problem when writing the 
> "define_insn_reservation" for instruction scheduling.
> How to write the "define_insn_reservation" for one instruction that there are 
> different latencies and pipelines according to parameter.
>
> For example, the ADD (shifted register) instruction in a64fx
>
> InstructionOption LatencyPipeline
> ADD (shifted register)   = 0 1  EX* | EAG*
>= [1-4] && =LSL  1+1   (EXA + EXA) 
> | (EXB + EXB)
>  2+1   (EXA + 
> EXA) | (EXB + EXB)
>
> In aarch64.md ADD (shifted register) instruction is defined as following.
>  (define_insn "*add__"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>   (match_operand:QI 2 "aarch64_shift_imm_" 
> "n"))
>   (match_operand:GPI 3 "register_operand" "r")))]
>   ""
>   "add\\t%0, %3, %1,  %2"
>   [(set_attr "type" "alu_shift_imm")]
> )
>
> It could not be distinguished by the type "alu_shift_imm" when writing 
> "define_insn_reservation" for ADD (shifted register).
> What should I do?

Just a guess - I'm not very familiar with the pipeline modeling, you
probably need to
expose two alternatives so you can assign a different type to the second one.

Other than that modeling the more restrictive (or permissive?) variant
might work good enough in practice.
a64fx is probably out-of-order anyway.

Richard.

> Regards
> Qian
>
>
>


Re: Is there a way to look for a tree by its UID?

2020-09-07 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 4:36 PM Erick Ochoa
 wrote:
>
>
>
> On 04/09/2020 15:19, Richard Biener wrote:
> > On Fri, Sep 4, 2020 at 10:13 AM Erick Ochoa
> >  wrote:
> >>
> >>
> >>
> >> On 03/09/2020 12:19, Richard Biener wrote:
> >>> On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc  
> >>> wrote:
> >>>>
> >>>> On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote:
> >>>>> So, I am just wondering is there an interface where I could do something
> >>>>> like:
> >>>>>
> >>>>> ```
> >>>>>   // vars is the field in pt_solution of type bitmap
> >>>>>   EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi)
> >>>>>   {
> >>>>>  // uid is set
> >>>>>  tree pointed_to = get_tree_with_uid(uid);
> >>>>>   }
> >>>>> ```
> >>>>
> >>>> There is not.
> >>>
> >>> And there cannot be since the solution includes UIDs of
> >>> decls that are "fake" and thus never really existed.
> >>
> >> Hi Richard and Jakub,
> >>
> >> thanks, I was looking for why get_tree_with_uid might be a somewhat bad
> >> idea.
> >>
> >> I am thinking about representing an alias set similarly to the
> >> pt_solution. Instead of having bits set in position of points-to
> >> variables UIDs, I was thinking about having bits set in position of
> >> may-alias variables' UIDs. I think an interface similar to the one I
> >> described can provide a good mechanism of jumping to different aliases,
> >> but I do agree that HEAP variables and shadow_variables (and perhaps
> >> other fake variables) might not be a good idea to keep in the interface
> >> to avoid jumping to trees which do not represent something in gimple.
> >>
> >> Richard, you mentioned in another e-mail that I might want to provide
> >> the alias-sets from IPA-PTA to another pass in a similar way to
> >> ipa_escape_pt. I think using a structure similar to pt_solution for
> >> may-alias solution is a good idea. Again, the bitmap to aliasing
> >> variables in UIDs. However, I think for this solution to be general I
> >> need several of these structs not just one. Ideally one per candidate
> >> alias-set, at most one per variable.
> >
> > Sure, you need one per alias-set.  Indeed you might want to work
> > with bitmaps of varinfo IDs first when computing alias-sets
> > since ...
>
> Yes, that's what I've been doing :)
>
> >>>
> >>> I think you need to first get a set of candidates you want to
> >>> transform (to limit the work done below), then use the
> >>> internal points-to solutions and compute alias sets for this
> >>> set plus the points-to solution this alias-set aliases. >
> >>> You can then keep the candidate -> alias-set ID -> points-to
> >>> relation (thus candidates should not be "all variables" for
> >>> efficiency reasons).
> >>
> >> I think I can use a relatively simple heuristic: start by looking at
> >> malloc statements and obtain the alias-sets for variables which hold
> >> malloc's return values. This should address most efficiency concerns.
> >>
> >> So, I'm thinking about the following:
> >>
> >> * obtain variables which hold the result of malloc. These are the
> >> initial candidates.
> >
> > ... those would be the is_heapvar ones.  Since you can probably
> > only handle the case where all pointers either only point to a
> > single allocation sites result and nothing else or not to it that case
> > looks special and thus easy anyway.
>
> I did a git grep and is_heapvar is gone. But, I believe that I still
> collect these variables as quickly as possible. I iterate over the call
> graph and if I find malloc, then I just look at the callers and collect
> the lhs. This lhs corresponds to the "decl" in the varinfo_t struct.
>
> I then just iterate over the variables in varmap to find matching lhs
> with the decl and computing alias sets by looking at the intersection of
> pt_solution. This seems to work well. I still need to find out whether
> they escape, but it should be simple to do so from here.
>
> >
> >> * for initial candidates compute alias-sets as bitmaps where only "real"
> >> decl UIDs are set. Compute this jus

Re: A couple GIMPLE questions

2020-09-06 Thread Richard Biener via Gcc
On September 6, 2020 9:38:45 AM GMT+02:00, Gary Oblock via Gcc 
 wrote:
>>That's not a question? Are you asking why PHIs exist at all?
>>They are the standard way to represent merging in SSA
>>representations. You can iterate on the PHIs of a basic block, etc.
>
>Marc,
>
>I first worked with the SSA form twenty years ago so yes I am
>aware of what a phi is... I've just never seen a compiler eliminate
>an assignment of a variable to a constant and jam the constant into
>the phi where the SSA variable should be. What a phi is all about
>is representing data flow and a constant in the phi doesn't seem
>to be related to that. I can deal with this but it seems that having to
>crawl the phis looking for constants seems baroque. I would hope
>there is a control that can suppress this or a transformation
>that I can invoke to reverse it...

No, there isn't. We happily propagate constants into PHI nodes. 

Richard. 

>
>Thanks,
>
>Gary
>
>From: Marc Glisse 
>Sent: Saturday, September 5, 2020 11:29 PM
>To: Gary Oblock 
>Cc: gcc@gcc.gnu.org 
>Subject: Re: A couple GIMPLE questions
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>On Sat, 5 Sep 2020, Gary Oblock via Gcc wrote:
>
>> First off one of the questions just me being curious but
>> second is quite serious. Note, this is GIMPLE coming
>> into my optimization and not something I've modified.
>>
>> Here's the C code:
>>
>> type_t *
>> do_comp( type_t *data, size_t len)
>> {
>>  type_t *res;
>>  type_t *x = min_of_x( data, len);
>>  type_t *y = max_of_y( data, len);
>>
>>  res = y;
>>  if ( x < y ) res = 0;
>>  return res;
>> }
>>
>> And here's the resulting GIMPLE:
>>
>> ;; Function do_comp.constprop (do_comp.constprop.0, funcdef_no=5,
>decl_uid=4392, cgraph_uid=3, symbol_order=68) (executed once)
>>
>> do_comp.constprop (struct type_t * data)
>> {
>>  struct type_t * res;
>>  struct type_t * x;
>>  struct type_t * y;
>>  size_t len;
>>
>>   [local count: 1073741824]:
>>
>>   [local count: 1073741824]:
>>  x_2 = min_of_x (data_1(D), 1);
>>  y_3 = max_of_y (data_1(D), 1);
>>  if (x_2 < y_3)
>>goto ; [29.00%]
>>  else
>>goto ; [71.00%]
>>
>>   [local count: 311385128]:
>>
>>   [local count: 1073741824]:
>>  # res_4 = PHI 
>>  return res_4;
>>
>> }
>>
>> The silly question first. In the "if" stmt how does GCC
>> get those probabilities? Which it shows as 29.00% and
>> 71.00%. I believe they should both be 50.00%.
>
>See the profile_estimate pass dump. One branch makes the function
>return
>NULL, which makes gcc guess that it may be a bit less likely than the
>other. Those are heuristics, which are tuned to help on average, but of
>course they are sometimes wrong.
>
>> The serious question is what is going on with this phi?
>>res_4 = PHI 
>>
>> This makes zero sense practicality wise to me and how is
>> it supposed to be recognized and used? Note, I really do
>> need to transform the "0B" into something else for my
>> structure reorganization optimization.
>
>That's not a question? Are you asking why PHIs exist at all? They are
>the
>standard way to represent merging in SSA representations. You can
>iterate
>on the PHIs of a basic block, etc.
>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any
>attachments, is for the sole use of the intended recipient(s) and
>contains information that is confidential and proprietary to Ampere
>Computing or its subsidiaries. It is to be used solely for the purpose
>of furthering the parties' business relationship. Any unauthorized
>review, copying, or distribution of this email (or any attachments
>thereto) is strictly prohibited. If you are not the intended recipient,
>please contact the sender immediately and permanently delete the
>original and any copies of this email and any attachments thereto.
>
>Could you please get rid of this when posting on public mailing lists?
>
>--
>Marc Glisse



Re: Is there a way to look for a tree by its UID?

2020-09-04 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 10:13 AM Erick Ochoa
 wrote:
>
>
>
> On 03/09/2020 12:19, Richard Biener wrote:
> > On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc  
> > wrote:
> >>
> >> On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote:
> >>> So, I am just wondering is there an interface where I could do something
> >>> like:
> >>>
> >>> ```
> >>>  // vars is the field in pt_solution of type bitmap
> >>>  EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi)
> >>>  {
> >>> // uid is set
> >>> tree pointed_to = get_tree_with_uid(uid);
> >>>  }
> >>> ```
> >>
> >> There is not.
> >
> > And there cannot be since the solution includes UIDs of
> > decls that are "fake" and thus never really existed.
>
> Hi Richard and Jakub,
>
> thanks, I was looking for why get_tree_with_uid might be a somewhat bad
> idea.
>
> I am thinking about representing an alias set similarly to the
> pt_solution. Instead of having bits set in position of points-to
> variables UIDs, I was thinking about having bits set in position of
> may-alias variables' UIDs. I think an interface similar to the one I
> described can provide a good mechanism of jumping to different aliases,
> but I do agree that HEAP variables and shadow_variables (and perhaps
> other fake variables) might not be a good idea to keep in the interface
> to avoid jumping to trees which do not represent something in gimple.
>
> Richard, you mentioned in another e-mail that I might want to provide
> the alias-sets from IPA-PTA to another pass in a similar way to
> ipa_escape_pt. I think using a structure similar to pt_solution for
> may-alias solution is a good idea. Again, the bitmap to aliasing
> variables in UIDs. However, I think for this solution to be general I
> need several of these structs not just one. Ideally one per candidate
> alias-set, at most one per variable.

Sure, you need one per alias-set.  Indeed you might want to work
with bitmaps of varinfo IDs first when computing alias-sets
since ...
> >
> > I think you need to first get a set of candidates you want to
> > transform (to limit the work done below), then use the
> > internal points-to solutions and compute alias sets for this
> > set plus the points-to solution this alias-set aliases. >
> > You can then keep the candidate -> alias-set ID -> points-to
> > relation (thus candidates should not be "all variables" for
> > efficiency reasons).
>
> I think I can use a relatively simple heuristic: start by looking at
> malloc statements and obtain the alias-sets for variables which hold
> malloc's return values. This should address most efficiency concerns.
>
> So, I'm thinking about the following:
>
> * obtain variables which hold the result of malloc. These are the
> initial candidates.

... those would be the is_heapvar ones.  Since you can probably
only handle the case where all pointers either only point to a
single allocation sites result and nothing else or not to it that case
looks special and thus easy anyway.

> * for initial candidates compute alias-sets as bitmaps where only "real"
> decl UIDs are set. Compute this just before the end of IPA-PTA.
> * for each alias_set:
>  for each alias:
>map[alias->decl] = alias_set
> * Use this map and the alias-sets bitmaps in pass just after IPA-PTA.
> * Potentially use something similar to get_tree_with_uid but that is
> only valid for trees which are keys in the map.

Hmm, isn't this more than you need?  Given a set of candidates C
you try to form alias-sets so that all members of an alias set A
are members of C, they are layout-compatible and not member
of another alias-set.  Plus no member escapes and all pointers
you can track may only point to a subset of a single alias-set.

>From the above C is what constrains the size of your sets and
the mapping.

> Does this sound reasonable?
>
> >
> > Richard.
> >
> >>  Jakub
> >>


Re: A silly question regarding function types

2020-09-04 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 4:39 AM Gary Oblock via Gcc  wrote:
>
> Note, isn't a problem, rather, it's something that puzzles me.
>
> On walking a function types argument types this way
>
> for ( arg = TYPE_ARG_TYPES ( func_type);
>arg != NULL;
>arg = TREE_CHAIN ( arg))
> {
>.
>.
>  }
>
> I noticed an extra void argument that didn't exist
> tagged on the end.
>
> I then noticed other code doing this (which I copied:)
>
> for ( arg = TYPE_ARG_TYPES ( func_type);
> arg != NULL && arg != void_list_node;
> arg = TREE_CHAIN ( arg))
>  {
>  .
>  .
>   }
>
> What is going on here???

Without a void_list_node on the end it's a variadic function

> Thanks,
>
> Gary
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Re: about souce code location

2020-09-04 Thread Richard Biener via Gcc
On Fri, Sep 4, 2020 at 2:23 AM 易会战 via Gcc  wrote:
>
> I am working a instrumention tool, and need get the location info for a 
> gimple statement. I use the location structure to get the info, and it can 
> work when i use -O1. When I use -O2, sometimes the info seems to be lost and 
> I get line num is zero.  anyone can tell me how to get the info?

Not all statements have a location, if you encounter such you need to
look at the "surrounding context"
to find one.


Re: Types are confused in inlining

2020-09-03 Thread Richard Biener via Gcc
On September 3, 2020 7:59:12 PM GMT+02:00, Gary Oblock 
 wrote:
>>This is absolutely not enough information to guess at the
>>issue ;)
>
>That's fair, I was hoping some mad genius out there would confess to a
>fubar_adjustment phase that was probably at fault. 😉

Ah, well. It's probably your own code that is at fault ;) 

>>I suggest you break at the return stmt of make_ssa_name_fn
>>looking for t->base.u.version == 101 to see where and with
>>which type _101 is created, from there watch *&t->typed.type
>>in case something adjusts the type.
>
>I did the former but I used ssa_name_nodes_created
>instead. Which though harder to get at, is unique.
>Regarding the later... I guess... But, at various times (on certain
>OS versions of certain machines) watch points have been
>at bit dubious. I assume on a recent Ubuntu release
>on an Intel I7 core this wouldn't be the case???

I never had an issue with watch points. 

Richard. 

>Thanks,
>
>Gary
>
>From: Richard Biener 
>Sent: Wednesday, September 2, 2020 11:31 PM
>To: Gary Oblock 
>Cc: gcc@gcc.gnu.org 
>Subject: Re: Types are confused in inlining
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>On Wed, Sep 2, 2020 at 10:19 PM Gary Oblock via Gcc 
>wrote:
>>
>> I'm not accusing inlining of having problems but I really
>> need to understand what's going on in this situation so I can
>> fix my optimization.
>>
>> The error given is:
>> main.c: In function ‘main’:
>> main.c:5:1: error: non-trivial conversion in ‘ssa_name’
>> 5 | main(void)
>>   | ^
>> struct type_t *
>> unsigned long
>> _101 = dedangled_97;
>> during GIMPLE pass: fixup_cfg
>> etc.
>> etc.
>>
>> I put a conditional breakpoint in gdb where both
>> _101 and dedangled_97 were created and low
>> and behold they were both set to "unsigned long".
>> Does anybody have a clue as to how "_101" got
>> changed from "unsigned long" to "struct type_t *"?
>> Note, the later is a meaningful type in my program.
>> I'm trying to replace all instances of the former as
>> part of structure reorganization optimization.) I should
>> mention that this GIMPLE stmt is the one that moves
>> the value computed in an inlined function into the body
>> of code where the inling took place.
>
>This is absolutely not enough information to guess at the
>issue ;)
>
>I suggest you break at the return stmt of make_ssa_name_fn
>looking for t->base.u.version == 101 to see where and with
>which type _101 is created, from there watch *&t->typed.type
>in case something adjusts the type.
>
>> Thanks,
>>
>> Gary Oblock
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any
>attachments, is for the sole use of the intended recipient(s) and
>contains information that is confidential and proprietary to Ampere
>Computing or its subsidiaries. It is to be used solely for the purpose
>of furthering the parties' business relationship. Any unauthorized
>review, copying, or distribution of this email (or any attachments
>thereto) is strictly prohibited. If you are not the intended recipient,
>please contact the sender immediately and permanently delete the
>original and any copies of this email and any attachments thereto.



Re: Is there a way to look for a tree by its UID?

2020-09-03 Thread Richard Biener via Gcc
On Thu, Sep 3, 2020 at 10:58 AM Jakub Jelinek via Gcc  wrote:
>
> On Thu, Sep 03, 2020 at 10:22:52AM +0200, Erick Ochoa wrote:
> > So, I am just wondering is there an interface where I could do something
> > like:
> >
> > ```
> > // vars is the field in pt_solution of type bitmap
> > EXECUTE_IF_SET_IN_BITMAP (vars, 0, uid, bi)
> > {
> >// uid is set
> >tree pointed_to = get_tree_with_uid(uid);
> > }
> > ```
>
> There is not.

And there cannot be since the solution includes UIDs of
decls that are "fake" and thus never really existed.

I think you need to first get a set of candidates you want to
transform (to limit the work done below), then use the
internal points-to solutions and compute alias sets for this
set plus the points-to solution this alias-set aliases.

You can then keep the candidate -> alias-set ID -> points-to
relation (thus candidates should not be "all variables" for
efficiency reasons).

Richard.

> Jakub
>


Re: Types are confused in inlining

2020-09-02 Thread Richard Biener via Gcc
On Wed, Sep 2, 2020 at 10:19 PM Gary Oblock via Gcc  wrote:
>
> I'm not accusing inlining of having problems but I really
> need to understand what's going on in this situation so I can
> fix my optimization.
>
> The error given is:
> main.c: In function ‘main’:
> main.c:5:1: error: non-trivial conversion in ‘ssa_name’
> 5 | main(void)
>   | ^
> struct type_t *
> unsigned long
> _101 = dedangled_97;
> during GIMPLE pass: fixup_cfg
> etc.
> etc.
>
> I put a conditional breakpoint in gdb where both
> _101 and dedangled_97 were created and low
> and behold they were both set to "unsigned long".
> Does anybody have a clue as to how "_101" got
> changed from "unsigned long" to "struct type_t *"?
> Note, the later is a meaningful type in my program.
> I'm trying to replace all instances of the former as
> part of structure reorganization optimization.) I should
> mention that this GIMPLE stmt is the one that moves
> the value computed in an inlined function into the body
> of code where the inling took place.

This is absolutely not enough information to guess at the
issue ;)

I suggest you break at the return stmt of make_ssa_name_fn
looking for t->base.u.version == 101 to see where and with
which type _101 is created, from there watch *&t->typed.type
in case something adjusts the type.

> Thanks,
>
> Gary Oblock
>
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Re: Question about exporting omputing alias sets

2020-09-02 Thread Richard Biener via Gcc
On Wed, Sep 2, 2020 at 10:04 AM Erick Ochoa
 wrote:
>
> Hello,
>
> I am trying to find out all pointers which alias a pointer and place
> them in a set.
>
> I am using `ptr_derefs_may_alias_p` to find out if two pointers may
> point to the same memory location. I think this yields conservative
> results (i.e., when it cannot be proven that to pointers may alias, it
> will give the result "true"). This is what I need.
>
> To collect all pointers which alias pointer "p", this means is that I
> have to collect all pointer variables and all pointer expressions in the
> program and then call `ptr_derefs_may_alias_p` O(n) times to find the
> set of pointers which alias pointer "p". If I have to do this for all
> pointers in the program, that would mean O(n^2).
>
> This also implies that in order for my sets to be correct I need to
> collect all pointer variables and all pointer expressions. I think that
> a better idea would be to have a list of all bitmap solutions and for
> every variable with a bitmap if position j is set, assigned to alias set
> j. In pseudocode:
>
> compute_alias_sets(bitmap set)
> {
>// create array of sets of same length as varmap
>// call it alias_map
>for (i = 0; i < varmap.length (); i++)
>{
>  bitmap set = get_varinfo(i)->solution;
>  EXECUTE_IF_SET_IN_BITMAP (set, 0, j, bi)
>  {
>varinfo_t v = get_varinfo (j);
>alias_map[j].insert(v);
>  }
>}
> }
>
> I think this is a better implementation. I don't need to worry about
> collecting all pointer expressions and I think varmap also contains
> declarations. (I think IPA-PTA already has "collected" all pointer
> expressions/declarations in varmap and created constraint variables for
> them.) *Would there be an issue with exporting varmap and delaying its
> deletion to this pass I'm working on?* This pass is intended to be run
> immediately after IPA-PTA. After this, I think I would still need to do
> some refining on the "alias_map" since (I think) there are more
> constraint variables than gimple variables in the partition, but the
> information I look for should be able to be derived from here.

I think you should add the code to compute your "alias sets" to
IPA-PTA if struct-reorg is enabled and export those like
we "export" the IPA escaped set (ipa_escaped_pt) to later
passes.

I agree that you want to work on the points-to solutions rather
than start from pointers and expressions in GIMPLE.

Richard.

> Thanks!


Re: [RFC] Add new flag to specify output constraint in match.pd

2020-09-02 Thread Richard Biener via Gcc
On Wed, Sep 2, 2020 at 9:35 AM Feng Xue OS  wrote:
>
> >
> >>
> >> >>   There is a match-folding issue derived from pr94234.  A piece of code 
> >> >> like:
> >> >>
> >> >>   int foo (int n)
> >> >>   {
> >> >>  int t1 = 8 * n;
> >> >>  int t2 = 8 * (n - 1);
> >> >>
> >> >>  return t1 - t2;
> >> >>   }
> >> >>
> >> >>  It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) 
> >> >> * C", and
> >> >>  be folded to constant "8". But this folding will fail if both v1 and 
> >> >> v2 have
> >> >>  multiple uses, as the following code.
> >> >>
> >> >>   int foo (int n)
> >> >>   {
> >> >>  int t1 = 8 * n;
> >> >>  int t2 = 8 * (n - 1);
> >> >>
> >> >>  use_fn (t1, t2);
> >> >>  return t1 - t2;
> >> >>   }
> >> >>
> >> >>  Given an expression with non-single-use operands, folding it will 
> >> >> introduce
> >> >>  duplicated computation in most situations, and is deemed to be 
> >> >> unprofitable.
> >> >>  But it is always beneficial if final result is a constant or existing 
> >> >> SSA value.
> >> >>
> >> >>  And the rule is:
> >> >>   (simplify
> >> >>(plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
> >> >>(if ((!ANY_INTEGRAL_TYPE_P (type)
> >> >> || TYPE_OVERFLOW_WRAPS (type)
> >> >> || (INTEGRAL_TYPE_P (type)
> >> >> && tree_expr_nonzero_p (@0)
> >> >> && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION 
> >> >> (type)
> >> >>/* If @1 +- @2 is constant require a hard single-use on either
> >> >>   original operand (but not on both).  */
> >> >>&& (single_use (@3) || single_use (@4)))   <- control 
> >> >> whether match or not
> >> >> (mult (plusminus @1 @2) @0)))
> >> >>
> >> >>  Current matcher only provides a way to check something before folding,
> >> >>  but no mechanism to affect decision after folding. If has, for the 
> >> >> above
> >> >>  case, we can let it go when we find result is a constant.
> >> >
> >> > :s already has a counter-measure where it still folds if the output is at
> >> > most one operation. So this transformation has a counter-counter-measure
> >> > of checking single_use explicitly. And now we want a counter^3-measure...
> >> >
> >> Counter-measure is key factor to matching-cost.  ":s" seems to be somewhat
> >> coarse-grained. And here we do need more control over it.
> >>
> >> But ideally, we could decouple these counter-measures from definitions of
> >> match-rule, and let gimple-matcher get a more reasonable match-or-not
> >> decision based on these counters. Anyway, it is another story.
> >>
> >> >>  Like the way to describe input operand using flags, we could also add
> >> >>  a new flag to specify this kind of constraint on output that we expect
> >> >>  it is a simple gimple value.
> >> >>
> >> >>  Proposed syntax is
> >> >>
> >> >>   (opcode:v{ condition } )
> >> >>
> >> >>  The char "v" stands for gimple value, if more descriptive, other char 
> >> >> is
> >> >>  preferred. "condition" enclosed by { } is an optional c-syntax 
> >> >> condition
> >> >>  expression. If present, only when "condition" is met, matcher will 
> >> >> check
> >> >>  whether folding result is a gimple value using
> >> >>  gimple_simplified_result_is_gimple_val ().
> >> >>
> >> >>  Since there is no SSA concept in GENERIC, this is only for 
> >> >> GIMPLE-match,
> >> >>  not GENERIC-match.
> >> >>
> >> >>  With this syntax, the rule is changed to
> >> >>
> >> >>  #Form 1:
> >> >>   (simplify
> >> >>(plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
> >> >>(if ((!ANY_INTEGRAL_TYPE_P (type)
> >> >> || TYPE_OVERFLOW_WRAPS (type)
> >> >> || (INTEGRAL_TYPE_P (type)
> >> >> && tree_expr_nonzero_p (@0)
> >> >> && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION 
> >> >> (type))
> >> >>( if (!single_use (@3) && !single_use (@4))
> >> >>   (mult:v (plusminus @1 @2) @0)))
> >> >>   (mult (plusminus @1 @2) @0)
> >> >
> >> > That seems to match what you can do with '!' now (that's very recent).
> >
> > It's also what :s does but a slight bit more "local".  When any operand is
> > marked :s and it has more than a single-use we only allow simplifications
> > that do not require insertion of extra stmts.  So basically the above 
> > pattern
> > doesn't behave any different than if you omit your :v.  Only if you'd
> > place :v on an inner expression there would be a difference.  Correlating
> > the inner expression we'd not want to insert new expressions for with
> > a specific :s (or multiple ones) would be a more natural extension of what
> > :s provides.
> >
> > Thus, for the above case (Form 1), you do not need :v at all and :s works.
>
> Between ":s" and ":v", there is a subtle difference. ":s" only ensures 
> interior
> transform does not insert any new stmts, but this is not true for final one.
>
> Code snippet generated for (A * C) +- (B * C) -> (A+-B) * C:
>
>   gimple_seq *lseq = s

Re: [RFC] Add new flag to specify output constraint in match.pd

2020-09-02 Thread Richard Biener via Gcc
On Wed, Sep 2, 2020 at 9:27 AM Marc Glisse  wrote:
>
> On Wed, 2 Sep 2020, Richard Biener via Gcc wrote:
>
> > On Mon, Aug 24, 2020 at 8:20 AM Feng Xue OS via Gcc  wrote:
> >>
> >>>>   There is a match-folding issue derived from pr94234.  A piece of code 
> >>>> like:
> >>>>
> >>>>   int foo (int n)
> >>>>   {
> >>>>  int t1 = 8 * n;
> >>>>  int t2 = 8 * (n - 1);
> >>>>
> >>>>  return t1 - t2;
> >>>>   }
> >>>>
> >>>>  It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) 
> >>>> * C", and
> >>>>  be folded to constant "8". But this folding will fail if both v1 and v2 
> >>>> have
> >>>>  multiple uses, as the following code.
> >>>>
> >>>>   int foo (int n)
> >>>>   {
> >>>>  int t1 = 8 * n;
> >>>>  int t2 = 8 * (n - 1);
> >>>>
> >>>>  use_fn (t1, t2);
> >>>>  return t1 - t2;
> >>>>   }
> >>>>
> >>>>  Given an expression with non-single-use operands, folding it will 
> >>>> introduce
> >>>>  duplicated computation in most situations, and is deemed to be 
> >>>> unprofitable.
> >>>>  But it is always beneficial if final result is a constant or existing 
> >>>> SSA value.
> >>>>
> >>>>  And the rule is:
> >>>>   (simplify
> >>>>(plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
> >>>>(if ((!ANY_INTEGRAL_TYPE_P (type)
> >>>> || TYPE_OVERFLOW_WRAPS (type)
> >>>> || (INTEGRAL_TYPE_P (type)
> >>>> && tree_expr_nonzero_p (@0)
> >>>> && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION 
> >>>> (type)
> >>>>/* If @1 +- @2 is constant require a hard single-use on either
> >>>>   original operand (but not on both).  */
> >>>>&& (single_use (@3) || single_use (@4)))   <- control whether 
> >>>> match or not
> >>>> (mult (plusminus @1 @2) @0)))
> >>>>
> >>>>  Current matcher only provides a way to check something before folding,
> >>>>  but no mechanism to affect decision after folding. If has, for the above
> >>>>  case, we can let it go when we find result is a constant.
> >>>
> >>> :s already has a counter-measure where it still folds if the output is at
> >>> most one operation. So this transformation has a counter-counter-measure
> >>> of checking single_use explicitly. And now we want a counter^3-measure...
> >>>
> >> Counter-measure is key factor to matching-cost.  ":s" seems to be somewhat
> >> coarse-grained. And here we do need more control over it.
> >>
> >> But ideally, we could decouple these counter-measures from definitions of
> >> match-rule, and let gimple-matcher get a more reasonable match-or-not
> >> decision based on these counters. Anyway, it is another story.
> >>
> >>>>  Like the way to describe input operand using flags, we could also add
> >>>>  a new flag to specify this kind of constraint on output that we expect
> >>>>  it is a simple gimple value.
> >>>>
> >>>>  Proposed syntax is
> >>>>
> >>>>   (opcode:v{ condition } )
> >>>>
> >>>>  The char "v" stands for gimple value, if more descriptive, other char is
> >>>>  preferred. "condition" enclosed by { } is an optional c-syntax condition
> >>>>  expression. If present, only when "condition" is met, matcher will check
> >>>>  whether folding result is a gimple value using
> >>>>  gimple_simplified_result_is_gimple_val ().
> >>>>
> >>>>  Since there is no SSA concept in GENERIC, this is only for GIMPLE-match,
> >>>>  not GENERIC-match.
> >>>>
> >>>>  With this syntax, the rule is changed to
> >>>>
> >>>>  #Form 1:
> >>>>   (simplify
> >>>>(plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
> >>>>(if ((!ANY_INTEGRAL_TYPE_P (type)
> >>>> || TYPE_OVERFLOW_WRAPS (t

Re: [RFC] Add new flag to specify output constraint in match.pd

2020-09-01 Thread Richard Biener via Gcc
On Mon, Aug 24, 2020 at 8:20 AM Feng Xue OS via Gcc  wrote:
>
> >>   There is a match-folding issue derived from pr94234.  A piece of code 
> >> like:
> >>
> >>   int foo (int n)
> >>   {
> >>  int t1 = 8 * n;
> >>  int t2 = 8 * (n - 1);
> >>
> >>  return t1 - t2;
> >>   }
> >>
> >>  It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * 
> >> C", and
> >>  be folded to constant "8". But this folding will fail if both v1 and v2 
> >> have
> >>  multiple uses, as the following code.
> >>
> >>   int foo (int n)
> >>   {
> >>  int t1 = 8 * n;
> >>  int t2 = 8 * (n - 1);
> >>
> >>  use_fn (t1, t2);
> >>  return t1 - t2;
> >>   }
> >>
> >>  Given an expression with non-single-use operands, folding it will 
> >> introduce
> >>  duplicated computation in most situations, and is deemed to be 
> >> unprofitable.
> >>  But it is always beneficial if final result is a constant or existing SSA 
> >> value.
> >>
> >>  And the rule is:
> >>   (simplify
> >>(plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
> >>(if ((!ANY_INTEGRAL_TYPE_P (type)
> >> || TYPE_OVERFLOW_WRAPS (type)
> >> || (INTEGRAL_TYPE_P (type)
> >> && tree_expr_nonzero_p (@0)
> >> && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION 
> >> (type)
> >>/* If @1 +- @2 is constant require a hard single-use on either
> >>   original operand (but not on both).  */
> >>&& (single_use (@3) || single_use (@4)))   <- control whether 
> >> match or not
> >> (mult (plusminus @1 @2) @0)))
> >>
> >>  Current matcher only provides a way to check something before folding,
> >>  but no mechanism to affect decision after folding. If has, for the above
> >>  case, we can let it go when we find result is a constant.
> >
> > :s already has a counter-measure where it still folds if the output is at
> > most one operation. So this transformation has a counter-counter-measure
> > of checking single_use explicitly. And now we want a counter^3-measure...
> >
> Counter-measure is key factor to matching-cost.  ":s" seems to be somewhat
> coarse-grained. And here we do need more control over it.
>
> But ideally, we could decouple these counter-measures from definitions of
> match-rule, and let gimple-matcher get a more reasonable match-or-not
> decision based on these counters. Anyway, it is another story.
>
> >>  Like the way to describe input operand using flags, we could also add
> >>  a new flag to specify this kind of constraint on output that we expect
> >>  it is a simple gimple value.
> >>
> >>  Proposed syntax is
> >>
> >>   (opcode:v{ condition } )
> >>
> >>  The char "v" stands for gimple value, if more descriptive, other char is
> >>  preferred. "condition" enclosed by { } is an optional c-syntax condition
> >>  expression. If present, only when "condition" is met, matcher will check
> >>  whether folding result is a gimple value using
> >>  gimple_simplified_result_is_gimple_val ().
> >>
> >>  Since there is no SSA concept in GENERIC, this is only for GIMPLE-match,
> >>  not GENERIC-match.
> >>
> >>  With this syntax, the rule is changed to
> >>
> >>  #Form 1:
> >>   (simplify
> >>(plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
> >>(if ((!ANY_INTEGRAL_TYPE_P (type)
> >> || TYPE_OVERFLOW_WRAPS (type)
> >> || (INTEGRAL_TYPE_P (type)
> >> && tree_expr_nonzero_p (@0)
> >> && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION 
> >> (type))
> >>( if (!single_use (@3) && !single_use (@4))
> >>   (mult:v (plusminus @1 @2) @0)))
> >>   (mult (plusminus @1 @2) @0)
> >
> > That seems to match what you can do with '!' now (that's very recent).

It's also what :s does but a slight bit more "local".  When any operand is
marked :s and it has more than a single-use we only allow simplifications
that do not require insertion of extra stmts.  So basically the above pattern
doesn't behave any different than if you omit your :v.  Only if you'd
place :v on an inner expression there would be a difference.  Correlating
the inner expression we'd not want to insert new expressions for with
a specific :s (or multiple ones) would be a more natural extension of what
:s provides.

Thus, for the above case (Form 1), you do not need :v at all and :s works.

Richard.

> Thanks,
> Feng


Re: [GSoC] Automatic Parallel Compilation Viability -- Final Report

2020-08-31 Thread Richard Biener via Gcc
On August 31, 2020 6:21:27 PM GMT+02:00, Giuliano Belinassi 
 wrote:
>Hi, Richi.
>
>On 08/31, Richard Biener wrote:
>> On Mon, Aug 31, 2020 at 1:15 PM Jan Hubicka  wrote:
>> >
>> > > On Fri, Aug 28, 2020 at 10:32 PM Giuliano Belinassi
>> > >  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > This is the final report of the "Automatic Parallel Compilation
>> > > > Viability" project.  Please notice that this report is pretty
>> > > > similar to the delivered from the 2nd evaluation, as this phase
>> > > > consisted of mostly rebasing and bug fixing.
>> > > >
>> > > > Please, reply this message for any question or suggestion.
>> > >
>> > > Thank you for your great work Giuliano!
>> >
>> > Indeed, it is quite amazing work :)
>> > >
>> > > It's odd that LTO emulated parallelism is winning here,
>> > > I'd have expected it to be slower.  One factor might
>> > > be different partitioning choices and the other might
>> > > be that the I/O required is faster than the GC induced
>> > > COW overhead after forking.  Note you can optimize
>> > > one COW operation by re-using the main process for
>> > > compiling the last partition.  I suppose you tested
>> > > this on a system with a fast SSD so I/O overhead is
>> > > small?
>> >
>> > At the time I implemented fork based parallelism for WPA (which I
>think
>> > we could recover by bit generalizing guiliano's patches), I had
>same
>> > outcome: forked ltranses was simply running slower than those after
>> > streaming.  This was however tested on Firefox in my estimate
>sometime
>> > around 2013. I never tried it on units comparable to insn-emit
>(which
>> > would be differnt at that time anyway). I was mostly aiming to get
>it
>> > fully transparent with streaming but never quite finished it since,
>at
>> > that time, it I tought time is better spent on optimizing LTO data
>> > layout.
>> >
>> > I suppose we want to keep both mechanizms in both WPA and normal
>> > compilation and make compiler to choose fitting one.
>> 
>> I repeated Giulianos experiment on gimple-match.ii and
>> producing LTO bytecode takes 5.3s and the link step
>> 9.5s with two jobs, 6.6s with three and 5.0s with four
>> and 2.4s with 32.
>> 
>> With -fparallel-jobs=N and --param promote-statics=1 I
>> see 14.8s, 13.9s and 13.5s here.  With 8 jobs the reduction
>> is to 11s.
>> 
>> It looks like LTO much more aggressively partitions
>> this - I see 36 partitions generated for gimple-match.c
>> while -fparallel-jobs creates "only" 27.  -fparallel-jobs
>> doesn't seem to honor the various lto-partition
>> --params btw?  The relevant ones would be
>> --param lto-partitions (the max. number of partitions
>> to generate) and --param lto-min-partition
>> (the minimum size for a partition).  I always thought
>> that lto-min-partition should be higher for
>> -fparallel-jobs (which I envisioned to be enabled by
>> default).
>
>There is a partition balancing mechanism that can be disabled
>with --param=balance-partitions=0.

Ah, I used =1 for this... 

>Assuming that you used -fparallel-jobs=32, it may be possible
>that it merged small partitions until it reached the average
>size of max_size / 33, which resulted in 27 partitions.

Note that the partitioning shouldn't depend on the argument to -fparallel-jobs 
for the sake of reproducible builds. 

>The only lto parameter that I use is --param=lto-min-partition
>controlling the minimum size in which that it will run
>in parallel. 
>
>> 
>> I guess before investigating the current state in detail
>> it might be worth exploring Honzas wish of sharing
>> the actual partitioning code between LTO and -fparallel-jobs.
>> 
>> Note that larger objects take a bigger hit from the GC COW
>> issue so at some point that becomes dominant because the
>> first GC walk for each partition is the same as doing a GC
>> walk for the whole object.  Eventually it makes sense to
>> turn off GC completely for smaller partitions.
>
>Just a side note, I added a ggc_collect () before start forking
>and it did not improve things.

You need to force collection, ggc_collect () is usually a no-op. Also see 
Honzas response here. Some experiments are needed here. 

Richard. 

>Thank you,
>Giuliano.
>
>> 
>> Richard.
>> 
>&g

Re: [GSoC] Automatic Parallel Compilation Viability -- Final Report

2020-08-31 Thread Richard Biener via Gcc
On Mon, Aug 31, 2020 at 1:15 PM Jan Hubicka  wrote:
>
> > On Fri, Aug 28, 2020 at 10:32 PM Giuliano Belinassi
> >  wrote:
> > >
> > > Hi,
> > >
> > > This is the final report of the "Automatic Parallel Compilation
> > > Viability" project.  Please notice that this report is pretty
> > > similar to the delivered from the 2nd evaluation, as this phase
> > > consisted of mostly rebasing and bug fixing.
> > >
> > > Please, reply this message for any question or suggestion.
> >
> > Thank you for your great work Giuliano!
>
> Indeed, it is quite amazing work :)
> >
> > It's odd that LTO emulated parallelism is winning here,
> > I'd have expected it to be slower.  One factor might
> > be different partitioning choices and the other might
> > be that the I/O required is faster than the GC induced
> > COW overhead after forking.  Note you can optimize
> > one COW operation by re-using the main process for
> > compiling the last partition.  I suppose you tested
> > this on a system with a fast SSD so I/O overhead is
> > small?
>
> At the time I implemented fork based parallelism for WPA (which I think
> we could recover by bit generalizing guiliano's patches), I had same
> outcome: forked ltranses was simply running slower than those after
> streaming.  This was however tested on Firefox in my estimate sometime
> around 2013. I never tried it on units comparable to insn-emit (which
> would be differnt at that time anyway). I was mostly aiming to get it
> fully transparent with streaming but never quite finished it since, at
> that time, it I tought time is better spent on optimizing LTO data
> layout.
>
> I suppose we want to keep both mechanizms in both WPA and normal
> compilation and make compiler to choose fitting one.

I repeated Giulianos experiment on gimple-match.ii and
producing LTO bytecode takes 5.3s and the link step
9.5s with two jobs, 6.6s with three and 5.0s with four
and 2.4s with 32.

With -fparallel-jobs=N and --param promote-statics=1 I
see 14.8s, 13.9s and 13.5s here.  With 8 jobs the reduction
is to 11s.

It looks like LTO much more aggressively partitions
this - I see 36 partitions generated for gimple-match.c
while -fparallel-jobs creates "only" 27.  -fparallel-jobs
doesn't seem to honor the various lto-partition
--params btw?  The relevant ones would be
--param lto-partitions (the max. number of partitions
to generate) and --param lto-min-partition
(the minimum size for a partition).  I always thought
that lto-min-partition should be higher for
-fparallel-jobs (which I envisioned to be enabled by
default).

I guess before investigating the current state in detail
it might be worth exploring Honzas wish of sharing
the actual partitioning code between LTO and -fparallel-jobs.

Note that larger objects take a bigger hit from the GC COW
issue so at some point that becomes dominant because the
first GC walk for each partition is the same as doing a GC
walk for the whole object.  Eventually it makes sense to
turn off GC completely for smaller partitions.

Richard.

> Honza
> >
> > Thanks again,
> > Richard.
> >
> > > Thank you,
> > > Giuliano.
> > >
> > > --- 8< ---
> > >
> > > # Automatic Parallel Compilation Viability: Final Report
> > >
> > > ## Complete Tasks
> > >
> > > For the third evaluation, we expected to deliver the product as a
> > > series of patches for trunk.  The patch series were in fact delivered
> > > [1], but several items must be fixed before merge.
> > >
> > >
> > > Overall, the project works and speedups ranges from 0.95x to 3.3x.
> > > Bootstrap is working, and therefore this can be used in an experimental
> > > state.
> > >
> > > ## How to use
> > >
> > > 1. Clone the autopar_devel branch:
> > > ```
> > > git clone --single-branch --branch devel/autopar_devel \
> > >   git://gcc.gnu.org/git/gcc.git gcc_autopar_devel
> > > ```
> > > 2. Follow the standard compilation options provided in the Compiling
> > > GCC page, and install it on some directory. For instance:
> > >
> > > ```
> > > cd gcc_autopar_devel
> > > mkdir build && cd build
> > > ../configure --disable-bootstrap --enable-languages=c,c++
> > > make -j 8
> > > make DESTDIR=/tmp/gcc11_autopar install
> > > ```
> > >
> > > 3. If you want to test whether your version is working, just launch
> > > Gcc with `-fparallel-jobs=2` when compiling a file with -c.
> > >
> > > 5. If you want to compile a project with this version it uses GNU
> > > Makefiles, you must modify the compilation rule command and prepend a
> > > `+` token to it. For example, in Git's Makefile, Change:
> > > ```
> > > $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs)
> > > $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(ALL_CFLAGS) 
> > > $(EXTRA_CPPFLAGS) $<
> > > ```
> > > to:
> > > ```
> > > $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs)
> > > +$(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(ALL_CFLAGS) 
> > > $(EXTRA_CPPFLAGS) $<
> > > ```
> > > as well as point the CC variable to the installed gcc, and
> > > append a `-fparallel-jobs

Re: [GSoC] Automatic Parallel Compilation Viability -- Final Report

2020-08-31 Thread Richard Biener via Gcc
On Fri, Aug 28, 2020 at 10:32 PM Giuliano Belinassi
 wrote:
>
> Hi,
>
> This is the final report of the "Automatic Parallel Compilation
> Viability" project.  Please notice that this report is pretty
> similar to the delivered from the 2nd evaluation, as this phase
> consisted of mostly rebasing and bug fixing.
>
> Please, reply this message for any question or suggestion.

Thank you for your great work Giuliano!

It's odd that LTO emulated parallelism is winning here,
I'd have expected it to be slower.  One factor might
be different partitioning choices and the other might
be that the I/O required is faster than the GC induced
COW overhead after forking.  Note you can optimize
one COW operation by re-using the main process for
compiling the last partition.  I suppose you tested
this on a system with a fast SSD so I/O overhead is
small?

Thanks again,
Richard.

> Thank you,
> Giuliano.
>
> --- 8< ---
>
> # Automatic Parallel Compilation Viability: Final Report
>
> ## Complete Tasks
>
> For the third evaluation, we expected to deliver the product as a
> series of patches for trunk.  The patch series were in fact delivered
> [1], but several items must be fixed before merge.
>
>
> Overall, the project works and speedups ranges from 0.95x to 3.3x.
> Bootstrap is working, and therefore this can be used in an experimental
> state.
>
> ## How to use
>
> 1. Clone the autopar_devel branch:
> ```
> git clone --single-branch --branch devel/autopar_devel \
>   git://gcc.gnu.org/git/gcc.git gcc_autopar_devel
> ```
> 2. Follow the standard compilation options provided in the Compiling
> GCC page, and install it on some directory. For instance:
>
> ```
> cd gcc_autopar_devel
> mkdir build && cd build
> ../configure --disable-bootstrap --enable-languages=c,c++
> make -j 8
> make DESTDIR=/tmp/gcc11_autopar install
> ```
>
> 3. If you want to test whether your version is working, just launch
> Gcc with `-fparallel-jobs=2` when compiling a file with -c.
>
> 5. If you want to compile a project with this version it uses GNU
> Makefiles, you must modify the compilation rule command and prepend a
> `+` token to it. For example, in Git's Makefile, Change:
> ```
> $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs)
> $(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(ALL_CFLAGS) 
> $(EXTRA_CPPFLAGS) $<
> ```
> to:
> ```
> $(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs)
> +$(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(ALL_CFLAGS) 
> $(EXTRA_CPPFLAGS) $<
> ```
> as well as point the CC variable to the installed gcc, and
> append a `-fparallel-jobs=jobserver` on your CFLAGS variable.
>
> # How the parallelism works in this project
>
> In LTO, the Whole Program Analysis decides how to partition the
> callgraph for running the LTRANS stage in parallel.  This project
> works very similar to this, however with some changes.
>
> The first was to modify the LTO structure so that it accepts
> the compilation without IR streaming to files.  This avoid an IO
> overhead when compiling in parallel.
>
> The second was to use a custom partitioner to find which nodes
> should be in the same partition.  This was mainly done to bring COMDAT
> together, as well as symbols that are part of other symbols, and even
> private symbols so that we do not output hidden global symbols.
>
> However, experiment showed that bringing private symbols together did
> not yield a interesting speedup on some large files, and therefore
> we implemented two modes of partitioning:
>
> 1. Partition without static promotion. This is the safer method to use,
> as we do not modify symbols in the Compilation Unit. This may lead to
> speedups in files that have multiple entries points with low
> connectivity between then (such as insn-emit.c), however this will not
> provide speedups when this hypothesis is not true (gimple-match.c is an
> example of this). This is the default mode.
>
> 2. Partition with static promotion to global. This is a more aggressive
> method, as we can decide to promote some functions to global to increase
> parallelism opportunity. This also will change the final assembler name
> of the promoted function to avoid collision with functions of others
> Compilation Units. To use this mode, the user has to manually specify
> --param=promote-statics=1, as they must be aware of this.
>
> Currently, partitioner 2. do not account the number of nodes to be
> promoted.  Implementing this certainly will reduce impact on produced
> code.
>
> ## Jobserver Integration
>
> We implemented a interface to communicate with the GNU Make's Jobserver
> that is able to detect when the GNU Make Jobserver is active, thanks to
> Nathan Sidwell. This works as follows:
>
> When -fparallel-jobs=jobserver is provided, GCC will try to detect if
> there is a running Jobserver in which we can communicate to. If true,
> we return the token that Make originally gave to us, then we wait for
> make for a new token that, when provided, will launch a forked child
> process with 

Re: Test case for improving PTA precision

2020-08-28 Thread Richard Biener via Gcc
On Fri, Aug 28, 2020 at 1:24 PM Erick Ochoa
 wrote:
>
> Hi,
>
> I'm testing the precision of IPA-PTA when compiling with -flto. I found
> this case when a global variable is marked as escaping even if the
> variable is a primitive type and no address is taken.
>
> This is the result of IPA-PTA which I believe is wrong.
>
> buff2 = { ESCAPED NONLOCAL }
> buff1 = { }
> buff0 = { ESCAPED NONLOCAL }
>
> The variable must be assigned a value returned from a function from a
> library (i.e. I think the execution path in IPA-PTA is through
> handle_lhs_call).
>
> I later tested with local variables and those are correctly marked as
> not escaping. This might have to do just with the fact that these are
> global variables. I understand that there's also virtual memory operands
> which define that a function might modify a global variable... but I
> would suspect that ipa-visibility should have turned these variables as
> not externally visible. But then why is buff1 not escaping? (strlen is a
> builtin and there's a different execution path...)
>
> I talked about this before but I'm adding the test case here in case
> someone more knowledgeable can comment and guide me towards a more
> concrete reason and I could try to provide a patch that also includes
> the fix itself.
>
> This was compiled and tested with GCC-10.2.0
>
>
> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-20.c
> b/gcc/testsuite/gcc.dg/ipa/ipa-pta-20.c
> new file mode 100644
> index 000..c82d5205b78
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run } */
> +/* { dg-options "-flto -flto-partition=none -O2 -fipa-pta
> -fdump-ipa-pta2-details" } */
> +
> +#include 
> +#include 
> +#include 
> +
> +char buff0;
> +char buff1;
> +char buff2;
> +
> +int
> +main(int argc, char** argv)
> +{
> +  buff0 = argv[1][0]; // escapes?
> +  buff1 = strlen(argv[1]); // does not escape
> +  buff2 = rand(); // escapes?
> +  return &buff0 < &buff1 ? &buff2 < &buff1 : 0;
> +}
> +
> +/* { dg-final { scan-ipa-dump "buff0 = { }" "pta2" } } */
> +/* { dg-final { scan-ipa-dump "buff1 = { }" "pta2" } } */
> +/* { dg-final { scan-ipa-dump "buff2 = { }" "pta2" } } */

I think you're reading the dumps wrong.

ESCAPED = { }

nothing escapes

buff2 = { ESCAPED NONLOCAL }
buff1 = { }
buff0 = { ESCAPED NONLOCAL }

this means that buff2 and buff0 point to what escapes and other global memory.
This is because argv[1][0] points to global memory and rand () returns
a pointer to global memory (and everything that escapes is also global memory).

Note we track pointers through integers which means even a 'char' and an 'int'
are considered pointer (parts).  strlen is handled explicitely to not
convey a pointer.

The above does not mean that buff2 or buff0 escape!

Richard.


Re: LTO slows down calculix by more than 10% on aarch64

2020-08-28 Thread Richard Biener via Gcc
On Fri, Aug 28, 2020 at 1:17 PM Prathamesh Kulkarni
 wrote:
>
> On Wed, 26 Aug 2020 at 16:50, Richard Biener  
> wrote:
> >
> > On Wed, Aug 26, 2020 at 12:34 PM Prathamesh Kulkarni via Gcc
> >  wrote:
> > >
> > > Hi,
> > > We're seeing a consistent regression >10% on calculix with -O2 -flto vs 
> > > -O2
> > > on aarch64 in our validation CI. I tried to investigate this issue a
> > > bit, and it seems the regression comes from inlining of orthonl into
> > > e_c3d. Disabling that brings back the performance. However, inlining
> > > orthonl into e_c3d, increases it's size from 3187 to 3837 by around
> > > 16.9% which isn't too large.
> > >
> > > I have attached two test-cases, e_c3d.f that has orthonl manually
> > > inlined into e_c3d to "simulate" LTO's inlining, and e_c3d-orig.f,
> > > which contains unmodified function.
> > > (gauss.f is included by e_c3d.f). For reproducing, just passing -O2 is
> > > sufficient.
> > >
> > > It seems that inlining orthonl, causes 20 hoistings into block 181,
> > > which are then hoisted to block 173, in particular hoistings of w(1,
> > > 1) ... w(3, 3), which wasn't
> > > possible without inlining. The hoistings happen because of basic block
> > > that computes orthonl in line 672 has w(1, 1) ... w(3, 3) and the
> > > following block in line 1035 in e_c3d.f:
> > >
> > > senergy=
> > >  &(s11*w(1,1)+s12*(w(1,2)+w(2,1))
> > >  &+s13*(w(1,3)+w(3,1))+s22*w(2,2)
> > >  &+s23*(w(2,3)+w(3,2))+s33*w(3,3))*weight
> > >
> > > Disabling hoisting into blocks 173 (and 181), brings back most of the
> > > performance. I am not able to understand why (if?) these hoistings of
> > > w(1, 1) ...
> > > w(3, 3) are causing slowdown however. Looking at assembly, the hot
> > > code-path from perf in e_c3d shows following code-gen diff:
> > > For inlined version:
> > > .L122:
> > > ldr d15, [x1, -248]
> > > add w0, w0, 1
> > > add x2, x2, 24
> > > add x1, x1, 72
> > > fmuld15, d17, d15
> > > fmuld15, d15, d18
> > > fmuld14, d15, d14
> > > fmadd   d16, d14, d31, d16
> > > cmp w0, 4
> > > beq .L121
> > > ldr d14, [x2, -8]
> > > b   .L122
> > >
> > > and for non-inlined version:
> > > .L118:
> > > ldr d0, [x1, -248]
> > > add w0, w0, 1
> > > ldr d2, [x2, -8]
> > > add x1, x1, 72
> > > add x2, x2, 24
> > > fmuld0, d3, d0
> > > fmuld0, d0, d5
> > > fmuld0, d0, d2
> > > fmadd   d1, d4, d0, d1
> > > cmp w0, 4
> > > bne .L118
> >
> > I wonder if you have profles.  The inlined version has a
> > non-empty latch block (looks like some PRE is happening
> > there?).  Eventually your uarch does not like the close
> > (does your assembly show the layour as it is?) branches?
> Hi Richard,
> I have uploaded profiles obtained by perf here:
> -O2: https://people.linaro.org/~prathamesh.kulkarni/o2_perf.data
> -O2 -flto: https://people.linaro.org/~prathamesh.kulkarni/o2_lto_perf.data
>
> For the above loop, it shows the following:
> -O2:
>   0.01 │ f1c:  ldur   d0, [x1, #-248]
>   3.53 │addw0, w0, #0x1
>   │ldur   d2, [x2, #-8]
>   3.54 │addx1, x1, #0x48
>   │addx2, x2, #0x18
>   5.89 │fmul   d0, d3, d0
> 14.12 │fmul   d0, d0, d5
> 14.14 │fmul   d0, d0, d2
> 14.13 │fmadd  d1, d4, d0, d1
>   0.00 │cmpw0, #0x4
>   3.52 │  ↑ b.ne   f1c
>
> -O2 -flto:
>   5.47  |1124:ldur   d15, [x1, #-248]
>   2.19  │addw0, w0, #0x1
>   1.10  │addx2, x2, #0x18
>   2.18  │addx1, x1, #0x48
>   4.37  │fmul   d15, d17, d15
>  13.13 │fmul   d15, d15, d18
>  13.13 │fmul   d14, d15, d14
>  13.14 │fmadd  d16, d14, d31, d16
>│cmpw0, #0x4
>   3.28  │↓ b.eq   1154
>   0.00  │ldur   d14, [x2, #-8]
>   2.19  │↑ b  1124
>
> IIUC, the biggest relative difference comes fr

Re: Questions regarding update_stmt and release_ssa_name_fn.

2020-08-28 Thread Richard Biener via Gcc
On Fri, Aug 28, 2020 at 6:29 AM Gary Oblock  wrote:
>
> > If x_2 is a default def then the IL isn't correct in the first place.  I 
> > doubt
> > it is that way, btw. - we have verifiers that would blow up if it would.
>
> Richard,
>
> I'm just sharing this so you can tell me whether or not I'm going
> crazy. ;-)
>
> This little function is finding that arr_2 = PHI 
> is problematic.
>
> void
> wolf_fence (
> Info *info // Pass level gobal info (might not use it)
>   )
> {
>   struct cgraph_node *node;
>
>   fprintf( stderr,
>   "Wolf Fence: Find wolf for default defs with non nop defines\n");
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> {
>   struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
>   push_cfun ( func);
>
>   unsigned int len = SSANAMES ( func)->length ();
>   for ( unsigned int i = 0; i < len; i++)
>
> {
>
>  tree ssa_name = (*SSANAMES ( func))[i];
>  if ( ssa_name == NULL ) continue;
>  if ( ssa_defined_default_def_p ( ssa_name) )

That's because this function is supposed to be only called
on SSA default defs but you call it on all SSA names.  Add
a SSA_NAME_IS_DEFAULT_DEF (ssa_name) && before
and all will be well.

>{
>  gimple *def_stmt =
> SSA_NAME_DEF_STMT ( ssa_name);
>  if ( !gimple_nop_p ( def_stmt) )
>
> {
>  fprintf ( stderr, "Wolf fence caught :");
>  print_gimple_stmt ( stderr, def_stmt, 0);
>  gcc_assert (0);
> }
>
>}
>
>}
>
> pop_cfun ();
> }
> fprintf( stderr, "Wolf Fence: Didn't find wolf!\n");
> }
>
> This is run at the very start of the structure reorg pass
> before any of my code did anything at all (except initiate
> the structure info with a few flags and the like.)
>
> Here's C code:
>
> - aux.h ---
> #include "stdlib.h"
> typedef struct type type_t;
> struct type {
>   double x;
>   double y;
> };
>
> extern type_t *min_of_x( type_t *, size_t);
> - aux.c ---
> #include "aux.h"
> #include "stdlib.h"
>
> type_t *
> min_of_x( type_t *arr, size_t len)
> {
>   type_t *end_of = arr + len;
>   type_t *loc = arr;
>   double result = arr->x;
>   arr++;
>   for( ; arr < end_of ; arr++  ) {
> double value = arr->x;
> if (  value < result ) {
>   result = value;
>   loc = arr;
> }
>   }
>   return loc;
> }
> - main.c --
> #include "aux.h"
> #include "stdio.h"
>
> int
> main(void)
> {
>   size_t len = 1;
>   type_t *data = (type_t *)malloc( len * sizeof(type_t));
>   int i;
>   for( i = 0; i < len; i++ ) {
> data[i].x = drand48();
>   }
>
>   type_t *min_x;
>   min_x = min_of_x( data, len);
>
>   if ( min_x == 0 ) {
> printf("min_x error\n");
> exit(-1);
>   }
>
>   printf("min_x %e\n" , min_x->x);
> }
> ---
> Here's the GIMPLE comining into the structure reoganization pass:
>
> Program:
>
> ;; Function min_of_x (min_of_x, funcdef_no=0, decl_uid=4391, cgraph_uid=2, 
> symbol_order=23) (executed once)
>
> min_of_x (struct type_t * arr, size_t len)
> {
>   double value;
>   double result;
>   struct type_t * loc;
>   struct type_t * end_of;
>
>[local count: 118111600]:
>   _1 = len_7(D) * 16;
>   end_of_9 = arr_8(D) + _1;
>   result_11 = arr_8(D)->x;
>   arr_12 = arr_8(D) + 16;
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   value_14 = arr_2->x;
>   if (result_6 > value_14)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 477815112]:
>
>[local count: 955630225]:
>   # loc_3 = PHI 
>   # result_5 = PHI 
>   arr_15 = arr_2 + 16;
>
>[local count: 1073741824]:
>   # arr_2 = PHI 
>   # loc_4 = PHI 
>   # result_6 = PHI 
>   if (arr_2 < end_of_9)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   # loc_13 = PHI 
>   return loc_13;
>
> }
>
>
>
> ;; Function main (main, funcdef_no=1, decl_uid=4389, cgraph_uid=1, 
> symbol_order=5) (executed once)
>
> main ()
> {
>   struct type_t * min_x;
>   int i;
>   struct type_t * data;
>
>[local count: 10737416]:
>   data_10 = malloc (16);
>   goto ; [100.00%]
>
>[local count: 1063004409]:
>   _1 =

Re: Questions regarding update_stmt and release_ssa_name_fn.

2020-08-27 Thread Richard Biener via Gcc
On August 27, 2020 7:45:15 PM GMT+02:00, Gary Oblock  
wrote:
>Richard,
>
>>You need to call update_stmt () if you change SSA operands to
>>sth else.
>
>I'm having trouble parsing the "sth else" above. Could you
>please rephrase this if it's important to your point. I take
>what you mean is if you change any SSA operand to any
>statement then update that statement.

If you change any SSA operand of any stmt to a different SSA name or a constant 
then you need to update the stmt containing the SSA operand. 

In _1 = _2 + 3; _2 is an SSA operand of that stmt. 

Richard. 

>Thanks,
>
>Gary
>
>From: Richard Biener 
>Sent: Thursday, August 27, 2020 2:04 AM
>To: Gary Oblock 
>Cc: gcc@gcc.gnu.org 
>Subject: Re: Questions regarding update_stmt and release_ssa_name_fn.
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>On Wed, Aug 26, 2020 at 11:32 PM Gary Oblock via Gcc 
>wrote:
>>
>> I'm having some major grief with a few related things that I'm try to
>> do. The mostly revolve around trying to change the type of an SSA
>name
>> (which I've given up in favor of creating new SSA names and replacing
>> the ones I wanted to change.) However, this seems too has its own
>> issues.
>>
>> In one problematic case in particular, I'm seeing a sequence like:
>>
>> foo_3 = mumble_1 op mumble_2
>>
>> bar_5 = foo_3 op baz_4
>>
>> when replace foo_3 with foo_4 the (having the needed new type.)
>>
>> I'm seeing a later verification phase think
>>
>> bar_5 = foo_4 op baz_4
>>
>> is still associated with the foo_3.
>>
>> Should the transformation above be associated with update_stmt and/or
>> release_ssa_name_fn? And if they are both needed is there a proper
>> order required.  Note, when I try using them, I'm seeing some
>malformed
>> tree operands that die in horrible ways.
>>
>> By the way, I realize I can probably simply create a new GIMPLE stmt
>> from scratch to replace the ones I'm modifying but this will cause
>> some significant code bloat and I want to avoid that if at all
>> possible.
>
>You need to call update_stmt () if you change SSA operands to
>sth else.
>
>> There is an addition wrinkle to this problem with C code like this
>>
>> void
>> whatever ( int x, .. )
>> {
>>   :
>>   x++;
>>   :
>> }
>>
>> I'm seeing x_2 being thought of as default definition in the
>following
>> GIMPLE stmt when it's clearly not since it's defined by the
>statement.
>>
>>   x_2 = X_1 + 4
>>
>> My approach has been to simply make the SSA name to replace x_2a
>> normal SSA name and not a default def. Is this not reasonable and
>> correct?
>
>If x_2 is a default def then the IL isn't correct in the first place. 
>I doubt
>it is that way, btw. - we have verifiers that would blow up if it
>would.
>
>Richard.
>
>>
>> Thanks,
>>
>> Gary Oblock
>>
>> Gary
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail message, including any
>attachments, is for the sole use of the intended recipient(s) and
>contains information that is confidential and proprietary to Ampere
>Computing or its subsidiaries. It is to be used solely for the purpose
>of furthering the parties' business relationship. Any unauthorized
>review, copying, or distribution of this email (or any attachments
>thereto) is strictly prohibited. If you are not the intended recipient,
>please contact the sender immediately and permanently delete the
>original and any copies of this email and any attachments thereto.



Re: Questions regarding update_stmt and release_ssa_name_fn.

2020-08-27 Thread Richard Biener via Gcc
On Wed, Aug 26, 2020 at 11:32 PM Gary Oblock via Gcc  wrote:
>
> I'm having some major grief with a few related things that I'm try to
> do. The mostly revolve around trying to change the type of an SSA name
> (which I've given up in favor of creating new SSA names and replacing
> the ones I wanted to change.) However, this seems too has its own
> issues.
>
> In one problematic case in particular, I'm seeing a sequence like:
>
> foo_3 = mumble_1 op mumble_2
>
> bar_5 = foo_3 op baz_4
>
> when replace foo_3 with foo_4 the (having the needed new type.)
>
> I'm seeing a later verification phase think
>
> bar_5 = foo_4 op baz_4
>
> is still associated with the foo_3.
>
> Should the transformation above be associated with update_stmt and/or
> release_ssa_name_fn? And if they are both needed is there a proper
> order required.  Note, when I try using them, I'm seeing some malformed
> tree operands that die in horrible ways.
>
> By the way, I realize I can probably simply create a new GIMPLE stmt
> from scratch to replace the ones I'm modifying but this will cause
> some significant code bloat and I want to avoid that if at all
> possible.

You need to call update_stmt () if you change SSA operands to
sth else.

> There is an addition wrinkle to this problem with C code like this
>
> void
> whatever ( int x, .. )
> {
>   :
>   x++;
>   :
> }
>
> I'm seeing x_2 being thought of as default definition in the following
> GIMPLE stmt when it's clearly not since it's defined by the statement.
>
>   x_2 = X_1 + 4
>
> My approach has been to simply make the SSA name to replace x_2a
> normal SSA name and not a default def. Is this not reasonable and
> correct?

If x_2 is a default def then the IL isn't correct in the first place.  I doubt
it is that way, btw. - we have verifiers that would blow up if it would.

Richard.

>
> Thanks,
>
> Gary Oblock
>
> Gary
>
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any unauthorized review, copying, or distribution of this email 
> (or any attachments thereto) is strictly prohibited. If you are not the 
> intended recipient, please contact the sender immediately and permanently 
> delete the original and any copies of this email and any attachments thereto.


Re: LTO slows down calculix by more than 10% on aarch64

2020-08-26 Thread Richard Biener via Gcc
On Wed, Aug 26, 2020 at 12:34 PM Prathamesh Kulkarni via Gcc
 wrote:
>
> Hi,
> We're seeing a consistent regression >10% on calculix with -O2 -flto vs -O2
> on aarch64 in our validation CI. I tried to investigate this issue a
> bit, and it seems the regression comes from inlining of orthonl into
> e_c3d. Disabling that brings back the performance. However, inlining
> orthonl into e_c3d, increases it's size from 3187 to 3837 by around
> 16.9% which isn't too large.
>
> I have attached two test-cases, e_c3d.f that has orthonl manually
> inlined into e_c3d to "simulate" LTO's inlining, and e_c3d-orig.f,
> which contains unmodified function.
> (gauss.f is included by e_c3d.f). For reproducing, just passing -O2 is
> sufficient.
>
> It seems that inlining orthonl, causes 20 hoistings into block 181,
> which are then hoisted to block 173, in particular hoistings of w(1,
> 1) ... w(3, 3), which wasn't
> possible without inlining. The hoistings happen because of basic block
> that computes orthonl in line 672 has w(1, 1) ... w(3, 3) and the
> following block in line 1035 in e_c3d.f:
>
> senergy=
>  &(s11*w(1,1)+s12*(w(1,2)+w(2,1))
>  &+s13*(w(1,3)+w(3,1))+s22*w(2,2)
>  &+s23*(w(2,3)+w(3,2))+s33*w(3,3))*weight
>
> Disabling hoisting into blocks 173 (and 181), brings back most of the
> performance. I am not able to understand why (if?) these hoistings of
> w(1, 1) ...
> w(3, 3) are causing slowdown however. Looking at assembly, the hot
> code-path from perf in e_c3d shows following code-gen diff:
> For inlined version:
> .L122:
> ldr d15, [x1, -248]
> add w0, w0, 1
> add x2, x2, 24
> add x1, x1, 72
> fmuld15, d17, d15
> fmuld15, d15, d18
> fmuld14, d15, d14
> fmadd   d16, d14, d31, d16
> cmp w0, 4
> beq .L121
> ldr d14, [x2, -8]
> b   .L122
>
> and for non-inlined version:
> .L118:
> ldr d0, [x1, -248]
> add w0, w0, 1
> ldr d2, [x2, -8]
> add x1, x1, 72
> add x2, x2, 24
> fmuld0, d3, d0
> fmuld0, d0, d5
> fmuld0, d0, d2
> fmadd   d1, d4, d0, d1
> cmp w0, 4
> bne .L118

I wonder if you have profles.  The inlined version has a
non-empty latch block (looks like some PRE is happening
there?).  Eventually your uarch does not like the close
(does your assembly show the layour as it is?) branches?

> which corresponds to the following loop in line 1014.
> do n1=1,3
>   s(iii1,jjj1)=s(iii1,jjj1)
>  &  +anisox(m1,k1,n1,l1)
>  &  *w(k1,l1)*vo(i1,m1)*vo(j1,n1)
>  &  *weight
>
> I am not sure why would hoisting have any direct effect on this loop
> except perhaps that hoisting allocated more reigsters, and led to
> increased register pressure. Perhaps that's why it's using highered
> number regs for code-gen in inlined version ? However disabling
> hoisting in blocks 173 and 181, also leads to overall 6 extra spills
> (by grepping for str to sp), so
> hoisting is also helping here ? I am not sure how to proceed further,
> and would be grateful for suggestions.
>
> Thanks,
> Prathamesh


Re: Do all global structure variables escape in IPA-PTA?

2020-08-26 Thread Richard Biener via Gcc
On Wed, Aug 26, 2020 at 11:45 AM Erick Ochoa
 wrote:
>
>
>
> On 26/08/2020 10:36, Erick Ochoa wrote:
> >
> >
> > On 25/08/2020 22:03, Richard Biener wrote:
> >> On August 25, 2020 6:36:19 PM GMT+02:00, Erick Ochoa
> >>  wrote:
> >>>
> >>>
> >>> On 25/08/2020 17:19, Erick Ochoa wrote:
> >>>>
> >>>>
> >>>> On 25/08/2020 17:10, Richard Biener wrote:
> >>>>> On August 25, 2020 3:09:13 PM GMT+02:00, Erick Ochoa
> >>>>>  wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm trying to understand how the escape analysis in IPA-PTA works.
> >>> I
> >>>>>> was
> >>>>>> testing a hypothesis where if a structure contains an array of
> >>>>>> characters and this array of characters is passed to fopen, the
> >>>>>> structure and all subfields will escape.
> >>>>>>
> >>>>>> To do this, I made a program that has a global structure variable
> >>> foo2
> >>>>>> that is has a field passed as an argument to fopen. I also made
> >>> another
> >>>>>>
> >>>>>> variable foo whose array is initialized by the result of rand.
> >>>>>>
> >>>>>> However, after compiling this program with -flto
> >>> -flto-partition=none
> >>>>>> -fipa -fdump-ipa-pta -fdump-tree-all-all -Ofast (gcc --version
> >>> 10.2.0)
> >>>>>>
> >>>>>> E.g.
> >>>>>>
> >>>>>> #include 
> >>>>>> #include 
> >>>>>> #include 
> >>>>>>
> >>>>>> struct foo_t {
> >>>>>> char buffer1[100];
> >>>>>> char buffer2[100];
> >>>>>> };
> >>>>>>
> >>>>>> struct foo_t foo;
> >>>>>> struct foo_t foo2;
> >>>>>>
> >>>>>> int
> >>>>>> main(int argc, char** argv)
> >>>>>> {
> >>>>>>
> >>>>>> fopen(foo2.buffer1, "r");
> >>>>>> for (int i = 0; i < 100; i++)
> >>>>>> {
> >>>>>>   foo.buffer1[i] = rand();
> >>>>>> }
> >>>>>> int i = rand();
> >>>>>> int retval = foo.buffer1[i % 100];
> >>>>>> return retval;
> >>>>>> }
> >>>>>>
> >>>>>> I see the PTA dump state the following:
> >>>>>>
> >>>>>> ESCAPED = { STRING ESCAPED NONLOCAL foo2 }
> >>>>>> foo = { ESCAPED NONLOCAL }
> >>>>>> foo2 = { ESCAPED NONLOCAL }
> >>>>>>
> >>>>>> which I understand as
> >>>>>> * something externally visible might point to foo2
> >>>>>> * foo2 might point to something externally visible
> >>>>>> * foo might point to something externally visible
> >>>>>
> >>>>> Yes. So it's exactly as your hypothesis.
> >>>>>
> >>>>>> I have seen that global variables are stored in the .gnu.lto_.decls
> >>> LTO
> >>>>>>
> >>>>>> file section. In the passes I have worked on I have ignored global
> >>>>>> variables. But can foo and foo2 be marked as escaping because the
> >>>>>> declarations are not streamed in yet? Or is there another reason I
> >>> am
> >>>>>> not seeing? I am aware of aware of the several TODOs at the
> >>> beginning
> >>>>>> of
> >>>>>> gcc/tree-ssa-structalias.c but I am unsure if they contribute to
> >>> these
> >>>>>> variables being marked as escaping. (Maybe TODO 1 and TODO 2?)
> >>>>>
> >>>>> Not sure what the problem is. Foo2 escapes because it's address is
> >>>>> passed to a function.
> >>>>>
> >>>>
> >>>> foo2 is not the problem, it is foo. foo is not passed to a function
> >>> and
> >>>> it is also escaping.
> >>>
> >>>
> >>> Sorry, I meant: foo might point to 

Re: Do all global structure variables escape in IPA-PTA?

2020-08-25 Thread Richard Biener via Gcc
On August 25, 2020 6:36:19 PM GMT+02:00, Erick Ochoa 
 wrote:
>
>
>On 25/08/2020 17:19, Erick Ochoa wrote:
>> 
>> 
>> On 25/08/2020 17:10, Richard Biener wrote:
>>> On August 25, 2020 3:09:13 PM GMT+02:00, Erick Ochoa 
>>>  wrote:
>>>> Hi,
>>>>
>>>> I'm trying to understand how the escape analysis in IPA-PTA works.
>I
>>>> was
>>>> testing a hypothesis where if a structure contains an array of
>>>> characters and this array of characters is passed to fopen, the
>>>> structure and all subfields will escape.
>>>>
>>>> To do this, I made a program that has a global structure variable
>foo2
>>>> that is has a field passed as an argument to fopen. I also made
>another
>>>>
>>>> variable foo whose array is initialized by the result of rand.
>>>>
>>>> However, after compiling this program with -flto
>-flto-partition=none
>>>> -fipa -fdump-ipa-pta -fdump-tree-all-all -Ofast (gcc --version
>10.2.0)
>>>>
>>>> E.g.
>>>>
>>>> #include 
>>>> #include 
>>>> #include 
>>>>
>>>> struct foo_t {
>>>>    char buffer1[100];
>>>>    char buffer2[100];
>>>> };
>>>>
>>>> struct foo_t foo;
>>>> struct foo_t foo2;
>>>>
>>>> int
>>>> main(int argc, char** argv)
>>>> {
>>>>
>>>>    fopen(foo2.buffer1, "r");
>>>>    for (int i = 0; i < 100; i++)
>>>>    {
>>>>  foo.buffer1[i] = rand();
>>>>    }
>>>>    int i = rand();
>>>>    int retval = foo.buffer1[i % 100];
>>>>    return retval;
>>>> }
>>>>
>>>> I see the PTA dump state the following:
>>>>
>>>> ESCAPED = { STRING ESCAPED NONLOCAL foo2 }
>>>> foo = { ESCAPED NONLOCAL }
>>>> foo2 = { ESCAPED NONLOCAL }
>>>>
>>>> which I understand as
>>>> * something externally visible might point to foo2
>>>> * foo2 might point to something externally visible
>>>> * foo might point to something externally visible
>>>
>>> Yes. So it's exactly as your hypothesis.
>>>
>>>> I have seen that global variables are stored in the .gnu.lto_.decls
>LTO
>>>>
>>>> file section. In the passes I have worked on I have ignored global
>>>> variables. But can foo and foo2 be marked as escaping because the
>>>> declarations are not streamed in yet? Or is there another reason I
>am
>>>> not seeing? I am aware of aware of the several TODOs at the
>beginning
>>>> of
>>>> gcc/tree-ssa-structalias.c but I am unsure if they contribute to
>these
>>>> variables being marked as escaping. (Maybe TODO 1 and TODO 2?)
>>>
>>> Not sure what the problem is. Foo2 escapes because it's address is 
>>> passed to a function.
>>>
>> 
>> foo2 is not the problem, it is foo. foo is not passed to a function
>and 
>> it is also escaping.
>
>
>Sorry, I meant: foo might point to something which is externally 
>visible. Which I don't think is the case in the program. I understand 
>this might be due to the imprecision in the escape-analysis and what
>I'm 
>trying to find out is the source of imprecision.

Foo is exported and thus all function calls can store to it making it point to 
escaped and nonlocal variables. 

Richard. 

>> 
>>> ?
>>>
>>> Richard.
>>>
>>>> Just FYI, I've been reading:
>>>> * Structure Aliasing in GCC
>>>> * Gimple Alias Improvements for GCC 4.5
>>>> * Memory SSA - A Unified Approach for Sparsely Representing Memory
>>>> Operations
>>>>
>>>> Thanks, I appreciate all help!
>>>



Re: Question about IPA-PTA and build_alias

2020-08-25 Thread Richard Biener via Gcc
On August 24, 2020 10:00:44 AM GMT+02:00, Erick Ochoa 
 wrote:
>
>
>On 24/08/2020 09:40, Richard Biener wrote:
>> On Mon, Aug 17, 2020 at 3:22 PM Erick Ochoa
>>  wrote:
>>>
>>> Hello,
>>>
>>> I'm looking to understand better the points-to analysis (IPA-PTA)
>and
>>> the alias analysis (build_alias).
>>>
>>> How is the information produced by IPA-PTA consumed?
>>>
>>> Are alias sets in build_alias computed by the intersections of the
>>> points_to_set(s) (computed by IPA-PTA)?
>>>
>>> My intuition tells me that it could be relatively simple to move
>>> build_alias to be an SIMPLE_IPA_PASS performed just after IPA-PTA,
>but I
>>> do not have enough experience in GCC to tell if this is correct.
>What
>>> could be some difficulties which I am not seeing? (Either move, or
>>> create a new IPA-ALIAS SIMPLE_IPA_PASS.) This pass would have the
>same
>>> sensitivity as IPA-PTA { flow-insensitive, context-insensitive,
>>> field-sensitive } because the alias sets could be computed by the
>>> intersection of points-to-sets.
>> 
>> Both IPA-PTA and build_alias do the same, they build PTA constraint
>> sets, solve them and attach points-to info to SSA names.  Just
>IPA-PTA
>> does this for the whole TU while build_alias does it for a function
>at a time.
>> 
>> So I guess I do not understand your question.
>
>Hi Richard,
>
>I'm just trying to imagine what a data-layout optimization would look 
>like if instead of using the type-escape analysis we used the points-to
>
>analysis to find out which variables/memory locations escape and what 
>that would mean for the transformation itself.

What I've said before is that for the object based approach you need precise 
following of pointers which covers escape analysis already. For non allocated 
objects you need to find possible address taken and accesses. 

I don't think the escape analysis included in IPA points-to analysis will help 
you in the end. The constraint solver does not do the precise analysis you need 
as well but the precise analysis will give you conservative escape results. 

>One of the things that I think would be needed are alias-sets. I
>thought 
>that build_alias was building alias sets but I was mistaken. However, 
>computing the alias sets should not be too difficult.
>
>Also continuing imagining what a data-layout optimization would look 
>like in GCC, since IPA-PTA is a SIMPLE_IPA_PASS and if alias sets are 
>indeed needed, I was asking what would be the reception to a 
>SIMPLE_IPA_PASS that computes alias sets just after IPA-PTA. (As
>opposed 
>to a full ipa pass).

If you look we skip simple analysis if IPA analysis was done to not overwrite 
its results. So forcing it (even earlier) would make IPA analysis moot which 
would of course not be welcome. 

Richard. 

>
>
>
>> 
>> Richard.
>> 
>>>
>>> Thanks!



Re: Do all global structure variables escape in IPA-PTA?

2020-08-25 Thread Richard Biener via Gcc
On August 25, 2020 3:09:13 PM GMT+02:00, Erick Ochoa 
 wrote:
>Hi,
>
>I'm trying to understand how the escape analysis in IPA-PTA works. I
>was 
>testing a hypothesis where if a structure contains an array of 
>characters and this array of characters is passed to fopen, the 
>structure and all subfields will escape.
>
>To do this, I made a program that has a global structure variable foo2 
>that is has a field passed as an argument to fopen. I also made another
>
>variable foo whose array is initialized by the result of rand.
>
>However, after compiling this program with -flto -flto-partition=none 
>-fipa -fdump-ipa-pta -fdump-tree-all-all -Ofast (gcc --version 10.2.0)
>
>E.g.
>
>#include 
>#include 
>#include 
>
>struct foo_t {
>   char buffer1[100];
>   char buffer2[100];
>};
>
>struct foo_t foo;
>struct foo_t foo2;
>
>int
>main(int argc, char** argv)
>{
>
>   fopen(foo2.buffer1, "r");
>   for (int i = 0; i < 100; i++)
>   {
> foo.buffer1[i] = rand();
>   }
>   int i = rand();
>   int retval = foo.buffer1[i % 100];
>   return retval;
>}
>
>I see the PTA dump state the following:
>
>ESCAPED = { STRING ESCAPED NONLOCAL foo2 }
>foo = { ESCAPED NONLOCAL }
>foo2 = { ESCAPED NONLOCAL }
>
>which I understand as
>* something externally visible might point to foo2
>* foo2 might point to something externally visible
>* foo might point to something externally visible

Yes. So it's exactly as your hypothesis. 

>I have seen that global variables are stored in the .gnu.lto_.decls LTO
>
>file section. In the passes I have worked on I have ignored global 
>variables. But can foo and foo2 be marked as escaping because the 
>declarations are not streamed in yet? Or is there another reason I am 
>not seeing? I am aware of aware of the several TODOs at the beginning
>of 
>gcc/tree-ssa-structalias.c but I am unsure if they contribute to these 
>variables being marked as escaping. (Maybe TODO 1 and TODO 2?)

Not sure what the problem is. Foo2 escapes because it's address is passed to a 
function. 

? 

Richard. 

>Just FYI, I've been reading:
>* Structure Aliasing in GCC
>* Gimple Alias Improvements for GCC 4.5
>* Memory SSA - A Unified Approach for Sparsely Representing Memory 
>Operations
>
>Thanks, I appreciate all help!



Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Richard Biener via Gcc
On Mon, Aug 24, 2020 at 1:22 PM Stefan Kanthak  wrote:
>
> "Richard Biener"  wrote:
>
> > On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak  
> > wrote:
> >>
> >> "Allan Sandfeld Jensen"  wrote:
> >>
> >> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
> >> >> Hi @ll,
> >> >>
> >> >> in his ACM queue article <https://queue.acm.org/detail.cfm?id=3372264>,
> >> >> Matt Godbolt used the function
> >> >>
> >> >> | bool isWhitespace(char c)
> >> >> | {
> >> >> |
> >> >> | return c == ' '
> >> >> |
> >> >> |   || c == '\r'
> >> >> |   || c == '\n'
> >> >> |   || c == '\t';
> >> >> |
> >> >> | }
> >> >>
> >> >> as an example, for which GCC 9.1 emits the following assembly for AMD64
> >> >>
> >> >> processors (see <https://godbolt.org/z/acm19_conds>):
> >> >> |xoreax, eax  ; result = false
> >> >> |cmpdil, 32   ; is c > 32
> >> >> |ja .L4   ; if so, exit with false
> >> >> |movabs rax, 4294977024   ; rax = 0x12600
> >> >> |shrx   rax, rax, rdi ; rax >>= c
> >> >> |andeax, 1; result = rax & 1
> >> >> |
> >> >> |.L4:
> >> >> |ret
>
> [...]
>
> > Whether or not the conditional branch sequence is faster depends on whether
> > the branch is well-predicted which very much depends on the data you
> > feed the isWhitespace function with
>
> Correct.
>
> > but I guess since this is the c == ' ' test it _will_ be a well-predicted 
> > branch
>
> Also correct, but you miss a point: the typical use case is
>
> while (isWhitespace(*ptr)) ptr++;
>
> > which means the conditional branch sequence will be usually faster.
>
> And this is wrong!
> The (well-predicted) branch is usually NOT taken, so both code variants
> usually execute (with one exception the same) 6 or 7 instructions.

Whether or not the branch is predicted taken does not matter, what
matters is that the continuation is not data dependent on the branch
target computation and thus can execute in parallel to it.

> > The proposed change turns the control into a data dependence which
> > constrains instruction scheduling and retirement.
>
> It doesn't matter: the branch has the same data dependency too!
>
> > Indeed a mispredicted branch will likely be more costly.
>
> And no branch is even better: the branch predictor has a limited capacity,
> so every removed branch instruction can help improve its efficiency.
>
> > x86 CPUs do not perform data speculation.
>
> >>  mov ecx, edi
> >>  movabs  rax, 4294977024
> >>  shr rax, cl
> >>  xor edi, edi
> >>  cmp ecx, 33
> >>  setbdil
> >>  and eax, edi
>
> I already presented measured numbers: with random data, the branch-free
> code is faster, with ordered data the original code.
>
> Left column 1 billion sequential characters
> for (int i=10; i; --i) ...(i);
> right column 1 billion random characters, in cycles per character:

I guess feeding it Real Text (TM) is the only relevant benchmark,
doing sth like

  for (;;)
 cnt[isWhitespace(*ptr++)]++;

> GCC:   2.43.4
> branch-free:   3.02.5

I'd call that unconclusive data - you also failed to show your test data
is somehow relevant.  We do know that mispredicted branches are bad.
You show well-predicted branches are good.  By simple statistics
singling out 4 out of 255 values will make the branches well-predicted.

> Now perform a linear interpolation and find the break-even point at
> p=0.4, with p=0 for ordered data and p=1 for random data, or just use
> the average of these numbers: 2.9 cycles vs. 2.75 cycles.
> That's small, but measurable!


Re: Question about Gimple Variables named D.[0-9]*

2020-08-24 Thread Richard Biener via Gcc
On Thu, Aug 20, 2020 at 11:51 AM Erick Ochoa
 wrote:
>
> Hello,
>
> I am looking at the dump for the build_alias pass. I see a lot of
> variables with the naming convention D.[0-9]* in the points-to sets
> being printed.
>
> When I compile with
>
> -fdump-tree-all-all
>
> I can see that the suffix D.[0-9]* is appended to some gimple variables.
> I initially imagined that variables in the points-to variable set could
> map to a variable declaration in gimple, but this does not seem to be
> the case. I have confirmed this by searching for some known variable
> name in the points-to set and finding no matches in the gimple code, the
> other way around seems to also be true.
>
> Are these variables just constraint variables used to solve the
> points-to analysis? In other words, the variables in points-to sets
> printed out in build_alias do not have a simple map to variables in
> gimple. The only relation is that the intersection between to points-to
> set for variable A with the points-to set of variable B will yield an
> is_alias(A, B) relationship. Is the above true?

The points-to sets in SSA_NAME_POINTER_INFO record DECL_UIDs
which are those printed as D.[0-9]* which is appended to all variables
if you dump with -uid.  The points-to set "names" in the points-to dumps
are internal names created for the constraint vairables - most of the time
based on the program variable names but only loosely coupled.

The translation between constraint variables and program variables is
done in set_uids_in_ptset.

Richard.

>
> Thanks!
>
>


Re: Peephole optimisation: isWhitespace()

2020-08-24 Thread Richard Biener via Gcc
On Mon, Aug 17, 2020 at 7:09 PM Stefan Kanthak  wrote:
>
> "Allan Sandfeld Jensen"  wrote:
>
> > On Freitag, 14. August 2020 18:43:12 CEST Stefan Kanthak wrote:
> >> Hi @ll,
> >>
> >> in his ACM queue article ,
> >> Matt Godbolt used the function
> >>
> >> | bool isWhitespace(char c)
> >> | {
> >> |
> >> | return c == ' '
> >> |
> >> |   || c == '\r'
> >> |   || c == '\n'
> >> |   || c == '\t';
> >> |
> >> | }
> >>
> >> as an example, for which GCC 9.1 emits the following assembly for AMD64
> >>
> >> processors (see ):
> >> |xoreax, eax  ; result = false
> >> |cmpdil, 32   ; is c > 32
> >> |ja .L4   ; if so, exit with false
> >> |movabs rax, 4294977024   ; rax = 0x12600
> >> |shrx   rax, rax, rdi ; rax >>= c
> >> |andeax, 1; result = rax & 1
> >> |
> >> |.L4:
> >> |ret
> >>
> > No it doesn't. As your example shows if you took the time to read it, it is
> > what gcc emit when generating code to run on a _haswell_ architecture.
>
> Matt's article does NOT specify the architecture for THIS example.
> He specified it for another example he named "(q)":
>
> | When targeting the Haswell microarchitecture, GCC 8.2 compiles this code
> | to the assembly in (q) (https://godbolt.org/z/acm19_bits):
>
> WHat about CAREFUL reading?
>
> > If you remove -march=haswell from the command line you get:
> >
> >xor eax, eax
> >cmp dil, 32
> >ja  .L1
> >movabs  rax, 4294977024
> >mov ecx, edi
> >shr rax, cl
> >and eax, 1
> >
> > It uses one mov more, but no shrx.
>
> The SHRX is NOT the point here; its the avoidable conditional branch that
> matters!

Whether or not the conditional branch sequence is faster depends on whether
the branch is well-predicted which very much depends on the data you
feed the isWhitespace function with but I guess since this is the
c == ' ' test it _will_ be a well-predicted branch which means the
conditional branch sequence will be usually faster.  The proposed
change turns the control into a data dependence which constrains
instruction scheduling and retirement.  Indeed a mispredicted branch
will likely be more costly.

x86 CPUs do not perform data speculation.

Richard.

>  mov ecx, edi
>  movabs  rax, 4294977024
>  shr rax, cl
>  xor edi, edi
>  cmp ecx, 33
>  setbdil
>  and eax, edi
>
> Stefan


Re: Question about IPA-PTA and build_alias

2020-08-24 Thread Richard Biener via Gcc
On Mon, Aug 17, 2020 at 3:22 PM Erick Ochoa
 wrote:
>
> Hello,
>
> I'm looking to understand better the points-to analysis (IPA-PTA) and
> the alias analysis (build_alias).
>
> How is the information produced by IPA-PTA consumed?
>
> Are alias sets in build_alias computed by the intersections of the
> points_to_set(s) (computed by IPA-PTA)?
>
> My intuition tells me that it could be relatively simple to move
> build_alias to be an SIMPLE_IPA_PASS performed just after IPA-PTA, but I
> do not have enough experience in GCC to tell if this is correct. What
> could be some difficulties which I am not seeing? (Either move, or
> create a new IPA-ALIAS SIMPLE_IPA_PASS.) This pass would have the same
> sensitivity as IPA-PTA { flow-insensitive, context-insensitive,
> field-sensitive } because the alias sets could be computed by the
> intersection of points-to-sets.

Both IPA-PTA and build_alias do the same, they build PTA constraint
sets, solve them and attach points-to info to SSA names.  Just IPA-PTA
does this for the whole TU while build_alias does it for a function at a time.

So I guess I do not understand your question.

Richard.

>
> Thanks!


Re: GCC Plugins and global_options

2020-08-24 Thread Richard Biener via Gcc
On Thu, Aug 13, 2020 at 10:39 AM Jakub Jelinek via Gcc  wrote:
>
> Hi!
>
> Any time somebody adds or removes an option in some *.opt file (which e.g.
> on the 10 branch after branching off 11 happened 5 times already), many
> offsets in global_options variable change.  It is true we don't guarantee
> ABI stability for plugins, but we change the most often used data structures
> on the release branches only very rarely and so the options changes are the
> most problematic for ABI stability of plugins.
>
> Annobin uses a way to remap accesses to some of the global_options.x_* by
> looking them up in the cl_options array where we have
> offsetof (struct gcc_options, x_flag_lto)
> etc. remembered, but sadly doesn't do it for all options (e.g. some flag_*
> etc. option accesses may be hidden in various macros like POINTER_SIZE),
> and more importantly some struct gcc_options offsets are not covered at all.
> E.g. there is no offsetof (struct gcc_options, x_optimize),
> offsetof (struct gcc_options, x_flag_sanitize) etc.  Those are usually:
> Variable
> int optimize
> in the *.opt files.
>
> So, couldn't our opt*.awk scripts generate another array that would either
> cover just the offsets not covered in struct cl_options that a plugin could
> use to remap struct global_options offsets at runtime, which would include
> e.g. the offsetof value and the name of the variable and perhaps sizeof for
> verification purposes?
> Or couldn't we in plugin/include/ install a modified version of options.h
> that instead of all the:
> #define flag_opts_finished global_options.x_flag_opts_finished
> will do:
> #define flag_opts_finished gcc_lookup_option (flag_opts_finished)
> where lookup_option would be a macro that does something like:
> __attribute__((__pure__))
> void *gcc_lookup_option_2 (unsigned short, const char *, unsigned short);
> template 
> T &gcc_lookup_option_1 (unsigned short offset, const char *name)
> {
>   T *ptr = static_cast  (gcc_lookup_option_2 (offset, name, sizeof (T)));
>   return *ptr;
> }
> #define lookup_option(var) \
>   gcc_lookup_option_1  \
> (offsetof (struct gcc_options, x_##var), #var, \
>  sizeof (global_options.x_##var))
> where the gcc_lookup_option_2 function would lookup the variable in an
> opt*.awk generated table, containing entries like:
>   "ix86_stack_protector_guard_offset", NULL, NULL, NULL, NULL, NULL, NULL, 
> NULL,
>   "ix86_stack_protector_guard_reg", "", NULL, NULL,
>   "recip_mask", NULL, NULL, NULL,
> ...
> As struct gcc_options is around 5KB now, that table would need 5K entries,
> NULL and "" stand for no variable starts here, and "" additionally says that
> padding starts here.
> So, if no options have changed since the plugin has been built, it would be
> very cheap, it would just verify that at the given offset the table contains
> the corresponding string (i.e. non-NULL and strcmp == 0 and that the size
> matches (that size - 1 following entries are NULL and then there is
> non-NULL)). If not, it would keep looking around (one loop that looks in
> both directions in the table, so first it would check offsets -1 and +1 from
> the original, then -2 and +2, etc.
> And would gcc_unreachable () if it can't find it in the table, or can find
> it, but the size has changed.
>
> If that is unacceptable, at least having a table with variables not covered
> in struct cl_options offsets would allow the plugin to do it itself
> (basically by constructing a remapping table, original offsetof (struct 
> gcc_options, XXX)
> remaps to offset ABC (and have some value like (unsigned short) -1 to signal
> it is gone and there should be assertion failure.
>
> Thoughts on this?

I'd say we ignore this since we do not provide any ABI stability guarantees.
Instead we maybe want to "export" the genchecksum result and embed
it into plugin objects and refuse to load plugins that were not built against
the very same ABI [unless --force is given]?

Richard.

> Jakub
>


Re: Problem cropping up in Value Range Propogation

2020-08-24 Thread Richard Biener via Gcc
On Tue, Aug 11, 2020 at 6:15 AM Gary Oblock via Gcc  wrote:
>
> I'm trying to debug a problem cropping up in value range propagation.
> Ironically I probably own an original copy 1995 copy of the paper it's
> based on but that's not going to be much help since I'm lost in the
> weeds.  It's running on some optimization (my structure reorg
> optimization) generated GIMPLE statements.
>
> Here's the GIMPLE dump:
>
> Function max_of_y (max_of_y, funcdef_no=1, decl_uid=4391, cgraph_uid=2, 
> symbol_order=20) (executed once)
>
> max_of_y (unsigned long data, size_t len)
> {
>   double value;
>   double result;
>   size_t i;
>
>[local count: 118111600]:
>   field_arry_addr_14 = _reorg_base_var_type_t.y;
>   index_15 = (sizetype) data_27(D);
>   offset_16 = index_15 * 8;
>   field_addr_17 = field_arry_addr_14 + offset_16;
>   field_val_temp_13 = MEM  [(void *)field_addr_17];
>   result_8 = field_val_temp_13;
>   goto ; [100.00%]
>
>[local count: 955630225]:
>   _1 = i_3 * 16;
>   PPI_rhs1_cast_18 = (unsigned long) data_27(D);
>   PPI_rhs2_cast_19 = (unsigned long) _1;
>   PtrPlusInt_Adj_20 = PPI_rhs2_cast_19 / 16;
>   PtrPlusInt_21 = PPI_rhs1_cast_18 + PtrPlusInt_Adj_20;
>   dedangled_27 = (unsigned long) PtrPlusInt_21;
>   field_arry_addr_23 = _reorg_base_var_type_t.y;
>   index_24 = (sizetype) dedangled_27;
>   offset_25 = index_24 * 8;
>   field_addr_26 = field_arry_addr_23 + offset_25;
>   field_val_temp_22 = MEM  [(void *)field_addr_26];
>   value_11 = field_val_temp_22;
>   if (result_5 < value_11)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 477815112]:
>
>[local count: 955630225]:
>   # result_4 = PHI 
>   i_12 = i_3 + 1;
>
>[local count: 1073741824]:
>   # i_3 = PHI <1(2), i_12(5)>
>   # result_5 = PHI 
>   if (i_3 < len_9(D))
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   # result_10 = PHI 
>   return result_10;
> }
>
> The failure in VRP is occurring on
>
> offset_16 = data_27(D) * 8;
>
> which is the from two adjacent statements above
>
>   index_15 = (sizetype) data_27(D);
>   offset_16 = index_15 * 8;
>
> being merged together.
>
> Note, the types of index_15/16 are sizetype and data_27 is unsigned
> long.
> The error message is:
>
> internal compiler error: tree check: expected class ‘type’, have 
> ‘exceptional’ (error_mark) in to_wide,

This means the SSA name looked at is released and should no longer be
refered from in the IL.

> Things only start to look broken in value_range::lower_bound in
> value-range.cc when
>
> return wi::to_wide (t);
>
> is passed error_mark_node in t. It's getting it from m_min just above.
> My observation is that m_min is not always error_mark_node. In fact, I
> seem to think you need to use set_varying to get this to even happen.
>
> Note, the ssa_propagation_engine processed the statement "offset_16 =
> data..."  multiple times before failing on it. What oh what is
> happening and how in the heck did I cause it???
>
> Please, somebody throw me a life preserver on this.
>
> Thanks,
>
> Gary
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its subsidiaries. It 
> is to be used solely for the purpose of furthering the parties' business 
> relationship. Any review, copying, or distribution of this email (or any 
> attachments thereto) is strictly prohibited. If you are not the intended 
> recipient, please contact the sender immediately and permanently delete the 
> original and any copies of this email and any attachments thereto.


Re: RFC: -fno-share-inlines

2020-08-23 Thread Richard Biener via Gcc
On Mon, Aug 10, 2020 at 9:36 AM Allan Sandfeld Jensen
 wrote:
>
> Following the previous discussion, this is a proposal for a patch that adds
> the flag -fno-share-inlines that can be used when compiling singular source
> files with a different set of flags than the rest of the project.
>
> It basically turns off comdat for inline functions, as if you compiled without
> support for 'weak' symbols. Turning them all into "static" functions, even if
> that wouldn't normally be possible for that type of function. Not sure if it
> breaks anything, which is why I am not sending it to the patch list.
>
> I also considered alternatively to turn the comdat generation off later during
> assembler production to ensure all processing and optimization of comdat
> functions would occur as normal.

We already have -fvisibility-inlines-hidden so maybe call it
-fvisibility-inlines-static?
Does this option also imply 'static' vtables?

Richard.

> Best regards
> Allan


Re: Silly question about pass numbers

2020-08-12 Thread Richard Biener via Gcc
On August 13, 2020 2:57:04 AM GMT+02:00, Gary Oblock via Gcc  
wrote:
>Segher,
>
>If this was on the mainline and not in the middle of a
>nontrivial optimization effort I would have filed a bug report
>and not asked a silly question. 😉
>
>I'm at a total lost as to how I could have caused the pass
>numbers to be backward... but at least have I confirmed that's
>what seems to be happening. It's not doing any harm to
>anything except the sanity of anybody looking at the pass
>dumps...

The inline dump is last written to during transform phase which is only carried 
out when the body is further optimized (thus again function at a time, not 
IPA). Which is why you see interleaving of dump appends. 

>Thanks,
>
>Gary
>
>From: Segher Boessenkool 
>Sent: Wednesday, August 12, 2020 5:45 PM
>To: Gary Oblock 
>Cc: gcc@gcc.gnu.org 
>Subject: Re: Silly question about pass numbers
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>Hi!
>
>On Wed, Aug 12, 2020 at 08:26:34PM +, Gary Oblock wrote:
>> The files are from the same run:
>> -rw-rw-r-- 1 gary gary  3855 Aug 12 12:49 exe.ltrans0.ltrans.074i.cp
>> -rw-rw-r-- 1 gary gary 16747 Aug 12 12:49
>exe.ltrans0.ltrans.087i.structure-reorg
>>
>> By the time .cp was created inlining results in only main existing.
>> In the .structure-reorg file there are three functions.
>
>It does not matter what time the dump files were last opened (or
>created
>or written to).
>
>> Not only am I seeing things in .cp (beyond a shadow of a doubt)
>> that were created in structure  reorganization, inlining has also
>> been done and its pass number of 79!
>>
>> Note, this is not hurting me in any way other than violating my
>> beliefs about pass numbering.
>
>I cannot check on any of that because this is not in mainline GCC?
>It is a lot easier if you ask us about problems we may be able to
>reproduce ;-)  Like maybe something with only cp and inline?
>
>
>Segher



Re: Has FSF stopped processing copyright paperwork?

2020-08-07 Thread Richard Biener via Gcc
On Fri, Aug 7, 2020 at 3:14 PM H.J. Lu via Gcc  wrote:
>
> On Tue, May 5, 2020 at 6:42 PM Kaylee Blake  wrote:
> >
> > On 2/5/20 11:49 pm, H.J. Lu wrote:
> > > On Wed, Mar 18, 2020 at 6:46 PM Kaylee Blake via Binutils
> > >  wrote:
> > >>
> > >> On 19/3/20 12:02 pm, H.J. Lu wrote:
> > >>> Kaylee, is your paper work with FSF in order? I will submit the updated
> > >>> patch set after your paper is on file with FSF.
> > >>
> > >> I'm waiting on a response from them at the moment.
> > >>
> > >
> > > Hi Kaylee,
> > >
> > > Any update on your paper work with FSF?
> > >
> >
> > Still waiting; apparently their work process has been dramatically
> > slowed by the whole COVID-19 situation.
> >
> > --
> > Kaylee Blake 
> > C is the worst language, except for all the others.
>
> Hi,
>
> I submitted a set of binutils patches:
>
> https://sourceware.org/pipermail/binutils/2020-March/13.html
>
> including contribution from Kaylee Blake .
> Can someone check if Kaylee's paperwork is on file with FSF?

Don't see her in the list.

Richard.

> Thanks.
>
> --
> H.J.


Re: Define __attribute__((no_instrument_function)) but still got instrumented

2020-08-06 Thread Richard Biener via Gcc
On Fri, Aug 7, 2020 at 8:35 AM Shuai Wang via Gcc  wrote:
>
> Hello!
>
> I am working on a ARM GCC plugin which instruments each GIMPLE function
> with some new function calls.
>
> Currently I want to skip certain functions by adding the
> no_instrument_function attribute. However, I do see that in the
> disassembled code, all functions are still instrumented.
>
> Am I missed anything here? From this page (
> https://www.keil.com/support/man/docs/armcc/armcc_chr1359124976163.htm), I
> do see that no_instrument_function is used to skip --gnu_instrument, but
> might not be applicable to my case where I use the following command to
> compile:
>
> arm-none-eabi-g++ -fplugin=my_plugin.so -mcpu=cortex-m4 -mthumb
> -mfloat-abi=soft -Og -fmessage-length=0 -fsigned-char -ffunction-sections
> -fdata-sections -fno-move-loop-invariants -Wall -Wextra  -g3 -DDEBUG
> -DUSE_FULL_ASSERT -DOS_USE_SEMIHOSTING -DTRACE -DOS_USE_TRACE_SEMIH
> OSTING_DEBUG -DSTM32F429xx -DUSE_HAL_DRIVER -DHSE_VALUE=800
> -DLOS_KERNEL_DEBUG_OUT
>
> But overall, could anyone shed some lights on: 1) how to skip instrument
> certain functions with GCC plugin? 2: is it possible to check the function
> attribute in GIMPLE code? If so, I can simply check if certain functions
> are marked as "no_instrument_function" and skip by myself.

You can check lookup_attribute("no_instrument_function",
DECL_ATTRIBUTES (cfun->decl))

> Thank you!
> Shuai


Re: Gcc Digest, Vol 5, Issue 52

2020-07-29 Thread Richard Biener via Gcc
On Wed, Jul 29, 2020 at 9:39 PM Gary Oblock  wrote:
>
> Richard,
>
> Thanks, I had no idea about the immediate uses mechanism and
> using it will speed things up a bit and make them more reliable.
> However, I'll still have to scan the LHS of each assignment unless
> there's a mechanism to traverse all the SSAs for a function.

May I suggest that you give the GCC internals manual a read,
particularly the sections about GIMPLE, GENERIC and
'Analysis and Optimization of GIMPLE tuples'.  Most of the
info I provided is documented there.

> Note, I assume there is also a mechanism to add and remove
> immediate use instances. If I can find it I'll post a question to the list.
>
> I do the the patching on a per function basis immediately after
> applying the transforms. It was going to be a scan of all the
> GIMPLE. What you've told me might make it a bit of a misnomer
> to call what I intend to do now, a scan. The default defs problem
> happened when the original scan tried to simply modify the type
> of a default def. There didn't seem to be a way of doing this and I've
> since learned this in fact associates declarations not types but with
> a declaration. Note, just modifying the type of normal ssa names
> seemed to work but I can't in fact know it actually would have.
>
> I'm not sure I can do justice to the other transformations but
> here is one larger example. Note, since I'm currently only
> dealing with dynamically allocated array I'll only see "a->f" and
> not "a[i].f" so you are getting the former.
>
>  _2 = _1->f
>
> turns into
>
> get_field_arry_addr: new_3 = array_base.f_array_field
> get_index   : new_4 = (sizetype)_1
> get_offset   : new_5  = new_4 * size_of_f_element
> get_field_addr: new_6 = new_3 + new_5   // uses pointer arith
> temp_set: new_7 = * new_6
> final_set  : _2   = new_7
>
> I hope that's sufficient to satisfy your curiosity because the only other
> large transformation currently coded is that for the malloc which would
> take me quite a while to put together an example of. Note, these are
> shown in the HL design doc which I sent you. Though like battle plans,
> no design no matter how good survives coding intact.
>
> Thanks again,
>
> Gary
>
>
>
>
> 
> From: Richard Biener 
> Sent: Wednesday, July 29, 2020 5:42 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Gcc Digest, Vol 5, Issue 52
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Tue, Jul 28, 2020 at 11:02 PM Gary Oblock  wrote:
> >
> > Richard,
> >
> > I wasn't aware of release_defs so I'll add that for certain.
> >
> > When I do a single transformation as part of the transformation pass
> > each transformation uses the correct types internally but on the edges
> > emits glue code that will be transformed via a dangling type fixup pass.
> >
> > For example when adding something to a pointer:
> >
> > _2 = _1 + k
> >
> > Where _1 & _2 are the old point types I'll
> >  emit
> >
> > new_3 = (type_convert)_1
> > new_4 = (type_convert)k
> > new_5 = new_4 / struct_size // truncating divide
> > new_6 = new_3 + new_5
> > _2   = (type_convert)_new_6
> >
> > Note, the casting is done with CONVERT_EXPR
> > which is harmless when I create new ssa names
> > and set the appropriate operands in
>
> OK, so you're funneling the new "index" values through
> the original pointer variable _1?  But then I don't see
> where the patching up of SSA names and the default
> def issue happens.
>
> > new_3 = (type_convert)_1
> > _2 = (type_convert)new_6
> >
> > to
> >
> > new_3 = new_7
> > new_8 = new_6
> >
> > Now I might actually find via a look up that
> > _1 and/or _2 were already mapped to
> > new_7 and/or new_8 but that's irrelevant.
> >
> > To intermix the applications of the transformations and
> > the patching of these dangling types seems like I'd
> > need to do an insanely ugly recursive walk of each functions
> > body.
> >
> > I'm curious when you mention def-use I'm not aware of
> > GCC using def-use chains except at the RTL level.
> > Is there a def-use mechanism in GIMPLE because
> > in SSA form it's trivial to find the definition of
> > a temp variable but no

Re: Gcc Digest, Vol 5, Issue 52

2020-07-29 Thread Richard Biener via Gcc
On Tue, Jul 28, 2020 at 11:02 PM Gary Oblock  wrote:
>
> Richard,
>
> I wasn't aware of release_defs so I'll add that for certain.
>
> When I do a single transformation as part of the transformation pass
> each transformation uses the correct types internally but on the edges
> emits glue code that will be transformed via a dangling type fixup pass.
>
> For example when adding something to a pointer:
>
> _2 = _1 + k
>
> Where _1 & _2 are the old point types I'll
>  emit
>
> new_3 = (type_convert)_1
> new_4 = (type_convert)k
> new_5 = new_4 / struct_size // truncating divide
> new_6 = new_3 + new_5
> _2   = (type_convert)_new_6
>
> Note, the casting is done with CONVERT_EXPR
> which is harmless when I create new ssa names
> and set the appropriate operands in

OK, so you're funneling the new "index" values through
the original pointer variable _1?  But then I don't see
where the patching up of SSA names and the default
def issue happens.

> new_3 = (type_convert)_1
> _2 = (type_convert)new_6
>
> to
>
> new_3 = new_7
> new_8 = new_6
>
> Now I might actually find via a look up that
> _1 and/or _2 were already mapped to
> new_7 and/or new_8 but that's irrelevant.
>
> To intermix the applications of the transformations and
> the patching of these dangling types seems like I'd
> need to do an insanely ugly recursive walk of each functions
> body.
>
> I'm curious when you mention def-use I'm not aware of
> GCC using def-use chains except at the RTL level.
> Is there a def-use mechanism in GIMPLE because
> in SSA form it's trivial to find the definition of
> a temp variable but non trivial to find the use of
> it. Which I think is a valid reason for fixing up the
> dangling types of temps in a scan.

In GIMPLE SSA we maintain a list of uses for each SSA
def, available via the so called immediate-uses.  You
can grep for uses of FOR_EACH_IMM_USE[_FAST]

>
> Note, I'll maintain a mapping like you suggest but not use
> it at transformation application time. Furthermore,
> I'll initialize the mapping with the default defs from
> the DECLs so I won't have to mess with them on the fly.
> Now at the time in the scan when I find uses and defs of
> a dangling type I'd like to simply modify the associated operands
> of the statement. What is the real advantage creating a new
> statement with the correct types? I'll be using SSA_NAME_DEF_STMT
> if the newly created ssa name is on the left hand side. Also, the
> ssa_name it replaces will no longer be referenced by the end of the
> scan pass.

Since you are replacing a[i].b with array_for_b[i] I am wondering
how you do the transform for non-pointer adjustments.

> Note, I do have a escape mechanism in a qualification
> pre-pass to the transformations. It's not intended as
> catch-all for things I don't understand rather it's an
> aid to find possible new cases. However, there are
> legitimate things at this point in time during development
> of this optimization that I need to spot things this way. Later,
> when points to analysis is integrated falling through to
> the default case behavior will likely cause an internal error.
>
> Thanks,
>
> Gary
>
> 
> From: Richard Biener 
> Sent: Tuesday, July 28, 2020 12:07 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Gcc Digest, Vol 5, Issue 52
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Tue, Jul 28, 2020 at 4:36 AM Gary Oblock via Gcc  wrote:
> >
> > Almost all of the makes sense to.
> >
> > I'm not sure what a conditionally initialized pointer is.
>
> {
>   void *p;
>   if (condition)
> p = ...;
>   if (other condition)
>  ... use p;
>
> will end up with a PHI node after the conditional init with
> one PHI argument being the default definition SSA name
> for 'p'.
>
>
> > You mention VAR_DECL but I assume this is for
> > completeness and not something I'll run across
> > associated with a default def (but then again I don't
> > understand notion of a conditionally initialized
> > pointer.)
> >
> > I'm at the moment only dealing with a single malloced
> > array of structures of the given type (though multiple types could have 
> > this property.) I intend to extend this to cover multiple array and static 
> > allocations but I need to get the easiest case working first. This means no 
> > side pointers are needed and if and when I

Re: Gcc Digest, Vol 5, Issue 52

2020-07-28 Thread Richard Biener via Gcc
On Tue, Jul 28, 2020 at 4:36 AM Gary Oblock via Gcc  wrote:
>
> Almost all of the makes sense to.
>
> I'm not sure what a conditionally initialized pointer is.

{
  void *p;
  if (condition)
p = ...;
  if (other condition)
 ... use p;

will end up with a PHI node after the conditional init with
one PHI argument being the default definition SSA name
for 'p'.


> You mention VAR_DECL but I assume this is for
> completeness and not something I'll run across
> associated with a default def (but then again I don't
> understand notion of a conditionally initialized
> pointer.)
>
> I'm at the moment only dealing with a single malloced
> array of structures of the given type (though multiple types could have this 
> property.) I intend to extend this to cover multiple array and static 
> allocations but I need to get the easiest case working first. This means no 
> side pointers are needed and if and when I need them pointer will get 
> transformed into a base and index pair.
>
> I intend to do the creation of new ssa names as a separate pass from the 
> gimple transformations. So I will technically be creating for the duration of 
> the pass possibly two defs associated with a single gimple statement. Do I 
> need to delete the old ssa names
> via some mechanism?

When you remove the old definition do

   gsi_remove (&gsi, true); // gsi points at stmt
   release_defs (stmt);

note that as far as I understand you need to modify the stmts using
the former pointer (since its now an index), and I would not recommend
to make creation of new SSA names a separate pass, instead create
them when you alter the original definition and maintain a map
between old and new SSA name.

I haven't dug deep enough into your figure how you identify things
to modify (well, I fear you're just scanning for "uses" of the changed
type ...), but in the scheme I think should be implemented you'd
follow the SSA def->use links for both tracking an objects life
as well as for modifying the accesses.

With just scanning for types I am quite sure you'll run into
cases where you discover SSA uses that you did not modify
because you thought that's not necessary (debug stmts!).  Of
course you'll simply make more things "type escape points" then.

> By the way this is really helpful information. The only
> other person on the project, Erick, is a continent away
> and has about as much experience with gimple as
> me but a whole heck lot less compiler experience.
>
> Thanks,
>
> Gary
>
> 
> From: Gcc  on behalf of gcc-requ...@gcc.gnu.org 
> 
> Sent: Monday, July 27, 2020 1:33 AM
> To: gcc@gcc.gnu.org 
> Subject: Gcc Digest, Vol 5, Issue 52
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> Send Gcc mailing list submissions to
> gcc@gcc.gnu.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://gcc.gnu.org/mailman/listinfo/gcc
> or, via email, send a message with subject or body 'help' to
> gcc-requ...@gcc.gnu.org
>
> You can reach the person managing the list at
> gcc-ow...@gcc.gnu.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Gcc digest..."


Re: LTO Dead Field Elimination

2020-07-27 Thread Richard Biener via Gcc
On Mon, Jul 27, 2020 at 2:59 PM Christoph Müllner
 wrote:
>
> Hi Richard,
>
> On 7/27/20 2:36 PM, Richard Biener wrote:
> > On Fri, Jul 24, 2020 at 5:43 PM Erick Ochoa
> >  wrote:
> >>
> >> This patchset brings back struct reorg to GCC.
> >>
> >> We’ve been working on improving cache utilization recently and would
> >> like to share our current implementation to receive some feedback on it.
> >>
> >> Essentially, we’ve implemented the following components:
> >>
> >>  Type-based escape analysis to determine if we can reorganize a type
> >> at link-time
> >>
> >>  Dead-field elimination to remove unused fields of a struct at
> >> link-time
> >>
> >> The type-based escape analysis provides a list of types, that are not
> >> visible outside of the current linking unit (e.g. parameter types of
> >> external functions).
> >>
> >> The dead-field elimination pass analyses non-escaping structs for fields
> >> that are not used in the linking unit and thus can be removed. The
> >> resulting struct has a smaller memory footprint, which allows for a
> >> higher cache utilization.
> >>
> >> As a side-effect a couple of new infrastructure code has been written
> >> (e.g. a type walker, which we were really missing in GCC), which can be
> >> of course reused for other passes as well.
> >>
> >> We’ve prepared a patchset in the following branch:
> >>
> >>refs/vendors/ARM/heads/arm-struct-reorg-wip
> >
> > Just had some time to peek into this.  Ugh.  The code doesn't look like
> > GCC code looks :/  It doesn't help to have one set of files per C++ class 
> > (25!).
>
> Any suggestions how to best structure these?

As "bad" as it sounds, put everything into one file (maybe separate out
type escape analysis from the actual transform).  Add a toplevel comment
per file explaining things.

> Are there some coding guidelines in the GCC project,
> which can help us to match the expectation?

Look at existing passes, otherwise there's mostly conventions on
formatting.

> > The code itself is undocumented - it's hard to understand what the purpose
> > of all the Walker stuff is.
> >
> > You also didn't seem to know walk_tree () nor walk_gimple* ().
>
> True, we were not aware of that code.
> Thanks for pointing to that code.
> We will have a look.
>
> > Take as example - I figured to look for IPA pass entries, then I see
> >
> > +
> > +static void
> > +collect_types ()
> > +{
> > +  GimpleTypeCollector collector;
> > +  collector.walk ();
> > +  collector.print_collected ();
> > +  ptrset_t types = collector.get_pointer_set ();
> > +  GimpleCaster caster (types);
> > +  caster.walk ();
> > +  if (flag_print_cast_analysis)
> > +caster.print_reasons ();
> > +  ptrset_t casting = caster.get_sets ();
> > +  fix_escaping_types_in_set (casting);
> > +  GimpleAccesser accesser;
> > +  accesser.walk ();
> > +  if (flag_print_access_analysis)
> > +accesser.print_accesses ();
> > +  record_field_map_t record_field_map = accesser.get_map ();
> > +  TypeIncompleteEquality equality;
> > +  bool has_fields_that_can_be_deleted = false;
> > +  typedef std::set field_offsets_t;
> >
> > there's no comments (not even file-level) that explains how type escape
> > is computed.
> >
> > Sorry, but this isn't even close to be coarsely reviewable.
>
> Sad to hear.
> We'll work on the input that you provided and provide a new version.
>
> Thanks,
> Christoph
>
> >
> >> We’ve also added a subsection in the GCC internals document to allow
> >> other compiler devs to better understand our design and implementation.
> >> A generated PDF can be found here:
> >>
> >> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F
> >> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F/download
> >>
> >> page: 719
> >>
> >> We’ve been testing the pass against a range of in-tree tests and
> >> real-life applications (e.g. all SPEC CPU2017 C benchmarks). For
> >> testing, please see testing subsection in the gcc internals we prepared.
> >>
> >> Currently we see the following limitations:
> >>
> >> * It is not a "true" ipa pass yes. That is, we can only succeed with
> >> -flto-partition=none.
> >> * Currently it is not safe to use -fipa-sra.
> >> * Brace constructors not supported now. We handle this gracefully.
> >> * Only C as of now.
> >> * Results of sizeof() and offsetof() are generated in the compiler
> >> frontend and thus can’t be changed later at link time. There are a
> >> couple of ideas to resolve this, but that’s currently unimplemented.
> >> * At this point we’d like to thank the GCC community for their patient
> >> help so far on the mailing list and in other channels. And we ask for
> >> your support in terms of feedback, comments and testing.
> >>
> >> Thanks!


Re: LTO Dead Field Elimination

2020-07-27 Thread Richard Biener via Gcc
On Fri, Jul 24, 2020 at 5:43 PM Erick Ochoa
 wrote:
>
> This patchset brings back struct reorg to GCC.
>
> We’ve been working on improving cache utilization recently and would
> like to share our current implementation to receive some feedback on it.
>
> Essentially, we’ve implemented the following components:
>
>  Type-based escape analysis to determine if we can reorganize a type
> at link-time
>
>  Dead-field elimination to remove unused fields of a struct at
> link-time
>
> The type-based escape analysis provides a list of types, that are not
> visible outside of the current linking unit (e.g. parameter types of
> external functions).
>
> The dead-field elimination pass analyses non-escaping structs for fields
> that are not used in the linking unit and thus can be removed. The
> resulting struct has a smaller memory footprint, which allows for a
> higher cache utilization.
>
> As a side-effect a couple of new infrastructure code has been written
> (e.g. a type walker, which we were really missing in GCC), which can be
> of course reused for other passes as well.
>
> We’ve prepared a patchset in the following branch:
>
>refs/vendors/ARM/heads/arm-struct-reorg-wip

Just had some time to peek into this.  Ugh.  The code doesn't look like
GCC code looks :/  It doesn't help to have one set of files per C++ class (25!).
The code itself is undocumented - it's hard to understand what the purpose
of all the Walker stuff is.

You also didn't seem to know walk_tree () nor walk_gimple* ().

Take as example - I figured to look for IPA pass entries, then I see

+
+static void
+collect_types ()
+{
+  GimpleTypeCollector collector;
+  collector.walk ();
+  collector.print_collected ();
+  ptrset_t types = collector.get_pointer_set ();
+  GimpleCaster caster (types);
+  caster.walk ();
+  if (flag_print_cast_analysis)
+caster.print_reasons ();
+  ptrset_t casting = caster.get_sets ();
+  fix_escaping_types_in_set (casting);
+  GimpleAccesser accesser;
+  accesser.walk ();
+  if (flag_print_access_analysis)
+accesser.print_accesses ();
+  record_field_map_t record_field_map = accesser.get_map ();
+  TypeIncompleteEquality equality;
+  bool has_fields_that_can_be_deleted = false;
+  typedef std::set field_offsets_t;

there's no comments (not even file-level) that explains how type escape
is computed.

Sorry, but this isn't even close to be coarsely reviewable.

> We’ve also added a subsection in the GCC internals document to allow
> other compiler devs to better understand our design and implementation.
> A generated PDF can be found here:
>
> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F
> https://cloud.theobroma-systems.com/s/aWwxPiDJ3nCgc7F/download
>
> page: 719
>
> We’ve been testing the pass against a range of in-tree tests and
> real-life applications (e.g. all SPEC CPU2017 C benchmarks). For
> testing, please see testing subsection in the gcc internals we prepared.
>
> Currently we see the following limitations:
>
> * It is not a "true" ipa pass yes. That is, we can only succeed with
> -flto-partition=none.
> * Currently it is not safe to use -fipa-sra.
> * Brace constructors not supported now. We handle this gracefully.
> * Only C as of now.
> * Results of sizeof() and offsetof() are generated in the compiler
> frontend and thus can’t be changed later at link time. There are a
> couple of ideas to resolve this, but that’s currently unimplemented.
> * At this point we’d like to thank the GCC community for their patient
> help so far on the mailing list and in other channels. And we ask for
> your support in terms of feedback, comments and testing.
>
> Thanks!


Re: Tar version being used

2020-07-27 Thread Richard Biener via Gcc
On Mon, Jul 27, 2020 at 12:59 PM CHIGOT, CLEMENT via Gcc
 wrote:
>
> Hi everyone,
>
> I'm wondering if someone knows which tar version / configuration was being 
> used when creating gcc-10.2.0 tarballs ?
>
> I'm getting some directory checksum errors while trying to unpack it with the 
> AIX tar (which can be a bit old). But they are disappearing when I'm building 
> these tarballs on Ubuntu-18.04, even with the last tar version 1.32.
>
> Note that gcc-10.1.0 doesn't have these problems, so maybe something have 
> changed since.

I have used tar 1.30 as shipped by openSUSE Leap 15.1
(tar-1.30-lp151.2.1.x86_64)

Richard.

> Sincerely
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France


Re: Problems with changing the type of an ssa name

2020-07-27 Thread Richard Biener via Gcc
On Sun, Jul 26, 2020 at 10:31 PM Gary Oblock  wrote:
>
> Richard,
>
> As you know I'm working on a structure reorganization optimization.
> The particular one I'm working on is called instance interleaving.
> For the particular case I'm working on now, there is a single array
> of structures being transformed, a pointer to an element of the
> array is transformed into an index into what is now a structure
> of arrays. Note, I did share my HL design document with you so
> there are more details in there if you need them. So what all this
> means is for this example
>
> typedef struct fu fu_t;
> struct fu {
>   char x;
>   inty;
>   double z;
> };
>   :
>   :
>   fu_t *fubar = (fu_t*)malloc(...);
>   fu_t *baz;
>
> That fubar and baz no longer are pointer types and need to be
> transformed into some integer type (say _index_fu_t.) Thus if
> I encounter an ssa_name of type "fu_t *", I'll need to modify its
> type be _index_fu_t. This is of course equivalent to replacing
> that ssa name with a new one of type _index_fu_t.
>
> Now, how do I actually do either of these? My attempts at
> former all failed and the  later seems equally difficult for
> the default defs. Note, prefer modifying them to replacing
> them because it seems more reasonable and it also seems
> to work except for the default defs.
>
> I really need some help with this Richard.

OK, so modifying the SSA name in-place is really bad here
since you _have_ to adjust all uses and defs anyway.  Thus
please create a new SSA name here.

The default-def case you run into is either an uninitialized
value which can easily appear with conditionally initialized
pointers or the SSA name associated with the value of
a function argument.

Once you have to deal with a default def you have to create
a new underlying VAR_DECL (or PARM_DECL if it was a
parameter) with the new type and for the SSA replacement
create its default def (get_or_create_ssa_default_def).

Now for parameters this of course means you have to
adjust function signatures and calls.  For the function
boundary case you'll likely need to pass a pointer to
the structure as well which means you'll have to add
parameters.

As for "replacing" uses you can use immediate uses
to walk them:

 FOR_EACH_IMM_USE_STMT (...)
FOR_EACH_IMM_USE_ON_STMT (..)
...

also the SSA definition statement after your transform
cannot be the same so you have to create another
stmt anyway, no?

Richard.

> Thanks,
>
> Gary
> 
> From: Richard Biener 
> Sent: Saturday, July 25, 2020 10:48 PM
> To: Gary Oblock ; gcc@gcc.gnu.org 
> Subject: Re: Problems with changing the type of an ssa name
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On July 25, 2020 10:47:59 PM GMT+02:00, Gary Oblock 
>  wrote:
> >Richard,
> >
> >I suppose that might be doable but aren't there any ramifications
> >from the fact that the problematic ssa_names are the default defs?
> >I can imagine easily replacing all the ssa names except those that
> >are default defs.
>
> Well, just changing the SSA names doesn't make it less ramifications. You 
> have to know what you are doing.
>
> So - what's the reason you need to change those SSA name types?
>
> Richard.
>
> >Gary
> >
> >From: Richard Biener 
> >Sent: Friday, July 24, 2020 11:16 PM
> >To: Gary Oblock ; Gary Oblock via Gcc
> >; gcc@gcc.gnu.org 
> >Subject: Re: Problems with changing the type of an ssa name
> >
> >[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
> >Please be mindful of safe email handling and proprietary information
> >protection practices.]
> >
> >
> >On July 25, 2020 7:30:48 AM GMT+02:00, Gary Oblock via Gcc
> > wrote:
> >>If you've followed what I've been up to via my questions
> >>on the mailing list, I finally traced my latest big problem
> >>back to to my own code. In a nut shell here is what
> >>I'm doing.
> >>
> >>I'm creating a new type exaactly like this:
> >>
> >>tree pointer_rep =
> >>  make_signed_type ( TYPE_PRECISION ( pointer_sized_int_node));
> >>TYPE_MAIN_VARIANT ( pointer_rep) =
> >>  TYPE_MAIN_VARIANT ( pointer_sized_int_node);
> >>const char *gcc_name =
> >>identifier_to_locale ( IDENTIFIER_POINTER ( TYPE_NAME (
> >>ri->gcc_type)));
> >>size_t len =
> >>  s

Re: TImode for BITS_PER_WORD=32 targets

2020-07-27 Thread Richard Biener via Gcc
On Fri, Jul 24, 2020 at 5:38 PM Andrew Stubbs  wrote:
>
> Hi all,
>
> I want amdgcn to be able to support int128 types, partly because they
> might come up in code offloaded from x86_64 code, and partly because
> libgomp now requires at least some support (amdgcn builds have been
> failing since yesterday).
>
> But, amdgcn has 32-bit registers, and therefore defines BITS_PER_WORD to
> 32, which means that TImode doesn't Just Work, at least not for all
> operators. It already has TImode moves, for internal uses, so I can
> enable TImode and fix the libgomp build, but now libgfortran tries to
> use operators that don't exist, so I'm no better off.
>
> The expand pass won't emit libgcc calls, like it does for DImode, and
> libgcc doesn't have the routines for it anyway. Neither does it support
> synthesized shifts or rotates for more than double-word types.
> (Multiple-word add and subtract appear to work fine, however.)
>
> What would be the best (least effort) way to implement this?
>
> I think I need shift, rotate, multiply, divide, and modulus, but there's
> probably more.

You've figured out that TImode support for SImode word_mode targets
is not implemented in generic code.  So what you need to do is either
provide patterns for all of the operations you need or implement
generic support for libgcc fallbacks (which isn't there).  Joseph might
have an idea what's missing and how difficult it would be (I suppose
we do not want divti3 to end up calling divdi3, thus "stage" TImode
support ontop of DImode ops eventually provided by libgcc only).
libgcc2.c uses LIBGCC2_UNITS_PER_WORD so it _might_ be
possible to somehow do this "staging" by providing two different
values here.  I guess you'd have to try.

Richard.

> Thanks, any advise will be appreciated.
>
> Andrew


Re: Problems with changing the type of an ssa name

2020-07-25 Thread Richard Biener via Gcc
On July 25, 2020 10:47:59 PM GMT+02:00, Gary Oblock  
wrote:
>Richard,
>
>I suppose that might be doable but aren't there any ramifications
>from the fact that the problematic ssa_names are the default defs?
>I can imagine easily replacing all the ssa names except those that
>are default defs.

Well, just changing the SSA names doesn't make it less ramifications. You have 
to know what you are doing. 

So - what's the reason you need to change those SSA name types? 

Richard. 

>Gary
>________
>From: Richard Biener 
>Sent: Friday, July 24, 2020 11:16 PM
>To: Gary Oblock ; Gary Oblock via Gcc
>; gcc@gcc.gnu.org 
>Subject: Re: Problems with changing the type of an ssa name
>
>[EXTERNAL EMAIL NOTICE: This email originated from an external sender.
>Please be mindful of safe email handling and proprietary information
>protection practices.]
>
>
>On July 25, 2020 7:30:48 AM GMT+02:00, Gary Oblock via Gcc
> wrote:
>>If you've followed what I've been up to via my questions
>>on the mailing list, I finally traced my latest big problem
>>back to to my own code. In a nut shell here is what
>>I'm doing.
>>
>>I'm creating a new type exaactly like this:
>>
>>tree pointer_rep =
>>  make_signed_type ( TYPE_PRECISION ( pointer_sized_int_node));
>>TYPE_MAIN_VARIANT ( pointer_rep) =
>>  TYPE_MAIN_VARIANT ( pointer_sized_int_node);
>>const char *gcc_name =
>>identifier_to_locale ( IDENTIFIER_POINTER ( TYPE_NAME (
>>ri->gcc_type)));
>>size_t len =
>>  strlen ( REORG_SP_PTR_PREFIX) + strlen ( gcc_name);
>>char *name = ( char *)alloca(len + 1);
>>strcpy ( name, REORG_SP_PTR_PREFIX);
>>strcat ( name, gcc_name);
>>TYPE_NAME ( pointer_rep) = get_identifier ( name);
>>
>>I detect an ssa_name that I want to change to have this type
>>and change it thusly. Note, this particular ssa_name is a
>>default def which I seems to be very pertinent (since it's
>>the only case that fails.)
>>
>>modify_ssa_name_type ( an_ssa_name, pointer_rep);
>>
>>void
>>modify_ssa_name_type ( tree ssa_name, tree type)
>>{
>>  // This rips off the code in make_ssa_name_fn with a
>>  // modification or two.
>>
>>  if ( TYPE_P ( type) )
>>{
>>   TREE_TYPE ( ssa_name) = TYPE_MAIN_VARIANT ( type);
>>   if ( ssa_defined_default_def_p ( ssa_name) )
>>  {
>> // I guessing which I know is a terrible thing to do...
>> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, TYPE_MAIN_VARIANT (
>type));
>>   }
>> else
>>   {
>>   // The following breaks defaults defs hence the check
>above.
>> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, NULL_TREE);
>>   }
>>}
>> else
>>{
>>  TREE_TYPE ( ssa_name) = TREE_TYPE ( type);
>>  SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, type);
>>}
>>}
>>
>>After this it dies when trying to call print_generic_expr with the ssa
>>name.
>>
>>Here's the bottom most complaint from the internal error:
>>
>>tree check: expected tree that contains ‘decl minimal’ structure, have
>>‘integer_type’ in dump_generic_node, at tree-pretty-print.c:3154
>>
>>Can anybody tell what I'm doing wrong?
>
>Do not modify existing SSA names, instead create a new one and replace
>uses of the old.
>
>Richard.
>
>>Thank,
>>
>>Gary
>>
>>
>>
>>
>>CONFIDENTIALITY NOTICE: This e-mail message, including any
>attachments,
>>is for the sole use of the intended recipient(s) and contains
>>information that is confidential and proprietary to Ampere Computing
>or
>>its subsidiaries. It is to be used solely for the purpose of
>furthering
>>the parties' business relationship. Any review, copying, or
>>distribution of this email (or any attachments thereto) is strictly
>>prohibited. If you are not the intended recipient, please contact the
>>sender immediately and permanently delete the original and any copies
>>of this email and any attachments thereto.



Re: Problems with changing the type of an ssa name

2020-07-24 Thread Richard Biener via Gcc
On July 25, 2020 7:30:48 AM GMT+02:00, Gary Oblock via Gcc  
wrote:
>If you've followed what I've been up to via my questions
>on the mailing list, I finally traced my latest big problem
>back to to my own code. In a nut shell here is what
>I'm doing.
>
>I'm creating a new type exaactly like this:
>
>tree pointer_rep =
>  make_signed_type ( TYPE_PRECISION ( pointer_sized_int_node));
>TYPE_MAIN_VARIANT ( pointer_rep) =
>  TYPE_MAIN_VARIANT ( pointer_sized_int_node);
>const char *gcc_name =
>identifier_to_locale ( IDENTIFIER_POINTER ( TYPE_NAME (
>ri->gcc_type)));
>size_t len =
>  strlen ( REORG_SP_PTR_PREFIX) + strlen ( gcc_name);
>char *name = ( char *)alloca(len + 1);
>strcpy ( name, REORG_SP_PTR_PREFIX);
>strcat ( name, gcc_name);
>TYPE_NAME ( pointer_rep) = get_identifier ( name);
>
>I detect an ssa_name that I want to change to have this type
>and change it thusly. Note, this particular ssa_name is a
>default def which I seems to be very pertinent (since it's
>the only case that fails.)
>
>modify_ssa_name_type ( an_ssa_name, pointer_rep);
>
>void
>modify_ssa_name_type ( tree ssa_name, tree type)
>{
>  // This rips off the code in make_ssa_name_fn with a
>  // modification or two.
>
>  if ( TYPE_P ( type) )
>{
>   TREE_TYPE ( ssa_name) = TYPE_MAIN_VARIANT ( type);
>   if ( ssa_defined_default_def_p ( ssa_name) )
>  {
> // I guessing which I know is a terrible thing to do...
> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, TYPE_MAIN_VARIANT ( type));
>   }
> else
>   {
>   // The following breaks defaults defs hence the check above.
> SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, NULL_TREE);
>   }
>}
> else
>{
>  TREE_TYPE ( ssa_name) = TREE_TYPE ( type);
>  SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, type);
>}
>}
>
>After this it dies when trying to call print_generic_expr with the ssa
>name.
>
>Here's the bottom most complaint from the internal error:
>
>tree check: expected tree that contains ‘decl minimal’ structure, have
>‘integer_type’ in dump_generic_node, at tree-pretty-print.c:3154
>
>Can anybody tell what I'm doing wrong?

Do not modify existing SSA names, instead create a new one and replace uses of 
the old. 

Richard. 

>Thank,
>
>Gary
>
>
>
>
>CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>is for the sole use of the intended recipient(s) and contains
>information that is confidential and proprietary to Ampere Computing or
>its subsidiaries. It is to be used solely for the purpose of furthering
>the parties' business relationship. Any review, copying, or
>distribution of this email (or any attachments thereto) is strictly
>prohibited. If you are not the intended recipient, please contact the
>sender immediately and permanently delete the original and any copies
>of this email and any attachments thereto.



GCC 10.2 Released

2020-07-23 Thread Richard Biener
The GNU Compiler Collection version 10.2 has been released.

GCC 10.2 is a bug-fix release from the GCC 10 branch
containing important fixes for regressions and serious bugs in
GCC 10.1 with more than 94 bugs fixed since the previous release.

This release is available from the FTP servers listed at:

  http://www.gnu.org/order/ftp.html

Please do not contact me directly regarding questions or comments
about this release.  Instead, use the resources available from
http://gcc.gnu.org.

As always, a vast number of people contributed to this GCC release
-- far too many to thank them individually!


GCC 10.2.1 Status Report (2020-07-23)

2020-07-23 Thread Richard Biener


Status
==

The GCC 10.2 release process has been completed and the GCC 10 branch
is now again open for regression and documentation fixes.


Quality Data


Priority  #   Change from last report
---   ---
P1 
P2  219   +   1
P3   57   +   4
P4  176
P5   22
---   ---
Total P1-P3 276   +   5
Total   474   +   5


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2020-July/233135.html


Re: Three issues

2020-07-22 Thread Richard Biener via Gcc
On Thu, Jul 23, 2020 at 5:32 AM Gary Oblock  wrote:
>
> Richard,
>
> My wolf fence failed to detect an issue at the end of my pass
> so I'm now hunting for a problem I caused in a following pass.
>
> Your thoughts?

Sorry - I'd look at the IL after your pass for obvious mistakes.
All default defs need to have a VAR_DECL associated as
SSA_NAME_VAR.

> Gary
>
> - Wolf Fence Follows -
> int
> wf_func ( tree *slot, tree *dummy)
> {
>   tree t_val = *slot;
>   gcc_assert( t_val->ssa_name.var);
>   return 0;
> }
>
> void
> wolf_fence (
> Info *info // Pass level gobal info (might not use it)
>   )
> {
>   struct cgraph_node *node;
>   fprintf( stderr,
>   "Wolf Fence: Find wolf via gcc_assert(t_val->ssa_name.var)\n");
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> {
>   struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
>   push_cfun ( func);
>   DEFAULT_DEFS ( func)->traverse_noresize < tree *, wf_func> ( NULL);
>   pop_cfun ();
> }
>   fprintf( stderr, "Wolf Fence: Didn't find wolf!\n");
> }
> 
> From: Richard Biener 
> Sent: Wednesday, July 22, 2020 2:32 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Three issues
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
> >
> > Some background:
> >
> > This is in the dreaded structure reorganization optimization that I'm
> > working on. It's running at LTRANS time with '-flto-partition=one'.
> >
> > My issues in order of importance are:
> >
> > 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> > has a segfault because the "var" field of "a" is (nil).
> >
> > struct ssa_name_hasher : ggc_ptr_hash
> > {
> >   /* Hash a tree in a uid_decl_map.  */
> >
> >   static hashval_t
> >   hash (tree item)
> >   {
> > return item->ssa_name.var->decl_minimal.uid;
> >   }
> >
> >   /* Return true if the DECL_UID in both trees are equal.  */
> >
> >   static bool
> >   equal (tree a, tree b)
> >   {
> >   return (a->ssa_name.var->decl_minimal.uid == 
> > b->ssa_name.var->decl_minimal.uid);
> >   }
> > };
> >
> > The parameter "a" is associated with "*entry" on the 2nd to last
> > line shown (it's trimmed off after that.) This from hash-table.h:
> >
> > template > template class Allocator>
> > typename hash_table::value_type &
> > hash_table
> > ::find_with_hash (const compare_type &comparable, hashval_t hash)
> > {
> >   m_searches++;
> >   size_t size = m_size;
> >   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
> >
> >   if (Lazy && m_entries == NULL)
> > m_entries = alloc_entries (size);
> >
> > #if CHECKING_P
> >   if (m_sanitize_eq_and_hash)
> > verify (comparable, hash);
> > #endif
> >
> >   value_type *entry = &m_entries[index];
> >   if (is_empty (*entry)
> >   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> > return *entry;
> >   .
> >   .
> >
> > Is there any way this could happen other than by a memory corruption
> > of some kind? This is a show stopper for me and I really need some help on
> > this issue.
> >
> > 2) I tried to dump out all the gimple in the following way at the very
> > beginning of my program:
> >
> > void
> > print_program ( FILE *file, int leading_space )
> > {
> >   struct cgraph_node *node;
> >   fprintf ( file, "%*sProgram:\n", leading_space, "");
> >
> >   // Print Global Decls
> >   //
> >   varpool_node *var;
> >   FOR_EACH_VARIABLE ( var)
> >   {
> > tree decl = var->decl;
> > fprintf ( file, "%*s", leading_space, "");
> > print_generic_decl ( file, decl, (dump_flags_t)0);
> > fprintf ( file, "\n");
> >   }
> >
> >   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> >   {
> > struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> > dump_function_header ( file, func->decl, (dump_flags_t)0);
> > dump_function_to_file ( func->decl, file, (dump_flags_t)0);
> >   }
> > }
> &

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:16 PM Florian Weimer  wrote:
>
> * Richard Biener:
>
> > On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> > wrote:
> >>
> >> * Dongsheng Song:
> >>
> >> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> >> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> >> > python's platform tags (e.g. manylinux2010, manylinux2014).
> >>
> >> I started out with a year number, but that was before the was Level A.
> >> Too many new CPUs only fall under level A unfortunately because they do
> >> not even have AVX.  This even applies to some new server CPU designs
> >> released this year.
> >>
> >> I'm concerned that putting a year into the level name suggests that
> >> everything main-stream released after that year supports that level, and
> >> that's not true.  I think for manylinux, it's different, and it actually
> >> works out there.  No one is building a new GNU/Linux distribution that
> >> is based on glibc 2.12 today, for example.  But not so much for x86
> >> CPUs.
> >>
> >> If you think my worry is unfounded, then a year-based approach sounds
> >> compelling.
> >
> > I think the main question is whether those levels are supposed to be
> > an implementation detail hidden from most software developer or
> > if people are expected to make concious decisions between
> > -march=x86-100 and -march=x86-101.  Implementation detail
> > for system integrators, that is.
>
> Anyone who wants to optimize their software something that's more
> current than what was available in 2003 has to think about this in some
> form.
>
> With these levels, I hope to provide a pre-packaged set of choices, with
> a consistent user interface, in the sense that -march= options and file
> system locations match.  Programmers will definitely encounter these
> strings, and they need to know what they mean for their users.  We need
> to provide them with the required information so that they can make
> decisions based on their knowledge of their user base.  But the ultimate
> decision really has to be a programmer choice.
>
> I'm not sure if GCC documentation or glibc documentation would be the
> right place for this.  An online resource that can be linked to directly
> seems more appropriate.
>
> Apart from that, there is the more limited audience of general purpose
> distribution builders.  I expect they will pick one of these levels to
> build all the distribution binaries, unless they want to be stuck in
> 2003.  But as long they do not choose the highest level defined,
> programmers might still want to provide optimized library builds for
> run-time selection, and then they need the same guidance as before.
>
> > If it's not merely an implementation detail then names without
> > any chance of providing false hints (x86-2014 - oh, it will
> > run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> > course I want avx2) is better.  But this also means this feature
> > should come with extensive documentation on how it is
> > supposed to be used.  For example we might suggest ISVs
> > provide binaries for all architecture levels or use IFUNCs
> > or other runtime CPU selection capabilities.
>
> I think we should document the mechanism as best as we can, and provide
> intended use cases.  We shouldn't go as far as to tell programmers what
> library versions they must build, except that they should always include
> a fallback version if no optimized library can be selected.
>
> Describing the interactions with IFUNCs also makes sense.
>
> But I think we should not go overboard with this.  Historically, we've
> done not such a great job with documenting toolchain features, I know,
> and we should do better now.  I will try to write something helpful, but
> it should still match the relative importance of this feature.
>
> > It's also required to provide a (extensive?) list of SKUs that fall
> > into the respective categories (probably up to CPU vendors to amend
> > those).
>
> I'm afraid, but SKUs are not very useful in this context.
> Virtualization can disable features (e.g., some cloud providers
> advertise they use certain SKUs, but some features are not available to
> guests), and firmware updates have done so as well.  I think the only
> way is to document our selection criteria, and encourage CPU vendors to
> enhance their SKU browsers so that you can search by the (lack of)
> support for certain CPU features.
>
> The selection criteria I sugg

Re: Three issues

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
>
> Some background:
>
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
>
> My issues in order of importance are:
>
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
>
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
>
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
>
>   /* Return true if the DECL_UID in both trees are equal.  */
>
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == 
> b->ssa_name.var->decl_minimal.uid);
>   }
> };
>
> The parameter "a" is associated with "*entry" on the 2nd to last
> line shown (it's trimmed off after that.) This from hash-table.h:
>
> template template class Allocator>
> typename hash_table::value_type &
> hash_table
> ::find_with_hash (const compare_type &comparable, hashval_t hash)
> {
>   m_searches++;
>   size_t size = m_size;
>   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
>
>   if (Lazy && m_entries == NULL)
> m_entries = alloc_entries (size);
>
> #if CHECKING_P
>   if (m_sanitize_eq_and_hash)
> verify (comparable, hash);
> #endif
>
>   value_type *entry = &m_entries[index];
>   if (is_empty (*entry)
>   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> return *entry;
>   .
>   .
>
> Is there any way this could happen other than by a memory corruption
> of some kind? This is a show stopper for me and I really need some help on
> this issue.
>
> 2) I tried to dump out all the gimple in the following way at the very
> beginning of my program:
>
> void
> print_program ( FILE *file, int leading_space )
> {
>   struct cgraph_node *node;
>   fprintf ( file, "%*sProgram:\n", leading_space, "");
>
>   // Print Global Decls
>   //
>   varpool_node *var;
>   FOR_EACH_VARIABLE ( var)
>   {
> tree decl = var->decl;
> fprintf ( file, "%*s", leading_space, "");
> print_generic_decl ( file, decl, (dump_flags_t)0);
> fprintf ( file, "\n");
>   }
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
> struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> dump_function_header ( file, func->decl, (dump_flags_t)0);
> dump_function_to_file ( func->decl, file, (dump_flags_t)0);
>   }
> }
>
> When I run this the first two (out of three) functions print
> just fine. However, for the third, func->decl is (nil) and
> it segfaults.
>
> Now the really odd thing is that this works perfectly at the
> end or middle of my optimization.
>
> What gives?
>
> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
>
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
>
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88 alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*) 
> (ipa-prototype.c:329)
> ==18572==by 0x106E987: gcc::pass_manager::pass_manager(gcc::context*) 
> (pass-instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool) (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
>
> Are these known issues with lto or is this a valgrind issue?

It smells like you are modifying IL via APIs that rely on cfun set to the
function you are modifying.  Note such API dependence might be not
obvious so it's advisable to do

 push_cfun (function to modify);
... modify IL of function ...
 pop_cfun ();

note push/pop_cfun can be expensive so try to glob function modifications.
That said, the underlying issue is likely garbage collector related - try
building with --enable-valgrind-annotations which makes valgrind a bit more
GCC GC aware.

Richard.

> Thanks,
>
> Gary
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its s

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  wrote:
>
> * Dongsheng Song:
>
> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> > python's platform tags (e.g. manylinux2010, manylinux2014).
>
> I started out with a year number, but that was before the was Level A.
> Too many new CPUs only fall under level A unfortunately because they do
> not even have AVX.  This even applies to some new server CPU designs
> released this year.
>
> I'm concerned that putting a year into the level name suggests that
> everything main-stream released after that year supports that level, and
> that's not true.  I think for manylinux, it's different, and it actually
> works out there.  No one is building a new GNU/Linux distribution that
> is based on glibc 2.12 today, for example.  But not so much for x86
> CPUs.
>
> If you think my worry is unfounded, then a year-based approach sounds
> compelling.

I think the main question is whether those levels are supposed to be
an implementation detail hidden from most software developer or
if people are expected to make concious decisions between
-march=x86-100 and -march=x86-101.  Implementation detail
for system integrators, that is.

If it's not merely an implementation detail then names without
any chance of providing false hints (x86-2014 - oh, it will
run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
course I want avx2) is better.  But this also means this feature
should come with extensive documentation on how it is
supposed to be used.  For example we might suggest ISVs
provide binaries for all architecture levels or use IFUNCs
or other runtime CPU selection capabilities.  It's also required
to provide a (extensive?) list of SKUs that fall into the respective
categories (probably up to CPU vendors to amend those).
Since this is a feature crossing multiple projects - at least
glibc and GCC - sharing the source of said documentation
would be important.

So for the bike-shedding I indeed think x86-10{0,1,2,3}
or x86-{A,B,C,..}, eventually duplicating as x86_64- as
suggested by Jan is better than x86-2014 or x86-avx2.

Richard.

> Thanks,
> Florian
>


Re: GCC 10.2 Release Candidate available from gcc.gnu.org

2020-07-17 Thread Richard Biener
On Fri, 17 Jul 2020, Romain Naour wrote:

> Hello,
> 
> Le 15/07/2020 à 13:50, Richard Biener a écrit :
> > 
> > The first release candidate for GCC 10.2 is available from
> > 
> >  https://gcc.gnu.org/pub/gcc/snapshots/10.2.0-RC-20200715/
> >  ftp://gcc.gnu.org/pub/gcc/snapshots/10.2.0-RC-20200715/
> > 
> > and shortly its mirrors.  It has been generated from git commit
> > 932e9140d3268cf2033c1c3e93219541c53fcd29.
> > 
> > I have so far bootstrapped and tested the release candidate on
> > x86_64-linux.  Please test it and report any issues to bugzilla.
> > 
> > If all goes well, I'd like to release 10.2 on Thursday, July 23th.
> > 
> 
> GCC 10 and 9 build may fail to build due a missing build dependency, see
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546248.html
> 
> We need to backport this patch from master:
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b19d8aac15649f31a7588b2634411a1922906ea8

I've pushed it to gcc-10 after bootstrapping on x86_64-unknown-linux-gnu.

Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: Complete 'ChangeLog' files state (was: GCC 10.2 Release Candidate available from gcc.gnu.org)

2020-07-17 Thread Richard Biener
On Fri, 17 Jul 2020, Thomas Schwinge wrote:

> Hi!
> 
> On 2020-07-15T13:50:35+0200, Richard Biener  wrote:
> > The first release candidate for GCC 10.2 is available from
> >
> >  https://gcc.gnu.org/pub/gcc/snapshots/10.2.0-RC-20200715/
> >  ftp://gcc.gnu.org/pub/gcc/snapshots/10.2.0-RC-20200715/
> >
> > and shortly its mirrors.  It has been generated from git commit
> > 932e9140d3268cf2033c1c3e93219541c53fcd29.
> 
> That's probably something to be careful about for any tarball etc.
> releases: per that commit 932e9140d3268cf2033c1c3e93219541c53fcd29, we
> don't have complete 'ChangeLog' files state -- the "catch-up" commit
> 25f8c7101d1a1e7304ed53ec727240a88eb65086 "Daily bump" (2020-07-16) was
> not included.

Not so important for the release candidate but indeed for last minutes
commit to the branch before the actual release 
https://gcc.gnu.org/releasing.html doesn't list how to manually invoke
ChangeLog updating.

Of ocurse I'm just hoping we won't have any last-minute checkins ;)

Richard.
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: Default defs question

2020-07-15 Thread Richard Biener via Gcc
On July 16, 2020 7:09:21 AM GMT+02:00, Gary Oblock via Gcc  
wrote:
>Regarding the other question I asked today could somebody explain to
>me what the default_defs are all about. 

Default defs are SSA names without an explicit defining statement for example 
those representing values at function entry. They are also used for 
uninitialized variables. 

I suspect I'm doing something
>wrong with regard of them. Note, I've isolated the failure in the last
>email
>down to this bit (in red):
>
>if (is_empty (*entry)
>|| (!is_deleted (*entry) && Descriptor::equal (*entry, comparable))
>
>Which doesn't make much sense to me.
>
>Thanks,
>
>Gary
>
>
>CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
>is for the sole use of the intended recipient(s) and contains
>information that is confidential and proprietary to Ampere Computing or
>its subsidiaries. It is to be used solely for the purpose of furthering
>the parties' business relationship. Any review, copying, or
>distribution of this email (or any attachments thereto) is strictly
>prohibited. If you are not the intended recipient, please contact the
>sender immediately and permanently delete the original and any copies
>of this email and any attachments thereto.



GCC 10.2 Release Candidate available from gcc.gnu.org

2020-07-15 Thread Richard Biener


The first release candidate for GCC 10.2 is available from

 https://gcc.gnu.org/pub/gcc/snapshots/10.2.0-RC-20200715/
 ftp://gcc.gnu.org/pub/gcc/snapshots/10.2.0-RC-20200715/

and shortly its mirrors.  It has been generated from git commit
932e9140d3268cf2033c1c3e93219541c53fcd29.

I have so far bootstrapped and tested the release candidate on
x86_64-linux.  Please test it and report any issues to bugzilla.

If all goes well, I'd like to release 10.2 on Thursday, July 23th.


GCC 10.1.1 Status Report (2020-07-15)

2020-07-15 Thread Richard Biener


Status
==

The GCC 10 branch is now frozen for the GCC 10.2 release, all changes
to he branch require a RM approval.


Quality Data


Priority  #   Change from last report
---   ---
P1 
P2  218   +   2
P3   53   +   6
P4  176   +   2
P5   22
---   ---
Total P1-P3 271   +   8
Total   469   +  10


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2020-June/232986.html


Re: Crash at gimple_code(gimple* )

2020-07-15 Thread Richard Biener via Gcc
On Wed, Jul 15, 2020 at 9:30 AM Shuai Wang via Gcc  wrote:
>
> Hello,
>
> I am using the following code to iterate different gimple statements:
>
> ...
>  gimple* stmt = gsi_stmt(gsi);
> if (gimple_assign_load_p(stmt)) {
>  tree rhs = gimple_assign_rhs1 (stmt);
>  if (!rhs) return;
>   gimple* def_stmt = SSA_NAME_DEF_STMT(rhs);
>   if (!def_stmt) return;
>
>  switch (gimple_code (def_stmt)) {
>  
>  }
> }
>
> While the above code works smoothly for most of the cases, to my surprise,
> the following statement (pointed by gsi) would cause a crash at
> gimple_code(def_stmt):
>
> stderr.9_1 = stderr;
>
> It seems that `stderr` is a special tree node; however, it successfully
> passes the two if checks and reaches the gimple_code(def_stmt), but still
> caused an exception:
>
> 0xb5cd5f crash_signal
> ../../gcc-10.1.0/gcc/toplev.c:328
> 0x7f4214557838 gimple_code
>
> /export/d1/shuaiw/gcc-build-10/gcc-install/lib/gcc/x86_64-pc-linux-gnu/10.1.0/plugin/include/gimple.h:1783
> 
>
> Am I missing anything?

I see you're working on 10.1, please make sure to configure your
development compiler with
--enable-checking which would have said that SSA_NAME_DEF_STMT expects
an SSA name
argument but you are passing it a VAR_DECL.

Richard.


> Best,
> Shuai


Re: GCC Plugin to insert new expressions/statements in the code

2020-07-15 Thread Richard Biener via Gcc
On Tue, Jul 14, 2020 at 11:23 PM Masoud Gholami  wrote:
>
> Hi,
>
> I am writing a plugin that  uses the PLUGIN_PRAGMAS event to register a 
> custom pragma that is expected to be before a function call as follows:
>
> int main() {
>
> char *filename = “path/to/file”;
> #pragma inject_before_call
> File *f = fopen(filename, …);   // marked fopen (by the 
> pragma)
> …
> fclose(f);
> char *filename2 = “path/to/file2”;
> File *f2 = fopen(filename2, …); // non-marked fopen
> …
> fclose(f2);
> return 0;
>
> }
>
> In fact, I am using the inject_before_call pragma to mark some fopen calls in 
> the code (in this example, the first  fopen call is marked). Then, for each 
> marked fopen call, some extra expressions/statements/declarations are 
> injected into the code before calling the marked function. For example, the 
> above main function would be transformed as follows:
>
> int main() {
>
> char *filename = “/path/to/file”;
> File *tmp_f = fopen(“/path/to/another/file”, “w+");
> fclose(tmp_f);
> File *f = fopen(filename, …);
> …
> fclose(f);
> char *filename2 = “path/to/file2”;  // codes not injected for the 
> non-marked fopen
> File *f2 = fopen(filename2, …);
> …
> fclose(f2);
> return 0;
>
> }
>
> Here, because of the inject_before_call pragma, the grey code is injected 
> into the main function before calling the marked fopen. It simply opens a new 
> file (“/path/to/another/file”) and closes it.
> The thing about the injected code is that it should be inserted only if a 
> fopen call is marked by a inject_before_call pragma. And if after the 
> inject_before_call pragma no fopen calls are made, the user gets an error 
> (the pragma should be only inserted before a fopen call).
>
> I implemented this in 3 steps as follows:
>
> 1. detection of the marked fopen calls: I created a pragma_handler which 
> remembers the location_t of all inject_before_call pragmas. Then using a pass 
> (before ssa), I look for the statements/expressions that are in the next line 
> of each remembered location. If it’s a fopen call, it is considered as a 
> marked call and the code should be inserted before the fopen call. If it’s 
> something other than a fopen call, an error will be generated. However, I’m 
> not aware if there are any better ways to detect the marked calls.
>
> Here is the simplified pass to find the marked fopen calls (generating errors 
> not covered):
>
> unsigned int execute(function *func) {
> basic_block bb;
> FOR_EACH_BB_FN (bb, func) {
> for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); 
> gsi_next (&gsi)) {
> gimple *stmt = gsi_stmt (gsi);
> if (gimple_is_fopen(stmt)) {
> if (marked_fopen(stmt)) {
> handle_marked_fopen(stmt);
> }
> }
> }
> }
> }
>
> 2. create the GIMPLE representation of the code to be injected: after finding 
> the marked fopen calls, I construct some declaration and expressions as 
> follows:
>
> // create the strings “/path/to/another/file" and “w+"
> tree another_path = build_string (20, “/path/to/another/file");
> fix_string_type (another_path);
> tree mode = build_string (3, “w+\0");
> fix_string_type (mode);
>
> // create a call to the fopen function with the created strings
> tree fopen_decl = lookup_qualified_name (global_namespace, 
> get_identifier("fopen"), 0, true, false);
> gimple *new_open_call = gimple_build_call(fopen_decl, 2, another_path, mode);
>
> // create the tmp_f declaration
> f_decl = build_decl(UNKNOWN_LOCATION, VAR_DECL, get_identifier(“tmp_f"), 
> fileptr_type_node);
> pushdecl (f_decl);
> rest_of_decl_compilation (f_decl, 0, 0);

That's the wrong interface for GIMPLE code.  Is f_decl supposed to be
a global variable
or a function local one?  For the latter simply use

 f_decl = create_tmp_var (fileptr_type_node, "tmp_f");

> // set the lhs of the fopen call to be f_decl
> gimple_call_set_lhs(new_open_call, f_decl)
>
> // create a call to the fclose function with the tmp_f variable
> tree fclose_decl = lookup_qualified_name (global_namespace, 
> get_identifier("fclose"), 0, true, false);

Likewise lookup_qualified_name is a frontend specific function, since
there's no builtin declaration
for fclose you'll have to build one yourself.

> gimple *new_close_call = gimple_build_call(fclose_decl, 1, f_decl);
>
>
> 3. add the created GIMPLE trees to the code (basic-blocks):
>
> basic_block bb = gimple_bb(stmt);
> for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next 
> (&gsi)) { gimple *st = gsi_stmt (gsi);
> if (st == stmt) {  // the marked fopen call
> gsi_insert_before(&gsi, new_open_call, GSI_NEW_STMT);
> gsi_insert_after(&gsi

Re: Understand pointer deferences in GIMPLE

2020-07-14 Thread Richard Biener via Gcc
On Tue, Jul 14, 2020 at 9:17 AM Shuai Wang via Gcc  wrote:
>
> Hello,
>
> I am trying to traverse the GIMPlE statements and identify all pointer
> differences (i.e., memory load and store). For instance, something like:
>
>   **_4* = 0;
>...
>   _108 = (signed char *) _107;
>   _109 = **_108*;
>
> After some quick searches in the GCC codebase, I am thinking to use the
> following statements to identify variables like _4 and _108:
>
> tree op0 = gimple_op(stmt, 0);// get the left variable

Use gimple_get_lhs (stmt)

> if (TREE_CODE (op0) == SSA_NAME) {
>   struct ptr_info_def *pi = SSA_NAME_PTR_INFO (op0);
>   if (pi) {

That's the wrong thing to look at.  You can use gimple_store_p
which also can end up with DECL_P in position op0.

But what you are running into is that the LHS of *_4 = 0; is _not_
the SSA name _4 but a MEM_REF tree with tree operand zero
being the SSA name _4.

> std::cerr << "find a store\n";
> return STORE;
>   }
> }
>
> However, to my surprise, variables like _4 just cannot be matched. Actually
> _4 and _108 will be both treated as "NOT" SSA_NAME, and therefore cannot
> satisfy the first if condition anyway.
>
> So here is my question:
>
> 1. How come variables like _4 and _108 are NOT ssa forms?
> 2. then, what would be the proper way of identifying pointer dereferences,
> something like *_4 = 0; and _109 = *_108 + 1?
>
> Best,
> Shuai


Re: RISC-V: `ld.so' fails linking against `libgcc.a' built at `-O0'

2020-07-13 Thread Richard Biener via Gcc
On Tue, Jul 14, 2020 at 7:24 AM Andreas Schwab  wrote:
>
> On Jul 14 2020, Maciej W. Rozycki wrote:
>
> >  Arguably this might probably be called a deficiency in libgcc, however
> > the objects are built with `-fexceptions -fnon-call-exceptions'
>
> I consider that broken.  It doesn't make any sense to build a lowlevel
> runtime library like libgcc with exceptions.

Indeed - you only need to be able to unwind through those, so
-fasynchronous-unwind-tables should be used.

Richard.

> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: New x86-64 micro-architecture levels

2020-07-13 Thread Richard Biener via Gcc
On Mon, Jul 13, 2020 at 9:40 AM Florian Weimer  wrote:
>
> * Richard Biener:
>
> >> Looks good.  I like it.
> >
> > Likewise.  Btw, did you check that VIA family chips slot into Level A
> > at least?
>
> Those seem to lack SSE4.2, so they land in the baseline.
>
> > Where do AMD bdverN slot in?
>
> bdver1 to bdver3 (as defined by GCC) should land in Level B (so Level A
> if that is dropped).  bdver4 and znver1 (and later) should land in
> Level C.
>
> >>  My only concerns are
> >>
> >> 1. Names like “x86-100”, “x86-101”, what features do they support?
> >
> > Indeed I didn't get the -100, -101 part.  On the GCC side I'd have
> > suggested -march=generic-{A,B,C,D} implying the respective
> > -mtune.
>
> With literal A, B, C, D, or are they just placeholders?  If not literal
> levels, then what we should use there?
>
> I like the simplicity of numbers.  I used letters in the proposal to
> avoid confusion if we alter the proposal by dropping or levels, shifting
> the meaning of those that come later.  I expect to switch back to
> numbers again for the final version.

They are indeed placeholders though I somehow prefer letters to
numbers.  But this is really bike-shedding territory.  Good documentation
on the tools side will be more imporant as well as consistent spelling
between tools sets, possibly driven by a good choice from within the
psABI document.

Richard.


Re: New x86-64 micro-architecture levels

2020-07-12 Thread Richard Biener via Gcc
On Fri, Jul 10, 2020 at 11:45 PM H.J. Lu via Gcc  wrote:
>
> On Fri, Jul 10, 2020 at 10:30 AM Florian Weimer  wrote:
> >
> > Most Linux distributions still compile against the original x86-64
> > baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
> > EM64T compatibility).
> >
> > There has been an attempt to use the existing AT_PLATFORM-based loading
> > mechanism in the glibc dynamic linker to enable a selection of optimized
> > libraries.  But the general selection mechanism in glibc is problematic:
> >
> >   hwcaps subdirectory selection in the dynamic loader
> >   
> >
> > We also have the problem that the glibc version of "haswell" is distinct
> > from GCC's -march=haswell (and presumably other compilers):
> >
> >   Definition of "haswell" platform is inconsistent with GCC
> >   
> >
> > And that the selection criteria are not what people expect:
> >
> >   Epyc and other current AMD CPUs do not select the "haswell" platform
> >   subdirectory
> >   
> >
> > Since the hwcaps-based selection does not work well regardless of
> > architecture (even in cases the kernel provides glibc with data), I
> > worked on a new mechanism that does not have the problems associated
> > with the old mechanism:
> >
> >   [PATCH 00/30] RFC: elf: glibc-hwcaps support
> >   
> >
> > (Don't be concerned that these patches have not been reviewed; we are
> > busy preparing the glibc 2.32 release, and these changes do not alter
> > the glibc ABI itself, so they do not have immediate priority.  I'm
> > fairly confident that a version of these changes will make it into glibc
> > 2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
> > Enterprise Linux 8.4.  Debian as well, but I have never done anything
> > like it there, so I don't know if the patches will be accepted.)
> >
> > Out of the box, this should work fairly well for IBM POWER and Z, where
> > there is a clear progression of silicon versions (at least on paper
> > —virtualization may blur the picture somewhat).
> >
> > However, for x86, we do not have such a clear progression of
> > micro-architecture versions.  This is not just as a result of the
> > AMD/Intel competition, but also due to ongoing product differentiation
> > within one chip vendor.  I think we need these levels broadly for the
> > following reasons:
> >
> > * Selecting on individual CPU features (similar to the old hwcaps
> >   mechanism) in glibc has scalability issues, particularly for
> >   LD_LIBRARY_PATH processing.
> >
> > * Developers need guidance about useful targets for optimization.  I
> >   think there is value in limiting the choices, in the sense that “if
> >   you are able to test three builds in total, these are the things you
> >   should build”.
> >
> > * glibc and the compilers should align in their definition of the
> >   levels, so that developers can use an -march= option to build for a
> >   particular level that is recognized by glibc.  This is why I think the
> >   description of the levels should go into the psABI supplement.
> >
> > * A preference order for these levels avoids falling back to the K8
> >   baseline if the platform progresses to a new version due to
> >   glibc/kernel/hypervisor/hardware upgrades.
> >
> > I'm including a proposal for the levels below.  I use single letters for
> > them, but I expect that the concrete implementation of this proposal
> > will use names like “x86-100”, “x86-101”, like in the glibc patch
> > referenced above.  (But we can discuss other approaches.)
> >
> > I looked at various machines in the Red Hat labs and talked to Intel and
> > AMD engineers about this, but this concrete proposal is based on my own
> > analysis of the situation.  I excluded CPU features related to
> > cryptography and cache management, including hardware transactional
> > memory, and CPU timing.  I assume that we will see some of these
> > features being disabled by the firmware or the kernel over time.  That
> > would eliminate entire levels from selection, which is not desirable.
> > For cryptographic code, I expect that localized selection of an
> > optimized implementation works because such code tends to be isolated
> > blocks, running for dozens of cycles each time, not something that gets
> > scattered all over the place by the compiler.
> >
> > We previously discussed not emitting VZEROUPPER at later levels, but I
> > don't think this is beneficial because the ABI does not have
> > callee-saved vector registers, so it can only be useful with local
> > functions (or whatever LTO considers local), where there is no ABI
> > impact anyway.
> >
> > I did not include FSGSBASE because the FS base is already available at
> > %fs:0.  Changing the FS base in userspace breaks too much,

Re: documentation of powerpc64{,le}-linux-gnu as primary platform

2020-07-09 Thread Richard Biener via Gcc
On July 9, 2020 3:43:19 PM GMT+02:00, David Edelsohn via Gcc  
wrote:
>On Thu, Jul 9, 2020 at 9:07 AM Matthias Klose  wrote:
>>
>> On 7/9/20 1:58 PM, David Edelsohn via Gcc wrote:
>> > On Thu, Jul 9, 2020 at 7:03 AM Matthias Klose 
>wrote:
>> >>
>> >> https://gcc.gnu.org/gcc-8/criteria.html lists the little endian
>platform first
>> >> as a primary target, however it's not mentioned for GCC 9 and GCC
>10. Just an
>> >> omission?
>> >>
>> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg00854.html
>suggests that
>> >> the little endian platform should be mentioned, and maybe the big
>endian
>> >> platform should be dropped?
>> >>
>> >> Jakub suggested to fix that for GCC 9 and GCC 10, and get a
>consensus for GCC 11.
>> >
>> > Why are you so insistent to drop big endian?  No.  Please leave
>this alone.
>>
>> No, I don't leave this alone.  The little endian target is dropped in
>GCC 9 and
>> GCC 10.  Is this really what you intended to do?
>
>No, it's not dropped.  Some people are being pedantic about the name,
>which is why Bill added {,le}.  powerpc64-unknown-linux-gnu means
>everything.  If you want to add {,le} back, that's fine.  But there
>always is some variant omitted, and that doesn't mean it is ignored.
>The more that one over-specifies and enumerates some variants, the
>more that it implies the other variants intentionally are ignored.
>
>I would appreciate that we would separate the discussion about
>explicit reference to {,le} from the discussion about dropping the big
>endian platform.

I think for primary platforms it is important to be as specific as possible 
since certain regressions are supposed to block a release. That's less of an 
issue for secondary platforms but it's still a valid concern there as well for 
build issues. 

Richard. 

>Thanks, David



Re: GCC 10.1.1 Status Report (2020-06-29)

2020-07-08 Thread Richard Biener
On Mon, 29 Jun 2020, Richard Biener wrote:

> 
> Status
> ==
> 
> The GCC 10 branch is in regression and documentation fixing mode.
> 
> We're close to two months after the GCC 10.1 release which means
> a first bugfix release is about to happen.  The plan is to release
> mid July and I am targeting for a release candidate mid next
> week, no later than July 17th.

So this sparked some confusion as "mid next week" is now but July 17th
is end of next week.  Thus I will do 10.1 RC1 next week July 15th.

Sorry for the confusion,
Richard.


<    2   3   4   5   6   7   8   9   10   11   >