Re: GCC 9 web timeline dates incorrect?

2019-01-16 Thread Richard Biener
On Wed, Jan 16, 2019 at 12:55 PM Tadeus Prastowo
 wrote:
>
> On Wed, Jan 16, 2019 at 12:52 PM Richard Biener
>  wrote:
> >
> > On Wed, Jan 16, 2019 at 10:40 AM Tadeus Prastowo
> >  wrote:
> > >
> > > Hi,
> > >
> > > On this link https://gcc.gnu.org/develop.html#timeline, I see the 
> > > following:
> > >
> > > GCC 9 Stage 1 (starts 2018-04-25)   GCC 8.1 release (2018-05-02)
> > >  |   \
> > >  |v
> > >  |  GCC 8.2 release (2018-07-26)
> > > GCC 9 Stage 3 (starts 2017-11-12)
> > >  |
> > > GCC 9 Stage 4 (starts 2018-01-07)
> > >  |
> > >  v
> > >
> > > Are the dates for GCC 9 Stage 3 and Stage 4 correct?
> >
> > Fixed.
>
> Ack for Stage 4's date.  But, are you sure that Stage 3 should still
> be shown as below?
>
> GCC 9 Stage 3 (starts 2017-11-12)

Shows 2018 for me.

> Thank you.
>
> --
> Best regards,
> Tadeus


Re: Parallelize the compilation using Threads

2019-01-16 Thread Richard Biener
:   0.02 (  0%)   0.01 (  0%)   0.04 (  
> 0%)   0 kB (  0%)
>  CPROP  :   1.27 (  1%)   0.01 (  0%)   1.14 (  
> 1%)   30881 kB (  2%)
>  PRE:   0.61 (  1%)   0.00 (  0%)   0.59 (  
> 1%)1920 kB (  0%)
>  CSE 2  :   0.57 (  1%)   0.01 (  0%)   0.58 (  
> 1%)2822 kB (  0%)
>  branch prediction  :   0.08 (  0%)   0.01 (  0%)   0.10 (  
> 0%) 887 kB (  0%)
>  combiner   :   1.15 (  1%)   0.00 (  0%)   1.28 (  
> 1%)   35520 kB (  2%)
>  if-conversion  :   0.24 (  0%)   0.00 (  0%)   0.22 (  
> 0%)5851 kB (  0%)
>  integrated RA  :   2.29 (  2%)   0.03 (  1%)   2.37 (  
> 2%)   54041 kB (  3%)
>  LRA non-specific   :   0.97 (  1%)   0.01 (  0%)   1.04 (  
> 1%)5294 kB (  0%)
>  LRA virtuals elimination   :   0.44 (  0%)   0.00 (  0%)   0.39 (  
> 0%)6089 kB (  0%)
>  LRA reload inheritance :   0.17 (  0%)   0.00 (  0%)   0.27 (  
> 0%)5783 kB (  0%)
>  LRA create live ranges :   1.07 (  1%)   0.00 (  0%)   1.09 (  
> 1%)1004 kB (  0%)
>  LRA hard reg assignment:   0.11 (  0%)   0.00 (  0%)   0.09 (  
> 0%)   0 kB (  0%)
>  LRA rematerialization  :   0.20 (  0%)   0.00 (  0%)   0.20 (  
> 0%)   0 kB (  0%)
>  reload :   0.02 (  0%)   0.00 (  0%)   0.03 (  
> 0%)   0 kB (  0%)
>  reload CSE regs:   0.90 (  1%)   0.01 (  0%)   0.80 (  
> 1%)   13780 kB (  1%)
>  ree:   0.13 (  0%)   0.00 (  0%)   0.10 (  
> 0%) 589 kB (  0%)
>  thread pro- & epilogue :   0.51 (  1%)   0.01 (  0%)   0.57 (  
> 1%)2328 kB (  0%)
>  if-conversion 2:   0.08 (  0%)   0.00 (  0%)   0.08 (  
> 0%) 319 kB (  0%)
>  combine stack adjustments  :   0.04 (  0%)   0.00 (  0%)   0.02 (  
> 0%)   0 kB (  0%)
>  peephole 2 :   0.12 (  0%)   0.00 (  0%)   0.18 (  
> 0%)1242 kB (  0%)
>  hard reg cprop :   0.57 (  1%)   0.00 (  0%)   0.49 (  
> 0%) 189 kB (  0%)
>  scheduling 2   :   2.53 (  3%)   0.03 (  1%)   2.53 (  
> 2%)5740 kB (  0%)
>  machine dep reorg  :   0.08 (  0%)   0.00 (  0%)   0.07 (  
> 0%)   0 kB (  0%)
>  reorder blocks :   0.74 (  1%)   0.01 (  0%)   0.69 (  
> 1%)6926 kB (  0%)
>  shorten branches   :   0.20 (  0%)   0.00 (  0%)   0.16 (  
> 0%)   0 kB (  0%)
>  final  :   0.85 (  1%)   0.01 (  0%)   0.97 (  
> 1%)  115151 kB (  6%)
>  symout :   1.17 (  1%)   0.11 (  2%)   1.25 (  
> 1%)  202121 kB ( 11%)
>  variable tracking  :   0.77 (  1%)   0.01 (  0%)   0.81 (  
> 1%)   45792 kB (  2%)
>  var-tracking dataflow  :   1.30 (  1%)   0.01 (  0%)   1.24 (  
> 1%) 926 kB (  0%)
>  var-tracking emit  :   1.43 (  1%)   0.01 (  0%)   1.42 (  
> 1%)   57281 kB (  3%)
>  tree if-combine:   0.06 (  0%)   0.00 (  0%)   0.02 (  
> 0%) 417 kB (  0%)
>  uninit var analysis:   0.03 (  0%)   0.00 (  0%)   0.02 (  
> 0%)   0 kB (  0%)
>  straight-line strength reduction   :   0.04 (  0%)   0.00 (  0%)   0.03 (  
> 0%) 525 kB (  0%)
>  store merging  :   0.04 (  0%)   0.00 (  0%)   0.03 (  
> 0%) 492 kB (  0%)
>  initialize rtl :   0.01 (  0%)   0.00 (  0%)   0.04 (  
> 0%)  12 kB (  0%)
>  address lowering   :   0.04 (  0%)   0.00 (  0%)   0.02 (  
> 0%)   2 kB (  0%)
>  early local passes :   0.02 (  0%)   0.01 (  0%)   0.00 (  
> 0%)   0 kB (  0%)
>  unaccounted optimizations  :   0.01 (  0%)   0.00 (  0%)   0.00 (  
> 0%)   0 kB (  0%)
>  rest of compilation:   1.29 (  1%)   0.01 (  0%)   1.11 (  
> 1%)5063 kB (  0%)
>  remove unused locals   :   0.25 (  0%)   0.04 (  1%)   0.25 (  
> 0%)  37 kB (  0%)
>  address taken  :   0.11 (  0%)   0.10 (  2%)   0.25 (  
> 0%)   0 kB (  0%)
>  verify loop closed :   0.00 (  0%)   0.00 (  0%)   0.01 (  
> 0%)   0 kB (  0%)
>  verify RTL sharing :   5.24 (  5%)   0.05 (  1%)   5.37 (  
> 5%)   0 kB (  0%)
>  rebuild frequencies:   0.04 (  0%)   0.00 (  0%)   0.06 (  
> 0%) 621 kB (  0%)
>  repair loop structures :   0.17 (  0%)   0.00 (  0%)   0.24 (  
> 0%)   0 kB (  0%)
>  TOTAL  

Re: Problems with GGC and bitmap/hash_set

2019-01-18 Thread Richard Biener
On Thu, Jan 17, 2019 at 8:16 PM Michael Ploujnikov
 wrote:
>
> Hi,
>
> I've been doing some investigations that required using a bitmap to keep 
> track of decl IDs and I ran into segmentation fault after my bitmap has been 
> loaded from a PCH. Attached is a short patch that can reliably reproduce my 
> problem. Looks like head->current is set to point to uninitialized memory for 
> some reason. Could someone with a strong understanding of GGC take a look and 
> let me know what I might be doing wrong and how to fix this?

If you're just doing investigation why do you care about PCH?

In any event I guess that bimaps are not supported by PCH because of
GTY(skip)ing the bitmap_head current member (so that's not saved).
I guess PCH writers/readers need to ignore skipping.

> I can reliably reproduce this with the following command:
>
> gdb --args /gcc/build/./gcc/cc1plus -nostdinc++ -nostdinc++ -I 
> /gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I 
> /gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include -I 
> /gcc/src/libstdc++-v3/libsupc++ -I /gcc/src/libstdc++-v3/include/backward -I 
> /gcc/src/libstdc++-v3/testsuite/util -iprefix 
> /gcc/build/gcc/../lib/gcc/x86_64-pc-linux-gnu/9.0.0/ -isystem 
> /gcc/build/./gcc/include -isystem /gcc/build/./gcc/include-fixed 
> -D_GNU_SOURCE -D _GNU_SOURCE -D LOCALEDIR=. -isystem 
> /gcc-install/x86_64-pc-linux-gnu/include -isystem 
> /gcc-install/x86_64-pc-linux-gnu/sys-include 
> /gcc/src/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_c++200x_compatibility.cc
>  -quiet -dumpbase all_c++200x_compatibility.cc -mtune=generic -march=x86-64 
> -auxbase-strip all_c++200x_compatibility.s -g -O2 -Wc++11-compat -Werror 
> -version -fdiagnostics-color=never -fchecking=1 -fmessage-length=0 
> -fno-show-column -ffunction-sections -fdata-sections 
> -fno-diagnostics-show-caret -o all_c++200x_compatibility.s
>
> Here's a sample GDB session:
>
> Breakpoint 2, c_common_read_pch (pfile=0x31eafa0,
> name=0x31fc950 
> "/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/extc++.h.gch/O2g.gch",
>  fd=9,
> orig_name=0x31ff0e0 
> "/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/extc++.h")
>  at ../../src/gcc/c-family/c-pch.c:348
> 348   timevar_push (TV_PCH_RESTORE);
> (gdb) enable 1
> (gdb) c
> Continuing.
>
> Breakpoint 1, allocate_decl_uid () at ../../src/gcc/tree.c:1024
> 1024  if (!all_ids)
> (gdb) p *all_ids
> $8 = {static crashme = {elements = 0x0, heads = 0x0, obstack = {chunk_size = 
> 0, chunk = 0x0,
>   object_base = 0x0, next_free = 0x0, chunk_limit = 0x0, temp = {i = 0, p 
> = 0x0},
>   alignment_mask = 0, chunkfun = {plain = 0x0, extra = 0x0}, freefun = 
> {plain = 0x0,
> extra = 0x0}, extra_arg = 0x0, use_extra_arg = 0, maybe_empty_object 
> = 0,
>   alloc_failed = 0}}, indx = 1790, tree_form = true, first = 0x1002dd9950,
>   current = 0x7fdf0f865370, obstack = 0x0}
> (gdb) n
> 1029  gcc_assert (all_ids->tree_form);
> (gdb) n
> 1030  int new_id = next_decl_uid++;
> (gdb) n
> 1031  gcc_assert (!bitmap_bit_p (all_ids, new_id));
> (gdb) s
> bitmap_bit_p (head=0x10006cc3c0, bit=229123) at ../../src/gcc/bitmap.c:963
> 963   unsigned int indx = bit / BITMAP_ELEMENT_ALL_BITS;
> (gdb) p indx
> $10 = 32767
> (gdb) s
> 968   if (!head->tree_form)
> (gdb) s
> 971 ptr = bitmap_tree_find_element (head, indx);
> (gdb) s
> bitmap_tree_find_element (head=0x10006cc3c0, indx=1790) at 
> ../../src/gcc/bitmap.c:546
> 546   if (head->current == NULL
> (gdb) p head->current
> $11 = (bitmap_element *) 0x7fdf0f865370
> (gdb) p *head->current
> Cannot access memory at address 0x7fdf0f865370
> (gdb) c
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00c4866b in bitmap_bit_p (head=0x10006cc3c0, bit=229123) at 
> ../../src/gcc/bitmap.c:978
> 978   return (ptr->bits[word_num] >> bit_num) & 1;
>
>
>
>
> Alternatively I tried using a hash_set to keep track of ids:
>
> typedef int_hash  unsigned_int_hasher;
> static GTY(()) hash_set *all_ids;
>
> But I couldn't figure out how to write the required gt_ggc_mx and gt_pch_nx 
> methods.
>
>
> Any help would be greatly appreciated!
>
>
> - Michael


Re: [Question]: How to tracking the relationship between gimple expr and expanded rtx ?

2019-01-18 Thread Richard Biener
On Fri, Jan 18, 2019 at 4:11 AM Li Kun  wrote:
>
> I need to known which rtx is expanded from a specific CALL_EXPR, how
> could i do ?
>
> Is INSN_LOCATION accurate enough ?

No.  There's no accurate way to do this so you have to invent something.
Or start by explaining what you are wanting to do.

Richard.

>
> --
> Best Regards
> Li Kun
>


Re: [Question]: How to tracking the relationship between gimple expr and expanded rtx ?

2019-01-18 Thread Richard Biener
On Fri, Jan 18, 2019 at 10:33 AM Li Kun  wrote:
>
>
>
> 在 2019/1/18 16:52, Richard Biener 写道:
> > On Fri, Jan 18, 2019 at 4:11 AM Li Kun  wrote:
> >> I need to known which rtx is expanded from a specific CALL_EXPR, how
> >> could i do ?
> >>
> >> Is INSN_LOCATION accurate enough ?
> > No.  There's no accurate way to do this so you have to invent something.
> > Or start by explaining what you are wanting to do.
> I'm trying to implement safestack as an pass after expand, so i have to
> known where the
> args are lying if the composite struct param passed by reference, and i
> could move the arg to unsafe region.
> I try to not interfere the procedure of expand_call, but i can't get the
> accurate informations.
> What i'm thinking about is to get the rtxs of CALL_EXPR, and to locate
> the args reversely.
> So is there any better way to make it ?

I fear the only appropriate place to do something like this is hooking
into call expansion (expand_call).

Richard.

>
> Thanks a lot!
> >
> > Richard.
> >
> >> --
> >> Best Regards
> >> Li Kun
> >>
>
> --
> Best Regards
> Li Kun
>


Re: Problems with GGC and bitmap/hash_set

2019-01-18 Thread Richard Biener
On January 18, 2019 2:38:44 PM GMT+01:00, Michael Ploujnikov 
 wrote:
>On 2019-01-18 3:45 a.m., Richard Biener wrote:
>> On Thu, Jan 17, 2019 at 8:16 PM Michael Ploujnikov
>>  wrote:
>>>
>>> Hi,
>>>
>>> I've been doing some investigations that required using a bitmap to
>keep track of decl IDs and I ran into segmentation fault after my
>bitmap has been loaded from a PCH. Attached is a short patch that can
>reliably reproduce my problem. Looks like head->current is set to point
>to uninitialized memory for some reason. Could someone with a strong
>understanding of GGC take a look and let me know what I might be doing
>wrong and how to fix this?
>> 
>> If you're just doing investigation why do you care about PCH?
>
>My idea is to write a patch that does something similar to this, but
>not exactly (obviously not storing all of the object IDs for example).
>It's very preliminary and I don't know about the feasibility of what
>I'm trying to do, but I do know that I need a bitmap/hash_set for the
>full duration of toplev::main which inevitably involves PCH. However
>because of this segfault I haven't been able to actually try my idea
>yet!
>
>> 
>> In any event I guess that bimaps are not supported by PCH because of
>> GTY(skip)ing the bitmap_head current member (so that's not saved).
>> I guess PCH writers/readers need to ignore skipping.
>
>Why is the current member not saved? I know the documentation says that
>another pointer in the chain already points to the same thing, but
>what's the harm in not skipping it?

It disturbs optimal chaining of the list elements during the GC walk. As said 
we probably need a different thing than Skip for this purpose. For your 
experiments simply remove the skipping. 

Richard. 

>
>- Michael



Re: Support for AVX512 ternary logic instruction

2019-01-21 Thread Richard Biener
On Mon, Jan 21, 2019 at 2:46 AM Andi Kleen  wrote:
>
> Wojciech Muła  writes:
> >
> > The main concern is if it's a proper approach? Seems that to match
> > other logic functions, like "a & b | c", a separate pattern is required.
> > Since an argument can be either negated or not, and we can use three
> > logic ops (or, and, xor) there would be 72 patterns. So maybe a new
> > optimization pass would be easier to create and maintain? (Just a silly
> > guess.)
>
> Yes that's not scalable.

You can use code iterators for the logic ops so only need to
explicitely write down the not variants.

> >
> > I'd be grateful for any comments and advice.
>
> Maybe you could write it in the simplifier pattern language
> and then generate a suitable builtin.

Using an UNSPEC and machine-reorg might also be an option...

> See https://gcc.gnu.org/onlinedocs/gccint/Match-and-Simplify.html
>
> However the problem is that this may affect other optimizations
> because it happens too early. e.g. the compiler would also need
> to learn to constant propagate the new builtin, and understand
> its side effects, which might affect a lot of places.
>
> So a custom compiler patch that runs late may be better.
> Or perhaps some extension of the simplifier that does it.
>
> I looked at this at some point for PCMP*STR* which are similarly
> powerful instructions that could potentially replace a lot of
> others.
>
> -Andi


Re: SLP-based reduction vectorization

2019-01-24 Thread Richard Biener
On Mon, Jan 21, 2019 at 2:20 PM Anton Youdkevitch
 wrote:
>
> Here is the prototype for doing vectorized reduction
> using SLP approach. I would appreciate feedback if this
> is a feasible approach and if overall the direction is
> right.
>
> The idea is to vectorize reduction like this
>
> S = A[0]+A[1]+...A[N];
>
> into
>
> Sv = Av[0]+Av[1]+...+Av[N/VL];
>
>
> So that, for instance, the following code:
>
> typedef double T;
> T sum;
>
> void foo (T*  __restrict__ a)
> {
> sum = a[0]+ a[1] + a[2]+ a[3] + a[4]+ a[5] + a[6]+ a[7];
> }
>
>
> instead of:
>
> foo:
> .LFB23:
> .cfi_startproc
> movsd   (%rdi), %xmm0
> movsd   16(%rdi), %xmm1
> addsd   8(%rdi), %xmm0
> addsd   24(%rdi), %xmm1
> addsd   %xmm1, %xmm0
> movsd   32(%rdi), %xmm1
> addsd   40(%rdi), %xmm1
> addsd   %xmm1, %xmm0
> movsd   48(%rdi), %xmm1
> addsd   56(%rdi), %xmm1
> addsd   %xmm1, %xmm0
> movsd   %xmm0, sum2(%rip)
> ret
> .cfi_endproc
>
>
> be compiled into:
>
> foo:
> .LFB11:
> .cfi_startproc
> movupd  32(%rdi), %xmm0
> movupd  48(%rdi), %xmm3
> movupd  (%rdi), %xmm1
> movupd  16(%rdi), %xmm2
> addpd   %xmm3, %xmm0
> addpd   %xmm2, %xmm1
> addpd   %xmm1, %xmm0
> haddpd  %xmm0, %xmm0
> movlpd  %xmm0, sum(%rip)
> ret
> .cfi_endproc
>
>
> As this is a very crude prototype there are some things
> to consider.
>
> 1. As the current SLP framework assumes presence of
> group stores I cannot use directly it as reduction
> does not require group stores (or even stores at all),
> so, I'm partially using the existing functionality but
> sometimes I have to create a stripped down version
> of it for my own needs;
>
> 2. The current version considers only PLUS reduction
> as it is encountered most often and therefore is the
> most practical;
>
> 3. While normally SLP transformation should operate
> inside single basic block this requirement greatly
> restricts it's practical application as in a code
> complex enough there will be vectorizable subexpressions
> defined in basic block(s) different from that where the
> reduction result resides. However, for the sake of
> simplicity only single uses in the same block are
> considered now;
>
> 4. For the same sake the current version does not deal
> with partial reductions which would require partial sum
> merging and careful removal of the scalars that participate
> in the vector part. The latter gets done automatically
> by DCE in the case of full reduction vectorization;
>
> 5. There is no cost model yet for the reasons mentioned
> in the paragraphs 3 and 4.

First sorry for the late response.

No, I don't think your approach of bypassing the "rest"
is OK.  I've once started to implement BB reduction
support but somehow got distracted IIRC.  Your testcase
(and the prototype I worked on) still has a (scalar non-grouped)
store which can be keyed on in SLP discovery phase.

You should be able to "re-use" (by a lot of refactoring I guess)
the reduction finding code (vect_is_slp_reduction) to see
wheter such a store is fed by a reduction chain.  Note that
if you adjust the testcase to have

 sum[0] = a[0] + ... + a[n];
 sum[1] = b[0] + ... + b[n];

then you'll have a grouped store fed by reductions.  You
can also consider

 for (i = ...)
  {
 sum[i] = a[i*4] + a[i*4+1] + a[i*4+2] + a[i*4+3];
  }

which we should be able to handle.

So the whole problem of doing BB reductions boils down
to SLP tree discovery, the rest should be more straight-forward
(of course code-gen needs to be adapted for the non-loop case
as well).

It's not the easiest problem you try to tackle btw ;)  May
I suggest you become familiar with the code by BB vectorizing
vector CONSTRUCTORs instead?

typedef int v4si __attribute__((vector_size(16)));

v4si foo (int *i, *j)
{
  return (v4si) { i[0] + j[0], i[1] + j[1], i[2] + j[2], i[3] + j[3] };
}

it has the same SLP discovery "issue", this time somewhat
easier as a CONSTRUCTOR directly plays the role of the
"grouped store".

Richard.

> Thanks in advance.
>
> --
>   Anton


Re: __builtin_dynamic_object_size

2019-01-24 Thread Richard Biener
On Wed, Jan 23, 2019 at 12:33 PM Jakub Jelinek  wrote:
>
> On Wed, Jan 23, 2019 at 10:40:43AM +, Jonathan Wakely wrote:
> > There's a patch to add __builtin_dynamic_object_size to clang:
> > https://reviews.llvm.org/D56760
> >
> > It was suggested that this could be done via a new flag bit for
> > __builtin_object_size, but only if GCC would support that too
> > (otherwise it would be done as a separate builtin).
> >
> > Is there any interest in adding that as an option to __builtin_object_size?
> >
> > I know Jakub is concerned about arbitrarily complex expressions, when
> > __builtin_object_size is supposed to always be efficient and always
> > evaluate at compile time (which would imply the dynamic behaviour
> > should be a separate builtin, if it exists at all).
>
> The current modes (0-3) certainly must not be changed and must return a
> constant, that is what huge amounts of code in the wild relies on.

I wouldn't overload _bos but only use a new builtin.

> The reason to choose constants only was the desire to make _FORTIFY_SOURCE
> cheap at runtime.  For the dynamically computed expressions, the question
> is how far it should go, how complex expressions it wants to build and how
> much code and runtime can be spent on computing that.
>
> The rationale for __builtin_dynamic_object_size lists only very simple
> cases, where the builtin is just called on result of malloc, so that is
> indeed easy, the argument is already evaluated before the malloc call, so
> you can just save it in a temporary and use later.  Slightly more complex
> is calloc, where you need to multiply two numbers (do you handle overflow
> somehow, or not?).  But in real world, it can be arbitrarily more complex,
> there can be pointer arithmetics with constant or variable offsets,
> there can be conditional adjustments of pointers or PHIs with multiple
> "dynamic" expressions for the sizes (shall the dynamic expression evaluate
> as max over expressions for different phi arguments (that is essentially
> what is done for the constant __builtin_object_size, but for dynamic
> expressions max needs to be computed at runtime, or shall it reconstruct
> the conditional or remember it and compute whatever ? val1 : val2),
> loops which adjust pointers, etc. and all that can be done many times in
> between where the objects are allocated and where the builtin is used.

Which means I'd like to see a thorough specification of the builtin.
If it is allowed to return "failure" in any event then of what use is
the builtin in practice?

Richard.

> Jakub


Re: SLP-based reduction vectorization

2019-01-24 Thread Richard Biener
On Thu, Jan 24, 2019 at 1:04 PM Anton Youdkevitch
 wrote:
>
> Richard,
>
> Thanks a lot for the response! I will definitely
> try the constructor approach.
>
> You mentioned non-grouped store. Is the handling
> of such stores actually there and I just missed
> it? It was a big hassle for me as I didn't manage
> to find it and assumed there isn't one.

No, it isn't there.  On a branch I'm working on I'm
just doing sth like

+  /* Find SLP sequences starting from single stores.  */
+  data_reference_p dr;
+  FOR_EACH_VEC_ELT (vinfo->shared->datarefs, i, dr)
+   if (DR_IS_WRITE (dr))
+ {
+   stmt_vec_info stmt_info = vinfo->lookup_dr (dr)->stmt;
+   if (STMT_SLP_TYPE (stmt_info))
+ continue;
+   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+ continue;
+   vect_analyze_slp_instance (vinfo, stmt_info, max_tree_size);
+ }

note that this alone won't work since you have to actually
build a first set of scalar stmts (like vect_analyze_slp_instance
does for groups just by gathering its elements) from the
reduction.  But here you could (and IIRC I did back in time
when prototyping reduction BB vect support) hack in
some ad-hoc pattern-matching of a series of PLUS.

> I have a question (a lot of them, though, but this
> one is bothering me most). It is actually paragraph
> 4 of my previous letter. In real world code there can
> be a case that the loading of the elements and the use
> of them (for a reduction) are in different BBs (I saw
> this myself). So, not only it complicates the things
> in general but for me it breaks some SLP code assuming
> single BB operation (IIRC, some dataref analysis phase
> assumes single BB). Did anybody consider this before?

Sure I considered this but usually restricting one to a
single BB works quite well and simplifies dependence
analysis a lot.

> OK, I know I start looking kind of stubborn here but
> what about the case:
>
> foo(A[0]+...A[n])
>
> There won't be any store here by definition while a
> reduction will. Or is it something too rarely seen?

You are right - in principle a reduction can be "rooted"
at any point.  But you need to come up with an
algorithm with sensible cost (in time and memory)
to detect the reduction group.  The greedy matching
I talked about above can be applied anywhere, not
just at stores.

> --
>Thanks,
>Anton
>
>
> On 24/1/2019 13:47, Richard Biener wrote:
> > On Mon, Jan 21, 2019 at 2:20 PM Anton Youdkevitch
> >  wrote:
> >>
> >> Here is the prototype for doing vectorized reduction
> >> using SLP approach. I would appreciate feedback if this
> >> is a feasible approach and if overall the direction is
> >> right.
> >>
> >> The idea is to vectorize reduction like this
> >>
> >> S = A[0]+A[1]+...A[N];
> >>
> >> into
> >>
> >> Sv = Av[0]+Av[1]+...+Av[N/VL];
> >>
> >>
> >> So that, for instance, the following code:
> >>
> >> typedef double T;
> >> T sum;
> >>
> >> void foo (T*  __restrict__ a)
> >> {
> >>  sum = a[0]+ a[1] + a[2]+ a[3] + a[4]+ a[5] + a[6]+ a[7];
> >> }
> >>
> >>
> >> instead of:
> >>
> >> foo:
> >> .LFB23:
> >>  .cfi_startproc
> >>  movsd   (%rdi), %xmm0
> >>  movsd   16(%rdi), %xmm1
> >>  addsd   8(%rdi), %xmm0
> >>  addsd   24(%rdi), %xmm1
> >>  addsd   %xmm1, %xmm0
> >>  movsd   32(%rdi), %xmm1
> >>  addsd   40(%rdi), %xmm1
> >>  addsd   %xmm1, %xmm0
> >>  movsd   48(%rdi), %xmm1
> >>  addsd   56(%rdi), %xmm1
> >>  addsd   %xmm1, %xmm0
> >>  movsd   %xmm0, sum2(%rip)
> >>  ret
> >>  .cfi_endproc
> >>
> >>
> >> be compiled into:
> >>
> >> foo:
> >> .LFB11:
> >>  .cfi_startproc
> >>  movupd  32(%rdi), %xmm0
> >>  movupd  48(%rdi), %xmm3
> >>  movupd  (%rdi), %xmm1
> >>  movupd  16(%rdi), %xmm2
> >>  addpd   %xmm3, %xmm0
> >>  addpd   %xmm2, %xmm1
> >>  addpd   %xmm1, %xmm0
> >>  haddpd  %xmm0, %xmm0
> >>  movlpd  %xmm0, sum(%rip)
> >>  ret
> >>  .cfi_endproc
> >>
> >>
> >> As this is a very crude prototype there are some things
> >> to consider.
> >>
> >

Re: Enabling LTO for target libraries (e.g., libgo, libstdc++)

2019-01-25 Thread Richard Biener
On January 25, 2019 7:22:36 AM GMT+01:00, Nikhil Benesch 
 wrote:
>I am attempting to convince GCC to build target libraries with
>link-time
>optimizations enabled. I am primarily interested in libgo, but this
>discussion
>seems like it would be applicable to libstdc++, libgfortran, etc. The
>benchmarking I've done suggests that LTOing libgo yields a 5-20%
>speedup on
>various Go programs, which is quite substantial.
>
>The trouble is convincing GCC's build system to apply the various LTO
>flags to
>the correct places. Ian Taylor suggested the following to plumb -flto
>into
>libgo compilation:
>
>$ make GOCFLAGS_FOR_TARGET="-g -O3 -flto"
>
>This nearly works, and I believe there are analogous options that would
>apply to
>the other target libraries that GCC builds.
>
>The trouble is that while building libgo, the build system uses ar and
>ranlib
>directly from binutils, without providing them with the LTO plugin that
>was
>built earlier. This means that the LTO information is dropped on the
>floor, and
>attempting to link with the built libgo archive will fail.
>
>I have a simple patch to the top-level configure.ac that resolves the
>issue by
>teaching the build system to use the gcc-ar and gcc-ranlib wrappers
>which were
>built earlier and know how to pass the linker plugin to the underlying
>ar/ranlib
>commands. The patch is small enough that I've included it at the end of
>this
>email.
>
>My question is whether this is a reasonable thing to do. It seems like
>using
>the gcc-ar and gcc-ranlib wrappers strictly improves the situation, and
>won't
>impact compilations that don't specify -flto. But I'm not familiar
>enough with
>the build system to say for sure.
>
>Does anyone have advice to offer? Has anyone tried convincing the build
>system
>to compile some of the other target libraries (like libstdc++ or
>libgfortran)
>with -flto?

Using the built gcc-ar and ranlib sounds good but the patch looks not Form a 
quick look. I think we want to change the top level makefile to pass down the 
appropriate ar/ranlib_FOR_TARGET similar to how we pass CC_FOR_tARGET. 

Richard. 

>diff --git a/configure.ac b/configure.ac
>index 87f2aee05008..1c38ac5979ff 100644
>--- a/configure.ac
>+++ b/configure.ac
>@@ -3400,7 +3400,8 @@
>ACX_CHECK_INSTALLED_TARGET_TOOL(WINDMC_FOR_TARGET, windmc)
> 
> RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
> 
>-GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [binutils/ar])
>+GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR,
>+  [gcc/gcc-ar -B$$r/$(HOST_SUBDIR)/gcc/])
> GCC_TARGET_TOOL(as, AS_FOR_TARGET, AS, [gas/as-new])
>GCC_TARGET_TOOL(cc, CC_FOR_TARGET, CC, [gcc/xgcc
>-B$$r/$(HOST_SUBDIR)/gcc/])
> dnl see comments for CXX_FOR_TARGET_FLAG_TO_PASS
>@@ -3424,7 +3425,8 @@ GCC_TARGET_TOOL(nm, NM_FOR_TARGET, NM,
>[binutils/nm-new])
>GCC_TARGET_TOOL(objcopy, OBJCOPY_FOR_TARGET, OBJCOPY,
>[binutils/objcopy])
>GCC_TARGET_TOOL(objdump, OBJDUMP_FOR_TARGET, OBJDUMP,
>[binutils/objdump])
> GCC_TARGET_TOOL(otool, OTOOL_FOR_TARGET, OTOOL)
>-GCC_TARGET_TOOL(ranlib, RANLIB_FOR_TARGET, RANLIB, [binutils/ranlib])
>+GCC_TARGET_TOOL(ranlib, RANLIB_FOR_TARGET, RANLIB,
>+  [gcc/gcc-ranlib -B$$r/$(HOST_SUBDIR)/gcc/])
>GCC_TARGET_TOOL(readelf, READELF_FOR_TARGET, READELF,
>[binutils/readelf])
> GCC_TARGET_TOOL(strip, STRIP_FOR_TARGET, STRIP, [binutils/strip-new])
>GCC_TARGET_TOOL(windres, WINDRES_FOR_TARGET, WINDRES,
>[binutils/windres])



Re: Enabling LTO for target libraries (e.g., libgo, libstdc++)

2019-01-25 Thread Richard Biener
On January 25, 2019 6:17:54 PM GMT+01:00, Joseph Myers 
 wrote:
>On Fri, 25 Jan 2019, Nikhil Benesch wrote:
>
>> I am attempting to convince GCC to build target libraries with
>link-time
>> optimizations enabled. I am primarily interested in libgo, but this
>discussion
>
>Note that as far as I know issues with host-dependencies of LTO
>bytecode 
>(bug 41526) may still exist, so this wouldn't work with Canadian cross 
>compilers.

I was expecting the LTO byte code not to be retained in the built libraries. 

>> I have a simple patch to the top-level configure.ac that resolves the
>issue by
>> teaching the build system to use the gcc-ar and gcc-ranlib wrappers
>which were
>> built earlier and know how to pass the linker plugin to the
>underlying ar/ranlib
>> commands. The patch is small enough that I've included it at the end
>of this
>> email.
>
>Will those wrappers properly wrap round tools for the target that were 
>configured using --with-build-time-tools?



Re: -fno-common

2019-01-29 Thread Richard Biener
On Mon, Jan 28, 2019 at 4:59 PM Bernhard Schommer
 wrote:
>
> Hi,
>
> I would like to know if the handling of the option -fno-common has
> changed between version 7.3 and 8.2 for x86. I tried it with the
> default system version of OpenSUSE and for example:
>
> const int i;
>
> is placed in the .bss section. With a newer self-compiled version 8.2
> the same variable is placed in the section .rodata. I could not find
> any information in the Changelog whether the behavior has changed and
> thus would like to know if there was any change.

I can confirm this change in behavior.  I vaguely remember changes in this
area but I'm not sure if the change was on purpose.

Richard.

> Best,
> -Bernhard


Re: libgomp platform customization

2019-01-31 Thread Richard Biener
On Wed, Jan 30, 2019 at 3:46 PM Sebastian Huber
 wrote:
>
> Hello,
>
> we would like to use libgomp in a quite constraint environment. In this
> environment using for example the C locale support, errno, malloc(),
> realloc(), free(), and abort() are problematic. One option would be to
> introduce a new header file "config/*/platform.h" which is included in
> libgomp.h right after the #include "config.h". A platform could then do
> something like this:
>
> #define malloc(size) platform_malloc(size)
> ...
>
> In env.c there are some uses of strto*() like this:
>
>errno = 0;
>stride = strtol (env, &env, 10);
>if (errno)
>  return false;
>
> I would like to introduce a new header file "strto.h" which defines
> something like this:
>
> static inline char *
> gomp_strtol (char *s, long *value)
> {
>char *end;
>
>errno = 0;
>*value = strtol (s, &end, 10);
>if (errno != 0)
>  return NULL;
>
>return end;
> }
>
> Then use:
>
>env = gomp_strtol (env, &stride);
>if (env == NULL)
>  return false;
>
> A platform could then provide its own "config/*/strto.h" with an
> alternative implementation.
>
> Would this be acceptable after the GCC 9 release?

I guess you could look at what nvptx and HSA (and GCN on some branch)
do here?

Richard.

> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax : +49 89 189 47 41-09
> E-Mail  : sebastian.hu...@embedded-brains.de
> PGP : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>


Re: Small typo on site

2019-01-31 Thread Richard Biener
On Wed, Jan 30, 2019 at 5:19 PM  wrote:
>
> hi,
> on site:
> https://www.gnu.org/software/gcc/bugs/#dontwant
> in section
> Reporting Bugs > Summarized bug reporting instructions > What we do not want
> in 3 point you have
>
> An attached archive (tar, zip, shar, whatever) containing all (or some :-)
> of the above.
>
> (or some :-)  should be replace to (or some :-) )

Fixed.

> ~wytrzeszcz
>
> Ps. thank for all you did already
>


Re: libgomp platform customization

2019-01-31 Thread Richard Biener
On Thu, Jan 31, 2019 at 10:37 AM Sebastian Huber
 wrote:
>
> On 31/01/2019 10:29, Richard Biener wrote:
> > On Wed, Jan 30, 2019 at 3:46 PM Sebastian Huber
> >   wrote:
> >> Hello,
> >>
> >> we would like to use libgomp in a quite constraint environment. In this
> >> environment using for example the C locale support, errno, malloc(),
> >> realloc(), free(), and abort() are problematic. One option would be to
> >> introduce a new header file "config/*/platform.h" which is included in
> >> libgomp.h right after the #include "config.h". A platform could then do
> >> something like this:
> >>
> >> #define malloc(size) platform_malloc(size)
> >> ...
> >>
> >> In env.c there are some uses of strto*() like this:
> >>
> >> errno = 0;
> >> stride = strtol (env, &env, 10);
> >> if (errno)
> >>   return false;
> >>
> >> I would like to introduce a new header file "strto.h" which defines
> >> something like this:
> >>
> >> static inline char *
> >> gomp_strtol (char *s, long *value)
> >> {
> >> char *end;
> >>
> >> errno = 0;
> >> *value = strtol (s, &end, 10);
> >> if (errno != 0)
> >>   return NULL;
> >>
> >> return end;
> >> }
> >>
> >> Then use:
> >>
> >> env = gomp_strtol (env, &stride);
> >> if (env == NULL)
> >>   return false;
> >>
> >> A platform could then provide its own "config/*/strto.h" with an
> >> alternative implementation.
> >>
> >> Would this be acceptable after the GCC 9 release?
> > I guess you could look at what nvptx and HSA (and GCN on some branch)
> > do here?
>
> My problem is that our real-time operating system (RTEMS) is somewhere
> in between a full blown Linux and the offload hardware. I would like to
> get rid of stuff which depends on the Newlib struct _reent since this
> pulls in a lot of dependencies. The heavy weight functions are just used
> for the initialization (env.c) and error reporting. Containing the heap
> allocation functions helps to control the memory used by OpenMP
> computations.

I suppose supplying a RTEMS specific noop-stub would work here
and that's close enough to what the offload stuff does.  But I'm not
very familiar with this so sorry if misleading you.

Richard.

> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax : +49 89 189 47 41-09
> E-Mail  : sebastian.hu...@embedded-brains.de
> PGP : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>


GCC 9 Status Report (2019-02-08)

2019-02-08 Thread Richard Biener


Status
==

We're one month into the stabilization phase (Stage 4), making some
good progress in the long march towards zero P1 regressions.  Please
have a look at those that you assigned yourself to.

As usual this is a good time to test your non-{primary,secondary}
target making sure it builds and works correctly.


Quality Data


Priority  #   Change from last report
---   ---
P1   24   -  18 
P2  185   -   2
P3   32   -  15
P4  169   -  13
P5   24   -   1
---   ---
Total P1-P3 241   -  35
Total   434   -  49


Previous Report
===

https://gcc.gnu.org/ml/gcc/2019-01/msg00027.html


Re: Parallelize the compilation using Threads

2019-02-12 Thread Richard Biener
On Mon, Feb 11, 2019 at 10:46 PM Giuliano Belinassi
 wrote:
>
> Hi,
>
> I was just wondering what API should I use to spawn threads and control
> its flow. Should I use OpenMP, pthreads, or something else?
>
> My point what if we break compatibility with something. If we use
> OpenMP, I'm afraid that we will break compatibility with compilers not
> supporting it. On the other hand, If we use pthread, we will break
> compatibility with non-POSIX systems (Windows).

I'm not sure we have a thread abstraction for the host - we do have
one for the target via libgcc gthr.h though.  For prototyping I'd resort
to this same interface and fixup the host != target case as needed.

Richard.

>
> Giuliano.


Re: GCC missing -flto optimizations? SPEC lbm benchmark

2019-02-15 Thread Richard Biener
On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel  
wrote:
>I never could understand, why field reordering was removed from GCC?

The implementation simply was seriously broken, bitrotten and unmaintained. 

Richard 

 I
>mean, I know that it's prohibited in C and C++, but, sure, GCC can
>detect whether it possibly can influence application behavior, and if
>not, just do the reorder.
>
>The veto is important to C/C++ as programming languages, but not to
>machine code that is being generated from them. As long as app can't
>detect that its fields were reordered through means defined by C/C++,
>field reordering by compiler is fine, isn't it?
>
>On Fri, 15 Feb 2019 at 12:49, Jun Ma  wrote:
>>
>> Bin.Cheng  于2019年2月15日周五 下午5:12写道:
>>
>> > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey 
>wrote:
>> > >
>> > > I have a question about SPEC CPU 2017 and what GCC can and cannot
>do
>> > > with -flto.  As part of some SPEC analysis I am doing I found
>that with
>> > > -Ofast, ICC and GCC were not that far apart (especially spec int
>rate,
>> > > spec fp rate was a slightly larger difference).
>> > >
>> > > But when I added -ipo to the ICC command and -flto to the GCC
>command,
>> > > the difference got larger.  In particular the 519.lbm_r was more
>than
>> > > twice as fast with ICC and -ipo, but -flto did not help GCC at
>all.
>> > >
>> > > There are other tests that also show this type of improvement
>with -ipo
>> > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
>> > > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone
>have
>> > > any idea on what ICC is doing that GCC is missing?  Is GCC just
>not
>> > > agressive enough with its inlining?
>> >
>> > IIRC Jun did some investigation before? CCing.
>> >
>> > Thanks,
>> > bin
>> > >
>> > > Steve Ellcey
>> > > sell...@marvell.com
>>
>> ICC is doing much more than GCC in ipo, especially memory layout
>> optimizations. See https://software.intel.com/en-us/node/522667.
>> ICC is more aggressive in array transposition/structure splitting
>> /field reordering. However, these optimizations have been removed
>> from GCC long time ago.
>> As for case lbm_r, IIRC a loop with memory access which stride is 20
>is
>> most time-consuming.  ICC will optimize the array(maybe structure?)
>> and vectorize the loop under ipo.
>>
>> Thanks
>> Jun



Re: Gcc profile questions

2019-02-19 Thread Richard Biener
On February 19, 2019 5:19:11 PM GMT+01:00, Qing Zhao  
wrote:
>Hi,
>
>Suppose we have a program called foo which is built with gcc
>-fprofile-generate,  Now when foo is executed a bunch of .gcda files
>are created.  
>
>What happens when foo is executed more than once.  Are the .gcda files
>updated with each execution? 

Yes. 

Or are the .gcda files overwritten with
>new .gcda files (and thus contain profile info only for the last
>execution)?
>
>Thanks for the info.
>
>Qing



Re: Should invalid __RTL testcase "startwith" passes emit a warning?

2019-02-20 Thread Richard Biener
On Tue, Feb 19, 2019 at 3:29 PM Matthew Malcomson
 wrote:
>
> Hi there,
>
> I'd like to make handling of the __RTL function testcases where the
> startwith pass name is either invalid, not used for that optimisation
> level, or non-existant more understandable.
>
> Currently a problem with the pass name leaves around state that causes
> the compiler to ICE on other functions.
> If the pass name is invalid or one not used for the current optimisation
> level then "dfinit" is run, but "dfinish" is not, which breaks an
> assertion in the `rest_of_handle_df_finish` function.
> For any of the problems the "*clean_state" pass is not run, which causes
> an ICE on the first C function in the TU.
>
> The ICE's I've seen can be avoided by always running the "*clean_state"
> pass (including if the startwith pass of the function is not specified)
> and by always running the "dfinish" pass if the "dfinit" pass is run and
> I am working on a patch to do this.
>
> Since the function will not emit any code for any of these problems, I
> was wondering whether to emit a -Wunused-function warning pointing to
> the bad name (or to the area where a name should be), since it's
> unlikely to be intended.
> The current behaviour (apart from causing an ICE on other functions) is
> to silently do nothing.

You are supposed to write correct __RTL (or __GIMPLE).

Possibly clearing bad state to not leak to other functions might be a
good idea though.

Richard.

>
> Example for "ICE after a bad name".
>
>
> int
> foo_a ()
> {
>return 200;
> }
>
>
> int __RTL (startwith ("badname")) foo2 ()
> {
> (function "foo2"
>(insn-chain
>  (block 2
>(edge-from entry (flags "FALLTHRU"))
>(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
>(cinsn 101 (set (reg:DI x19) (reg:DI x0)))
>(cinsn 10 (use (reg/i:SI x19)))
>(edge-to exit (flags "FALLTHRU"))
>  ) ;; block 2
>) ;; insn-chain
> ) ;; function "foo2"
> }
>
>
> Cheers,
> Matthew


[RFC] Change PCH "checksum"

2019-02-22 Thread Richard Biener


GCC builds are currently not reproducible because for one the checksum
we compute for PCH purposes (by genchecksum) nowaways includes checksums
of archives (since we switched from checksumming a dummy executable
to checksumming object files).  That includes dates (unless built with
-D which we don't do).

Then later we switched to do thin archives so for example libbackend.a
we checksum doesn't contain the actual code anymore...

A pragmatic approach to "fix" things would be to just checksum
gtype-desc.o which should have enough state to cover PCH dependences
if I understand the workings correctly (patch below - a single
checksum would suffice so more simplifications are possible).

Another solution working on ELF systems with build-id support is
simply forgo checksumming anything and rely on the executable
build-id instead (pat^whack below as well).

Does anybody think that just checksumming gtype-desc.o is a
degradation over the current state (which checksums thin archives)?

Thanks,
Richard.

2019-02-22  Richard Biener  

c/
* Make-lang.in (cc1-checksum.c): Checksum only gtype-desc.o.

cp/
* Make-lang.in (cc1plus-checksum.c): Checksum only gtype-desc.o.

objc/
* Make-lang.in (cc1obj-checksum.c): Checksum only gtype-desc.o.

objcp/
* Make-lang.in (cc1objplus-checksum.c): Checksum only gtype-desc.o.

Index: gcc/c/Make-lang.in
===
--- gcc/c/Make-lang.in  (revision 269111)
+++ gcc/c/Make-lang.in  (working copy)
@@ -70,14 +70,13 @@ endif
 # compute checksum over all object files and the options
 # re-use the checksum from the prev-final stage so it passes
 # the bootstrap comparison and allows comparing of the cc1 binary
-cc1-checksum.c : build/genchecksum$(build_exeext) checksum-options \
-   $(C_OBJS) $(BACKEND) $(LIBDEPS) 
+cc1-checksum.c : build/genchecksum$(build_exeext) gtype-desc.o 
if [ -f ../stage_final ] \
   && cmp -s ../stage_current ../stage_final; then \
  cp ../prev-gcc/cc1-checksum.c cc1-checksum.c; \
else \
- build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \
- checksum-options > cc1-checksum.c.tmp &&   \
+ build/genchecksum$(build_exeext) gtype-desc.o \
+ > cc1-checksum.c.tmp &&\
  $(srcdir)/../move-if-change cc1-checksum.c.tmp cc1-checksum.c; \
fi
 
Index: gcc/cp/Make-lang.in
===
--- gcc/cp/Make-lang.in (revision 269111)
+++ gcc/cp/Make-lang.in (working copy)
@@ -105,14 +105,13 @@ cp-warn = $(STRICT_WARN)
 # compute checksum over all object files and the options
 # re-use the checksum from the prev-final stage so it passes
 # the bootstrap comparison and allows comparing of the cc1 binary
-cc1plus-checksum.c : build/genchecksum$(build_exeext) checksum-options \
-   $(CXX_OBJS) $(BACKEND) $(LIBDEPS) 
+cc1plus-checksum.c : build/genchecksum$(build_exeext) gtype-desc.o 
if [ -f ../stage_final ] \
   && cmp -s ../stage_current ../stage_final; then \
   cp ../prev-gcc/cc1plus-checksum.c cc1plus-checksum.c; \
else \
- build/genchecksum$(build_exeext) $(CXX_OBJS) $(BACKEND) $(LIBDEPS) \
- checksum-options > cc1plus-checksum.c.tmp && \
+ build/genchecksum$(build_exeext) gtype-desc.o \
+ > cc1plus-checksum.c.tmp &&  \
  $(srcdir)/../move-if-change cc1plus-checksum.c.tmp 
cc1plus-checksum.c; \
fi
 
Index: gcc/objc/Make-lang.in
===
--- gcc/objc/Make-lang.in   (revision 269111)
+++ gcc/objc/Make-lang.in   (working copy)
@@ -56,10 +56,9 @@ OBJC_OBJS = objc/objc-lang.o objc/objc-a
 
 objc_OBJS = $(OBJC_OBJS) cc1obj-checksum.o
 
-cc1obj-checksum.c : build/genchecksum$(build_exeext) checksum-options \
-$(OBJC_OBJS) $(C_AND_OBJC_OBJS) $(BACKEND) $(LIBDEPS)
-   build/genchecksum$(build_exeext) $(OBJC_OBJS) $(C_AND_OBJC_OBJS) \
-$(BACKEND) $(LIBDEPS) checksum-options > cc1obj-checksum.c.tmp && \
+cc1obj-checksum.c : build/genchecksum$(build_exeext) gtype-desc.o
+   build/genchecksum$(build_exeext) gtype-desc.o
+   > cc1obj-checksum.c.tmp && \
$(srcdir)/../move-if-change cc1obj-checksum.c.tmp cc1obj-checksum.c
 
 cc1obj$(exeext): $(OBJC_OBJS) $(C_AND_OBJC_OBJS) cc1obj-checksum.o $(BACKEND) 
$(LIBDEPS)
Index: gcc/objcp/Make-lang.in
===
--- gcc/objcp/Make-lang.in  (revision 269111)
+++ gcc/objcp/Make-lang.in  (working copy)
@@ -59,10 +59,9 @@ OBJCXX_OBJS = objcp/objcp-act.o objcp/ob
 
 obj-c++_OBJS = $(OBJCXX_OBJS) cc1objplus-checksum.o
 
-cc1objplus-checksum.c : build/ge

Re: [RFC] Change PCH "checksum"

2019-02-22 Thread Richard Biener
On February 22, 2019 5:03:46 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Fri, Feb 22, 2019 at 08:47:09AM -0700, Jeff Law wrote:
>> > 2019-02-22  Richard Biener  
>> > 
>> >c/
>> >* Make-lang.in (cc1-checksum.c): Checksum only gtype-desc.o.
>> > 
>> >cp/
>> >* Make-lang.in (cc1plus-checksum.c): Checksum only gtype-desc.o.
>> > 
>> >objc/
>> >* Make-lang.in (cc1obj-checksum.c): Checksum only gtype-desc.o.
>> > 
>> >objcp/
>> >* Make-lang.in (cc1objplus-checksum.c): Checksum only
>gtype-desc.o.
>> ISTM that gtype-desc effectively describes the structure of all the
>GC data.
>> 
>> Given we're summing the thin-archives, we're already missing things
>like
>> a change in static data.  So I don't think your patch is a
>degradation
>> over the current state.  I'm not 100% sure the current state is
>correct
>> though :-)
>
>Does it cover everything though?  I believe gtype-desc.c only covers a
>small
>portion, the rest is in all the gtype-*.h and gt-*.h headers that are
>included in the various object files.
>So, either we need to checksum all the object files that include gt-*.h
>or
>gtype-*.h headers in addition to gtype-desc.o, or perhaps checksum
>gtype.state ?  Though, that state wouldn't cover changes in ABI etc.

Gtype-desc.o does not cover everything indeed. But the current state doesn't 
cover Gtype-desc.o... Slightly better would be to Re-include frontend objects. 

Not sure why we checksummed build flags for example. Isn't it enough to handle 
gty walking changes?

Anyway, for suse I'm probably using the build-id thing. 

Richard. 

>   Jakub



Re: [RFC] Change PCH "checksum"

2019-02-26 Thread Richard Biener
On Mon, 25 Feb 2019, Mark Wielaard wrote:

> On Fri, 2019-02-22 at 12:29 +0100, Richard Biener wrote:
> > +struct build_id_note {
> > +/* The NHdr.  */
> > +uint32_t namesz;
> > +uint32_t descsz;
> > +uint32_t type;
> > +
> > +char name[4]; /* Note name for build-id is "GNU\0" */
> > +unsigned char build_id[16];
> > +};
> 
> Note that build-ids can be of different sizes depending on the style
> used to generate them, you get the correct size by looking at the
> descsz.

Yeah, as said it's currently a hack...

> > +static int
> > +get_build_id_1 (struct dl_phdr_info *info, size_t, void *data)
> > +{
> > +  for (unsigned i = 0; i < info->dlpi_phnum; ++i)
> > +{
> > +  if (info->dlpi_phdr[i].p_type != PT_NOTE)
> > +   continue;
> > +  build_id_note *note
> > +   = (build_id_note *)(info->dlpi_addr + info->dlpi_phdr[i].p_vaddr);
> > +  ptrdiff_t size = info->dlpi_phdr[i].p_filesz;
> > +  while (size >= (ptrdiff_t)sizeof (build_id_note))
> > +   {
> > + if (note->type == NT_GNU_BUILD_ID
> > + && note->namesz == 4
> > + && note->descsz >= 16)
> > +   {
> > + memcpy (data, note->build_id, 16);
> > + return 1;
> > +   }
> > + size_t offset = (sizeof (uint32_t) * 3
> > +  + ALIGN(note->namesz, 4)
> > +  + ALIGN(note->descsz, 4));
> > + note = (build_id_note *)((char *)note + offset);
> > + size -= offset;
> 
> Since the introduction of GNU Property notes this is (sadly) no longer
> the correct way to iterate through ELF notes. The padding of names and
> desc  might now depend on the alignment of the PT_NOTE segment.
> https://sourceware.org/ml/binutils/2018-09/msg00359.html

Ick, that's of course worse ;)  So it's not entirely clear what
the correct thing to do is - from how I read the mail at the above
link only iff sh_align of the note section is exactly 8 the above
ALIGN would use 8 byte alignment and else 4 is correct (independent
on sh_align).  Or can I assume sh_align of the note section is
"correct" for all existing binaries?  Note also the eventual difference
between note sections and note program headers which have another,
possibly different(?) alignment?  It's of course "easy" to replace
4 above by info->dlpi_phdr[i].p_align (but the align field differs
in width between elfclass 32 and 64 ... :/).

So - is merely changing the re-alignment from 4 to 
info->dlpi_phdr[i].p_align "correct"?

Richard.

> Cheers,
> 
> Mark
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC] Change PCH "checksum"

2019-02-26 Thread Richard Biener
On Tue, 26 Feb 2019, Mark Wielaard wrote:

> On Tue, 2019-02-26 at 09:33 +0100, Richard Biener wrote:
> > On Mon, 25 Feb 2019, Mark Wielaard wrote:
> > > Since the introduction of GNU Property notes this is (sadly) no
> > > longer
> > > the correct way to iterate through ELF notes. The padding of names
> > > and
> > > desc  might now depend on the alignment of the PT_NOTE segment.
> > > https://sourceware.org/ml/binutils/2018-09/msg00359.html
> > 
> > Ick, that's of course worse ;)  So it's not entirely clear what
> > the correct thing to do is - from how I read the mail at the above
> > link only iff sh_align of the note section is exactly 8 the above
> > ALIGN would use 8 byte alignment and else 4 is correct (independent
> > on sh_align).  Or can I assume sh_align of the note section is
> > "correct" for all existing binaries?  Note also the eventual
> > difference
> > between note sections and note program headers which have another,
> > possibly different(?) alignment?  It's of course "easy" to replace
> > 4 above by info->dlpi_phdr[i].p_align (but the align field differs
> > in width between elfclass 32 and 64 ... :/).
> > 
> > So - is merely changing the re-alignment from 4 to 
> > info->dlpi_phdr[i].p_align "correct"?
> 
> Yes, you will have multiple note segments one that combines the 4
> padded notes and one that combines the 8 padded notes.
> Some tools put 0 or 1 in the align field, so you might want to use
> (completely untested):
> align = (p_align <= 4) ? 4 : 8;
> offset += ALIGN ((ALIGN (sizeof (uint32_t) * 3 + namesz, align)
>   + descsz), align);

That would mean when p_align == 8 the note name isn't 8-aligned
but just 4-aligned?  That is, sizeof (Elf*_Nhdr) == 12, and the
name starts right after that instead of being aligned according
to p_align?  That sounds odd...  So p_align only applies to
the descriptor?

Richard.

> Cheers,
> 
> Mark
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC] Change PCH "checksum"

2019-02-26 Thread Richard Biener
On Tue, 26 Feb 2019, Richard Biener wrote:

> On Tue, 26 Feb 2019, Mark Wielaard wrote:
> 
> > On Tue, 2019-02-26 at 09:33 +0100, Richard Biener wrote:
> > > On Mon, 25 Feb 2019, Mark Wielaard wrote:
> > > > Since the introduction of GNU Property notes this is (sadly) no
> > > > longer
> > > > the correct way to iterate through ELF notes. The padding of names
> > > > and
> > > > desc  might now depend on the alignment of the PT_NOTE segment.
> > > > https://sourceware.org/ml/binutils/2018-09/msg00359.html
> > > 
> > > Ick, that's of course worse ;)  So it's not entirely clear what
> > > the correct thing to do is - from how I read the mail at the above
> > > link only iff sh_align of the note section is exactly 8 the above
> > > ALIGN would use 8 byte alignment and else 4 is correct (independent
> > > on sh_align).  Or can I assume sh_align of the note section is
> > > "correct" for all existing binaries?  Note also the eventual
> > > difference
> > > between note sections and note program headers which have another,
> > > possibly different(?) alignment?  It's of course "easy" to replace
> > > 4 above by info->dlpi_phdr[i].p_align (but the align field differs
> > > in width between elfclass 32 and 64 ... :/).
> > > 
> > > So - is merely changing the re-alignment from 4 to 
> > > info->dlpi_phdr[i].p_align "correct"?
> > 
> > Yes, you will have multiple note segments one that combines the 4
> > padded notes and one that combines the 8 padded notes.
> > Some tools put 0 or 1 in the align field, so you might want to use
> > (completely untested):
> > align = (p_align <= 4) ? 4 : 8;
> > offset += ALIGN ((ALIGN (sizeof (uint32_t) * 3 + namesz, align)
> >   + descsz), align);
> 
> That would mean when p_align == 8 the note name isn't 8-aligned
> but just 4-aligned?  That is, sizeof (Elf*_Nhdr) == 12, and the
> name starts right after that instead of being aligned according
> to p_align?  That sounds odd...  So p_align only applies to
> the descriptor?

So rather like the following (simplified for _GNU_SOURCE since
link.h includes elf.h there).  I've not yet come along binaries
with different p_align so I can't really test it.

#if _GNU_SOURCE
#include 

#define ALIGN(val, align)  (((val) + (align) - 1) & ~((align) - 1))

static int
get_build_id_1 (struct dl_phdr_info *info, size_t, void *data)
{ 
  for (unsigned i = 0; i < info->dlpi_phnum; ++i)
{ 
  if (info->dlpi_phdr[i].p_type != PT_NOTE)
continue;
  ElfW(Nhdr) *nhdr
= (ElfW(Nhdr) *)(info->dlpi_addr + info->dlpi_phdr[i].p_vaddr);
  ptrdiff_t size = info->dlpi_phdr[i].p_filesz;
  ptrdiff_t align = info->dlpi_phdr[i].p_align;
  if (align < 4)
align = 4;
  while (size >= (ptrdiff_t)sizeof (ElfW(Nhdr)))
{ 
  if (nhdr->n_type == NT_GNU_BUILD_ID
  && nhdr->n_namesz == 4
  && strncmp ((char *)nhdr
  + ALIGN (sizeof (ElfW(Nhdr)), align),
  "GNU", 4) == 0
  && nhdr->n_descsz >= 16)
{ 
  memcpy (data,
  (char *)nhdr
  + ALIGN (sizeof (ElfW(Nhdr)), align)
  + ALIGN (nhdr->n_namesz, align), 16);
  return 1;
}
  size_t offset = (ALIGN (sizeof (ElfW(Nhdr)), align)
   + ALIGN(nhdr->n_namesz, align)
   + ALIGN(nhdr->n_descsz, align));
  nhdr = (ElfW(Nhdr) *)((char *)nhdr + offset);
  size -= offset;
}
}

  return 0;
}



Re: [RFC] Change PCH "checksum"

2019-02-26 Thread Richard Biener
On Tue, 26 Feb 2019, Michael Matz wrote:

> Hi,
> 
> On Tue, 26 Feb 2019, Richard Biener wrote:
> 
> > get_build_id_1 (struct dl_phdr_info *info, size_t, void *data)
> > { 
> 
> Isn't this all a bit silly?  We could simply encode the svn revision, or 
> maybe even just some random bytes generated once in stage1 at build time 
> as "checksum" and be done with.  In the latter case PCHs will then not 
> work across different compiler builds, but so what?

Yes, a random number would work for PCH purposes but of course not
for reproducible builds.  Somehow even compile-options are relevant
though so I'm not really sure how volatile the PCH format is.
That is, whether for example checksumming sources would work.

But yeah, I considered a --with-pch-checksum=XYZ to make this
configurable (where we could for example checksum the rpm
changes - iff PCHs of two different builds - say, one with checking
enabled and one with checking disabled - really interoperate).

Still, using the build-id looks so "obvious" ...

Richard.


Re: [RFC] Change PCH "checksum"

2019-02-26 Thread Richard Biener
On Tue, 26 Feb 2019, Mark Wielaard wrote:

> On Tue, 2019-02-26 at 15:36 +0100, Richard Biener wrote:
> > On Tue, 26 Feb 2019, Mark Wielaard wrote:
> > 
> > > On Tue, 2019-02-26 at 09:33 +0100, Richard Biener wrote:
> > > > On Mon, 25 Feb 2019, Mark Wielaard wrote:
> > > > > Since the introduction of GNU Property notes this is (sadly) no
> > > > > longer
> > > > > the correct way to iterate through ELF notes. The padding of
> > > > > names
> > > > > and
> > > > > desc  might now depend on the alignment of the PT_NOTE segment.
> > > > > https://sourceware.org/ml/binutils/2018-09/msg00359.html
> > > > 
> > > > Ick, that's of course worse ;)  So it's not entirely clear what
> > > > the correct thing to do is - from how I read the mail at the
> > > > above
> > > > link only iff sh_align of the note section is exactly 8 the above
> > > > ALIGN would use 8 byte alignment and else 4 is correct
> > > > (independent
> > > > on sh_align).  Or can I assume sh_align of the note section is
> > > > "correct" for all existing binaries?  Note also the eventual
> > > > difference
> > > > between note sections and note program headers which have
> > > > another,
> > > > possibly different(?) alignment?  It's of course "easy" to
> > > > replace
> > > > 4 above by info->dlpi_phdr[i].p_align (but the align field
> > > > differs
> > > > in width between elfclass 32 and 64 ... :/).
> > > > 
> > > > So - is merely changing the re-alignment from 4 to 
> > > > info->dlpi_phdr[i].p_align "correct"?
> > > 
> > > Yes, you will have multiple note segments one that combines the 4
> > > padded notes and one that combines the 8 padded notes.
> > > Some tools put 0 or 1 in the align field, so you might want to use
> > > (completely untested):
> > > align = (p_align <= 4) ? 4 : 8;
> > > offset += ALIGN ((ALIGN (sizeof (uint32_t) * 3 + namesz, align)
> > >   + descsz), align);
> > 
> > That would mean when p_align == 8 the note name isn't 8-aligned
> > but just 4-aligned?  That is, sizeof (Elf*_Nhdr) == 12, and the
> > name starts right after that instead of being aligned according
> > to p_align?  That sounds odd...  So p_align only applies to
> > the descriptor?
> 
> Yes, it is that odd. There are 3 kinds of ELF notes.
> 
> The traditional ones as used by GNU and Solaris, which use 4 byte words
> for everything whether in ELFCLASS32 or ELFCLASS64 and which are 4 byte
> aligned themselves.
> 
> The gabi ones, which are similar for ELFCLASS32 but for ELFCLASS64 all
> words are 8 bytes and 8 bytes aligned themselves (as used by HPUX).
> 
> And the new style GNU Property notes, only used in ELFCLASS64, which
> use 4 byte words for the first 3 fields, immediately followed by the
> name bytes, padded so that desc is 8 bytes aligned and the note as a
> whole is 8 byte aligned.

I wonder how to distinguish the latter two - does one really need
to test the size of ElfW(Nhdr).n_namesz for example?  Why was the
GNU Property one chosen this way?!  Is the first case (traditional
GNU note) with p_align == 8 invalid?  That is, is testing p_align
really the correct way to determine how the individual parts are
aligned?  I guess not.

So - how do I identify a GNU Property note vs. a traditional
note vs. a gabi one?

Why was the third one added?! (I guess I asked that already...)

Richard.

> Cheers,
> 
> Mark
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: A bug in vrp_meet?

2019-03-01 Thread Richard Biener
On March 1, 2019 6:49:20 PM GMT+01:00, Qing Zhao  wrote:
>Jeff,
>
>thanks a lot for the reply.
>
>this is really helpful.
>
>I double checked the dumped intermediate file for pass “dom3", and
>located the following for _152:
>
>BEFORE the pass “dom3”, there is no _152, the corresponding Block
>looks like:
>
>   [local count: 12992277]:
>  _98 = (int) ufcMSR_52(D);
>  k_105 = (sword) ufcMSR_52(D);
>  i_49 = _98 > 0 ? k_105 : 0;
>
>***During the pass “doms”,  _152 is generated as following:
>
>Optimizing block #4
>….
>Visiting statement:
>i_49 = _98 > 0 ? k_105 : 0;
>Meeting
>  [0, 65535]
>and
>  [0, 0]
>to
>  [0, 65535]
>Intersecting
>  [0, 65535]
>and
>  [0, 65535]
>to
>  [0, 65535]
>Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
>  Replaced 'k_105' with variable '_98'
>gimple_simplified to _152 = MAX_EXPR <_98, 0>;
>i_49 = _152;
>  Folded to: i_49 = _152;
>LKUP STMT i_49 = _152
> ASGN i_49 = _152
>
>then bb 4 becomes:
>
>   [local count: 12992277]:
>  _98 = (int) ufcMSR_52(D);
>  k_105 = _98;
>  _152 = MAX_EXPR <_98, 0>;
>  i_49 = _152;
>
>and all the i_49 are replaced with _152. 
>
>However, the value range info for _152 doesnot reflect the one for
>i_49, it keeps as UNDEFINED. 
>
>is this the root problem?  

It looks like DOM fails to visit stmts generated by simplification. Can you 
open a bug report with a testcase?

Richard. 

>thanks a lot.
>
>Qing
>
>> On Feb 28, 2019, at 1:54 PM, Jeff Law  wrote:
>> 
>> On 2/28/19 10:05 AM, Qing Zhao wrote:
>>> Hi,
>>> 
>>> I have been debugging a runtime error caused by value range
>propagation. and finally located to the following gcc routine:
>>> 
>>> vrp_meet_1 in gcc/tree-vrp.c
>>> 
>>> 
>>> /* Meet operation for value ranges.  Given two value ranges VR0 and
>>>   VR1, store in VR0 a range that contains both VR0 and VR1.  This
>>>   may not be the smallest possible such range.  */
>>> 
>>> static void 
>>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>>> {
>>>  value_range saved;
>>> 
>>>  if (vr0->type == VR_UNDEFINED)
>>>{
>>>  set_value_range (vr0, vr1->type, vr1->min, vr1->max,
>vr1->equiv);
>>>  return;
>>>}
>>> 
>>>  if (vr1->type == VR_UNDEFINED)
>>>{
>>>  /* VR0 already has the resulting range.  */
>>>  return;
>>>}
>>> 
>>> 
>>> In the above, when one of vr0 or vr1 is VR_UNDEFINED,  the meet
>result of these two will be  the other VALUE. 
>>> 
>>> This seems not correct to me. 
>> That's the optimistic nature of VRP.  It's generally desired
>behavior.
>> 
>>> 
>>> For example, the following is the located incorrect value range
>propagation:  (portion from the dump file *.181t.dom3)
>>> 
>>> 
>>> Visiting PHI node: i_83 = PHI <_152(20), 0(22)>
>>>Argument #0 (20 -> 10 executable)
>>>_152: UNDEFINED
>>>Argument #1 (22 -> 10 executable)
>>>0: [0, 0]
>>> Meeting
>>>  UNDEFINED
>>> and
>>>  [0, 0]
>>> to
>>>  [0, 0]
>>> Intersecting
>>>  [0, 0]
>>> and
>>>  [0, 65535]
>>> to
>>>  [0, 0]
>>> 
>>> 
>>> 
>>> In the above, “i_83” is defined as PHI <_152(20), 0(22)>,   the 1st
>argument is UNDEFINED at this time(but its value range definitely is
>NOT [0,0]),
>>> and the 2nd argument is 0.
>> If it's value is undefined then it can be any value we want.  We
>choose
>> to make it equal to the other argument.
>> 
>> If VRP later finds that _152 changes, then it will go back and
>> reevaluate the PHI.  That's one of the basic design principles of the
>> optimistic propagators.
>> 
>>> 
>>> “vrp_meet” generate a VR_RANGE with [0,0] for “i_83” based on the
>current algorithm.  Obviously, this result VR_RANGE with [0,0] does NOT
>
>>> contain the value ranges for _152. 
>>> 
>>> the result of “vrp_meet” is Not correct.  and this incorrect value
>range result finally caused the runtime error. 
>>> 
>>> I ‘d like to modify the vrp_meet_1 as following:
>>> 
>>> 
>>> static void 
>>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>>> {
>>>  value_range saved;
>>> 
>>>  if (vr0->type == VR_UNDEFINED)
>>>{
>>>  /* VR0 already has the resulting range. */
>>>  return;
>>>}
>>> 
>>>  if (vr1->type == VR_UNDEFINED)
>>>{
>>>  set_value_range_to_undefined (vr0)
>>> return;
>>>}
>>> 
>>> 
>>> let me know your opinion.
>>> 
>>> thanks a lot for the help.
>> I think we (Richi and I) went through this about a year ago and the
>> conclusion was we should be looking at how you're getting into the
>> vrp_meet with the VR_UNDEFINED.
>> 
>> If it happens because the user's code has an undefined use, then, the
>> consensus has been to go ahead and optimize it.
>> 
>> If it happens for any other reason, then it's likely a bug in GCC. 
>We
>> had a couple of these when we made EVRP a re-usable module and
>started
>> exploiting its data in the jump threader.
>> 
>> So you need to work backwards from this point to figure out how you
>got
>> here.
>> 
>> jeff



Re: A bug in vrp_meet?

2019-03-04 Thread Richard Biener
On Fri, Mar 1, 2019 at 10:02 PM Qing Zhao  wrote:
>
>
> On Mar 1, 2019, at 1:25 PM, Richard Biener  wrote:
>
> On March 1, 2019 6:49:20 PM GMT+01:00, Qing Zhao  wrote:
>
> Jeff,
>
> thanks a lot for the reply.
>
> this is really helpful.
>
> I double checked the dumped intermediate file for pass “dom3", and
> located the following for _152:
>
> BEFORE the pass “dom3”, there is no _152, the corresponding Block
> looks like:
>
>  [local count: 12992277]:
> _98 = (int) ufcMSR_52(D);
> k_105 = (sword) ufcMSR_52(D);
> i_49 = _98 > 0 ? k_105 : 0;
>
> ***During the pass “doms”,  _152 is generated as following:
>
> Optimizing block #4
> ….
> Visiting statement:
> i_49 = _98 > 0 ? k_105 : 0;
> Meeting
> [0, 65535]
> and
> [0, 0]
> to
> [0, 65535]
> Intersecting
> [0, 65535]
> and
> [0, 65535]
> to
> [0, 65535]
> Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
> Replaced 'k_105' with variable '_98'
> gimple_simplified to _152 = MAX_EXPR <_98, 0>;
> i_49 = _152;
> Folded to: i_49 = _152;
> LKUP STMT i_49 = _152
>  ASGN i_49 = _152
>
> then bb 4 becomes:
>
>  [local count: 12992277]:
> _98 = (int) ufcMSR_52(D);
> k_105 = _98;
> _152 = MAX_EXPR <_98, 0>;
> i_49 = _152;
>
> and all the i_49 are replaced with _152.
>
> However, the value range info for _152 doesnot reflect the one for
> i_49, it keeps as UNDEFINED.
>
> is this the root problem?
>
>
> It looks like DOM fails to visit stmts generated by simplification. Can you 
> open a bug report with a testcase?
>
>
> The problem is, It took me quite some time in order to come up with a small 
> and independent testcase for this problem,
> a little bit change made the error disappear.
>
> do you have any suggestion on this?  or can you give me some hint on how to 
> fix this in DOM?  then I can try the fix on my side?

I remember running into similar issues in the past where I tried to
extract temporary nonnull ranges from divisions.
I have there

@@ -1436,11 +1436,16 @@ dom_opt_dom_walker::before_dom_children
   m_avail_exprs_stack->pop_to_marker ();

   edge taken_edge = NULL;
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-{
-  evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
-  taken_edge = this->optimize_stmt (bb, gsi);
-}
+  gsi = gsi_start_bb (bb);
+  if (!gsi_end_p (gsi))
+while (1)
+  {
+   evrp_range_analyzer.record_def_ranges_from_stmt (gsi_stmt (gsi), false);
+   taken_edge = this->optimize_stmt (bb, &gsi);
+   if (gsi_end_p (gsi))
+ break;
+   evrp_range_analyzer.record_use_ranges_from_stmt (gsi_stmt (gsi));
+  }

   /* Now prepare to process dominated blocks.  */
   record_edge_info (bb);

OTOH the issue in your case is that fold emits new stmts before gsi but the
above loop will never look at them.  See tree-ssa-forwprop.c for code how
to deal with this (setting a pass-local flag on stmts visited and walking back
to unvisited, newly inserted ones).  The fold_stmt interface could in theory
also be extended to insert new stmts on a sequence passed to it so the
caller would be responsible for inserting them into the IL and could then
more easily revisit them (but that's a bigger task).

So, does the following help?

Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 269361)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -1482,8 +1482,25 @@ dom_opt_dom_walker::before_dom_children
   edge taken_edge = NULL;
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 {
+  gimple_stmt_iterator pgsi = gsi;
+  gsi_prev (&pgsi);
   evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
   taken_edge = this->optimize_stmt (bb, gsi);
+  gimple_stmt_iterator npgsi = gsi;
+  gsi_prev (&npgsi);
+  /* Walk new stmts eventually inserted by DOM.  gsi_stmt (gsi) itself
+while it may be changed should not have gotten a new definition.  */
+  if (gsi_stmt (pgsi) != gsi_stmt (npgsi))
+   do
+ {
+   if (gsi_end_p (pgsi))
+ pgsi = gsi_start_bb (bb);
+   else
+ gsi_next (&pgsi);
+   evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (pgsi),
+false);
+ }
+   while (gsi_stmt (pgsi) != gsi_stmt (gsi));
 }

   /* Now prepare to process dominated blocks.  */


Richard.

> Thanks a lot.
>
> Qing
>
>
>
> Richard.
>
>


Re: About BZ#87210 [RFE] To initialize automatic stack variables

2019-03-04 Thread Richard Biener
On Mon, Mar 4, 2019 at 11:44 AM P J P  wrote:
>
> On Tuesday, 19 February, 2019, 3:55:35 PM IST, P J P  
> wrote:
> >
> >Hello,
> >
> >  -> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87210
> >
> >This RFE is about providing gcc option(s) to eliminate information leakage
> >issues from programs. Information leakage via uninitialised memory has been
> >a chronic/recurring issue across all software. They are found quite often
> >and may lead to severe effects if found in system software/kernel, OR an
> >application which handles sensitive information.
> >
> >Various projects/efforts are underway to keep such information exposure
> >from happening
> >
> >* STACKLEAK - http://lkml.iu.edu/hypermail/linux/kernel/1810.3/00522.html
> >* KLEAK - https://netbsd.org/gallery/presentations/maxv/kleak.pdf
> >* https://j00ru.vexillium.org/papers/2018/bochspwn_reloaded.pdf
> >
> >But these are still external corrections to improve specific project and/or
> >software. It does not help to fix/eliminate all information leakage issues.
> >Automatic memory initialisation:
> >
> >* https://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html
> >* https://reviews.llvm.org/D54604
> >
> >It'd be immensely helpful and welcome if gcc(1) could provide compile/build
> >time options to enable/disable - automatic memory initialisation.
> >
> >Could we please consider it as more viable/useful option?
>
> Ping...!

Patches welcome(?)

Richard.

> ---
>   -P J P
> http://feedmug.com


Re: GSoC Project Ideas

2019-03-04 Thread Richard Biener
On Mon, Mar 4, 2019 at 12:16 AM Jeff Law  wrote:
>
> On 3/3/19 4:06 PM, Patrick Palka wrote:
> > Hi everyone,
> >
> > I am very interested in working on GCC as part of GSoC this year.  A few 
> > years
> > ago I was a somewhat active code contributor[1] and unfortunately my
> > contributing waned once I went back to school, but I'm excited to 
> > potentially
> > have the opportunity to work on GCC again this summer.  My contributions 
> > were
> > mainly to the C++ frontend and to the middle end, and I've been thinking 
> > about
> > potential projects in these areas of the compiler.  Here are some project 
> > ideas
> > related to parts of the compiler that I've worked on in the past:
> >
> >   * Extend VRP to track unions of intervals
> > (inspired by comment #2 of PR72443 [2])
> >   Value ranges tracked by VRP currently are represented as an interval 
> > or
> >   its complement: [a,b] and ~[a,b].  A natural extension of this is
> >   to support unions of intervals, e.g. [a,b]U[c,d].  Such an extension
> >   would make VRP more powerful and at the same time would subsume
> >   anti-ranges, potentially making the code less complex overall.
> You should get in contact with Aldy and Andrew.  I believe their work
> already subsumes everything you've mentioned here.

I'm not so sure so work on this would definitely be appreciated.

> >
> >   * Make TREE_NO_WARNING more fine-grained
> > (inspired by comment #7 of PR74762 [3])
> >   TREE_NO_WARNING is currently used as a catch-all marker that inhibits 
> > all
> >   warnings related to the marked expression.  The problem with this is 
> > that
> >   if some warning routine sets the flag for its own purpose,
> >   then that later may inhibit another unrelated warning from firing, 
> > see for
> >   example PR74762.  Implementing a more fine-grained mechanism for
> >   inhibiting particular warnings would eliminate such issues.
> Might be interesting.  You'd probably need to discuss the details further.

I guess an implementation could use TREE_NO_WARNING (or gimple_no_warning_p)
as indicator that there's out-of-bad detail information which could be stored as
a map keyed off either a location or a tree or gimple *.

> >
> >   * Make -Wmaybe-uninitialized more robust
> >   (Inspired by the recent thread to move -Wmaybe-uninitialized to
> > -Wextra [4])
> >   Right now the pass generates too many false-positives, and hopefully 
> > that
> >   can be fixed somewhat.
> >   I think a distinction could be made between the following two 
> > scenarios in
> >   which a false-positive warning is emitted:
> > 1. the pass incorrectly proves that there exists an execution path 
> > that
> >results in VAR being used uninitialized due to a deficiency in 
> > the
> >implementation, or
> > 2. the pass gives up on exhaustively verifying that all execution 
> > paths
> >use VAR initialized (e.g. because there are too many paths to 
> > check).
> >The MAX_NUM_CHAINS, MAX_CHAIN_LEN, etc constants currently 
> > control
> >when this happens.
> >   I'd guess that a significant fraction of false-positives occur due to 
> > the
> >   second case, so maybe it would be worthwhile to allow the user to 
> > suppress
> >   warnings of this second type by specifying a warning level argument, 
> > e.g.
> >   -Wmaybe-uninitialized=1|2.
> >   Still, false-positives are generated in the first case too, see e.g.
> >   PR61112.  These can be fixed by improving the pass to understand such
> >   control flow.
> I'd suggest you look at my proposal from 2005 if you want to improve
> some of this stuff.
>
> You might also look at the proposal to distinguish between simple
> scalars that are SSA_NAMEs and the addressable/aggregate cases.
>
> In general I'm not a fan of extending the predicate analysis as-is in
> tree-ssa-uninit.c.  I'd first like to see it broken into an independent
> analysis module.  The analysis it does has applications for other
> warnings and optimizations.  Uninit warnings would just be a client of
> hte generic analysis pass.
>
> I'd love a way to annotate paths (or subpaths, or ssa-names) for cases
> where the threaders identify a jump threading path, but don't actually
> optimize it (often because it's a cold path or to avoid code bloat
> problems).   THese unexecutable paths that we leave in the CFG are often
> a source of false positives when folks use -O1, -Os and profile directed
> optimizations.  Bodik has some thoughts in this space, but I haven't
> really looked to see how feasible they are in the real world.
>
> >
> >   * Bug fixing in the C++ frontend / general C++ frontend improvements
> >   There are 100s of open PRs about the C++ frontend, and the goal here
> >   would just be to resolve as many as one can over the summer.
> Bugfixing is always good :-)
>
> jeff


Re: GSoC Project Ideas

2019-03-04 Thread Richard Biener
On Mon, Mar 4, 2019 at 1:23 PM Jakub Jelinek  wrote:
>
> On Mon, Mar 04, 2019 at 01:13:29PM +0100, Richard Biener wrote:
> > > >   * Make TREE_NO_WARNING more fine-grained
> > > > (inspired by comment #7 of PR74762 [3])
> > > >   TREE_NO_WARNING is currently used as a catch-all marker that 
> > > > inhibits all
> > > >   warnings related to the marked expression.  The problem with this 
> > > > is that
> > > >   if some warning routine sets the flag for its own purpose,
> > > >   then that later may inhibit another unrelated warning from 
> > > > firing, see for
> > > >   example PR74762.  Implementing a more fine-grained mechanism for
> > > >   inhibiting particular warnings would eliminate such issues.
> > > Might be interesting.  You'd probably need to discuss the details further.
> >
> > I guess an implementation could use TREE_NO_WARNING (or gimple_no_warning_p)
> > as indicator that there's out-of-bad detail information which could be 
> > stored as
> > a map keyed off either a location or a tree or gimple *.
>
> I guess on tree or gimple * is better, there would need to be some hook for
> copy_node/gimple_copy that would add the info for the new copy as well if
> the TREE_NO_WARNING or gimple_no_warning_p bit was set.  Plus there could be
> some purging of this on the side information, e.g.  once code is handed over
> from the FE to the middle-end (maybe do that only at free_lang_data time),
> for any warnings that are FE only there is no need to keep records in the on
> the side mapping that have info about those FE warnings only, as later on
> the FE warnings will not be reported anymore.
> The implementation could be e.g. a hash map from tree/gimple * (pointers) to
> bitmaps of warning numbers, with some hash table to ensure that the same
> bitmap is used for all the spots that need to have the same set of warnings
> disabled.

A possibly related project is to "defer" output of diagnostics until we know
the stmt/expression we emit it for survived dead code elimination.  Here there's
the question what to key the diagnostic off and how to move it (that is, detect
if the code causing it really fully went dead).

Richard.

> Jakub


Re: A bug in vrp_meet?

2019-03-05 Thread Richard Biener
On Mon, Mar 4, 2019 at 11:01 PM Qing Zhao  wrote:
>
> Hi, Richard,
>
> > On Mar 4, 2019, at 5:45 AM, Richard Biener  
> > wrote:
> >>
> >> It looks like DOM fails to visit stmts generated by simplification. Can 
> >> you open a bug report with a testcase?
> >>
> >>
> >> The problem is, It took me quite some time in order to come up with a 
> >> small and independent testcase for this problem,
> >> a little bit change made the error disappear.
> >>
> >> do you have any suggestion on this?  or can you give me some hint on how 
> >> to fix this in DOM?  then I can try the fix on my side?
> >
> > I remember running into similar issues in the past where I tried to
> > extract temporary nonnull ranges from divisions.
> > I have there
> >
> > @@ -1436,11 +1436,16 @@ dom_opt_dom_walker::before_dom_children
> >   m_avail_exprs_stack->pop_to_marker ();
> >
> >   edge taken_edge = NULL;
> > -  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > -{
> > -  evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), false);
> > -  taken_edge = this->optimize_stmt (bb, gsi);
> > -}
> > +  gsi = gsi_start_bb (bb);
> > +  if (!gsi_end_p (gsi))
> > +while (1)
> > +  {
> > +   evrp_range_analyzer.record_def_ranges_from_stmt (gsi_stmt (gsi), 
> > false);
> > +   taken_edge = this->optimize_stmt (bb, &gsi);
> > +   if (gsi_end_p (gsi))
> > + break;
> > +   evrp_range_analyzer.record_use_ranges_from_stmt (gsi_stmt (gsi));
> > +  }
> >
> >   /* Now prepare to process dominated blocks.  */
> >   record_edge_info (bb);
> >
> > OTOH the issue in your case is that fold emits new stmts before gsi but the
> > above loop will never look at them.  See tree-ssa-forwprop.c for code how
> > to deal with this (setting a pass-local flag on stmts visited and walking 
> > back
> > to unvisited, newly inserted ones).  The fold_stmt interface could in theory
> > also be extended to insert new stmts on a sequence passed to it so the
> > caller would be responsible for inserting them into the IL and could then
> > more easily revisit them (but that's a bigger task).
> >
> > So, does the following help?
>
> Yes, this change fixed the error in my side, now, in the dumped file for pass 
> dom3:
>
> 
> Visiting statement:
> i_49 = _98 > 0 ? k_105 : 0;
> Meeting
>   [0, 65535]
> and
>   [0, 0]
> to
>   [0, 65535]
> Intersecting
>   [0, 65535]
> and
>   [0, 65535]
> to
>   [0, 65535]
> Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
>   Replaced 'k_105' with variable '_98'
> gimple_simplified to _152 = MAX_EXPR <_98, 0>;
> i_49 = _152;

Ah, that looks interesting.  From this detail we might be
able to derive a testcase as well - a GIMPLE one
eventually because DOM runs quite late.  It's also interesting
to see the inefficient code here (the extra copy), probably
some known issue with match-and-simplify, I'd have to check.

>   Folded to: i_49 = _152;
> LKUP STMT i_49 = _152
>  ASGN i_49 = _152
>
> Visiting statement:
> _152 = MAX_EXPR <_98, 0>;
>
> Visiting statement:
> i_49 = _152;
> Intersecting
>   [0, 65535]  EQUIVALENCES: { _152 } (1 elements)
> and
>   [0, 65535]
> to
>   [0, 65535]  EQUIVALENCES: { _152 } (1 elements)
> 
>
> We can clearly see from the above, all the new stmts generated by fold are 
> visited now.

We can also see that DOMs optimize_stmt code is not executed on the first stmt
of the folding result (the MAX_EXPR), so the fix can be probably
amended/simplified
with that in mind.

> it is also confirmed that the runtime error caused by this bug was gone with 
> this fix.
>
> So, what’s the next step for this issue?
>
> will you commit this fix to gcc9 and gcc8  (we need it in gcc8)?

I'll see to carve out some cycles trying to find a testcase and amend
the fix a bit
and will take care of testing/submitting the fix.  Thanks for testing
that it works
for your case.

Richard.

> or I can test this fix on my side and commit it to both gcc9 and gcc8?
>
> thanks.
>
> Qing
>
> >
> > Index: gcc/tree-ssa-dom.c
> > ===
> > --- gcc/tree-ssa-dom.c  (revision 269361)
> > +++ gcc/tree-ssa-dom.c  (working copy)
> > @@ -1482,8 +1482,25 @@ dom_opt_dom_walker::before_dom_children
> >   edge taken_edge = NULL;
> >   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi)

Re: [RFC] gcc lto&binutils: Load different gcc's bfd-plugin automatically

2019-03-05 Thread Richard Biener
On Tue, Mar 5, 2019 at 10:41 AM JunMa  wrote:
>
> Hi All
>
> We are now optimizing some projects with lto enabled, however,
> there are some issues.
> First, lto_plugin.so needs to be passed to ar/nm/ranlib.
> For example, build static library with lto:
>
> gcc -flto -O2 a.c -c -o a.o
> gcc -flto -O2 b.c -c -o b.o
> ar rcs --plugin=/path/to/lto_plugin.so   libx.a  a.o b.o
>
> This is a little bit anoying. Also, it is not easy and convincible to use
> gcc-ar/gcc-nm/gcc-ranlib on those projects. Luckily, binutils offers a
> default plugin searching path(/lib/bfd-plugins), it can load plugin
> automatically from that directory.
>
> However, this brings up the second issue: binutils doesn't support multiple
> version plugins of gccs which have the same plugin name:"lto_plugin.so"
> int the same directory. Since these projects require different versions of
> gccs, multiple gccs co-exist in our build system, we cannot put lto_plugin
> in /lib/bfd-plugins. Although plugins may be compatible with each other,
> we do want to decouple it among different versions of GCC.
>
> I also have seen some discussions in
> https://bugzilla.redhat.com/show_bug.cgi?id=1467409
> where I don't seeany clear solutions.
>
> I thought about this and had some ideas. I want ask for some feedback here.
> The idea is to use versioned plugin searching paths
> (/lib/bfd-plugins/$cc/$target/$version)
> The $cc and $version information can be found in .comment section
> of elf  file, as for file which does not have comment section, just keep
> original searching path(/lib/bfd-plugins).

Well, why not go a step further and add a bfd-plugin note that suggests
the plugin to be used if it is installed?  That could contain for example
lto_plugin_gcc8.so (to be installed in /lib/bfd-plugins/).  Alternatively
a full path could be specified (though the files wouldn't then necessarily
work when moving between different compiler installs).

The install location could be modified to a location BFD searches just
for this note/comment but not for other auto-loading tries.

Or, since mostly archive related stuff is the issue, we could bundle
the plugin as a special archieve member... (ok, that creates a chicken
and egg issue at archieve creation time).

Richard.

> Here are few steps:
> 1) ar/nm/ranlib keep same behavior when '--plugin' is passed.
> 2) If --plugin is missed, find whether there are at least one .comment
> section in object file/archive file if not goto 5).
> 3) for elf object file, get $cc and $version information. for elf
> archive file,
> iterate comment section of all object files in archive, make sure $cc and
> $version are same. if not, goto 5).
> 4) Get $target from target_alias variable in configure.
> 5) Find plugin from /lib/bfd-plugins/$cc/$target/$version or
> /lib/bfd-plugins directory.
>
> Looking forward to your replies!
>
> Regards
> Jun


Re: A bug in vrp_meet?

2019-03-05 Thread Richard Biener
On Tue, Mar 5, 2019 at 10:48 AM Richard Biener
 wrote:
>
> On Mon, Mar 4, 2019 at 11:01 PM Qing Zhao  wrote:
> >
> > Hi, Richard,
> >
> > > On Mar 4, 2019, at 5:45 AM, Richard Biener  
> > > wrote:
> > >>
> > >> It looks like DOM fails to visit stmts generated by simplification. Can 
> > >> you open a bug report with a testcase?
> > >>
> > >>
> > >> The problem is, It took me quite some time in order to come up with a 
> > >> small and independent testcase for this problem,
> > >> a little bit change made the error disappear.
> > >>
> > >> do you have any suggestion on this?  or can you give me some hint on how 
> > >> to fix this in DOM?  then I can try the fix on my side?
> > >
> > > I remember running into similar issues in the past where I tried to
> > > extract temporary nonnull ranges from divisions.
> > > I have there
> > >
> > > @@ -1436,11 +1436,16 @@ dom_opt_dom_walker::before_dom_children
> > >   m_avail_exprs_stack->pop_to_marker ();
> > >
> > >   edge taken_edge = NULL;
> > > -  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > -{
> > > -  evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), 
> > > false);
> > > -  taken_edge = this->optimize_stmt (bb, gsi);
> > > -}
> > > +  gsi = gsi_start_bb (bb);
> > > +  if (!gsi_end_p (gsi))
> > > +while (1)
> > > +  {
> > > +   evrp_range_analyzer.record_def_ranges_from_stmt (gsi_stmt (gsi), 
> > > false);
> > > +   taken_edge = this->optimize_stmt (bb, &gsi);
> > > +   if (gsi_end_p (gsi))
> > > + break;
> > > +   evrp_range_analyzer.record_use_ranges_from_stmt (gsi_stmt (gsi));
> > > +  }
> > >
> > >   /* Now prepare to process dominated blocks.  */
> > >   record_edge_info (bb);
> > >
> > > OTOH the issue in your case is that fold emits new stmts before gsi but 
> > > the
> > > above loop will never look at them.  See tree-ssa-forwprop.c for code how
> > > to deal with this (setting a pass-local flag on stmts visited and walking 
> > > back
> > > to unvisited, newly inserted ones).  The fold_stmt interface could in 
> > > theory
> > > also be extended to insert new stmts on a sequence passed to it so the
> > > caller would be responsible for inserting them into the IL and could then
> > > more easily revisit them (but that's a bigger task).
> > >
> > > So, does the following help?
> >
> > Yes, this change fixed the error in my side, now, in the dumped file for 
> > pass dom3:
> >
> > 
> > Visiting statement:
> > i_49 = _98 > 0 ? k_105 : 0;
> > Meeting
> >   [0, 65535]
> > and
> >   [0, 0]
> > to
> >   [0, 65535]
> > Intersecting
> >   [0, 65535]
> > and
> >   [0, 65535]
> > to
> >   [0, 65535]
> > Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
> >   Replaced 'k_105' with variable '_98'
> > gimple_simplified to _152 = MAX_EXPR <_98, 0>;
> > i_49 = _152;
>
> Ah, that looks interesting.  From this detail we might be
> able to derive a testcase as well - a GIMPLE one
> eventually because DOM runs quite late.  It's also interesting
> to see the inefficient code here (the extra copy), probably
> some known issue with match-and-simplify, I'd have to check.
>
> >   Folded to: i_49 = _152;
> > LKUP STMT i_49 = _152
> >  ASGN i_49 = _152
> >
> > Visiting statement:
> > _152 = MAX_EXPR <_98, 0>;
> >
> > Visiting statement:
> > i_49 = _152;
> > Intersecting
> >   [0, 65535]  EQUIVALENCES: { _152 } (1 elements)
> > and
> >   [0, 65535]
> > to
> >   [0, 65535]  EQUIVALENCES: { _152 } (1 elements)
> > 
> >
> > We can clearly see from the above, all the new stmts generated by fold are 
> > visited now.
>
> We can also see that DOMs optimize_stmt code is not executed on the first stmt
> of the folding result (the MAX_EXPR), so the fix can be probably
> amended/simplified
> with that in mind.
>
> > it is also confirmed that the runtime error caused by this bug was gone 
> > with this fix.
> >
> > So, what’s the next step for this issue?
> >
> > will you commit this fix to gcc9 and gcc8  (we need it in gcc8)?
>
> I&#x

Re: A bug in vrp_meet?

2019-03-05 Thread Richard Biener
On Tue, Mar 5, 2019 at 11:44 AM Richard Biener
 wrote:
>
> On Tue, Mar 5, 2019 at 10:48 AM Richard Biener
>  wrote:
> >
> > On Mon, Mar 4, 2019 at 11:01 PM Qing Zhao  wrote:
> > >
> > > Hi, Richard,
> > >
> > > > On Mar 4, 2019, at 5:45 AM, Richard Biener  
> > > > wrote:
> > > >>
> > > >> It looks like DOM fails to visit stmts generated by simplification. 
> > > >> Can you open a bug report with a testcase?
> > > >>
> > > >>
> > > >> The problem is, It took me quite some time in order to come up with a 
> > > >> small and independent testcase for this problem,
> > > >> a little bit change made the error disappear.
> > > >>
> > > >> do you have any suggestion on this?  or can you give me some hint on 
> > > >> how to fix this in DOM?  then I can try the fix on my side?
> > > >
> > > > I remember running into similar issues in the past where I tried to
> > > > extract temporary nonnull ranges from divisions.
> > > > I have there
> > > >
> > > > @@ -1436,11 +1436,16 @@ dom_opt_dom_walker::before_dom_children
> > > >   m_avail_exprs_stack->pop_to_marker ();
> > > >
> > > >   edge taken_edge = NULL;
> > > > -  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > -{
> > > > -  evrp_range_analyzer.record_ranges_from_stmt (gsi_stmt (gsi), 
> > > > false);
> > > > -  taken_edge = this->optimize_stmt (bb, gsi);
> > > > -}
> > > > +  gsi = gsi_start_bb (bb);
> > > > +  if (!gsi_end_p (gsi))
> > > > +while (1)
> > > > +  {
> > > > +   evrp_range_analyzer.record_def_ranges_from_stmt (gsi_stmt 
> > > > (gsi), false);
> > > > +   taken_edge = this->optimize_stmt (bb, &gsi);
> > > > +   if (gsi_end_p (gsi))
> > > > + break;
> > > > +   evrp_range_analyzer.record_use_ranges_from_stmt (gsi_stmt 
> > > > (gsi));
> > > > +  }
> > > >
> > > >   /* Now prepare to process dominated blocks.  */
> > > >   record_edge_info (bb);
> > > >
> > > > OTOH the issue in your case is that fold emits new stmts before gsi but 
> > > > the
> > > > above loop will never look at them.  See tree-ssa-forwprop.c for code 
> > > > how
> > > > to deal with this (setting a pass-local flag on stmts visited and 
> > > > walking back
> > > > to unvisited, newly inserted ones).  The fold_stmt interface could in 
> > > > theory
> > > > also be extended to insert new stmts on a sequence passed to it so the
> > > > caller would be responsible for inserting them into the IL and could 
> > > > then
> > > > more easily revisit them (but that's a bigger task).
> > > >
> > > > So, does the following help?
> > >
> > > Yes, this change fixed the error in my side, now, in the dumped file for 
> > > pass dom3:
> > >
> > > 
> > > Visiting statement:
> > > i_49 = _98 > 0 ? k_105 : 0;
> > > Meeting
> > >   [0, 65535]
> > > and
> > >   [0, 0]
> > > to
> > >   [0, 65535]
> > > Intersecting
> > >   [0, 65535]
> > > and
> > >   [0, 65535]
> > > to
> > >   [0, 65535]
> > > Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
> > >   Replaced 'k_105' with variable '_98'
> > > gimple_simplified to _152 = MAX_EXPR <_98, 0>;
> > > i_49 = _152;
> >
> > Ah, that looks interesting.  From this detail we might be
> > able to derive a testcase as well - a GIMPLE one
> > eventually because DOM runs quite late.  It's also interesting
> > to see the inefficient code here (the extra copy), probably
> > some known issue with match-and-simplify, I'd have to check.
> >
> > >   Folded to: i_49 = _152;
> > > LKUP STMT i_49 = _152
> > >  ASGN i_49 = _152
> > >
> > > Visiting statement:
> > > _152 = MAX_EXPR <_98, 0>;
> > >
> > > Visiting statement:
> > > i_49 = _152;
> > > Intersecting
> > >   [0, 65535]  EQUIVALENCES: { _152 } (1 elements)
> > > and
> > >   [0, 65535]
> > > to
> > >   [0, 65535]  EQUIVALENCE

Re: A bug in vrp_meet?

2019-03-06 Thread Richard Biener
On Tue, Mar 5, 2019 at 10:36 PM Jeff Law  wrote:
>
> On 3/5/19 7:44 AM, Richard Biener wrote:
>
> > So fixing it properly with also re-optimize_stmt those stmts so we'd CSE
> > the MAX_EXPR introduced by folding makes it somewhat ugly.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Any ideas how to make it less so?  I can split out making optimize_stmt
> > take a gsi * btw, in case that's a more obvious change and it makes the
> > patch a little smaller.
> >
> > Richard.
> >
> > 2019-03-05  Richard Biener  
> >
> > PR tree-optimization/89595
> > * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> > stmt iterator as reference, take boolean output parameter to
> > indicate whether the stmt was removed and thus the iterator
> > already advanced.
> > (dom_opt_dom_walker::before_dom_children): Re-iterate over
> > stmts created by folding.
> >
> > * gcc.dg/torture/pr89595.c: New testcase.
> >
>
> Well, all the real logic changs are in the before_dom_children method.
> The bits in optimize_stmt are trivial enough to effectively ignore.
>
> I don't see a better way to discover and process statements that are
> created in the bowels of fold_stmt.

I'm not entirely happy so I created the following alternative which
is a bit larger and slower due to the pre-pass clearing the visited flag
but is IMHO easier to follow.  I guess there's plenty of TLC opportunity
here but then I also hope to retire the VN parts of DOM in favor
of the non-iterating RPO-VN code...

So - I'd lean to this variant even though it has the extra loop over stmts,
would you agree?

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-03-06  Richard Biener  

PR tree-optimization/89595
* tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
stmt iterator as reference, take boolean output parameter to
indicate whether the stmt was removed and thus the iterator
already advanced.
(dom_opt_dom_walker::before_dom_children): Re-iterate over
stmts created by folding.

* gcc.dg/torture/pr89595.c: New testcase.


fix-pr89595-2
Description: Binary data


Re: A bug in vrp_meet?

2019-03-07 Thread Richard Biener
On Wed, Mar 6, 2019 at 11:05 AM Richard Biener
 wrote:
>
> On Tue, Mar 5, 2019 at 10:36 PM Jeff Law  wrote:
> >
> > On 3/5/19 7:44 AM, Richard Biener wrote:
> >
> > > So fixing it properly with also re-optimize_stmt those stmts so we'd CSE
> > > the MAX_EXPR introduced by folding makes it somewhat ugly.
> > >
> > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > >
> > > Any ideas how to make it less so?  I can split out making optimize_stmt
> > > take a gsi * btw, in case that's a more obvious change and it makes the
> > > patch a little smaller.
> > >
> > > Richard.
> > >
> > > 2019-03-05  Richard Biener  
> > >
> > > PR tree-optimization/89595
> > > * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> > > stmt iterator as reference, take boolean output parameter to
> > > indicate whether the stmt was removed and thus the iterator
> > > already advanced.
> > > (dom_opt_dom_walker::before_dom_children): Re-iterate over
> > > stmts created by folding.
> > >
> > > * gcc.dg/torture/pr89595.c: New testcase.
> > >
> >
> > Well, all the real logic changs are in the before_dom_children method.
> > The bits in optimize_stmt are trivial enough to effectively ignore.
> >
> > I don't see a better way to discover and process statements that are
> > created in the bowels of fold_stmt.
>
> I'm not entirely happy so I created the following alternative which
> is a bit larger and slower due to the pre-pass clearing the visited flag
> but is IMHO easier to follow.  I guess there's plenty of TLC opportunity
> here but then I also hope to retire the VN parts of DOM in favor
> of the non-iterating RPO-VN code...
>
> So - I'd lean to this variant even though it has the extra loop over stmts,
> would you agree?

I have now applied this variant.

Richard.

> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2019-03-06  Richard Biener  
>
> PR tree-optimization/89595
> * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> stmt iterator as reference, take boolean output parameter to
> indicate whether the stmt was removed and thus the iterator
> already advanced.
> (dom_opt_dom_walker::before_dom_children): Re-iterate over
> stmts created by folding.
>
> * gcc.dg/torture/pr89595.c: New testcase.


Re: GSoC Project Ideas

2019-03-08 Thread Richard Biener
On Thu, Mar 7, 2019 at 7:20 PM Martin Sebor  wrote:
>
> On 3/4/19 6:17 AM, Richard Biener wrote:
> > On Mon, Mar 4, 2019 at 1:23 PM Jakub Jelinek  wrote:
> >>
> >> On Mon, Mar 04, 2019 at 01:13:29PM +0100, Richard Biener wrote:
> >>>>>* Make TREE_NO_WARNING more fine-grained
> >>>>>  (inspired by comment #7 of PR74762 [3])
> >>>>>TREE_NO_WARNING is currently used as a catch-all marker that 
> >>>>> inhibits all
> >>>>>warnings related to the marked expression.  The problem with 
> >>>>> this is that
> >>>>>if some warning routine sets the flag for its own purpose,
> >>>>>then that later may inhibit another unrelated warning from 
> >>>>> firing, see for
> >>>>>example PR74762.  Implementing a more fine-grained mechanism for
> >>>>>inhibiting particular warnings would eliminate such issues.
> >>>> Might be interesting.  You'd probably need to discuss the details 
> >>>> further.
> >>>
> >>> I guess an implementation could use TREE_NO_WARNING (or 
> >>> gimple_no_warning_p)
> >>> as indicator that there's out-of-bad detail information which could be 
> >>> stored as
> >>> a map keyed off either a location or a tree or gimple *.
> >>
> >> I guess on tree or gimple * is better, there would need to be some hook for
> >> copy_node/gimple_copy that would add the info for the new copy as well if
> >> the TREE_NO_WARNING or gimple_no_warning_p bit was set.  Plus there could 
> >> be
> >> some purging of this on the side information, e.g.  once code is handed 
> >> over
> >> from the FE to the middle-end (maybe do that only at free_lang_data time),
> >> for any warnings that are FE only there is no need to keep records in the 
> >> on
> >> the side mapping that have info about those FE warnings only, as later on
> >> the FE warnings will not be reported anymore.
> >> The implementation could be e.g. a hash map from tree/gimple * (pointers) 
> >> to
> >> bitmaps of warning numbers, with some hash table to ensure that the same
> >> bitmap is used for all the spots that need to have the same set of warnings
> >> disabled.
> >
> > A possibly related project is to "defer" output of diagnostics until we know
> > the stmt/expression we emit it for survived dead code elimination.  Here 
> > there's
> > the question what to key the diagnostic off and how to move it (that is, 
> > detect
> > if the code causing it really fully went dead).
>
> Another (maybe only remotely related) aspect of this project might
> be getting #pragma GCC diagnostic to work reliably with middle-end
> warnings emitted for inlined code.  That it doesn't work is one of
> the frustrations for users who run into false positives with "late"
> warnings like -Wstringop-overflow or -Wformat-overflow.

A similar issue is they are not carried along from compile-time to
LTO link time.  I'm not even sure how they are attached to anything
right now ... certainly not in DECL_FUNCTION_SPECIFIC_OPTIMIZATION.

> I'm sure there are bugs that track this but here's a test case
> involving -Warray-bounds:
>
>int a[3];
>
>int f (int i)
>{
>  return a[i];
>}
>
>#pragma GCC diagnostic push
>#pragma GCC diagnostic ignored "-Warray-bounds"
>int g (void)
>{
>  return f (7);   // expect no -Warray-bounds
>}
>#pragma GCC diagnostic pop
>
>int h (void)
>{
>  return f (7);   // expect -Warray-bounds
>}
>
> Martin


Re: Ryzen PPA znver1 optimizations

2019-03-08 Thread Richard Biener
On Fri, Mar 8, 2019 at 8:56 AM Vanida Plamondon
 wrote:
>
> I have been working on some PPA's that will provide standard Ubuntu
> and Linux Mint packages that are compiled with the znver1 cpu
> optimisations (Ryzen CPU). It has been quite tedious (though not
> particularly hard) to modify existing packages to be compiled with
> "-march=znver1" cflags and cxxflags, and since I started creating a
> toolchain to make the builds in the PPAs compile more reliably while
> producing broken less packages, I decided to modify GCC to always spit
> out ryzen optimised code automatically regardless of what code is
> thrown at it.
>
> I changed each instance of =generic in gcc/config.gcc to =znver1, and
> each instance of cpu= to cpu=znver1, and each instance of
> arch= that wasn't i386, i486, i586, i686, i786, x86-64, or
> x86_64 to arch=znver1.

You can configure with --with-arch=znver1 --with-tune=znver1 to
achieve the same effect.

> So what I think will happen is that I will set a PPA with a dependency
> on the PPA with the modified GCC, and any package I upload/copy to the
> aforementioned PPA that is compiling to x86 code will compile as
> though I set the "-march=znver1" option. Does anyone know whether or
> not this is going to work the way I think it will, or know how I can
> test to see if such is the case with the resulting binary packages?
>
> Also, is there a better way to do what I am trying to do?


Re: A bug in vrp_meet?

2019-03-20 Thread Richard Biener
On Tue, Mar 19, 2019 at 8:53 PM Jeff Law  wrote:
>
> On 3/6/19 3:05 AM, Richard Biener wrote:
> > On Tue, Mar 5, 2019 at 10:36 PM Jeff Law  wrote:
> >>
> >> On 3/5/19 7:44 AM, Richard Biener wrote:
> >>
> >>> So fixing it properly with also re-optimize_stmt those stmts so we'd CSE
> >>> the MAX_EXPR introduced by folding makes it somewhat ugly.
> >>>
> >>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >>>
> >>> Any ideas how to make it less so?  I can split out making optimize_stmt
> >>> take a gsi * btw, in case that's a more obvious change and it makes the
> >>> patch a little smaller.
> >>>
> >>> Richard.
> >>>
> >>> 2019-03-05  Richard Biener  
> >>>
> >>> PR tree-optimization/89595
> >>> * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> >>> stmt iterator as reference, take boolean output parameter to
> >>> indicate whether the stmt was removed and thus the iterator
> >>> already advanced.
> >>> (dom_opt_dom_walker::before_dom_children): Re-iterate over
> >>> stmts created by folding.
> >>>
> >>> * gcc.dg/torture/pr89595.c: New testcase.
> >>>
> >>
> >> Well, all the real logic changs are in the before_dom_children method.
> >> The bits in optimize_stmt are trivial enough to effectively ignore.
> >>
> >> I don't see a better way to discover and process statements that are
> >> created in the bowels of fold_stmt.
> >
> > I'm not entirely happy so I created the following alternative which
> > is a bit larger and slower due to the pre-pass clearing the visited flag
> > but is IMHO easier to follow.  I guess there's plenty of TLC opportunity
> > here but then I also hope to retire the VN parts of DOM in favor
> > of the non-iterating RPO-VN code...
> >
> > So - I'd lean to this variant even though it has the extra loop over stmts,
> > would you agree?
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2019-03-06  Richard Biener  
> >
> > PR tree-optimization/89595
> > * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> > stmt iterator as reference, take boolean output parameter to
> > indicate whether the stmt was removed and thus the iterator
> > already advanced.
> > (dom_opt_dom_walker::before_dom_children): Re-iterate over
> > stmts created by folding.
> >
> > * gcc.dg/torture/pr89595.c: New testcase.
> This one is easier to follow from a logic standpoint.  I don't think the
> gimple_set_visited bits are going to be terribly expensive in general.
>
> Is that flag in a known state for new statements?  I'm guessing it's
> cleared by some structure-sized memset as we create the raw statement?

Yes, it's of course not documented that way but IMHo the only reasonable
state.

> Might be worth clarifying that in the comments in gimple.h.
>
> jeff
>


Re: Indicating function exit points in debug data

2019-03-20 Thread Richard Biener
On Tue, Mar 19, 2019 at 9:38 PM Justin Paston-Cooper
 wrote:
>
> Hello,
>
> In my message https://sourceware.org/ml/gdb/2019-03/msg00042.html to
> the gdb mailing list, I asked whether it would be possible to
> implement a command which breaks at all exit points of the current
> stack frame. This would be very useful for evaluating a function's
> final state before it returns its result, independent of where in its
> definition it returns from.
>
> Tom Tromey suggested in that thread that this would be quite easy on
> gdb's side if gcc indicates exit locations in the DWARF data, for

Did he indicate _how_ to represent this in DWARF?  I suppose the breakpoint
should happen before the local frame is teared down.

> instance in the C case, it would indicate the locations of return
> statements. On a related note, he mentions that the "finish" command
> does not work for inlined functions because the compiler does not emit
> the required information.

While "finish" wants a location at the caller side, after the inlined frame is
teared down.  So they are somewhat distinct.

Can you open two enhancement requests in buzilla?

Thanks,
Richard.

> It would be nice if this new break on exit command worked both for
> inlined functions, and also the case of breaking after tail-recursions
> exit. With a single stone, the "finish" bird above is also killed.

> Would it be feasible to implement such a feature in gcc? If I'm not
> the first person to ask for this, are there any architectural or
> practical reasons as to why it might not be possible to implement? As
> it stands, I don't have the level of familiarity with gcc to come to
> you with a patch, but with guidance I would certainly be interested in
> working on the C/C++ case if that would be useful.
>
> Thanks,
>
> Justin


Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-21 Thread Richard Biener
On Wed, Mar 20, 2019 at 6:36 PM Andrew Haley  wrote:
>
> On 3/20/19 2:08 PM, Moritz Strübe wrote:
> >
> > Ok, I played around a bit. Interestingly, if I set
> > -fsanitize=udefined and -fsanitize-undefined-trap-on-error the
> > compiler detects that it will always trap, and optimizes the code
> > accordingly (the code after the trap is removed).* Which kind of
> > brings me to David's argument: Shouldn't the compiler warn if there
> > is undefined behavior it certainly knows of?
>
> Maybe an example would help.
>
> Consider this code:
>
> for (int i = start; i < limit; i++) {
>   foo(i * 5);
> }
>
> Should GCC be entitled to turn it into
>
> int limit_tmp = i * 5;
> for (int i = start * 5; i < limit_tmp; i += 5) {
>   foo(i);
> }
>
> If you answered "Yes, GCC should be allowed to do this", would you
> want a warning? And how many such warnings might there be in a typical
> program?

I assume i is signed int.  Even then GCC may not do this unless it knows
the loop is entered (start < limit).

Richard.

>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. 
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: Indicating function exit points in debug data

2019-03-21 Thread Richard Biener
On Wed, Mar 20, 2019 at 8:05 PM Tom Tromey  wrote:
>
> > "Segher" == Segher Boessenkool  writes:
>
> >> Section 6.2.5.2 outlines the line number information state machine's
> >> opcodes. One of them is "DW_LNS_set_epilogue_begin". Its definition
> >> is:
>
> Segher> How should this work with shrink-wrapping?  The whole point of that is
> Segher> you do not tear down the frame after all other code, etc.  I don't see
> Segher> how we can do better than putting this DW_LNS_set_epilogue_begin right
> Segher> before the actual return -- and that is after all the tear down etc.
>
> I think it's fine if the epilogue marker is inexact or missing from
> optimized code, because (1) that's the current state, and (2) it doesn't
> really make sense to talk about an epilogue in some cases.
>
> Similarly, IMO it is fine not to worry about non-local exits.  You can
> already catch exceptions and examine them in gdb -- the epilogue marker
> feature is mostly to address the unmet need of wanting to set a
> breakpoint at the end of a function.

Btw, the feature I am missing is not breaking at the end of a function
but conditionally breaking on a specific return value of a specific function.
Those are probably related but my usecase might be easier because
the return value location is defined by the ABI and catching the actual
return assembly instruction should already work.

> Ideally, in -O0 / -Og code, the marker would be reliable where it
> appears.
>
> It would be great if there was a way to indicate the location of the
> value-to-be-returned in the DWARF.  That way gdb could extract it at the
> epilogue point.  AFAIK this would require a DWARF extension.

The ABI specifies this at the 'ret' instruction?

Richard.

> thanks,
> Tom


Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-21 Thread Richard Biener
On Thu, Mar 21, 2019 at 9:25 AM Alexander Monakov  wrote:
>
> On Thu, 21 Mar 2019, Richard Biener wrote:
> > > Maybe an example would help.
> > >
> > > Consider this code:
> > >
> > > for (int i = start; i < limit; i++) {
> > >   foo(i * 5);
> > > }
> > >
> > > Should GCC be entitled to turn it into
> > >
> > > int limit_tmp = i * 5;
> > > for (int i = start * 5; i < limit_tmp; i += 5) {
> > >   foo(i);
> > > }
> > >
> > > If you answered "Yes, GCC should be allowed to do this", would you
> > > want a warning? And how many such warnings might there be in a typical
> > > program?
> >
> > I assume i is signed int.  Even then GCC may not do this unless it knows
> > the loop is entered (start < limit).
>
> Additionally, the compiler needs to prove that 'foo' always returns normally
> (i.e. cannot invoke exit/longjmp or such).

Ah, yes.  Andrews example was probably meaning limit_tmp = limit * 5, not i * 5.
Computing start * 5 is fine if the loop is entered.

Richard.

>
> Alexander


Re: GSOC

2019-03-26 Thread Richard Biener
On Tue, 26 Mar 2019, David Malcolm wrote:

> On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > Greetings All,
> > 
> > I would like to take up parallelize compilation using threads or make
> > c++/c 
> > memory issues not automatically promote. I did ask about this before
> > but
> > not get a reply. When someone replies I'm just a little concerned as 
> > my writing for proposals has never been great so if someone just
> > reviews
> > and doubt checks that's fine.
> > 
> > As for the other things building gcc and running the testsuite is
> > fine. Plus
> > I already working on gcc so I've pretty aware of most things and this
> > would
> > be a great steeping stone into more serious gcc development work.
> > 
> > If sample code is required that's in mainline gcc I sent out a trial
> > patch
> > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > 
> > Cheers,
> > 
> > Nick
> 
> It's good to see that you've gotten as far as attaching a patch to BZ
> [1]
> 
> I think someone was going to attempt the "parallelize compilation using
> threads" idea last year, but then pulled out before the summer; you may
> want to check the archives (or was that you?)

There's also Giuliano Belinassi who is interested in the same project
(CCed).

> IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - but
> I don't know if he's still interested/able to spare the cycles.

I've offered mentoring to Giuliano, so yes.

> That said, the parallel compilation one strikes me as very ambitious;
> it's not clear to me what could realistically be done as a GSoC
> project.  I think a good proposal on that would come up with some
> subset of the problem that's doable over a summer, whilst also being
> useful to the project.  The RTL infrastructure has a lot of global
> state, so maybe either focus on the gimple passes, or on fixing global
> state on the RTL side?  (I'm not sure)

That was the original intent for the experiment.  There's also
the already somewhat parallel WPA stage in LTO compilation mode
(but it simply forks for the sake of simplicity...).

> Or maybe a project to be more
> explicit about regions of the code that assume that the garbage-
> collector can't run within them?[3] (since the GC is state that would
> be shared by the threads).

The GC will be one obstackle.  The original idea was to drive
parallelization on the pass level by the pass manager for the
GIMPLE passes, so serialization points would be in it.

Richard.

> Hope this is constructive/helpful
> Dave
> 
> [1] though typically our workflow involved sending patches to the gcc-
> patches mailing list
> [2] as libgccjit maintainer I have an interest in global state within
> the compiler
> [3] I posted some ideas about this back in 2013 IIRC; probably
> massively bit-rotted since then.  I also gave a talk at Cauldron 2013
> about global state in the compiler (with a view to gcc-as-a-shared-
> library); likewise I expect much of the ideas there to be out-of-date); 
> for libgccjit I went with a different approach


Re: Function pointers to a nested function / contained procedure

2019-03-27 Thread Richard Biener
On Wed, Mar 27, 2019 at 8:48 AM Thomas König  wrote:
>
> Hi Eric,
> > There is an entire machinery in the middle-end and the back-ends to support 
> > this (look for trampolines/descriptors in the manual and the source code). 
> > This should essentially work out of the box for any language front-end.
> Thanks for the pointer. The documentation I have seen seems to point out what 
> to do in a back end to implement this, less towards what to do in a front 
> end. And the source is big :-)
> Could somebody maybe shed some additional light on what magic I would have to 
> invoke in the Fortran front end?

I think the only thing required is that the function nesting is
appearant by means of DECL_CONTEXT of the inner function being
the outer function.  Of course accesses of outer function variables
from the inner function have to use the same decl tree
(not just a copy of the decl with the same name).

Richard.

> Regards, Thomas


Re: Function pointers to a nested function / contained procedure

2019-03-27 Thread Richard Biener
On Wed, Mar 27, 2019 at 10:09 AM Jakub Jelinek  wrote:
>
> On Wed, Mar 27, 2019 at 10:02:21AM +0100, Richard Biener wrote:
> > On Wed, Mar 27, 2019 at 8:48 AM Thomas König  wrote:
> > >
> > > Hi Eric,
> > > > There is an entire machinery in the middle-end and the back-ends to 
> > > > support this (look for trampolines/descriptors in the manual and the 
> > > > source code). This should essentially work out of the box for any 
> > > > language front-end.
> > > Thanks for the pointer. The documentation I have seen seems to point out 
> > > what to do in a back end to implement this, less towards what to do in a 
> > > front end. And the source is big :-)
> > > Could somebody maybe shed some additional light on what magic I would 
> > > have to invoke in the Fortran front end?
> >
> > I think the only thing required is that the function nesting is
> > appearant by means of DECL_CONTEXT of the inner function being
> > the outer function.  Of course accesses of outer function variables
> > from the inner function have to use the same decl tree
> > (not just a copy of the decl with the same name).
>
> Yeah, and then tree-nested.c should do most of the magic needed to make it
> working (plus expansion emit trampolines unless target has some other ways
> to pass the static chain pointer).
>
> Just look what will
>
> __attribute__((noipa)) void foo (void (*fn) (void))
> {
>   fn ();
>   fn ();
> }
>
> int
> main ()
> {
>   int i = 0;
>   void bar (void) { i++; }
>   foo (bar);
>   return i - 2;
> }
>
> do in C, the attribute is there to make sure it isn't inlined or otherwise
> IPA optimized.

Btw, I've looked at the bug and the issue there is the FE does

void foo()
{
  int a;
  int bar()
{
  return a;
}

  static int (*bp)() = bar;
  bp();
}

that is, tries to statically initialize the procedure pointer.  That causes
the indirect call to not receive a static chain and thus be called with
the wrong ABI.

You can't do that.

Richard.

>
> Jakub


Re: GCC-Reordering-Optimization-Options in Os and O2 when using __builtin_expect() and documentation

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 1:27 PM Martin Liška  wrote:
>
> On 3/25/19 1:36 PM, Moritz Strübe wrote:
> > Hi,
> >
> > I have an issue with the optimization options. We are on an stm32 and it
> > only has a prefetcher, but no cache. Thus it's nice to have linear
> > default path. For example, we use  __builtin_expect in our asserts. Yet
> > it seems that this does not work when using -Os. I confirmed that this
> > is not an arm issue, but can also be seen on x86.
> > I have the following code:
> > --
> > #include 
> > #ifdef UN
> > #define UNLIKELY(x) __builtin_expect((x),0)
> > #else
> > #define UNLIKELY(x) (x)
> > #endif
> >
> > float a = 66;
> >
> > int test (float b, int test) {
> >  if(UNLIKELY(test)) {
> >  return b / a;
> >  } else {
> >  return b * a;
> >  }
> > }
> > --
> > "gcc -O2" reorders the code depending on a passed -DUN, while -Os always
> > produces the same output (see https://godbolt.org/z/cL-Pbg)
> >
> > I played around with different options running
> > gcc -O{s|2} -Q  --help=optimizers
> > , but didn't manage to get -Os to do that optimization.
>
> Hi.
>
> So first we have a misleading documentation for 8.3.0, you're hitting:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87829
>
> So with -Os we enable BB reordering, the only difference is that we set
> -freorder-blocks-algorithm=simple with -Os. However, on x86_64, it stays
> with -freorder-blocks-algorithm=stv.
>
> Then you can use -fdump-rtl-bbro and see a dump file:
> pr-expect.c.300r.bbro
>
> The issues you're seeing is caused by fact that bbro uses
> optimize_function_for_size_p functions that return true with -Os.
> That's probably why you see the difference. That's my quick analysis.

Yeah, I think it's use of optimize_function_for_size_p is at least odd.
It seems to be a defensive check in place to do as little as possible
keeping the original order.  I think the only thing that should be disabled
is tracing of cold paths.

But this is all stage1 material.

Richard.

> Maybe Honza can help here?
>
> Martin
>
> > Opposed to what the manual[1] says, this only differs in
> > -finline-functions and -foptimize-strlen for 8.3
> > (OT: Especially the info about freorder-blocks-algorithm seems to be
> > outdated for gcc 8.3 (my arm 7.3.1 produces smaller code using stc, too).)
> > Since adjusting all those options didn't help I tried
> > gcc -O{s|2} -Q  --help={param|common|target|c++}
> > but that didn't give me any new insight. (BTW: "-Q --help=param" should
> > probably be documented in the --param-section)
> >
> > Cheers
> > Moritz
> >
> >
> > [1] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/Optimize-Options.html
> >
>


Re: GSOC

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
 wrote:
>
> Hi,
>
> On 03/26, Richard Biener wrote:
> > On Tue, 26 Mar 2019, David Malcolm wrote:
> >
> > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > Greetings All,
> > > >
> > > > I would like to take up parallelize compilation using threads or make
> > > > c++/c
> > > > memory issues not automatically promote. I did ask about this before
> > > > but
> > > > not get a reply. When someone replies I'm just a little concerned as
> > > > my writing for proposals has never been great so if someone just
> > > > reviews
> > > > and doubt checks that's fine.
> > > >
> > > > As for the other things building gcc and running the testsuite is
> > > > fine. Plus
> > > > I already working on gcc so I've pretty aware of most things and this
> > > > would
> > > > be a great steeping stone into more serious gcc development work.
> > > >
> > > > If sample code is required that's in mainline gcc I sent out a trial
> > > > patch
> > > > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > >
> > > > Cheers,
> > > >
> > > > Nick
> > >
> > > It's good to see that you've gotten as far as attaching a patch to BZ
> > > [1]
> > >
> > > I think someone was going to attempt the "parallelize compilation using
> > > threads" idea last year, but then pulled out before the summer; you may
> > > want to check the archives (or was that you?)
> >
> > There's also Giuliano Belinassi who is interested in the same project
> > (CCed).
>
> Yes, I will apply for this project, and I will submit the final version
> of my proposal by the end of the week.
>
> Currently, my target is the `expand_all_functions` routine, as most of
> the time is spent on it according to the experiments that I performed as
> part of my Master's research on compiler parallelization.
> (-O2, --disable-checking)

Yes, more specifically I think the realistic target is the GIMPLE part
of   execute_pass_list (cfun, g->get_passes ()->all_passes);  done in
cgraph_node::expand.  If you look at passes.def you'll see all_passes
also contains RTL expansion (pass_expand) and the RTL optimization
queue (pass_rest_of_compilation).  The RTL part isn't a realistic target.
Without changing the pass hierarchy the obvious part that can be
handled would be the pass_all_optimizations pass sub-queue of
all_passes since those are all passes that perform transforms on the
GIMPLE IL where we have all functions in this state at the same time
and where no interactions between the functions happen anymore
and thus functions can be processed in parallel (as much as make
processes individual translation units in parallel).

To simplify the taks further a useful constraint is to not have
a single optimization pass executed multiple times at the same time
(otherwise you have to look at pass specific global states as well),
thus the parallel part could be coded in a way keeping per function
the state of what pass to execute next and have a scheduler pick
a function its next pass is "free", scheduling that to a fixed set of
worker threads.  There's no dependences between functions
for the scheduling but each pass has only one execution resource
in the pipeline.  You can start processing an arbitrarily large number
of functions but slow functions will keep others from advancing across
the pass it executes on.

Passes could of course be individually marked as thread-safe
(multiple instances execute concurrently).

Garbage collection is already in control of the pass manager which
would also be the thread scheduler.  For GC the remaining issue
is allocation which passes occasionally do.  Locking is the short
term solution for GSoC I guess, long-term per-thread GC pools
might be better (to not slow down non-threaded parts of the compiler).

Richard.

>
> Thank you,
> Giuliano.
>
>
> >
> > > IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - but
> > > I don't know if he's still interested/able to spare the cycles.
> >
> > I've offered mentoring to Giuliano, so yes.
> >
> > > That said, the parallel compilation one strikes me as very ambitious;
> > > it's not clear to me what could realistically be done as a GSoC
> > > project.  I think a good proposal on that would come up with some
> > > subset of the problem that's doable over a summer, whilst also being
> > > useful to the project.

Re: GSOC

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 3:43 PM nick  wrote:
>
>
>
> On 2019-03-27 9:55 a.m., Giuliano Belinassi wrote:
> > Hi,
> >
> > On 03/26, Richard Biener wrote:
> >> On Tue, 26 Mar 2019, David Malcolm wrote:
> >>
> >>> On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> >>>> Greetings All,
> >>>>
> >>>> I would like to take up parallelize compilation using threads or make
> >>>> c++/c
> >>>> memory issues not automatically promote. I did ask about this before
> >>>> but
> >>>> not get a reply. When someone replies I'm just a little concerned as
> >>>> my writing for proposals has never been great so if someone just
> >>>> reviews
> >>>> and doubt checks that's fine.
> >>>>
> >>>> As for the other things building gcc and running the testsuite is
> >>>> fine. Plus
> >>>> I already working on gcc so I've pretty aware of most things and this
> >>>> would
> >>>> be a great steeping stone into more serious gcc development work.
> >>>>
> >>>> If sample code is required that's in mainline gcc I sent out a trial
> >>>> patch
> >>>> for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Nick
> >>>
> >>> It's good to see that you've gotten as far as attaching a patch to BZ
> >>> [1]
> >>>
> >>> I think someone was going to attempt the "parallelize compilation using
> >>> threads" idea last year, but then pulled out before the summer; you may
> >>> want to check the archives (or was that you?)
> >>
> >> There's also Giuliano Belinassi who is interested in the same project
> >> (CCed).
> >
> > Yes, I will apply for this project, and I will submit the final version
> > of my proposal by the end of the week.
> >
> > Currently, my target is the `expand_all_functions` routine, as most of
> > the time is spent on it according to the experiments that I performed as
> > part of my Master's research on compiler parallelization.
> > (-O2, --disable-checking)
> >
> > Thank you,
> > Giuliano.
> >
> >
> My goal was this:
> Or maybe a project to be more
> explicit about regions of the code that assume that the garbage-
> collector can't run within them?[3] (since the GC is state that would
> be shared by the threads).

That's already pretty clear so a non-project.  Honestly you are somewhat
late to the project but if you can come up with a solid proposal we will
definitely have a look.  The project itself is of course large enough to
cover 10s of GSoC students ;)

Richard.

> So there is no conflict between me and Giuliano. Richard and me have
> already been going back and forth. The remaining tasks for me are
> just write the proposal as the big one and I asked Richard to sent
> me a example you guys liked. I've signed up to contribute for the
> last year so that's fine.
>
> Just letting the list known as well as Richard where I am,
>
> Nick
> >>
> >>> IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - but
> >>> I don't know if he's still interested/able to spare the cycles.
> >>
> >> I've offered mentoring to Giuliano, so yes.
> >>
> >>> That said, the parallel compilation one strikes me as very ambitious;
> >>> it's not clear to me what could realistically be done as a GSoC
> >>> project.  I think a good proposal on that would come up with some
> >>> subset of the problem that's doable over a summer, whilst also being
> >>> useful to the project.  The RTL infrastructure has a lot of global
> >>> state, so maybe either focus on the gimple passes, or on fixing global
> >>> state on the RTL side?  (I'm not sure)
> >>
> >> That was the original intent for the experiment.  There's also
> >> the already somewhat parallel WPA stage in LTO compilation mode
> >> (but it simply forks for the sake of simplicity...).
> >>
> >>> Or maybe a project to be more
> >>> explicit about regions of the code that assume that the garbage-
> >>> collector can't run within them?[3] (since the GC is state that would
> >>> be shared by the threads).
> >>
> >> The GC will be one obstackle.  The original idea was to drive
> >> parallelization on the pass level by the pass manager for the
> >> GIMPLE passes, so serialization points would be in it.
> >>
> >> Richard.
> >>
> >>> Hope this is constructive/helpful
> >>> Dave
> >>>
> >>> [1] though typically our workflow involved sending patches to the gcc-
> >>> patches mailing list
> >>> [2] as libgccjit maintainer I have an interest in global state within
> >>> the compiler
> >>> [3] I posted some ideas about this back in 2013 IIRC; probably
> >>> massively bit-rotted since then.  I also gave a talk at Cauldron 2013
> >>> about global state in the compiler (with a view to gcc-as-a-shared-
> >>> library); likewise I expect much of the ideas there to be out-of-date);
> >>> for libgccjit I went with a different approach


Re: GSOC Proposal

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 6:31 PM nick  wrote:
>
> Greetings All,
>
> I've already done most of the work required for signing up for GSoC
> as of last year i.e. reading getting started, being signed up legally
> for contributions.
>
> My only real concern would be the proposal which I started writing here:
> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit?usp=sharing
>
> The biography and success section I'm fine with my bigger concern would be 
> the project and roadmap
> section. The roadmap is there and I will go into more detail about it in the 
> projects section as
> need be. Just wanted to known if the roadmap is detailed enough or can I just 
> write out a few
> paragraphs discussing it in the Projects Section.

I'm not sure I understand either the problem analysis nor the project
goal parts.  What
shared state with respect to garbage collection are you talking about?

Richard.

> Any other comments are welcome as well as I write it there,
> Nick


Re: GSOC

2019-03-29 Thread Richard Biener
On Thu, 28 Mar 2019, Giuliano Belinassi wrote:

> Hi, Richard
> 
> On 03/28, Richard Biener wrote:
> > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> >  wrote:
> > >
> > > Hi,
> > >
> > > On 03/26, Richard Biener wrote:
> > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > >
> > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > Greetings All,
> > > > > >
> > > > > > I would like to take up parallelize compilation using threads or 
> > > > > > make
> > > > > > c++/c
> > > > > > memory issues not automatically promote. I did ask about this before
> > > > > > but
> > > > > > not get a reply. When someone replies I'm just a little concerned as
> > > > > > my writing for proposals has never been great so if someone just
> > > > > > reviews
> > > > > > and doubt checks that's fine.
> > > > > >
> > > > > > As for the other things building gcc and running the testsuite is
> > > > > > fine. Plus
> > > > > > I already working on gcc so I've pretty aware of most things and 
> > > > > > this
> > > > > > would
> > > > > > be a great steeping stone into more serious gcc development work.
> > > > > >
> > > > > > If sample code is required that's in mainline gcc I sent out a trial
> > > > > > patch
> > > > > > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Nick
> > > > >
> > > > > It's good to see that you've gotten as far as attaching a patch to BZ
> > > > > [1]
> > > > >
> > > > > I think someone was going to attempt the "parallelize compilation 
> > > > > using
> > > > > threads" idea last year, but then pulled out before the summer; you 
> > > > > may
> > > > > want to check the archives (or was that you?)
> > > >
> > > > There's also Giuliano Belinassi who is interested in the same project
> > > > (CCed).
> > >
> > > Yes, I will apply for this project, and I will submit the final version
> > > of my proposal by the end of the week.
> > >
> > > Currently, my target is the `expand_all_functions` routine, as most of
> > > the time is spent on it according to the experiments that I performed as
> > > part of my Master's research on compiler parallelization.
> > > (-O2, --disable-checking)
> > 
> > Yes, more specifically I think the realistic target is the GIMPLE part
> > of   execute_pass_list (cfun, g->get_passes ()->all_passes);  done in
> > cgraph_node::expand.  If you look at passes.def you'll see all_passes
> > also contains RTL expansion (pass_expand) and the RTL optimization
> > queue (pass_rest_of_compilation).  The RTL part isn't a realistic target.
> > Without changing the pass hierarchy the obvious part that can be
> > handled would be the pass_all_optimizations pass sub-queue of
> > all_passes since those are all passes that perform transforms on the
> > GIMPLE IL where we have all functions in this state at the same time
> > and where no interactions between the functions happen anymore
> > and thus functions can be processed in parallel (as much as make
> > processes individual translation units in parallel).
> > 
> 
> Great. So if I understood correctly, I will need to split
> cgraph_node::expand() into three parts: IPA, GIMPLE and RTL, and then
> refactor `expand_all_functions` so that the loop
> 
>  for (i = new_order_pos - 1; i >= 0; i--)
> 
>  use these three functions, then partition
> 
>  g->get_passes()->all_passes
> 
> into get_passes()->gimple_passes and get_passes()->rtl_passes, so I
> can run RTL after GIMPLE is finished, to finally start the
> paralellization of per function GIMPLE passes.

Yes, it involves refactoring of the loop - you may notice that
parts of the compilation pipeline are under control of the
pass manager (passes.c) but some is still manually driven
by symbol_table::compile.  Whether it's more convenient to
get more control stuffed to the pass manager and perform the
threading under its control (I'd say that would be the cleaner
design) or to try do this in the c

Re: GSOC Proposal

2019-03-29 Thread Richard Biener
On Thu, 28 Mar 2019, nick wrote:

> 
> 
> On 2019-03-28 4:59 a.m., Richard Biener wrote:
> > On Wed, Mar 27, 2019 at 6:31 PM nick  wrote:
> >>
> >> Greetings All,
> >>
> >> I've already done most of the work required for signing up for GSoC
> >> as of last year i.e. reading getting started, being signed up legally
> >> for contributions.
> >>
> >> My only real concern would be the proposal which I started writing here:
> >> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit?usp=sharing
> >>
> >> The biography and success section I'm fine with my bigger concern would be 
> >> the project and roadmap
> >> section. The roadmap is there and I will go into more detail about it in 
> >> the projects section as
> >> need be. Just wanted to known if the roadmap is detailed enough or can I 
> >> just write out a few
> >> paragraphs discussing it in the Projects Section.
> > 
> > I'm not sure I understand either the problem analysis nor the project
> > goal parts.  What
> > shared state with respect to garbage collection are you talking about?
> > 
> > Richard.
> > 
> I just fixed it. Seems we were discussing RTL itself. I edited it to 
> reflect those changes. Let me know if it's unclear or you would actually 
> like me to discuss some changes that may occur in the RTL layer itself.
> 
> 
> I'm glad to be more exact if that's better but seems your confusion was 
> just what layer we were touching.

Let me just throw in some knowledge here.  The issue with RTL
is that we currently can only have a single function in this
intermediate language state since a function in RTL has some
state in global variables that would differ if it were another
function.  We can have multiple functions in GIMPLE intermediate
language state since all such state is in a function-specific
data structure (struct function).  The hard thing about moving
all this "global" state of RTL into the same place is that
there's global state in the various backends (and there's
already a struct funtion 'machine' part for such state, so there's
hope the issue isn't as big as it could be) and that some of
the global state is big and only changes very rarely.
That said, I'm not sure if anybody knows the full details here.

So as far as I understand you'd like to tackle this as project
with the goal to be able to have multiple functions in RTL
state.

That's laudable but IMHO also quite ambitious for a GSoC
project.  It's also an area I am not very familiar with so
I opt out of being a mentor for this project.

Richard.

> Nick
> >> Any other comments are welcome as well as I write it there,
> >> Nick
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

GCC 9 Status Report (2019-03-29)

2019-03-29 Thread Richard Biener


Status
==

We should be at the end of the stabilization phase (Stage 4) having
made some good progress in the long march towards zero P1 regressions.
Please have a look at those that you assigned yourself to.

There's still 12 P1 left at the moment though at some point bugs
not severe (wrong-code, rejects-valid) might be downgraded and
defered for fixing in a subsequent release.  We would like to follow
the historical precedent of releasing at the end of April or the
beginning of May and not slip further.

As usual this is a good time to test your non-{primary,secondary}
target making sure it builds and works correctly.


Quality Data


Priority  #   Change from last report
---   ---
P1   12   -  12
P2  158   -  27
P3   25   -   7
P4  138   -  31
P5   24
---   ---
Total P1-P3 195   -  46
Total   357   -  77


Previous Report
===

https://gcc.gnu.org/ml/gcc/2019-02/msg00028.html


Re: [GSoC 2019] Proposal: Parallelize GCC With Threads

2019-04-01 Thread Richard Biener
On Sun, 31 Mar 2019, Giuliano Belinassi wrote:

> Hi,
> 
> I wrote my GSoC Proposal to the "Parallelize GCC with threads" project,
> and if someone is interested in it, I am linking the text here in order
> to get feedback. Please let me know if something is not entirely clear,
> or if there are any problems with the calendar, or if you have any
> suggestions. :-)
> 
> Link to the proposal:
> https://github.com/giulianobelinassi/proposta-gsoc/blob/master/proposta.pdf
> 
> GitHub seems mess up with the PDF hyperlinks in their viewer, therefore you 
> will
> need to download the PDF if you want to click on them, or use this
> mirror i've set:
> https://www.ime.usp.br/~belinass/GSoC-Proposal.pdf

Hi,

I've read the proposal and it is great - the planned tasks timeline
is solid and realistic.  Of course what will happen between
second and final evaluation depends a lot on the amount of issues
unconvered.

Thanks,
Richard.

> Thank you,
> Giuliano.
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Re: GSOC Proposal

2019-04-01 Thread Richard Biener
On Fri, 29 Mar 2019, nick wrote:

> 
> 
> On 2019-03-29 10:28 a.m., nick wrote:
> > 
> > 
> > On 2019-03-29 5:08 a.m., Richard Biener wrote:
> >> On Thu, 28 Mar 2019, nick wrote:
> >>
> >>>
> >>>
> >>> On 2019-03-28 4:59 a.m., Richard Biener wrote:
> >>>> On Wed, Mar 27, 2019 at 6:31 PM nick  wrote:
> >>>>>
> >>>>> Greetings All,
> >>>>>
> >>>>> I've already done most of the work required for signing up for GSoC
> >>>>> as of last year i.e. reading getting started, being signed up legally
> >>>>> for contributions.
> >>>>>
> >>>>> My only real concern would be the proposal which I started writing here:
> >>>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit?usp=sharing
> >>>>>
> >>>>> The biography and success section I'm fine with my bigger concern would 
> >>>>> be the project and roadmap
> >>>>> section. The roadmap is there and I will go into more detail about it 
> >>>>> in the projects section as
> >>>>> need be. Just wanted to known if the roadmap is detailed enough or can 
> >>>>> I just write out a few
> >>>>> paragraphs discussing it in the Projects Section.
> >>>>
> >>>> I'm not sure I understand either the problem analysis nor the project
> >>>> goal parts.  What
> >>>> shared state with respect to garbage collection are you talking about?
> >>>>
> >>>> Richard.
> >>>>
> >>> I just fixed it. Seems we were discussing RTL itself. I edited it to 
> >>> reflect those changes. Let me know if it's unclear or you would actually 
> >>> like me to discuss some changes that may occur in the RTL layer itself.
> >>>
> >>>
> >>> I'm glad to be more exact if that's better but seems your confusion was 
> >>> just what layer we were touching.
> >>
> >> Let me just throw in some knowledge here.  The issue with RTL
> >> is that we currently can only have a single function in this
> >> intermediate language state since a function in RTL has some
> >> state in global variables that would differ if it were another
> >> function.  We can have multiple functions in GIMPLE intermediate
> >> language state since all such state is in a function-specific
> >> data structure (struct function).  The hard thing about moving
> >> all this "global" state of RTL into the same place is that
> >> there's global state in the various backends (and there's
> >> already a struct funtion 'machine' part for such state, so there's
> >> hope the issue isn't as big as it could be) and that some of
> >> the global state is big and only changes very rarely.
> >> That said, I'm not sure if anybody knows the full details here.
> >>
> >> So as far as I understand you'd like to tackle this as project
> >> with the goal to be able to have multiple functions in RTL
> >> state.
> >>
> >> That's laudable but IMHO also quite ambitious for a GSoC
> >> project.  It's also an area I am not very familiar with so
> >> I opt out of being a mentor for this project.
> >>
> > While I'm aware of three areas where the shared state is an issue
> > currently:
> > 1, Compiler's Proper
> > 2. The expand_functions 
> > 3. RTL
> > 4.Garbage Collector
> > 
> > Or maybe a project to be more
> > explicit about regions of the code that assume that the garbage-
> > collector can't run within them?[3] (since the GC is state that would
> > be shared by the threads).
> > 
> > This is what we were discussing previously and I wrote my proposal for
> > that. You however seem confused about what parts of the garbage collector
> > would be touched. That's fine with me, however seems you want be to
> > be more exact about which part  is touched.
> > 
> > My questions would be as it's changed back to the garbage collector project:
> > https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit
> > 
> > 1. Your confusion about which part of the garbage collector is touched 
> > doesn't
> > really make sense s it's for the whole garbage collector as related to 
> > shared

Re: GSOC Proposal

2019-04-01 Thread Richard Biener
On Mon, 1 Apr 2019, nick wrote:

> 
> 
> On 2019-04-01 5:56 a.m., Richard Biener wrote:
> > On Fri, 29 Mar 2019, nick wrote:
> > 
> >>
> >>
> >> On 2019-03-29 10:28 a.m., nick wrote:
> >>>
> >>>
> >>> On 2019-03-29 5:08 a.m., Richard Biener wrote:
> >>>> On Thu, 28 Mar 2019, nick wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> On 2019-03-28 4:59 a.m., Richard Biener wrote:
> >>>>>> On Wed, Mar 27, 2019 at 6:31 PM nick  wrote:
> >>>>>>>
> >>>>>>> Greetings All,
> >>>>>>>
> >>>>>>> I've already done most of the work required for signing up for GSoC
> >>>>>>> as of last year i.e. reading getting started, being signed up legally
> >>>>>>> for contributions.
> >>>>>>>
> >>>>>>> My only real concern would be the proposal which I started writing 
> >>>>>>> here:
> >>>>>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit?usp=sharing
> >>>>>>>
> >>>>>>> The biography and success section I'm fine with my bigger concern 
> >>>>>>> would be the project and roadmap
> >>>>>>> section. The roadmap is there and I will go into more detail about it 
> >>>>>>> in the projects section as
> >>>>>>> need be. Just wanted to known if the roadmap is detailed enough or 
> >>>>>>> can I just write out a few
> >>>>>>> paragraphs discussing it in the Projects Section.
> >>>>>>
> >>>>>> I'm not sure I understand either the problem analysis nor the project
> >>>>>> goal parts.  What
> >>>>>> shared state with respect to garbage collection are you talking about?
> >>>>>>
> >>>>>> Richard.
> >>>>>>
> >>>>> I just fixed it. Seems we were discussing RTL itself. I edited it to 
> >>>>> reflect those changes. Let me know if it's unclear or you would 
> >>>>> actually 
> >>>>> like me to discuss some changes that may occur in the RTL layer itself.
> >>>>>
> >>>>>
> >>>>> I'm glad to be more exact if that's better but seems your confusion was 
> >>>>> just what layer we were touching.
> >>>>
> >>>> Let me just throw in some knowledge here.  The issue with RTL
> >>>> is that we currently can only have a single function in this
> >>>> intermediate language state since a function in RTL has some
> >>>> state in global variables that would differ if it were another
> >>>> function.  We can have multiple functions in GIMPLE intermediate
> >>>> language state since all such state is in a function-specific
> >>>> data structure (struct function).  The hard thing about moving
> >>>> all this "global" state of RTL into the same place is that
> >>>> there's global state in the various backends (and there's
> >>>> already a struct funtion 'machine' part for such state, so there's
> >>>> hope the issue isn't as big as it could be) and that some of
> >>>> the global state is big and only changes very rarely.
> >>>> That said, I'm not sure if anybody knows the full details here.
> >>>>
> >>>> So as far as I understand you'd like to tackle this as project
> >>>> with the goal to be able to have multiple functions in RTL
> >>>> state.
> >>>>
> >>>> That's laudable but IMHO also quite ambitious for a GSoC
> >>>> project.  It's also an area I am not very familiar with so
> >>>> I opt out of being a mentor for this project.
> >>>>
> >>> While I'm aware of three areas where the shared state is an issue
> >>> currently:
> >>> 1, Compiler's Proper
> >>> 2. The expand_functions 
> >>> 3. RTL
> >>> 4.Garbage Collector
> >>>
> >>> Or maybe a project to be more
> >>> explicit about regions of the code that assume that the garbage-
> >>> collector can't run within them?[3] (since the GC i

Re: [RFC/RFA] Obsolete Cell Broadband Engine SPU targets

2019-04-02 Thread Richard Biener
On April 2, 2019 11:46:14 AM GMT+02:00, Ulrich Weigand  
wrote:
>Hello,
>
>the spu-elf target in GCC supports generating code for the SPU
>processors
>of the Cell Broadband Engine; it has been part of upstream GCC since
>2008.
>
>However, at this point I believe this target is no longer in use:
>- There is no supported Cell/B.E. hardware any more.
>- There is no supported operating system supporting Cell/B.E. any more.
>
>I've still been running daily regression tests until now, but I'll be
>unable to continue to do so much longer since the systems I've been
>using for this will go away.
>
>Rather than leave SPU support untested/maintained, I'd therefore
>propose to declare all SPU targets obsolete in GCC 9 and remove
>the code with GCC 10.
>
>Any objections to this approach?

Works for me. 

Richard. 

>Bye,
>Ulrich
>
>
>gcc/ChangeLog:
>
>   * config.gcc: Mark spu* targets as deprecated/obsolete.
>
>Index: gcc/config.gcc
>===
>--- gcc/config.gcc (revision 270076)
>+++ gcc/config.gcc (working copy)
>@@ -248,6 +248,7 @@ md_file=
> # Obsolete configurations.
> case ${target} in
>   *-*-solaris2.10*\
>+  | spu*-*-*  \
>   | tile*-*-* \
>  )
> if test "x$enable_obsolete" != xyes; then



Re: vector alignment

2019-04-03 Thread Richard Biener
On Tue, Apr 2, 2019 at 6:20 PM Martin Sebor  wrote:
>
> GCC tries to align a vector on its natural boundary, i.e., that
> given by its size, up to MAX_OBJECT_ALIGNMENT.  Vectors that are
> bigger than that are either silently [mis]aligned on that same
> maximum boundary (PR 89798), silently truncated (and misaligned),
> or cause an ICE (PR 89797).  Compiling the following:
>
>__attribute__ ((vector_size (N))) char v;
>
>_Static_assert (sizeof (v) == N, "size");
>_Static_assert (__alignof__ (v) == N, "alignment");
>
> with N set to 1LLU << I shows these failures:
>
>I < 29   succeeds
>I < 31   fails alignment
>I < 32   ICE
>I >= 32  fails alignment and size
>
> Attribute aligned doesn't seem to have any effect on types or
> variables declared with attribute vector_size.  The alignment
> set by the latter prevails.
>
> This happens no matter what scope the vector is defined in (i.e.,
> file or local).
>
> I have some questions:
>
> 1) Is there some reason to align vectors on the same boundary
> as their size no matter how big it is?  I can't find such
> a requirement in the ABIs I looked at.  Or would it be more
> appropriate to align the big ones on the preferred boundary
> for the target?  For instance, does it make more sense to
> align a 64KB vector on a 64KB boundary than on, say,
> a 64-byte boundary (or some other boundary less than 64K?)

I don't think there's a good reason.  Instead I think that
BIGGEST_ALIGNMENT is what we should go for as upper limit,
anything bigger doesn't make sense (unless the user explicitely
requests it).

> 2) If not, is it then appropriate to underalign very large
> vectors on a boundary less than their size?

Yes.

> 3) Should the aligned attribute not override the default vector
> alignment?

Yes, but doesn't it already?

> I would like to think the answer to (1) is that vectors should
> be aligned on the preferred boundary for the target/ABI.  If
> that's feasible, it should also obviate question (2).
>
> I believe the answer to (3) is yes.  If not, GCC should issue
> a warning that it doesn't honor the aligned attribute.
>
> Thanks
> Martin


Re: GSOC Proposal

2019-04-03 Thread Richard Biener
On Mon, 1 Apr 2019, nick wrote:

> 
> 
> On 2019-04-01 9:47 a.m., Richard Biener wrote:
> > On Mon, 1 Apr 2019, nick wrote:
> > 
> >> Well I'm talking about the shared roots of this garbage collector core 
> >> state 
> >> data structure or just struct ggc_root_tab.
> >>
> >> But also this seems that this to be no longer shared globally if I'm not 
> >> mistaken 
> >> or this:
> >> static vec extra_root_vec;
> >>
> >> Not sure after reading the code which is a bigger deal through so I wrote
> >> my proposal not just asking which is a better issue for not being thread
> >> safe. Sorry about that.
> >>
> >> As for the second question injection seems to not be the issue or outside
> >> callers but just internal so phase 3 or step 3 would now be:
> >> Find internal callers or users of x where x is one of the above rather
> >> than injecting outside callers. Which answers my second question about
> >> external callers being a issue still.
> >>
> >> Let me know which  of the two is a better issue:
> >> 1. struct ggc_root_tabs being shared
> >> 2.static vec extra_root_vec; as a shared heap or
> >> vector of root nodes for each type of allocation
> >>
> >> and I will gladly rewrite my proposal sections for that
> >> as needs to be reedited.
> > 
> > I don't think working on the garbage collector as a separate
> > GSoC project is useful at this point.  Doing locking around
> > allocation seems like a good short-term solution and if that
> > turns out to be a performance issue for the threaded part
> > using per-thread freelists is likely an easy to deploy
> > solution.
> > 
> > Richard.
> > 
> I agree but we were discussing this:
> Or maybe a project to be more
> explicit about regions of the code that assume that the garbage-
> collector can't run within them?[3] (since the GC is state that would
> be shared by the threads).

The process of collecting garbage is not the only issue (and that
very issue is easiest mitigated by collecting only at specific
points - which is what we do - and have those be serializing points).
The main issue is the underlying memory allocator (GCC uses memory
that is garbage collected plus regular heap memory).

> In addition I moved my paper back to our discussion about garbage collector
> state with outside callers.Seems we really need to do something about
> my wording as the idea of my project in a nutshell was to figure
> out how to mark shared state by callers and inject it into the
> garbage collector letting it known that the state was not shared between
> threads or shared. Seems that was on the GSoc page and in our discussions the 
> issue
> is marking outside code for shared state. If that's correct then my
> wording of outside callers is incorrect it should have been shared
> state between threads on outside callers to the garbage collector.
> If the state is that in your wording above then great as I understand
> where we are going and will gladly change my wording.

I'm still not sure what you are shooting at, the above sentences do
not make any sense to me.

> Also freelists don't work here as the state is shared at the caller's 
> end which would need two major issues:
> 1. Locking on nodes of the 
> freelists when two threads allocate at the same thing which can be a 
> problem if the shared state is shared a lot
> 2. Locking allocation with 
> large numbers of callers can starve threads

First of all allocating memory from the GC pool is not the main
work of GIMPLE passes so simply serializing at allocation time might
work out.  Second free lists of course do work.  What you'd do is
have a fast path in allocation using a thread-local "free list"
which you can allocate from without taking any lock.  Maybe I should
explain "free list" since that term doesn't make too much sense in
a garbage collector world.  What I'd do is when a client thread
asks for memory of size N allocate M objects of that size but put
M - 1 on the client thread local "free list" to be allocated lock-free
from for the next M - 1 calls.  Note that garbage collected memory
objects are only handed out in fixed chunks (powers of two plus
a few special sizes) so you'd have one "free list" per chunk size
per thread.

The collection itself (mark & sweep) would be fully serialized still
(and not return to any threads local "free list").

ggc_free'd objects _might_ go to the threads "free list"s (yeah, we
_do_ have ggc_free ...).

As said, I don't see GC or the memory allocator as sth interes

Re: GSoC Project Ideas

2019-04-03 Thread Richard Biener
On Tue, Apr 2, 2019 at 1:43 AM Patrick Palka  wrote:
>
> Hi Richard, Jakub and Martin,
>
> First of all I'm sorry for the very late reply, and I will be more
> punctual with my replies from now on.
>
> On Fri, Mar 8, 2019 at 4:35 AM Richard Biener
>  wrote:
> >
> > On Thu, Mar 7, 2019 at 7:20 PM Martin Sebor  wrote:
> > >
> > > On 3/4/19 6:17 AM, Richard Biener wrote:
> > > > On Mon, Mar 4, 2019 at 1:23 PM Jakub Jelinek  wrote:
> > > >>
> > > >> On Mon, Mar 04, 2019 at 01:13:29PM +0100, Richard Biener wrote:
> > > >>>>>* Make TREE_NO_WARNING more fine-grained
> > > >>>>>  (inspired by comment #7 of PR74762 [3])
> > > >>>>>TREE_NO_WARNING is currently used as a catch-all marker that 
> > > >>>>> inhibits all
> > > >>>>>warnings related to the marked expression.  The problem with 
> > > >>>>> this is that
> > > >>>>>if some warning routine sets the flag for its own purpose,
> > > >>>>>then that later may inhibit another unrelated warning from 
> > > >>>>> firing, see for
> > > >>>>>example PR74762.  Implementing a more fine-grained mechanism 
> > > >>>>> for
> > > >>>>>inhibiting particular warnings would eliminate such issues.
> > > >>>> Might be interesting.  You'd probably need to discuss the details 
> > > >>>> further.
> > > >>>
> > > >>> I guess an implementation could use TREE_NO_WARNING (or 
> > > >>> gimple_no_warning_p)
> > > >>> as indicator that there's out-of-bad detail information which could 
> > > >>> be stored as
> > > >>> a map keyed off either a location or a tree or gimple *.
> > > >>
> > > >> I guess on tree or gimple * is better, there would need to be some 
> > > >> hook for
> > > >> copy_node/gimple_copy that would add the info for the new copy as well 
> > > >> if
> > > >> the TREE_NO_WARNING or gimple_no_warning_p bit was set.  Plus there 
> > > >> could be
> > > >> some purging of this on the side information, e.g.  once code is 
> > > >> handed over
> > > >> from the FE to the middle-end (maybe do that only at free_lang_data 
> > > >> time),
> > > >> for any warnings that are FE only there is no need to keep records in 
> > > >> the on
> > > >> the side mapping that have info about those FE warnings only, as later 
> > > >> on
> > > >> the FE warnings will not be reported anymore.
> > > >> The implementation could be e.g. a hash map from tree/gimple * 
> > > >> (pointers) to
> > > >> bitmaps of warning numbers, with some hash table to ensure that the 
> > > >> same
> > > >> bitmap is used for all the spots that need to have the same set of 
> > > >> warnings
> > > >> disabled.
>
> This design makes a lot of sense, thank you for this!
>
> > > >
> > > > A possibly related project is to "defer" output of diagnostics until we 
> > > > know
> > > > the stmt/expression we emit it for survived dead code elimination.  
> > > > Here there's
> > > > the question what to key the diagnostic off and how to move it (that 
> > > > is, detect
> > > > if the code causing it really fully went dead).
>
> Interesting.  Which diagnostics would you have in mind to defer in this way?
>
> > >
> > > Another (maybe only remotely related) aspect of this project might
> > > be getting #pragma GCC diagnostic to work reliably with middle-end
> > > warnings emitted for inlined code.  That it doesn't work is one of
> > > the frustrations for users who run into false positives with "late"
> > > warnings like -Wstringop-overflow or -Wformat-overflow.
>
> Thank you Martin for bringing this up!
>
> >
> > A similar issue is they are not carried along from compile-time to
> > LTO link time.  I'm not even sure how they are attached to anything
> > right now ... certainly not in DECL_FUNCTION_SPECIFIC_OPTIMIZATION.
>
> This is good to know too.
>
> I know that there is only a week left to submit a proposal, but I am
> thinking of a projec

Re: vector alignment

2019-04-03 Thread Richard Biener
On April 3, 2019 7:59:47 PM GMT+02:00, Martin Sebor  wrote:
>On 4/3/19 5:13 AM, Richard Biener wrote:
>> On Tue, Apr 2, 2019 at 6:20 PM Martin Sebor  wrote:
>>>
>>> GCC tries to align a vector on its natural boundary, i.e., that
>>> given by its size, up to MAX_OBJECT_ALIGNMENT.  Vectors that are
>>> bigger than that are either silently [mis]aligned on that same
>>> maximum boundary (PR 89798), silently truncated (and misaligned),
>>> or cause an ICE (PR 89797).  Compiling the following:
>>>
>>> __attribute__ ((vector_size (N))) char v;
>>>
>>> _Static_assert (sizeof (v) == N, "size");
>>> _Static_assert (__alignof__ (v) == N, "alignment");
>>>
>>> with N set to 1LLU << I shows these failures:
>>>
>>> I < 29   succeeds
>>> I < 31   fails alignment
>>> I < 32   ICE
>>> I >= 32  fails alignment and size
>>>
>>> Attribute aligned doesn't seem to have any effect on types or
>>> variables declared with attribute vector_size.  The alignment
>>> set by the latter prevails.
>>>
>>> This happens no matter what scope the vector is defined in (i.e.,
>>> file or local).
>>>
>>> I have some questions:
>>>
>>> 1) Is there some reason to align vectors on the same boundary
>>>  as their size no matter how big it is?  I can't find such
>>>  a requirement in the ABIs I looked at.  Or would it be more
>>>  appropriate to align the big ones on the preferred boundary
>>>  for the target?  For instance, does it make more sense to
>>>  align a 64KB vector on a 64KB boundary than on, say,
>>>  a 64-byte boundary (or some other boundary less than 64K?)
>> 
>> I don't think there's a good reason.  Instead I think that
>> BIGGEST_ALIGNMENT is what we should go for as upper limit,
>> anything bigger doesn't make sense (unless the user explicitely
>> requests it).
>
>Sounds good.  Changing the alignment will impact object layout.

Do we really apply the alignment there? How do other compilers lay out here? 

>How do you suggest to deal with it? (Presumably for GCC 10.)
>Issuing an ABI warning and adding an option to override
>the new setting come to mind as possible mitigating solutions.

We could reject these vector types in aggregates in favor of arrays. Of course 
that ship has sailed... 

>> 
>>> 2) If not, is it then appropriate to underalign very large
>>>  vectors on a boundary less than their size?
>> 
>> Yes.
>
>Ack.
>
>> 
>>> 3) Should the aligned attribute not override the default vector
>>>  alignment?
>> 
>> Yes, but doesn't it already?
>
>Not if both are specified on the same declaration, as in:
>
>   typedef __attribute__ ((aligned (1024), vector_size (256))) int V;
>
>I opened PR 89950 for this.
>
>Martin



Re: Putting an all-zero variable into BSS

2019-04-05 Thread Richard Biener
On Thu, Apr 4, 2019 at 9:53 PM Thomas Koenig  wrote:
>
> Hi Andreas,
>
> >> Well, nothing is going to write to it (this is not accessible by
> >> user code), so that should not be a problem.
> > Then don't make it read-only.
>
> I tried this, and while it solves the executable size problem, it
> causes an OpenMP regression (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84487#c22 ),
> so now I am out of ideas.
>
> Oh well, I would have liked fixing this before 9.0, but it
> seems that this may not be possible.

Putting readonly data into .rodata isn't required by the C standard I think
so we could freely choose .bss for data exceeding a reasonable
size limit.  IIRC GCC behaved one or another way in the past already
and the last change might be due to security concerns.  Btw, large
all-zeros constant objects don't make very much sense...

Richard.


Re: GSOC Proposal

2019-04-05 Thread Richard Biener
On Wed, 3 Apr 2019, nick wrote:

> 
> 
> On 2019-04-03 7:30 a.m., Richard Biener wrote:
> > On Mon, 1 Apr 2019, nick wrote:
> > 
> >>
> >>
> >> On 2019-04-01 9:47 a.m., Richard Biener wrote:
> >>> On Mon, 1 Apr 2019, nick wrote:
> >>>
> >>>> Well I'm talking about the shared roots of this garbage collector core 
> >>>> state 
> >>>> data structure or just struct ggc_root_tab.
> >>>>
> >>>> But also this seems that this to be no longer shared globally if I'm not 
> >>>> mistaken 
> >>>> or this:
> >>>> static vec extra_root_vec;
> >>>>
> >>>> Not sure after reading the code which is a bigger deal through so I wrote
> >>>> my proposal not just asking which is a better issue for not being thread
> >>>> safe. Sorry about that.
> >>>>
> >>>> As for the second question injection seems to not be the issue or outside
> >>>> callers but just internal so phase 3 or step 3 would now be:
> >>>> Find internal callers or users of x where x is one of the above rather
> >>>> than injecting outside callers. Which answers my second question about
> >>>> external callers being a issue still.
> >>>>
> >>>> Let me know which  of the two is a better issue:
> >>>> 1. struct ggc_root_tabs being shared
> >>>> 2.static vec extra_root_vec; as a shared heap or
> >>>> vector of root nodes for each type of allocation
> >>>>
> >>>> and I will gladly rewrite my proposal sections for that
> >>>> as needs to be reedited.
> >>>
> >>> I don't think working on the garbage collector as a separate
> >>> GSoC project is useful at this point.  Doing locking around
> >>> allocation seems like a good short-term solution and if that
> >>> turns out to be a performance issue for the threaded part
> >>> using per-thread freelists is likely an easy to deploy
> >>> solution.
> >>>
> >>> Richard.
> >>>
> >> I agree but we were discussing this:
> >> Or maybe a project to be more
> >> explicit about regions of the code that assume that the garbage-
> >> collector can't run within them?[3] (since the GC is state that would
> >> be shared by the threads).
> > 
> > The process of collecting garbage is not the only issue (and that
> > very issue is easiest mitigated by collecting only at specific
> > points - which is what we do - and have those be serializing points).
> > The main issue is the underlying memory allocator (GCC uses memory
> > that is garbage collected plus regular heap memory).
> > 
> >> In addition I moved my paper back to our discussion about garbage collector
> >> state with outside callers.Seems we really need to do something about
> >> my wording as the idea of my project in a nutshell was to figure
> >> out how to mark shared state by callers and inject it into the
> >> garbage collector letting it known that the state was not shared between
> >> threads or shared. Seems that was on the GSoc page and in our discussions 
> >> the issue
> >> is marking outside code for shared state. If that's correct then my
> >> wording of outside callers is incorrect it should have been shared
> >> state between threads on outside callers to the garbage collector.
> >> If the state is that in your wording above then great as I understand
> >> where we are going and will gladly change my wording.
> > 
> > I'm still not sure what you are shooting at, the above sentences do
> > not make any sense to me.
> > 
> >> Also freelists don't work here as the state is shared at the caller's 
> >> end which would need two major issues:
> >> 1. Locking on nodes of the 
> >> freelists when two threads allocate at the same thing which can be a 
> >> problem if the shared state is shared a lot
> >> 2. Locking allocation with 
> >> large numbers of callers can starve threads
> > 
> > First of all allocating memory from the GC pool is not the main
> > work of GIMPLE passes so simply serializing at allocation time might
> > work out.  Second free lists of course do work.  What you'd do is
> > have a fast path in allocation using a thread-local "free list"
> > which you can allocate from without taking any lock.  Maybe I should
> 

Re: Putting an all-zero variable into BSS

2019-04-07 Thread Richard Biener
On April 6, 2019 3:59:41 PM GMT+02:00, Thomas Koenig  
wrote:
>Am 05.04.19 um 12:15 schrieb Richard Biener:
>
>> Putting readonly data into .rodata isn't required by the C standard I
>think
>> so we could freely choose .bss for data exceeding a reasonable
>> size limit.
>
>That would be the best solution, I think.
>
>> IIRC GCC behaved one or another way in the past already
>> and the last change might be due to security concerns.
>
>I cannot speak to that. If there is concern for C, we could also
>limit this to Fortran.
>
>> Btw, large
>> all-zeros constant objects don't make very much sense...
>
>I am well aware of this, but we're not going to change this
>before the GCC 9 release :-)
>
>So, would it be possible for you to make the change wrt .bss?
>I would not have the first idea where to start looking.

I don't know without looking, but I'd start at assemble_variable in varasm.c.

Richard. 

>Regards
>
>   Thomas



Re: GSOC Proposal

2019-04-07 Thread Richard Biener
On April 5, 2019 6:11:15 PM GMT+02:00, nick  wrote:
>
>
>On 2019-04-05 6:25 a.m., Richard Biener wrote:
>> On Wed, 3 Apr 2019, nick wrote:
>> 
>>>
>>>
>>> On 2019-04-03 7:30 a.m., Richard Biener wrote:
>>>> On Mon, 1 Apr 2019, nick wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote:
>>>>>> On Mon, 1 Apr 2019, nick wrote:
>>>>>>
>>>>>>> Well I'm talking about the shared roots of this garbage
>collector core state 
>>>>>>> data structure or just struct ggc_root_tab.
>>>>>>>
>>>>>>> But also this seems that this to be no longer shared globally if
>I'm not mistaken 
>>>>>>> or this:
>>>>>>> static vec extra_root_vec;
>>>>>>>
>>>>>>> Not sure after reading the code which is a bigger deal through
>so I wrote
>>>>>>> my proposal not just asking which is a better issue for not
>being thread
>>>>>>> safe. Sorry about that.
>>>>>>>
>>>>>>> As for the second question injection seems to not be the issue
>or outside
>>>>>>> callers but just internal so phase 3 or step 3 would now be:
>>>>>>> Find internal callers or users of x where x is one of the above
>rather
>>>>>>> than injecting outside callers. Which answers my second question
>about
>>>>>>> external callers being a issue still.
>>>>>>>
>>>>>>> Let me know which  of the two is a better issue:
>>>>>>> 1. struct ggc_root_tabs being shared
>>>>>>> 2.static vec extra_root_vec; as a shared
>heap or
>>>>>>> vector of root nodes for each type of allocation
>>>>>>>
>>>>>>> and I will gladly rewrite my proposal sections for that
>>>>>>> as needs to be reedited.
>>>>>>
>>>>>> I don't think working on the garbage collector as a separate
>>>>>> GSoC project is useful at this point.  Doing locking around
>>>>>> allocation seems like a good short-term solution and if that
>>>>>> turns out to be a performance issue for the threaded part
>>>>>> using per-thread freelists is likely an easy to deploy
>>>>>> solution.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>> I agree but we were discussing this:
>>>>> Or maybe a project to be more
>>>>> explicit about regions of the code that assume that the garbage-
>>>>> collector can't run within them?[3] (since the GC is state that
>would
>>>>> be shared by the threads).
>>>>
>>>> The process of collecting garbage is not the only issue (and that
>>>> very issue is easiest mitigated by collecting only at specific
>>>> points - which is what we do - and have those be serializing
>points).
>>>> The main issue is the underlying memory allocator (GCC uses memory
>>>> that is garbage collected plus regular heap memory).
>>>>
>>>>> In addition I moved my paper back to our discussion about garbage
>collector
>>>>> state with outside callers.Seems we really need to do something
>about
>>>>> my wording as the idea of my project in a nutshell was to figure
>>>>> out how to mark shared state by callers and inject it into the
>>>>> garbage collector letting it known that the state was not shared
>between
>>>>> threads or shared. Seems that was on the GSoc page and in our
>discussions the issue
>>>>> is marking outside code for shared state. If that's correct then
>my
>>>>> wording of outside callers is incorrect it should have been shared
>>>>> state between threads on outside callers to the garbage collector.
>>>>> If the state is that in your wording above then great as I
>understand
>>>>> where we are going and will gladly change my wording.
>>>>
>>>> I'm still not sure what you are shooting at, the above sentences do
>>>> not make any sense to me.
>>>>
>>>>> Also freelists don't work here as the state is shared at the
>caller's 
>>>>> end which would need two major issues:
>>>>> 1. Locking on nodes of the 
>

Re: GSOC Proposal

2019-04-08 Thread Richard Biener
On Sun, 7 Apr 2019, nick wrote:

> 
> 
> On 2019-04-07 5:31 a.m., Richard Biener wrote:
> > On April 5, 2019 6:11:15 PM GMT+02:00, nick  wrote:
> >>
> >>
> >> On 2019-04-05 6:25 a.m., Richard Biener wrote:
> >>> On Wed, 3 Apr 2019, nick wrote:
> >>>
> >>>>
> >>>>
> >>>> On 2019-04-03 7:30 a.m., Richard Biener wrote:
> >>>>> On Mon, 1 Apr 2019, nick wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote:
> >>>>>>> On Mon, 1 Apr 2019, nick wrote:
> >>>>>>>
> >>>>>>>> Well I'm talking about the shared roots of this garbage
> >> collector core state 
> >>>>>>>> data structure or just struct ggc_root_tab.
> >>>>>>>>
> >>>>>>>> But also this seems that this to be no longer shared globally if
> >> I'm not mistaken 
> >>>>>>>> or this:
> >>>>>>>> static vec extra_root_vec;
> >>>>>>>>
> >>>>>>>> Not sure after reading the code which is a bigger deal through
> >> so I wrote
> >>>>>>>> my proposal not just asking which is a better issue for not
> >> being thread
> >>>>>>>> safe. Sorry about that.
> >>>>>>>>
> >>>>>>>> As for the second question injection seems to not be the issue
> >> or outside
> >>>>>>>> callers but just internal so phase 3 or step 3 would now be:
> >>>>>>>> Find internal callers or users of x where x is one of the above
> >> rather
> >>>>>>>> than injecting outside callers. Which answers my second question
> >> about
> >>>>>>>> external callers being a issue still.
> >>>>>>>>
> >>>>>>>> Let me know which  of the two is a better issue:
> >>>>>>>> 1. struct ggc_root_tabs being shared
> >>>>>>>> 2.static vec extra_root_vec; as a shared
> >> heap or
> >>>>>>>> vector of root nodes for each type of allocation
> >>>>>>>>
> >>>>>>>> and I will gladly rewrite my proposal sections for that
> >>>>>>>> as needs to be reedited.
> >>>>>>>
> >>>>>>> I don't think working on the garbage collector as a separate
> >>>>>>> GSoC project is useful at this point.  Doing locking around
> >>>>>>> allocation seems like a good short-term solution and if that
> >>>>>>> turns out to be a performance issue for the threaded part
> >>>>>>> using per-thread freelists is likely an easy to deploy
> >>>>>>> solution.
> >>>>>>>
> >>>>>>> Richard.
> >>>>>>>
> >>>>>> I agree but we were discussing this:
> >>>>>> Or maybe a project to be more
> >>>>>> explicit about regions of the code that assume that the garbage-
> >>>>>> collector can't run within them?[3] (since the GC is state that
> >> would
> >>>>>> be shared by the threads).
> >>>>>
> >>>>> The process of collecting garbage is not the only issue (and that
> >>>>> very issue is easiest mitigated by collecting only at specific
> >>>>> points - which is what we do - and have those be serializing
> >> points).
> >>>>> The main issue is the underlying memory allocator (GCC uses memory
> >>>>> that is garbage collected plus regular heap memory).
> >>>>>
> >>>>>> In addition I moved my paper back to our discussion about garbage
> >> collector
> >>>>>> state with outside callers.Seems we really need to do something
> >> about
> >>>>>> my wording as the idea of my project in a nutshell was to figure
> >>>>>> out how to mark shared state by callers and inject it into the
> >>>>>> garbage collector letting it known that the state was not shared
> >> between
> >>>>>> threads or shared. Seems that was on the GSoc page and in our
> >> dis

Re: non-volatile automatic variables in setjmp tests

2019-04-08 Thread Richard Biener
On Fri, Apr 5, 2019 at 6:25 PM Michael Matz  wrote:
>
> Hello,
>
> On Fri, 5 Apr 2019, Jozef Lawrynowicz wrote:
>
> > Some setjmp/longjmp tests[1] depend on the value of an auto set before 
> > setjmp
> > to to be retained after returning from the longjmp. As I understand, this
> > behaviour is actually undefined, according to the gccint manual.
> >
> > Section 3 "Interfacing to GCC Output" of gccint says:
> >   If you use longjmp, beware of automatic variables. ISO C says that 
> > automatic
> >   variables that are not declared volatile have undefined values after a
> >   longjmp. And this is all GCC promises to do, because it is very difficult 
> > to
> >   restore register variables correctly, and one of GCC’s features is that it
> >   can put variables in registers without your asking it to.
>
> That is very old text, from 1997, and doesn't reflect what GCC actually
> does, which is ...
>
> > However, ISO C says [2]:
> >   ... values of objects of automatic storage duration are unspecified if 
> > they
> >   meet all the following conditions:
> >   * They are local to the function containing the corresponding setjmp()
> > invocation.
> >   * They do not have volatile-qualified type.
> >   * They are changed between the setjmp() invocation and longjmp() call.
>
> ... supporting this (and in any case if there's a direct conflict between
> the GCC docu and a relevant standard, then the former is likely the wrong
> one).  The are two modi: (a) the target has a setjmp/longjmp combination
> which restores all callee-saved registers (to the values at just before
> the setjmp call) and (b) the target doesn't save all these.
>
> For (a) GCC treats the setjmp call as a normal function call from a
> register-clobber perspective, so any auto variable live over the
> setjmp that is stored in a callee-saved register is restored to the
> pre-setjmp value.
>
> For (b) GCC makes sure to not allocate variables live over setjmp to
> registers.  This is the conservative assumption.
>
> Most targets in GCC follow the (b) model, even though the specific
> setjmp/longjmp mechanism would allow for (a).
>
> > gcc.dg/torture/stackalign/setjmp-1.c and
> > gcc.c-torture/execute/built-in-setjmp.c actually fail at execution for
> > msp430-elf @ -O1, because the auto being tested after the longjmp has not 
> > been
> > restored correctly.
>
> So the testcases should indeed work without volatile and you need to
> investigate why the restoring doesn't happen correctly.

There's one known "hole" in that the abnormal edges created to support all
of the above on GIMPLE are thrown away at RTL expansion time and
expected to be re-created "properly" but that doesn't actually happen.

The proper fix is to not throw them away but carry them over (with the
additional complication that you have to deal with that
find-many-sub-basic-block
case in some way).

Not sure if in this case we run into an RTL optimization that breaks things
(PRE / scheduling / invariant motion are candidates).

> > These tests feature a contrived way of making the auto appear used up to the
> > longjmp, so I wonder if they have been written deliberately without a
> > volatile, or if it was just an oversight.
> >
> > For msp430-elf @ -O1, this code to make the auto appear used is
> > ineffective as an optimization replaces the variable with a constant,
> > which is what triggers the test failure.
>
> Well, then it should be the correct constant, and hence be unaffected by
> setjmp/longjmp, so if that still doesn't work the constant is wrong,
> right?
>
> > Are these tests that rely on the value of a non-volatile auto after a
> > longjmp invalid? I can patch them to make the auto variable a global
> > instead.
>
> No, they should work as is.
>
>
> Ciao,
> Michael.


Re: is re-running bootstrap after a change safe?

2019-04-08 Thread Richard Biener
On Sat, Apr 6, 2019 at 1:09 AM Martin Sebor  wrote:
>
> On 4/5/19 4:02 PM, Jeff Law wrote:
> > On 4/5/19 3:37 PM, Martin Sebor wrote:
> >> On 4/5/19 3:29 PM, Jeff Law wrote:
> >>> On 4/5/19 2:50 PM, Eric Botcazou wrote:
> > Say if the first bootstrap succeeds and I then change a single
> > GCC .c file and rerun make bootstrap, am I guaranteed to see
> > the same fallout of the change as I would if I did a pristine
> > build in a clean directory?
> 
>  No, this would imply deleting the stage2 and stage3 compilers and
>  that isn't
>  what happens.  Instead the compiler of each stage is updated in
>  isolation.
> 
> >>> RIght.  Thus I always blow away stage2-* stage3-*, and stage1 target
> >>> directories along with the "compare" stamp file.
> >>
> >> Thanks (all of you).  It's amazing that I have been getting away
> >> with it for all these years.
> > I got away without removing the "compare" stamp file for a long time,
> > then broke the trunk with a comparison failure :(
> >
> >>
> >> Why is this not done automatically?  I mean, what is the use case
> >> for make bootstrap without doing these steps first?
> > During development folks often want to rebuild without going through a
> > full bootstrap.  Obviously for testing the final version of a patch the
> > "quick" approach of just rebuilding without blowing away the stage
> > directories isn't sufficient.
>
> I see.  So after a bootstrap and a subsequent change to a .c file,
> at each stage the next bootstrap recompiles just the changed file
> and relinks gcc.  It doesn't actually recompile all source files
> in stage 2 or 3 with the changed compiler from the last stage.
> That's why it's so much faster!  Make check then correctly reflects
> the change but the compiler doesn't get as fully exercised because
> the test suite has low coverage.
>
> > This is actually one of the things I'd really like to just automate.
> > You point to a git commit in a public repo, the tester picks it up and
> > does a bootstrap & regression test from scratch on whatever targets you
> > ask for.
>
> That would be great to validate the final patch.
>
> So to be clear: the safe and also most efficient to "rebootstrap"
> GCC is to remove what exactly?

Just remove everything?  Toplevel configure and stage1 build is
fast anyway.

That's what I'm doing since forever.

Richard.

>  (I don't see any stage2 or stage3
> directories in my build tree.)  Is there a make target for this?
>
> Thanks
> Martin


Re: GSOC

2019-04-08 Thread Richard Biener
On Sun, Apr 7, 2019 at 11:10 AM ashwina kumar  wrote:
>
> Hi ,
>
> While working I just figured out that -Wconversion is buggy. Please see the
> below code- -
>
> $ cat b.c
> #include 
>
> void main (void)
> {
> //contains build errors
> uint16_t x = 1;
> uint16_t y = 2;
> y += x;  /* contains error */
>
> }
>
> $ gcc b.c -Wconversion
> b.c: In function ‘main’:
> b.c:22:7: warning: conversion to ‘uint16_t {aka short unsigned int}’
> from ‘int’ may alter its value [-Wconversion]
>   y += x;  /* contains error */

The warning is correct unless you factor in that x == 1 and y == 2.

> Please help me to know as an GSOC student can I work on this for this year
> to make -Wconversion more robust.
>
> Thanks & Regards,
> Ashwina
>
> --
> Ashwina Kumar
> BIT Mesra


Re: Putting an all-zero variable into BSS

2019-04-08 Thread Richard Biener
On Mon, Apr 8, 2019 at 10:38 AM Andrew Haley  wrote:
>
> On 4/7/19 5:03 PM, Thomas Koenig wrote:
> > Hi Richard,
> >
> >> I don't know without looking, but I'd start at assemble_variable in 
> >> varasm.c.
> >
> > Thanks.  I've done that, and this is what a patch could look like.
> > However, I will not have time to formally submit this until next
> > weekend.
> >
> > In the meantime, comments are still welcome :-)
>
> Did you look at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83100
>
> This was the change that caused this behaviour.

Actually that just changed the behavior for DECL_COMMONs
which may of course match the fortran case here in case you
bisected this.  OTOH DECL_COMMONs are tentative and
do not have an initializer (and not go to .rodata either).

Richard.

>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. 
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: Putting an all-zero variable into BSS

2019-04-08 Thread Richard Biener
On Mon, Apr 8, 2019 at 11:33 AM Richard Biener
 wrote:
>
> On Mon, Apr 8, 2019 at 10:38 AM Andrew Haley  wrote:
> >
> > On 4/7/19 5:03 PM, Thomas Koenig wrote:
> > > Hi Richard,
> > >
> > >> I don't know without looking, but I'd start at assemble_variable in 
> > >> varasm.c.
> > >
> > > Thanks.  I've done that, and this is what a patch could look like.
> > > However, I will not have time to formally submit this until next
> > > weekend.
> > >
> > > In the meantime, comments are still welcome :-)
> >
> > Did you look at
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83100
> >
> > This was the change that caused this behaviour.
>
> Actually that just changed the behavior for DECL_COMMONs
> which may of course match the fortran case here in case you
> bisected this.  OTOH DECL_COMMONs are tentative and
> do not have an initializer (and not go to .rodata either).

That is, the C testcase

const char x[1024*1024] = {};

reproduces the "issue".  The comment in bss_initializer_p though
explicitely says

  /* Do not put non-common constants into the .bss section, they belong in
 a readonly section, except when NAMED is true.  */
  return ((!TREE_READONLY (decl) || DECL_COMMON (decl) || named)

(where named refers to explicit .bss section marked decls).  Note
the docs for -fzero-initialized-in-bss doesn't mention that this doesn't
apply for readonly variables.

Richard.

> Richard.
>
> >
> > --
> > Andrew Haley
> > Java Platform Lead Engineer
> > Red Hat UK Ltd. <https://www.redhat.com>
> > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: non-volatile automatic variables in setjmp tests

2019-04-08 Thread Richard Biener
On Mon, Apr 8, 2019 at 2:31 PM Michael Matz  wrote:
>
> Hi,
>
> On Mon, 8 Apr 2019, Richard Biener wrote:
>
> > Not sure if in this case we run into an RTL optimization that breaks things
> > (PRE / scheduling / invariant motion are candidates).
>
> That's true, what Josef sees might point to a genuine bug in the
> middle-end observed only on msp430; but we do want to make this situation
> work generally, as required by ISO C, not like how it's spelled in our
> manual.

Yes, and there's at least one existing bug, PR57067 for which it was
observed the scheduler genrates wrong-code.

Richard.

>
> Ciao,
> Michael.


Re: GSOC Proposal

2019-04-08 Thread Richard Biener
On Mon, 8 Apr 2019, nick wrote:

> 
> 
> On 2019-04-08 3:29 a.m., Richard Biener wrote:
> > On Sun, 7 Apr 2019, nick wrote:
> > 
> >>
> >>
> >> On 2019-04-07 5:31 a.m., Richard Biener wrote:
> >>> On April 5, 2019 6:11:15 PM GMT+02:00, nick  wrote:
> >>>>
> >>>>
> >>>> On 2019-04-05 6:25 a.m., Richard Biener wrote:
> >>>>> On Wed, 3 Apr 2019, nick wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2019-04-03 7:30 a.m., Richard Biener wrote:
> >>>>>>> On Mon, 1 Apr 2019, nick wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote:
> >>>>>>>>> On Mon, 1 Apr 2019, nick wrote:
> >>>>>>>>>
> >>>>>>>>>> Well I'm talking about the shared roots of this garbage
> >>>> collector core state 
> >>>>>>>>>> data structure or just struct ggc_root_tab.
> >>>>>>>>>>
> >>>>>>>>>> But also this seems that this to be no longer shared globally if
> >>>> I'm not mistaken 
> >>>>>>>>>> or this:
> >>>>>>>>>> static vec extra_root_vec;
> >>>>>>>>>>
> >>>>>>>>>> Not sure after reading the code which is a bigger deal through
> >>>> so I wrote
> >>>>>>>>>> my proposal not just asking which is a better issue for not
> >>>> being thread
> >>>>>>>>>> safe. Sorry about that.
> >>>>>>>>>>
> >>>>>>>>>> As for the second question injection seems to not be the issue
> >>>> or outside
> >>>>>>>>>> callers but just internal so phase 3 or step 3 would now be:
> >>>>>>>>>> Find internal callers or users of x where x is one of the above
> >>>> rather
> >>>>>>>>>> than injecting outside callers. Which answers my second question
> >>>> about
> >>>>>>>>>> external callers being a issue still.
> >>>>>>>>>>
> >>>>>>>>>> Let me know which  of the two is a better issue:
> >>>>>>>>>> 1. struct ggc_root_tabs being shared
> >>>>>>>>>> 2.static vec extra_root_vec; as a shared
> >>>> heap or
> >>>>>>>>>> vector of root nodes for each type of allocation
> >>>>>>>>>>
> >>>>>>>>>> and I will gladly rewrite my proposal sections for that
> >>>>>>>>>> as needs to be reedited.
> >>>>>>>>>
> >>>>>>>>> I don't think working on the garbage collector as a separate
> >>>>>>>>> GSoC project is useful at this point.  Doing locking around
> >>>>>>>>> allocation seems like a good short-term solution and if that
> >>>>>>>>> turns out to be a performance issue for the threaded part
> >>>>>>>>> using per-thread freelists is likely an easy to deploy
> >>>>>>>>> solution.
> >>>>>>>>>
> >>>>>>>>> Richard.
> >>>>>>>>>
> >>>>>>>> I agree but we were discussing this:
> >>>>>>>> Or maybe a project to be more
> >>>>>>>> explicit about regions of the code that assume that the garbage-
> >>>>>>>> collector can't run within them?[3] (since the GC is state that
> >>>> would
> >>>>>>>> be shared by the threads).
> >>>>>>>
> >>>>>>> The process of collecting garbage is not the only issue (and that
> >>>>>>> very issue is easiest mitigated by collecting only at specific
> >>>>>>> points - which is what we do - and have those be serializing
> >>>> points).
> >>>>>>> The main issue is the underlying memory allocator (GCC uses memory
> >>>>>>> tha

Re: GCC 8 vs. GCC 9 speed and size comparison

2019-04-16 Thread Richard Biener
On Tue, Apr 16, 2019 at 10:53 AM Michael Matz  wrote:
>
> Hello Martin,
>
> On Tue, 16 Apr 2019, Martin Liška wrote:
>
> > Yes, except kdecore.cc I used in all cases .ii pre-processed files. I'm
> > going to start using kdecore.ii as well.
>
> If the kdecore.cc is the one from me it's also preprocessed and doesn't
> contain any #include directives, I just edited it somewhat to be
> compilable for different architecture.

Btw, the tramp3d sources on our testers _do_ contain #include directives.

Richard.

>
> Ciao,
> Michael.
>
> >
> > As Honza pointed out in the email that hasn't reached this mailing list
> > due to file size, there's a significant change in inline-unit-growth. The 
> > param
> > has changed from 20 to 40 for GCC 9. Using --param inline-unit-growth=20 
> > for all
> > benchmarks, I see green numbres for GCC 9!
> >
> > Martin
> >
> > >
> > >
> > > Ciao,
> > > Michael.
> > >
> >
> >


Re: GCC 8 vs. GCC 9 speed and size comparison

2019-04-16 Thread Richard Biener
On Tue, Apr 16, 2019 at 11:56 AM Richard Biener
 wrote:
>
> On Tue, Apr 16, 2019 at 10:53 AM Michael Matz  wrote:
> >
> > Hello Martin,
> >
> > On Tue, 16 Apr 2019, Martin Liška wrote:
> >
> > > Yes, except kdecore.cc I used in all cases .ii pre-processed files. I'm
> > > going to start using kdecore.ii as well.
> >
> > If the kdecore.cc is the one from me it's also preprocessed and doesn't
> > contain any #include directives, I just edited it somewhat to be
> > compilable for different architecture.
>
> Btw, the tramp3d sources on our testers _do_ contain #include directives.

So for the parser it's small differences that accumulate, for example
a lot more comptype calls via null_ptr_cst_p (via char_type_p) via the new
conversion_null_warnings which is called even without any warning option.

Possible speedup to null_ptr_cst_p is to avoid the expensive char_type_p
(called 5 times in GCC 9 vs. only 2000 times in GCC 8):

Index: gcc/cp/call.c
===
--- gcc/cp/call.c   (revision 270387)
+++ gcc/cp/call.c   (working copy)
@@ -541,11 +541,11 @@ null_ptr_cst_p (tree t)
   STRIP_ANY_LOCATION_WRAPPER (t);

   /* Core issue 903 says only literal 0 is a null pointer constant.  */
-  if (TREE_CODE (type) == INTEGER_TYPE
- && !char_type_p (type)
- && TREE_CODE (t) == INTEGER_CST
+  if (TREE_CODE (t) == INTEGER_CST
+ && !TREE_OVERFLOW (t)
+ && TREE_CODE (type) == INTEGER_TYPE
  && integer_zerop (t)
- && !TREE_OVERFLOW (t))
+ && !char_type_p (type))
return true;
 }
   else if (CP_INTEGRAL_TYPE_P (type))

brings down the number of char_type_p calls to ~5000.  Still null_ptr_cst_p
calls are 15 vs. 17000, caused by the conversion_null_warnings code
doing

  /* Handle zero as null pointer warnings for cases other
 than EQ_EXPR and NE_EXPR */
  else if (null_ptr_cst_p (expr) &&
   (TYPE_PTR_OR_PTRMEM_P (totype) || NULLPTR_TYPE_P (totype)))
{

similarly "easy" to short-cut most of them:

@@ -6882,8 +6882,8 @@ conversion_null_warnings (tree totype, t
 }
   /* Handle zero as null pointer warnings for cases other
  than EQ_EXPR and NE_EXPR */
-  else if (null_ptr_cst_p (expr) &&
-  (TYPE_PTR_OR_PTRMEM_P (totype) || NULLPTR_TYPE_P (totype)))
+  else if ((TYPE_PTR_OR_PTRMEM_P (totype) || NULLPTR_TYPE_P (totype))
+  && null_ptr_cst_p (expr))
 {
   location_t loc =
get_location_for_expr_unwinding_for_system_header (expr);
   maybe_warn_zero_as_null_pointer_constant (expr, loc);

brings them down to 25000.

All this looks like there's plenty of low-hanging micro-optimization possible in
the C++ frontend.

I'm going to test the above two hunks, the overall savings are of course
small (and possibly applicable to branches as well).

Richard.


> Richard.
>
> >
> > Ciao,
> > Michael.
> >
> > >
> > > As Honza pointed out in the email that hasn't reached this mailing list
> > > due to file size, there's a significant change in inline-unit-growth. The 
> > > param
> > > has changed from 20 to 40 for GCC 9. Using --param inline-unit-growth=20 
> > > for all
> > > benchmarks, I see green numbres for GCC 9!
> > >
> > > Martin
> > >
> > > >
> > > >
> > > > Ciao,
> > > > Michael.
> > > >
> > >
> > >


Re: GCC 8 vs. GCC 9 speed and size comparison

2019-04-16 Thread Richard Biener
On Tue, Apr 16, 2019 at 1:39 PM Jakub Jelinek  wrote:
>
> On Tue, Apr 16, 2019 at 01:25:38PM +0200, Richard Biener wrote:
> > So for the parser it's small differences that accumulate, for example
> > a lot more comptype calls via null_ptr_cst_p (via char_type_p) via the new
> > conversion_null_warnings which is called even without any warning option.
> >
> > Possible speedup to null_ptr_cst_p is to avoid the expensive char_type_p
> > (called 5 times in GCC 9 vs. only 2000 times in GCC 8):
>
> If we do this (looks like a good idea to me), perhaps we should do also
> following (first part just doing what you've done in yet another spot,
> moving the less expensive checks first, because null_node_p strips location
> wrappers etc.) and the second not to call conversion_null_warnings at all
> if we don't want to warn (though, admittedly while
> warn_zero_as_null_pointer_constant defaults to 0, warn_conversion_null
> defaults to 1).
>
> --- gcc/cp/call.c   2019-04-12 21:47:06.301924378 +0200
> +++ gcc/cp/call.c   2019-04-16 13:35:59.779977641 +0200
> @@ -6844,8 +6844,9 @@ static void
>  conversion_null_warnings (tree totype, tree expr, tree fn, int argnum)
>  {
>/* Issue warnings about peculiar, but valid, uses of NULL.  */
> -  if (null_node_p (expr) && TREE_CODE (totype) != BOOLEAN_TYPE
> -  && ARITHMETIC_TYPE_P (totype))
> +  if (TREE_CODE (totype) != BOOLEAN_TYPE
> +  && ARITHMETIC_TYPE_P (totype)
> +  && null_node_p (expr))
>  {
>location_t loc = get_location_for_expr_unwinding_for_system_header 
> (expr);
>if (fn)
> @@ -7059,7 +7060,9 @@ convert_like_real (conversion *convs, tr
>return cp_convert (totype, expr, complain);
>  }
>
> -  if (issue_conversion_warnings && (complain & tf_warning))
> +  if (issue_conversion_warnings
> +  && (complain & tf_warning)
> +  && (warn_conversion_null || warn_zero_as_null_pointer_constant))
>  conversion_null_warnings (totype, expr, fn, argnum);
>
>switch (convs->kind)

Yes, that looks good to me as well.

Btw, I noticed the C++ FE calls build_qualified_type a _lot_, in 99% picking
up an existing variant from the list and those list walks visit ~20 types
_on average_!  A simple LRU cache (just put the found variant first) manages
to improve compile-time to be even better than GCC 8 (~1% improvement).
It improves the number of types checked to ~2.5 (from those 20).  Also
-fsynax-only compile-time from 2.9s to 2.75s (consistently).

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 270387)
+++ gcc/tree.c  (working copy)
@@ -6459,9 +6459,22 @@ get_qualified_type (tree type, int type_
   /* Search the chain of variants to see if there is already one there just
  like the one we need to have.  If so, use that existing one.  We must
  preserve the TYPE_NAME, since there is code that depends on this.  */
-  for (t = TYPE_MAIN_VARIANT (type); t; t = TYPE_NEXT_VARIANT (t))
-if (check_qualified_type (t, type, type_quals))
-  return t;
+  for (tree *t = &TYPE_MAIN_VARIANT (type); *t; t = &TYPE_NEXT_VARIANT (*t))
+{
+  if (check_qualified_type (*t, type, type_quals))
+   {
+ tree mv = TYPE_MAIN_VARAINT (type);
+ tree x = *t;
+ if (x != mv)
+   {
+ /* LRU.  */
+ *t = TYPE_NEXT_VARIANT (*t);
+ TYPE_NEXT_VARIANT (x) = TYPE_NEXT_VARIANT (mv);
+ TYPE_NEXT_VARIANT (mv) = x;
+   }
+ return x;
+   }
+}

   return NULL_TREE;
 }

peeling the main-variant case above might make the code a little bit prettier.

Richard.

>
> Jakub


Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell  wrote:
>
> On Fri, 12 Apr 2019 at 15:51, Jeff Law  wrote:
> >
> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> > > Dear all,
> > >
> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> > > (the WG14 C memory object model study group) now
> > > have a detailed proposal for pointer provenance semantics, refining
> > > the "provenance not via integers (PNVI)" model presented there.
> > > This will be discussed at the ISO WG14 C standards committee at the
> > > end of April, and comments from the GCC community before then would
> > > be very welcome.   The proposal reconciles the needs of existing code
> > > and the behaviour of existing compilers as well as we can, but it doesn't
> > > exactly match any of the latter, so we'd especially like to know whether
> > > it would be feasible to implement - our hope is that it would only require
> > > minor changes.  It's presented in three documents:
> > >
> > > N2362  Moving to a provenance-aware memory model for C: proposal for C2x
> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> > > This introduces the proposal and gives the proposed change to the standard
> > > text, presented as change-highlighted pages of the standard
> > > (though one might want to read the N2363 examples before going into that).
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> > >
> > > N2363  C provenance semantics: examples.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin 
> > > Uecker.
> > > This explains the proposal and its design choices with discussion of a
> > > series of examples.
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> > >
> > > N2364  C provenance semantics: detailed semantics.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> > > This gives a detailed mathematical semantics for the proposal
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> > >
> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> > > executable version of the semantics, with a web interface that
> > > allows one to explore and visualise the behaviour of small test
> > > programs, stepping through and seeing the abstract-machine
> > > memory state including provenance information.   N2363 compares
> > > the results of this for the example programs with gcc, clang, and icc
> > > results, though the tests are really intended as tests of the semantics
> > > rather than compiler tests, so one has to interpret this with care.
> > THanks.  I just noticed this came up in EuroLLVM as well.Getting
> > some standards clarity in this space would be good.
> >
> > Richi is in the best position to cover for GCC, but I suspect he's
> > buried with gcc-9 issues as we approach the upcoming release.  Hopefully
> > he'll have time to review this once crunch time has past.  I think more
> > than anything sanity checking the proposal's requirements vs what can be
> > reasonably implmemented is most important at this stage.
>
> Indeed.  We talked with him at the GNU cauldron, without uncovering
> any serious problems, but more detailed review from an implementability
> point of view would be great.   For the UB mailing list we just made
> a brief plain-text summary of the proposal (leaving out all the examples
> and standards diff, and glossing over some details).  I'll paste that
> in below in case it's helpful.  The next WG14 meeting is the week of
> April 29; comments before then would be particularly useful if that's 
> possible.
>
> best,
> Peter
>
> C pointer values are typically represented at runtime as simple
> concrete numeric values, but mainstream compilers routinely exploit
> information about the "provenance" of pointers to reason that they
> cannot alias, and hence to justify optimisations.  This is
> long-standing practice, but exactly what it means (what programmers
> can rely on, and what provenance-based alias analysis is allowed to
> do), has never been nailed down.   That's what the proposal does.
>
>
> The basic idea is to associate a *provenance* with every pointer
> value, identifying the original storage instance (or allocation, in
> other words) that the pointer is derived from.  In more detail:
>
> - We take abstract-machine pointer values to be pairs (pi,a), adding a
>   provenance pi, either @i where i is a storage instance ID, or the
>   *empty* provenance, to their concrete address a.
>
> - On every storage instance creation (of objects with static, thread,
>   automatic, and allocated storage duration), the abstract machine
>   nondeterministically chooses a fresh storage instance ID i (unique
>   across the entire execution), and the resulting pointer value
>   carries that single storage instance ID as its provenance @i.
>
> - Provenance is preserved by pointer arithmetic that adds or subtracts
>   an integer to a pointer.
>
> - At any acc

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell  wrote:
>
> On 17/04/2019, Richard Biener  wrote:
> > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell 
> > wrote:
> >>
> >> On Fri, 12 Apr 2019 at 15:51, Jeff Law  wrote:
> >> >
> >> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> >> > > Dear all,
> >> > >
> >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> >> > > (the WG14 C memory object model study group) now
> >> > > have a detailed proposal for pointer provenance semantics, refining
> >> > > the "provenance not via integers (PNVI)" model presented there.
> >> > > This will be discussed at the ISO WG14 C standards committee at the
> >> > > end of April, and comments from the GCC community before then would
> >> > > be very welcome.   The proposal reconciles the needs of existing code
> >> > > and the behaviour of existing compilers as well as we can, but it
> >> > > doesn't
> >> > > exactly match any of the latter, so we'd especially like to know
> >> > > whether
> >> > > it would be feasible to implement - our hope is that it would only
> >> > > require
> >> > > minor changes.  It's presented in three documents:
> >> > >
> >> > > N2362  Moving to a provenance-aware memory model for C: proposal for
> >> > > C2x
> >> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> >> > > This introduces the proposal and gives the proposed change to the
> >> > > standard
> >> > > text, presented as change-highlighted pages of the standard
> >> > > (though one might want to read the N2363 examples before going into
> >> > > that).
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> >> > >
> >> > > N2363  C provenance semantics: examples.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt,
> >> > > Martin Uecker.
> >> > > This explains the proposal and its design choices with discussion of
> >> > > a
> >> > > series of examples.
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> >> > >
> >> > > N2364  C provenance semantics: detailed semantics.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> >> > > This gives a detailed mathematical semantics for the proposal
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> >> > >
> >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> >> > > executable version of the semantics, with a web interface that
> >> > > allows one to explore and visualise the behaviour of small test
> >> > > programs, stepping through and seeing the abstract-machine
> >> > > memory state including provenance information.   N2363 compares
> >> > > the results of this for the example programs with gcc, clang, and icc
> >> > > results, though the tests are really intended as tests of the
> >> > > semantics
> >> > > rather than compiler tests, so one has to interpret this with care.
> >> > THanks.  I just noticed this came up in EuroLLVM as well.Getting
> >> > some standards clarity in this space would be good.
> >> >
> >> > Richi is in the best position to cover for GCC, but I suspect he's
> >> > buried with gcc-9 issues as we approach the upcoming release.
> >> > Hopefully
> >> > he'll have time to review this once crunch time has past.  I think more
> >> > than anything sanity checking the proposal's requirements vs what can
> >> > be
> >> > reasonably implmemented is most important at this stage.
> >>
> >> Indeed.  We talked with him at the GNU cauldron, without uncovering
> >> any serious problems, but more detailed review from an implementability
> >> point of view would be great.   For the UB mailing list we just made
> >> a brief plain-text summary of the proposal (leaving out all the examples
> >> and standards diff, and glossing over some details).  I'll paste that
> >> in below in case it's helpful.  The next WG14 meeting is the week of
> >> April 29; comments b

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
 wrote:
>
>
> Hi Richard,
>
> Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell  
> > wrote:
> > >
> > > On 17/04/2019, Richard Biener  wrote:
> > > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell 
> > > > wrote:
>
> ...
> > > > So this is not what GCC implements which tracks provenance through
> > > > non-pointer types to a limited extent when only copying is taking place.
> > > >
> > > > Your proposal makes
> > > >
> > > >  int a, b;
> > > >  int *p = &a;
> > > >  int *q = &b;
> > > >  uintptr_t pi = (uintptr_t)p; //expose
> > > >  uintptr_t qi = (uintptr_t)q; //expose
> > > >  pi += 4;
> > > >  if (pi == qi)
> > > >*(int *)pi = 1;
> > > >
> > > > well-defined since (int *)pi now has the provenance of &b.
> > >
> > > Yes.  (Just to be clear: it's not that we think the above example is
> > > desirable in itself, but it's well-defined as a consequence of what
> > > we do to make other common idioms, eg pointer bit manipulation,
> > > well-defined.)
> > >
> > > > Note GCC, when tracking provenance of non-pointer type
> > > > adds like in
> > > >
> > > >   int *p = &a;
> > > >   uintptr_t pi = (uintptr_t)p;
> > > >   pi += 4;
> > > >
> > > > considers pi to have provenance "anything" (not sure if you
> > > > have something like that) since we add 4 which has provenance
> > > > "anything" to pi which has provenance &a.
> > >
> > > We don't at present have a provenance "anything", but if the gcc
> > > "anything" means that it's assumed that it might alias with anything,
> > > then it looks like gcc's implementing a sound approximation to
> > > the proposal here?
> >
> > GCC makes the code well-defined whereas the proposal would make
> > dereferencing a pointer based on pi invoke undefined behavior?
>
> No, if there is an exposed object where pi points to, it is
> defined behaviour.
>
> >  Since
> > your proposal is based on an abstract machine there isn't anything
> > like a pointer with multiple provenances (which "anything" is), just
> > pointers with no provenance (pointing outside of any object), right?
>
> This is correct. What the proposal does though is put a limit
> on where pointers obtained from integers are allowed to point
> to: They cannot point to non-exposed objects. I assume GCC
> "anything" provenances also cannot point to all possible
> objects.

Yes.  We exclude objects that do not have their address taken
though (so somewhat similar to your "exposed").

> > For points-to analysis we of course have to track all possible
> > provenances of a pointer (and if we know it doesn't point inside
> > any object we make it point to nothing).
>
> Yes, a compiler should track what it knows (it could also track
> if it knows that some pointers point to the same object, etc.)
> while the abstract machine knows everything there is to know.
>
> > Btw, GCC changed its behavior here to support optimizing matlab
> > generated C code which passes pointers to arrays across functions
> > by marshalling them in two float typed halves (yikes!).  GCC is able
> > to properly track provenance across the decomposition / recomposition
> > when doing points-to analysis ;)
>
> Impressive ;-)  I would have thought that such encoding
> happens at ABI boundaries, where you cannot track anyway.
> But this seems to occur inside compiled code?

It occurs when matlab generates C code for an expression.  They
seem to use floating-point for everything at their self-invented ABI
boundary so "obviously" that includes pointers.

> While we do not attach a provenance to integers
> in our proposal, it does not necessarily imply that a compiler
> is not allowed to track such information. It then depends on
> how it uses it.
>
> For example,
>
> int z;
> int x;
> uintptr_t pi = (uintptr_t)&x;
>
> // encode in two floats ;-)
>
> // pass floats around
>
> // decode
>
> int* p = (int*)pi;
>
> If the compiler can prove that the address is still
> the same, it can also reattach the original provenance
> under some conditions.
>
> But there is a caveat: It can only do this is it cannot
> also be  a one-after p

Re: C provenance semantics proposal

2019-04-17 Thread Richard Biener
On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
 wrote:
>
> Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> >  wrote:
>
> > >
> > > >  Since
> > > > your proposal is based on an abstract machine there isn't anything
> > > > like a pointer with multiple provenances (which "anything" is), just
> > > > pointers with no provenance (pointing outside of any object), right?
> > >
> > > This is correct. What the proposal does though is put a limit
> > > on where pointers obtained from integers are allowed to point
> > > to: They cannot point to non-exposed objects. I assume GCC
> > > "anything" provenances also cannot point to all possible
> > > objects.
> >
> > Yes.  We exclude objects that do not have their address taken
> > though (so somewhat similar to your "exposed").
>
> Also if the address never escapes?

Yes.

> Using address-taken as the criterion is one option we considered,
> but we felt this exposes too many objects, like automatic
> arrays or locally used malloced/alloced data etc.
>
> Using integer-casts as criterion means that all
> objects whose address is taken but where (a) it is not
> seen that the pointer is cast to an integer and
> where (b) the pointer never escapes can be assumed safe.

Yeah, since the abstract machine sees everything using whatever
seems fit is possible.

Richard.

> Best,
> Martin


Re: C provenance semantics proposal

2019-04-18 Thread Richard Biener
On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
 wrote:
>
> Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> >  wrote:
> > >
> > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > >  wrote:
> > > > >
> > > > > >  Since
> > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > >
> > > > > This is correct. What the proposal does though is put a limit
> > > > > on where pointers obtained from integers are allowed to point
> > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > "anything" provenances also cannot point to all possible
> > > > > objects.
> > > >
> > > > Yes.  We exclude objects that do not have their address taken
> > > > though (so somewhat similar to your "exposed").
> > >
> > > Also if the address never escapes?
> >
> > Yes.
>
> Then with respect to "expose" it seems GCC implements
> a superset which means it allows some behavior which
> is undefined according to the proposal. So all seems
> well with respect to this part.
>
>
> With respect to tracking provenance through integers
> some changes might be required.
>
> Let's consider this example:
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>int* p = (int*)pj; // can be one-after pointer of 'x'
>p[-1] = 1; // well defined?
> }
>
> If I understand correctly, a pointer obtained from
> pi + 4 would have a "anything" provenance (which is
> fine). But the pointer obtained from 'pj' would have the
> provenance of 'y' so the access to 'x' would not
> be allowed.

Correct.  This is the most difficult case for us to handle
exactly also because (also valid for the proposal?)

int x;
int y;
uintptr_t pi = (uintptr_t)&x;
uintptr_t pj = (uintptr_t)&y;

if (pi + 4 == pj) {

   int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
   p[-1] = 1; // well defined?
}

while well-handled by GCC in the written form (as you
say, pi + 4 yields "anything" provenance), GCC itself
may tranform it into the first variant by noticing
the conditional equivalence and substituting pj for
pi + 4.

> But according to the preferred version of
> our proposal, the pointer could also be used to
> access 'x' because it is also exposed.
>
> GCC could make pj have a "anything" provenance
> even though it is not modified. (This would break
> some optimization such as the one for Matlab.)
>
> Maybe one could also refine this optimization to check
> for additional conditions which rule out the case
> that there is another object the pointer could point
> to.

The only feasible solution would be to not track
provenance through non-pointers and make
conversions of non-pointers to pointers have
"anything" provenance.

The additional issue that appears here though
is that we cannot even turn (int *)(uintptr_t)p
into p anymore since with the conditional
substitution we can then still arrive at
effectively (&y)[-1] = 1 which is of course
undefined behavior.

That is, your proposal makes

 ((int *)(uintptr_t)&y)[-1] = 1

well-defined (if &y - 1 == &x) but keeps

  (&y)[-1] = 1

as undefined which strikes me as a little bit
inconsistent.  If that's true it's IMHO worth
a defect report and second consideration.

Richard.

> Best,
> Martin


Re: C provenance semantics proposal

2019-04-18 Thread Richard Biener
On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
 wrote:
>
> On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
>  wrote:
> >
> > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > >  wrote:
> > > >
> > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > > >  wrote:
> > > > > >
> > > > > > >  Since
> > > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > > like a pointer with multiple provenances (which "anything" is), 
> > > > > > > just
> > > > > > > pointers with no provenance (pointing outside of any object), 
> > > > > > > right?
> > > > > >
> > > > > > This is correct. What the proposal does though is put a limit
> > > > > > on where pointers obtained from integers are allowed to point
> > > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > > "anything" provenances also cannot point to all possible
> > > > > > objects.
> > > > >
> > > > > Yes.  We exclude objects that do not have their address taken
> > > > > though (so somewhat similar to your "exposed").
> > > >
> > > > Also if the address never escapes?
> > >
> > > Yes.
> >
> > Then with respect to "expose" it seems GCC implements
> > a superset which means it allows some behavior which
> > is undefined according to the proposal. So all seems
> > well with respect to this part.
> >
> >
> > With respect to tracking provenance through integers
> > some changes might be required.
> >
> > Let's consider this example:
> >
> > int x;
> > int y;
> > uintptr_t pi = (uintptr_t)&x;
> > uintptr_t pj = (uintptr_t)&y;
> >
> > if (pi + 4 == pj) {
> >
> >int* p = (int*)pj; // can be one-after pointer of 'x'
> >p[-1] = 1; // well defined?
> > }
> >
> > If I understand correctly, a pointer obtained from
> > pi + 4 would have a "anything" provenance (which is
> > fine). But the pointer obtained from 'pj' would have the
> > provenance of 'y' so the access to 'x' would not
> > be allowed.
>
> Correct.  This is the most difficult case for us to handle
> exactly also because (also valid for the proposal?)
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
>p[-1] = 1; // well defined?
> }
>
> while well-handled by GCC in the written form (as you
> say, pi + 4 yields "anything" provenance), GCC itself
> may tranform it into the first variant by noticing
> the conditional equivalence and substituting pj for
> pi + 4.
>
> > But according to the preferred version of
> > our proposal, the pointer could also be used to
> > access 'x' because it is also exposed.
> >
> > GCC could make pj have a "anything" provenance
> > even though it is not modified. (This would break
> > some optimization such as the one for Matlab.)
> >
> > Maybe one could also refine this optimization to check
> > for additional conditions which rule out the case
> > that there is another object the pointer could point
> > to.
>
> The only feasible solution would be to not track
> provenance through non-pointers and make
> conversions of non-pointers to pointers have
> "anything" provenance.
>
> The additional issue that appears here though
> is that we cannot even turn (int *)(uintptr_t)p
> into p anymore since with the conditional
> substitution we can then still arrive at
> effectively (&y)[-1] = 1 which is of course
> undefined behavior.
>
> That is, your proposal makes
>
>  ((int *)(uintptr_t)&y)[-1] = 1
>
> well-defined (if &y - 1 == &x) but keeps
>
>   (&y)[-1] = 1
>
> as undefined which strikes me as a little bit
> inconsistent.  If that's true it's IMHO worth
> a defect report and second consideration.

Similarly that

int x;
int y;
uintptr_t pj = (uintptr_t)&y;

if (&x + 1 == &y) {

   int* p = (int*)pj; // can be one-after pointer of 'x'
   p[-1] = 1; // well defined?
}

is undefined but when I add a no-op

 (uintptr_t)&x;

it is well-defined is undesirable.  Can this no-op
stmt appear in another function?  Or even in
another translation unit (if x and y are global variables)?
And does such stmt have to be present (in another
TU) to make the example valid in this case?

To me all this makes requiring exposal through a cast
to a non-pointer (or accessing its representation) not
in any way more "useful" for an optimizing compiler than
modeling exposal through address-taking.

Richard.

> Richard.
>
> > Best,
> > Martin


Re: C provenance semantics proposal

2019-04-18 Thread Richard Biener
On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin
 wrote:
>
> Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener:
> > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> >  wrote:
> > >
> > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> > >  wrote:
> > > >
> > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > > >  wrote:
>
> 
> > > > Let's consider this example:
> > > >
> > > > int x;
> > > > int y;
> > > > uintptr_t pi = (uintptr_t)&x;
> > > > uintptr_t pj = (uintptr_t)&y;
> > > >
> > > > if (pi + 4 == pj) {
> > > >
> > > >int* p = (int*)pj; // can be one-after pointer of 'x'
> > > >p[-1] = 1; // well defined?
> > > > }
> > > >
> > > > If I understand correctly, a pointer obtained from
> > > > pi + 4 would have a "anything" provenance (which is
> > > > fine). But the pointer obtained from 'pj' would have the
> > > > provenance of 'y' so the access to 'x' would not
> > > > be allowed.
> > >
> > > Correct.  This is the most difficult case for us to handle
> > > exactly also because (also valid for the proposal?)
> > >
> > > int x;
> > > int y;
> > > uintptr_t pi = (uintptr_t)&x;
> > > uintptr_t pj = (uintptr_t)&y;
> > >
> > > if (pi + 4 == pj) {
> > >
> > >int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
> > >p[-1] = 1; // well defined?
> > > }
> > >
> > > while well-handled by GCC in the written form (as you
> > > say, pi + 4 yields "anything" provenance), GCC itself
> > > may tranform it into the first variant by noticing
> > > the conditional equivalence and substituting pj for
> > > pi + 4.
>
> Integers are just integers in the proposal, so conditional
> equivalence is not a problem for them. In my opinion this
> is a strength of the proposal. Tracking provenance for
> integers would mean that all computations would be affected
> by such subtle semantics issues (where you can not even
> replace an integer by an equivalent one). In this
> proposal this is limited to pointers where it at least
> makes some sense.
>
> > > > But according to the preferred version of
> > > > our proposal, the pointer could also be used to
> > > > access 'x' because it is also exposed.
> > > >
> > > > GCC could make pj have a "anything" provenance
> > > > even though it is not modified. (This would break
> > > > some optimization such as the one for Matlab.)
> > > >
> > > > Maybe one could also refine this optimization to check
> > > > for additional conditions which rule out the case
> > > > that there is another object the pointer could point
> > > > to.
> > >
> > > The only feasible solution would be to not track
> > > provenance through non-pointers and make
> > > conversions of non-pointers to pointers have
> > > "anything" provenance.
>
> This would be one solution, yes. But you could
> reattach the same provenance if you know that the
> pointer points in the middle of an object (so is
> not a first or one-after pointer) or if you know
> that there is no exposed object directly adjacent
> to this object, etc..
>
> > > The additional issue that appears here though
> > > is that we cannot even turn (int *)(uintptr_t)p
> > > into p anymore since with the conditional
> > > substitution we can then still arrive at
> > > effectively (&y)[-1] = 1 which is of course
> > > undefined behavior.
> > >
> > > That is, your proposal makes
> > >
> > >  ((int *)(uintptr_t)&y)[-1] = 1
> > >
> > > well-defined (if &y - 1 == &x) but keeps
> > >
> > >   (&y)[-1] = 1
> > >
> > > as undefined which strikes me as a little bit
> > > inconsistent.  If that's true it's IMHO worth
> > > a defect report and second consideration.
>
> This is true. But I would not call it inconsistent.
> It is just unusual if you expect that casts to integers
> and back are no-ops.  In this proposal a round-trip has
> the effect of stripping the original provenan

Re: C provenance semantics proposal

2019-04-18 Thread Richard Biener
On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin
 wrote:
>
> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> > On Thu, 18 Apr 2019 at 10:32, Richard Biener  
> > wrote:
>
>
> > An equality test of two pointers, on the other hand, doesn't necessarily
> > mean that they are interchangeable.  I don't see any good way to
> > avoid that in a provenance semantics, where a one-past
> > pointer might sometimes compare equal to a pointer to an
> > adjacent object but be illegal for accessing it.
>
> As I see it, there are essentially four options:
>
> 1.) Compilers do not use conditional equivalences for
> optimizations of pointers (or only when additional
> conditions apply which make it safe)
>
> 2.) We make pointer comparison between a pointer
> and a one-after pointer of a different object
> undefined behaviour.

Yes please!  OTOH GCC transforms
(uintptr_t)&a != (uintptr_t)(&b+1)
into &a != &b + 1 (for equality compares) and then
doesn't follow this C rule anyways.

> 3.) We make comparison have the side effect that
> afterwards any of the two pointers could have any
> of the two provenances. (with disambiguitation
> similar to what we have for casts).
>
> 4.) Compilers make sure that exposed objects never
> are allocated next to each other (as Jens proposed).

5.) While the standard guarantees that (int *)(uintptr_t)p == p
it does not guarantee that (uintptr_t)&a and (uintptr_t)&b
have a specific relation to each other.  To me this means
that (uintptr_t)(&b + 1) - (uintptr_t)&b is not necessarily
equal to sizeof(b).  (of course it's a QOI issue if that doesn't
hold)

> None of these options is great.

Indeed.  But you are now writing down one specific variant
(which isn't great either).  Sometimes no written down variant
is better than a not so great one, even if there isn't any obviously
greater one.

That said, GCCs implementation of the proposal might be
to require -fno-tree-pta to follow it.  And even that might not
fully rescue us because of that (int *)(uintptr_t) stripping...

At least I see no way to make use of the "exposed"ness
and thus we have to assume every variable is exposed.
Of course similar if the address-taken variant would be
written down in the standard given the standard applies
to the source form and not some intermediate (optimized)
compiler language.

Richard.

>
>
> Best,
> Martin


Re: C provenance semantics proposal

2019-04-24 Thread Richard Biener
On Thu, Apr 18, 2019 at 3:29 PM Jeff Law  wrote:
>
> On 4/18/19 6:50 AM, Jakub Jelinek wrote:
> > On Thu, Apr 18, 2019 at 02:47:18PM +0200, Jakub Jelinek wrote:
> >> On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote:
> >>>> 1.) Compilers do not use conditional equivalences for
> >>>> optimizations of pointers (or only when additional
> >>>> conditions apply which make it safe)
> >>>>
> >>>> 2.) We make pointer comparison between a pointer
> >>>> and a one-after pointer of a different object
> >>>> undefined behaviour.
> >>>
> >>> Yes please!  OTOH GCC transforms
> >>> (uintptr_t)&a != (uintptr_t)(&b+1)
> >>> into &a != &b + 1 (for equality compares) and then
> >>
> >> I think we don't.  It was http://gcc.gnu.org/PR88775, but we haven't 
> >> applied
> >> those changes, because we don't consider the point to start of one object
> >> vs. pointer to end of another one case in pointer comparisons (but do
> >> consider it in integral comparisons).
> >
> > That said, in RTL we really don't differentiate between pointers and
> > integers and we'll need to do something about that one day.
> I'd be happy to get things sorted out up to the RTL transition,
> particularly the cases involving equivalences.  Distinguishing between
> pointer and same sized integers in RTL will be difficult.

But we run into this with very simple testcases so we do have to fix it.
There's no point trying to "enhance" the GIMPLE side when RTL makes
it break so easily.

Richard.

> jeff


Re: C provenance semantics proposal

2019-04-24 Thread Richard Biener
On Thu, Apr 18, 2019 at 3:42 PM Jeff Law  wrote:
>
> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >> On Thu, 18 Apr 2019 at 10:32, Richard Biener  
> >> wrote:
> >
> >
> >> An equality test of two pointers, on the other hand, doesn't necessarily
> >> mean that they are interchangeable.  I don't see any good way to
> >> avoid that in a provenance semantics, where a one-past
> >> pointer might sometimes compare equal to a pointer to an
> >> adjacent object but be illegal for accessing it.
> >
> > As I see it, there are essentially four options:
> >
> > 1.) Compilers do not use conditional equivalences for
> > optimizations of pointers (or only when additional
> > conditions apply which make it safe)
> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
> VRP as well, maybe PTA.  It seems simple enough though :-)

Also touches fundamental PHI-OPT transforms like

 if (a == b)
...

 # c = PHI 

where we'd lose eliding such a conditional.  IMHO that's bad
and very undesirable.

> >
> > 2.) We make pointer comparison between a pointer
> > and a one-after pointer of a different object
> > undefined behaviour.
> I generally like this as well, though I suspect it probably makes a lot
> of currently well defined code undefined.
>
> >
> > 3.) We make comparison have the side effect that
> > afterwards any of the two pointers could have any
> > of the two provenances. (with disambiguitation
> > similar to what we have for casts).
> This could have some interesting effects on PTA.  Richi?

I played with this and doing this in an incomplete way like
just handling

  if (a == b)

as two-way assignment during constraint building is possible.
But that's not enough of course since every call is implicitely
producing equivalences between everything [escaped] ...
which makes points-to degrade to a point where it is useless.

So I think we need a working scheme where points-to doesn't
degrade from equivalencies being computed and the compiler
being free to introduce equivalences as well as copy-propagate
those.

Honestly I can't come up with a working solution to this
problem.

>
> >
> > 4.) Compilers make sure that exposed objects never
> > are allocated next to each other (as Jens proposed).
> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> ultimately have no control over layout of data in static storage.

Make everything 1 byte larger.

> jeff


Re: C provenance semantics proposal

2019-04-24 Thread Richard Biener
On Fri, Apr 19, 2019 at 11:09 AM Jens Gustedt  wrote:
>
> Hello Jakub,
>
> On Fri, 19 Apr 2019 10:49:08 +0200 Jakub Jelinek 
> wrote:
>
> > On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
> > > > OTOH GCC transforms
> > > > (uintptr_t)&a != (uintptr_t)(&b+1)
> > > > into &a != &b + 1 (for equality compares) and then
> > > > doesn't follow this C rule anyways.
> > >
> > > Actually our proposal we are discussing here goes exactly the other
> > > way around. It basically reduces
> > >
> > >   &a != &b + 1
> > >
> > > to
> > >
> > >   (uintptr_t)&a != (uintptr_t)(&b+1)
> > >
> > > with only an exception for null pointers, but which probably don't
> > > matter for a platform where null pointers are just all bits 0.
> >
> > That penalizes quite a few optimizations though.
> > If you have
> > ptr != ptr2
> > and points-to analysis finds a set of variables ptr as well as ptr2
> > points to and the sets would be disjoint, it would be nice to be able
> > to optimize that comparison away
>
> yes
>
> > (gcc does);
>
> great
>
> > similarly, if one of the
> > pointers is &object or &object + sizeof (object).
>
> Here I don't follow. Why would one waste brain and ressources to
> optimize code that does such tricks?
>
> > By requiring what you request above, it can be pretty much never
> > optimized, unless the points-to analysis is able to also record if
> > the pointer points to the start, middle or end of object and only if
> > it is known to be in the middle it can safely optimize, for start or
> > end it would need to prove the other pointer is to end or start and
> > only non-zero sized objects are involved.
>
> I have the impression that you just propose an inversion of the
> roles. What you require is the user to keep track of this kind of
> information, and to know when they do (or should not) compare a
> one-passed pointer to something with a different provenance.
>
> I just don't feel that it is adequate to impose such a detailed
> knowledge on users, which is basically about a marginal use
> case. One-off pointers don't occur "naturally" in many places,

They occur in the single important place - loop IV tests in
C++ style iterator != end where end is a "pointer" to one after
the last valid iterator value.

I'd
> guess. Using them for anything else than to test bounds for array
> traversal is insane, and there "usually" the test is with `<`, anyhow,
> which has different rules.

Unfortunately then C++ arrived and compilers were expected to
also optimize that nasty code.

Richard.

>
> Jens
>
> --
> :: INRIA Nancy Grand Est ::: Camus ::: ICube/ICPS :::
> :: ::: office Strasbourg : +33 368854536   ::
> :: :: gsm France : +33 651400183   ::
> :: ::: gsm international : +49 15737185122 ::
> :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::


Re: C provenance semantics proposal

2019-04-25 Thread Richard Biener
On Wed, Apr 24, 2019 at 8:41 PM Jeff Law  wrote:
>
> On 4/24/19 4:19 AM, Richard Biener wrote:
> > On Thu, Apr 18, 2019 at 3:42 PM Jeff Law  wrote:
> >>
> >> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> >>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener 
> >>>>  wrote:
> >>>
> >>>
> >>>> An equality test of two pointers, on the other hand, doesn't necessarily
> >>>> mean that they are interchangeable.  I don't see any good way to
> >>>> avoid that in a provenance semantics, where a one-past
> >>>> pointer might sometimes compare equal to a pointer to an
> >>>> adjacent object but be illegal for accessing it.
> >>>
> >>> As I see it, there are essentially four options:
> >>>
> >>> 1.) Compilers do not use conditional equivalences for
> >>> optimizations of pointers (or only when additional
> >>> conditions apply which make it safe)
> >> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
> >> VRP as well, maybe PTA.  It seems simple enough though :-)
> >
> > Also touches fundamental PHI-OPT transforms like
> >
> >  if (a == b)
> > ...
> >
> >  # c = PHI 
> >
> > where we'd lose eliding such a conditional.  IMHO that's bad
> > and very undesirable.
> But if we only suppress this optimization for pointers is it that terrible?

I've at least seen a lot of cases with c = PHI  for null pointer
checks.  It's just we're going to chase a lot of cases down even
knowing RTL will fuck up later big times.

>
>
> >>>
> >>> 3.) We make comparison have the side effect that
> >>> afterwards any of the two pointers could have any
> >>> of the two provenances. (with disambiguitation
> >>> similar to what we have for casts).
> >> This could have some interesting effects on PTA.  Richi?
> >
> > I played with this and doing this in an incomplete way like
> > just handling
> >
> >   if (a == b)
> >
> > as two-way assignment during constraint building is possible.
> > But that's not enough of course since every call is implicitely
> > producing equivalences between everything [escaped] ...
> > which makes points-to degrade to a point where it is useless.
> But the calls aren't generating conditional equivalences.  I must be
> missing something here.

if (compare_a_and_b (a, b))
  ...

yes, they are not creating conditional equivalences that can be
propagated out (w/o IPA info).  But we compute points-to
early, then inline (exposing the propagation opportunity),
preserving the points-to result.

>  You're the expert in this space, so if you say
> it totally degrades PTA, then it's a non-starter.

Well, it's possible to fix all testcases that get thrown to us but
what I have difficulties with is designing a way to follow the
proposed standard.

Btw, I've tried the trivial points-to patch for conditionals only
and even that regressed points-to testcases.

> >
> > So I think we need a working scheme where points-to doesn't
> > degrade from equivalencies being computed and the compiler
> > being free to introduce equivalences as well as copy-propagate
> > those.
> >
> > Honestly I can't come up with a working solution to this
> > problem.
> >
> >>
> >>>
> >>> 4.) Compilers make sure that exposed objects never
> >>> are allocated next to each other (as Jens proposed).
> >> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> >> ultimately have no control over layout of data in static storage.
> >
> > Make everything 1 byte larger.
> Not a bad idea.  I suspect the embedded folks would go bananas though.

Maybe, but those folks are also using -fno-strict-aliasing ...

Anyhow, my issue is that I don't see a clean design that would follow
the proposed standard wording (even our current desired implementation
behavior btw!) and not degrade simple testcases :/

Richard.

> jeff
>


Re: C provenance semantics proposal

2019-04-25 Thread Richard Biener
On Wed, Apr 24, 2019 at 11:18 PM Peter Sewell  wrote:
>
> On 24/04/2019, Jeff Law  wrote:
> > On 4/24/19 4:19 AM, Richard Biener wrote:
> >> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law  wrote:
> >>>
> >>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> >>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
> >>>>>  wrote:
> >>>>
> >>>>
> >>>>> An equality test of two pointers, on the other hand, doesn't
> >>>>> necessarily
> >>>>> mean that they are interchangeable.  I don't see any good way to
> >>>>> avoid that in a provenance semantics, where a one-past
> >>>>> pointer might sometimes compare equal to a pointer to an
> >>>>> adjacent object but be illegal for accessing it.
> >>>>
> >>>> As I see it, there are essentially four options:
> >>>>
> >>>> 1.) Compilers do not use conditional equivalences for
> >>>> optimizations of pointers (or only when additional
> >>>> conditions apply which make it safe)
> >>> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
> >>> VRP as well, maybe PTA.  It seems simple enough though :-)
> >>
> >> Also touches fundamental PHI-OPT transforms like
> >>
> >>  if (a == b)
> >> ...
> >>
> >>  # c = PHI 
> >>
> >> where we'd lose eliding such a conditional.  IMHO that's bad
> >> and very undesirable.
> > But if we only suppress this optimization for pointers is it that terrible?
>
> As far as I can see right now, there isn't a serious alternative.
> Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
> be true (either in a semantics for the source-language == that just
> compares the concrete representations or in one that's allowed
> but not required to be provenance-sensitive).   It's not possible
> to simultaneously have *p UB (which AIUI the compiler has to
> have in the intermediate language, to make alias analysis sound),
> *q not UB, and p interchangeable with q.Am I missing something?

No, you are not missing anything.  We do have this issue right now,
independent of standard wordings.  But the standard has that, too,
not allowing *(&x + 1), allowing the compare and allowing *&y.
Isn't that a defect as well?

Richard.

> Peter
>
>
> >
> >>>>
> >>>> 3.) We make comparison have the side effect that
> >>>> afterwards any of the two pointers could have any
> >>>> of the two provenances. (with disambiguitation
> >>>> similar to what we have for casts).
> >>> This could have some interesting effects on PTA.  Richi?
> >>
> >> I played with this and doing this in an incomplete way like
> >> just handling
> >>
> >>   if (a == b)
> >>
> >> as two-way assignment during constraint building is possible.
> >> But that's not enough of course since every call is implicitely
> >> producing equivalences between everything [escaped] ...
> >> which makes points-to degrade to a point where it is useless.
> > But the calls aren't generating conditional equivalences.  I must be
> > missing something here.  You're the expert in this space, so if you say
> > it totally degrades PTA, then it's a non-starter.
> >
> >>
> >> So I think we need a working scheme where points-to doesn't
> >> degrade from equivalencies being computed and the compiler
> >> being free to introduce equivalences as well as copy-propagate
> >> those.
> >>
> >> Honestly I can't come up with a working solution to this
> >> problem.
> >>
> >>>
> >>>>
> >>>> 4.) Compilers make sure that exposed objects never
> >>>> are allocated next to each other (as Jens proposed).
> >>> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> >>> ultimately have no control over layout of data in static storage.
> >>
> >> Make everything 1 byte larger.
> > Not a bad idea.  I suspect the embedded folks would go bananas though.
> >
> > jeff
> >
> >


Re: C provenance semantics proposal

2019-04-25 Thread Richard Biener
On Thu, Apr 25, 2019 at 3:03 PM Peter Sewell  wrote:
>
> On 25/04/2019, Richard Biener  wrote:
> > On Wed, Apr 24, 2019 at 11:18 PM Peter Sewell 
> > wrote:
> >>
> >> On 24/04/2019, Jeff Law  wrote:
> >> > On 4/24/19 4:19 AM, Richard Biener wrote:
> >> >> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law  wrote:
> >> >>>
> >> >>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> >> >>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >> >>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
> >> >>>>>  wrote:
> >> >>>>
> >> >>>>
> >> >>>>> An equality test of two pointers, on the other hand, doesn't
> >> >>>>> necessarily
> >> >>>>> mean that they are interchangeable.  I don't see any good way to
> >> >>>>> avoid that in a provenance semantics, where a one-past
> >> >>>>> pointer might sometimes compare equal to a pointer to an
> >> >>>>> adjacent object but be illegal for accessing it.
> >> >>>>
> >> >>>> As I see it, there are essentially four options:
> >> >>>>
> >> >>>> 1.) Compilers do not use conditional equivalences for
> >> >>>> optimizations of pointers (or only when additional
> >> >>>> conditions apply which make it safe)
> >> >>> I know this will hit DOM and CSE.  I wouldn't be surprised if it
> >> >>> touches
> >> >>> VRP as well, maybe PTA.  It seems simple enough though :-)
> >> >>
> >> >> Also touches fundamental PHI-OPT transforms like
> >> >>
> >> >>  if (a == b)
> >> >> ...
> >> >>
> >> >>  # c = PHI 
> >> >>
> >> >> where we'd lose eliding such a conditional.  IMHO that's bad
> >> >> and very undesirable.
> >> > But if we only suppress this optimization for pointers is it that
> >> > terrible?
> >>
> >> As far as I can see right now, there isn't a serious alternative.
> >> Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
> >> be true (either in a semantics for the source-language == that just
> >> compares the concrete representations or in one that's allowed
> >> but not required to be provenance-sensitive).   It's not possible
> >> to simultaneously have *p UB (which AIUI the compiler has to
> >> have in the intermediate language, to make alias analysis sound),
> >> *q not UB, and p interchangeable with q.Am I missing something?
> >
> > No, you are not missing anything.  We do have this issue right now,
> > independent of standard wordings.  But the standard has that, too,
> > not allowing *(&x + 1), allowing the compare and allowing *&y.
> > Isn't that a defect as well?
>
> In the source-language semantics, it's ok for p==q to not imply
> that p and q are interchangeable, and if compilers are doing
> provenance-based alias analysis (so address equality doesn't
> imply equally-readable/writable), it's pretty much inescapable.
>
> Hence why (without knowing much about the optimisations that
> actually go on) it's tempting to suggest that for pointer equality
> comparison one could just not infer that interchangeability. I'd be
> very interested to know the actual cost of that.

Since we at the moment track provenance through non-pointers
it means we cannot do this for non-pointer equivalences either.
So doing this means no longer tracking provenance through
non-pointers.

> (The standard does also have a defect in its definition of equality - on
> the one hand, it says that &x+1==&y comparison must be true
> if they are adjacent, but on the other (in DR260) that everything
> might be provenance-aware.   My preference would be to resolve
> that by requiring source-language == to not be provenance aware,
> but I think this is a more-or-less independent thing.)

I think it's related at least to us using provenance to optimize
pointer comparisons.

Richard.

> Peter


Re: __attribute__((early_branch))

2019-05-02 Thread Richard Biener
On Tue, Apr 30, 2019 at 9:53 PM Jeff Law  wrote:
>
> On 4/30/19 12:34 PM, cmdLP #CODE wrote:
> > Hello GCC-team,
> >
> > I use GCC for all my C and C++ programs. I know how to use GCC, but I am
> > not a contributor to GCC (yet). I often discover some problems C and C++
> > code have in general. There is often the choice between fast or readable
> > code. Some code I have seen performs well but looks ugly (goto, etc.);
> > other code is simple, but has some performance problems. What if we could
> > make the simple code perform well?
> >
> > There is a common problem with conditional branches inside loops. This can
> > decrease the performance of a program. To fix this issue, the conditional
> > branch should be moved outside of the loop. Sometimes this optimization is
> > done by the compiler, but guessing on optimizations done by the compiler is
> > really bad. Often it is not easy to transform the source code to have the
> > conditional branching outside the loop. Instead I propose a new attribute,
> > which forces the compiler to do a conditional branch (based on the
> > annotated parameter) at the beginning of a function. It branches to the
> > corresponding code of the function compiled with the value being constant.
> >
> > Here is a code example, which contains this issue.
> >
> > enum reduce_op
> > {
> > REDUCE_ADD,
> > REDUCE_MULT,
> > REDUCE_MIN,
> > REDUCE_MAX
> > };
> >
> > /* performance critical function */
> > void reduce_data(enum reduce_op reduce,
> >  unsigned const* data,
> >  unsigned data_size)
> > {
> > unsigned i, result, item;
> >
> > result = reduce == REDUCE_MULT ?  1u
> >: reduce == REDUCE_MIN  ? ~0u // ~0u is UINT_MAX
> >:  0u;
> >
> > for(i = 0; i < data_size; ++i)
> > {
> > item = data[i];
> >
> > switch(reduce)
> > {
> > case REDUCE_ADD:
> > result += item;
> > break;
> >
> > case REDUCE_MULT:
> > result *= item;
> > break;
> >
> > case REDUCE_MIN:
> > if(item < result) result = item;
> > // RIP: result  > break;
> >
> > case REDUCE_MAX:
> > if(item > result) result = item;
> > // RIP: result >?= item;
> > break;
> > }
> > }
> >
> > return result;
> > }
> >
> > The value of  reduce  does not change inside the function. For this
> > example, the optimization is trivial. But consider more complex examples.
> > The function should be optimized to:
> [  ]
> This is loop unswitching.  It's a standard GCC optimization.  If it's
> not working as well as it should, we're far better off determining why
> and fixing the automatic transformation rather than relying on
> attributes to drive the transformation.

It's currently not implemented for switch () stmts, just for conditionals.
This also hurts SPEC cactusADM.  There might be a missed-optimization
bug about this.  A simple recursive implementation might be possible;
unswitch one case at a time - maybe order by profile probability.  We
already recurse on the unswitched bodies (in case multiple conditions
can be unswitched)

>
> > What if the variable changes?
> >
> > When the variable changes there should be a jump to the corresponding
> > parallel code of the compiled code with new value.
> >
> > Unoptimized
> >
> > /* removes comments starting with # and ending in a newline */
> > void remove_comment(char* dst,
> > char const* src)
> > {
> > // initialization nessecary
> > int state __attribute__((early_branch(0, 1))) = 0;
> >
> > char c;
> >
> > while(*src)
> > {
> > c = *src++;
> >
> > switch(state)
> > {
> > case 0:
> > if(c == '#')
> > state = 1;
> > else
> > *dst++ = c;
> >
> > break;
> >
> > case 1:
> > if(c == '\n')
> > {
> > *dst++ = '\n';
> > state = 0;
> > }
> >
> > break;
> > }
> > }
> > *dst = '\0';
> > }
> >
> > changed to
> >
> > void remove_comment(char* dst,
> > char const* src)
> > {
> > char c;
> >
> > switch(0)
> > {
> > case 0:
> > while(*src)
> > {
> > c = *src++;
> > if(c == '#')
> > goto branch1;
> > else
> >

Re: __attribute__((early_branch))

2019-05-03 Thread Richard Biener
On Thu, May 2, 2019 at 6:16 PM Segher Boessenkool
 wrote:
>
> On Thu, May 02, 2019 at 02:17:51PM +0200, Richard Biener wrote:
> > On Tue, Apr 30, 2019 at 9:53 PM Jeff Law  wrote:
> > > This is loop unswitching.  It's a standard GCC optimization.  If it's
> > > not working as well as it should, we're far better off determining why
> > > and fixing the automatic transformation rather than relying on
> > > attributes to drive the transformation.
> >
> > It's currently not implemented for switch () stmts, just for conditionals.
> > This also hurts SPEC cactusADM.  There might be a missed-optimization
> > bug about this.  A simple recursive implementation might be possible;
> > unswitch one case at a time - maybe order by profile probability.  We
> > already recurse on the unswitched bodies (in case multiple conditions
> > can be unswitched)
>
> Well, if for some case value we can prove the controlling expression is
> constant in the loop, we can almost always prove it is constant without
> looking at the case value?  So we can pull the whole switch statement
> outside just as easily?

There isn't any infrastructure to "easily" do that (copy the loop N times,
wrap it in a switch stmt and put the N loop copies into the switch cases).
The infrastructure we have (loop versioning) manages to copy a loop once
and wrap the two copies with a conditional.  It might be also preferable
to only unswitch the most frequently executed case to avoid code size
explosion (IIRC the cactusADM case has 4 cases, only one is actually
executed).

Richard.

>
>
> Segher


Re: A bug in vrp_meet?

2019-05-06 Thread Richard Biener
On Sun, May 5, 2019 at 11:09 PM Eric Botcazou  wrote:
>
> > I have now applied this variant.
>
> You backported it onto the 8 branch on Friday:
>
> 2019-05-03  Richard Biener  
>
> Backport from mainline
> [...]
>     2019-03-07  Richard Biener  
>
> PR tree-optimization/89595
> * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> stmt iterator as reference, take boolean output parameter to
> indicate whether the stmt was removed and thus the iterator
> already advanced.
> (dom_opt_dom_walker::before_dom_children): Re-iterate over
> stmts created by folding.
>
> and this introduced a regression for the attached Ada testcase at -O:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0102173c in set_value_range (
> vr=0x17747a0  const*)::vr_const_varying>, t=VR_RANGE, min=0x76c3df78, max= out>, equiv=0x0)
> at /home/eric/svn/gcc-8-branch/gcc/tree-vrp.c:298
> 298   vr->type = t;
>
> on x86-64 at least.  Mainline and 9 branch are not affected.

It looks like backporting r269597 might fix it though reverting that on trunk
doesn't make the testcase fail there.

Richard.

> --
> Eric Botcazou


Re: GSOC

2019-05-07 Thread Richard Biener
On Mon, 6 May 2019, Giuliano Belinassi wrote:

> Hi,
> 
> On 03/29, Richard Biener wrote:
> > On Thu, 28 Mar 2019, Giuliano Belinassi wrote:
> > 
> > > Hi, Richard
> > > 
> > > On 03/28, Richard Biener wrote:
> > > > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On 03/26, Richard Biener wrote:
> > > > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > > > >
> > > > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > > > Greetings All,
> > > > > > > >
> > > > > > > > I would like to take up parallelize compilation using threads 
> > > > > > > > or make
> > > > > > > > c++/c
> > > > > > > > memory issues not automatically promote. I did ask about this 
> > > > > > > > before
> > > > > > > > but
> > > > > > > > not get a reply. When someone replies I'm just a little 
> > > > > > > > concerned as
> > > > > > > > my writing for proposals has never been great so if someone just
> > > > > > > > reviews
> > > > > > > > and doubt checks that's fine.
> > > > > > > >
> > > > > > > > As for the other things building gcc and running the testsuite 
> > > > > > > > is
> > > > > > > > fine. Plus
> > > > > > > > I already working on gcc so I've pretty aware of most things 
> > > > > > > > and this
> > > > > > > > would
> > > > > > > > be a great steeping stone into more serious gcc development 
> > > > > > > > work.
> > > > > > > >
> > > > > > > > If sample code is required that's in mainline gcc I sent out a 
> > > > > > > > trial
> > > > > > > > patch
> > > > > > > > for this issue: 
> > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Nick
> > > > > > >
> > > > > > > It's good to see that you've gotten as far as attaching a patch 
> > > > > > > to BZ
> > > > > > > [1]
> > > > > > >
> > > > > > > I think someone was going to attempt the "parallelize compilation 
> > > > > > > using
> > > > > > > threads" idea last year, but then pulled out before the summer; 
> > > > > > > you may
> > > > > > > want to check the archives (or was that you?)
> > > > > >
> > > > > > There's also Giuliano Belinassi who is interested in the same 
> > > > > > project
> > > > > > (CCed).
> > > > >
> > > > > Yes, I will apply for this project, and I will submit the final 
> > > > > version
> > > > > of my proposal by the end of the week.
> > > > >
> > > > > Currently, my target is the `expand_all_functions` routine, as most of
> > > > > the time is spent on it according to the experiments that I performed 
> > > > > as
> > > > > part of my Master's research on compiler parallelization.
> > > > > (-O2, --disable-checking)
> > > > 
> > > > Yes, more specifically I think the realistic target is the GIMPLE part
> > > > of   execute_pass_list (cfun, g->get_passes ()->all_passes);  done in
> > > > cgraph_node::expand.  If you look at passes.def you'll see all_passes
> > > > also contains RTL expansion (pass_expand) and the RTL optimization
> > > > queue (pass_rest_of_compilation).  The RTL part isn't a realistic 
> > > > target.
> > > > Without changing the pass hierarchy the obvious part that can be
> > > > handled would be the pass_all_optimizations pass sub-queue of
> > > > all_passes since those are all passes that perform transforms on the
> > > > GIMPLE IL where we have all functions in this state at the same time
> &

Re: GSOC

2019-05-13 Thread Richard Biener
On Sun, 12 May 2019, Giuliano Belinassi wrote:

> Hi, Richard
> 
> On 05/07, Richard Biener wrote:
> > On Mon, 6 May 2019, Giuliano Belinassi wrote:
> > 
> > > Hi,
> > > 
> > > On 03/29, Richard Biener wrote:
> > > > On Thu, 28 Mar 2019, Giuliano Belinassi wrote:
> > > > 
> > > > > Hi, Richard
> > > > > 
> > > > > On 03/28, Richard Biener wrote:
> > > > > > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 03/26, Richard Biener wrote:
> > > > > > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > > > > > >
> > > > > > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > > > > > Greetings All,
> > > > > > > > > >
> > > > > > > > > > I would like to take up parallelize compilation using 
> > > > > > > > > > threads or make
> > > > > > > > > > c++/c
> > > > > > > > > > memory issues not automatically promote. I did ask about 
> > > > > > > > > > this before
> > > > > > > > > > but
> > > > > > > > > > not get a reply. When someone replies I'm just a little 
> > > > > > > > > > concerned as
> > > > > > > > > > my writing for proposals has never been great so if someone 
> > > > > > > > > > just
> > > > > > > > > > reviews
> > > > > > > > > > and doubt checks that's fine.
> > > > > > > > > >
> > > > > > > > > > As for the other things building gcc and running the 
> > > > > > > > > > testsuite is
> > > > > > > > > > fine. Plus
> > > > > > > > > > I already working on gcc so I've pretty aware of most 
> > > > > > > > > > things and this
> > > > > > > > > > would
> > > > > > > > > > be a great steeping stone into more serious gcc development 
> > > > > > > > > > work.
> > > > > > > > > >
> > > > > > > > > > If sample code is required that's in mainline gcc I sent 
> > > > > > > > > > out a trial
> > > > > > > > > > patch
> > > > > > > > > > for this issue: 
> > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >
> > > > > > > > > > Nick
> > > > > > > > >
> > > > > > > > > It's good to see that you've gotten as far as attaching a 
> > > > > > > > > patch to BZ
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > > I think someone was going to attempt the "parallelize 
> > > > > > > > > compilation using
> > > > > > > > > threads" idea last year, but then pulled out before the 
> > > > > > > > > summer; you may
> > > > > > > > > want to check the archives (or was that you?)
> > > > > > > >
> > > > > > > > There's also Giuliano Belinassi who is interested in the same 
> > > > > > > > project
> > > > > > > > (CCed).
> > > > > > >
> > > > > > > Yes, I will apply for this project, and I will submit the final 
> > > > > > > version
> > > > > > > of my proposal by the end of the week.
> > > > > > >
> > > > > > > Currently, my target is the `expand_all_functions` routine, as 
> > > > > > > most of
> > > > > > > the time is spent on it according to the experiments that I 
> > > > > > > performed as
> > > > > > > part of my Master's research on

Re: Fixing inline expansion of overlapping memmove and non-overlapping memcpy

2019-05-15 Thread Richard Biener
On Tue, May 14, 2019 at 9:21 PM Aaron Sawdey  wrote:
>
> GCC does not currently do inline expansion of overlapping memmove, nor does it
> have an expansion pattern to allow for non-overlapping memcpy, so I plan to 
> add
> patterns and support to implement this in gcc 10 timeframe.
>
> At present memcpy and memmove are kind of entangled. Here's the current state 
> of
> play:
>
> memcpy -> expand with movmem pattern
> memmove (no overlap) -> transform to memcpy -> expand with movmem pattern
> memmove (overlap) -> remains memmove -> glibc call
>
> There are several problems currently. If the memmove() arguments are in fact
> overlapping, then the expansion is actually not used which makes no sense and
> costs performance of calling a library function instead of inline expanding
> memmove() of small blocks.
>
> There is currently no way to have a separate memcpy pattern. I know from
> experience with expansion of memcmp on power that lengths on the order of
> hundreds of bytes are needed before the function call overhead is overcome by
> optimized glibc code. But we need the memcpy guarantee of non-overlapping
> arguments to make that happen, as we don't want to do a runtime overlap test.
>
> There is some analysis that happens in gimple_fold_builtin_memory_op() that
> determines when memmove calls cannot have an overlap between the arguments and
> converts them into memcpy() which is nice.
>
> However in builtins.c expand_builtin_memmove() does not actually do the
> expansion using the memmove pattern. This is why a memmove() call that cannot 
> be
> converted to memcpy() by gimple_fold_builtin_memory_op() is not expanded and 
> we
> call glibc memmove(). Only expand_builtin_memcpy() actually uses the memmove
> pattern.
>
> So here's my proposed set of fixes:
>  * Add new optab entries for nonoverlapping_memcpy and overlapping_memmove
>cases.
>  * The movmem optab will continue to be treated exactly as it is today so
>that ports that might have a broken movmem pattern that doesn't actually
>handle the overlap cases will continue to work.
>  * expand_builtin_memmove() needs to actually do the memmove() expansion.
>  * expand_builtin_memcpy() needs to use cpymem. Currently this happens down in
>emit_block_move_via_movmem() so some functions might need to be renamed.
>  * ports can then add the new overlapping move and nonoverlapping copy 
> expanders
>and will get better expansion of both memmove and memcpy functions.
>
> I'd be interested in any comments about pieces of this machinery that need to
> work a certain way, or other related issues that should be addressed in
> between expand_builtin_memcpy() and emit_block_move_via_movmem().

I wonder if introducing a __builtin_memmove_with_hints specifying whether
src < dst or dst > src or unknown and/or a safe block size where that
doesn't matter
would help?  I can then be safely expanded to memmove() or to specific
inline code.

Richard.

> Thanks!
>Aaron
>
> --
> Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
>


Re: Determining maximum vector length supported by the CPU?

2019-05-22 Thread Richard Biener
On Wed, May 22, 2019 at 10:36 AM Martin Reinecke
 wrote:
>
> Hi Matthias!
>
> > I agree, we need more information from the compiler. Esp. whether the user
> > specified `-mprefer-avx128` or `-mprefer-vector-width=none/128/256/512`.
> > OTOH `-msve-vector-bits=N` is reported as __ARM_FEATURE_SVE_BITS. So that's
> > covered.
>
> Almost ... except that I'd need a platform-agnostic definition. The
> point is that the code does not care about the underlying hardware at
> all, only for the vector length supported by it.

And then you run into AVX + SSE vs. AVX2 + SSE cases where the (optimal) length
depends on the component type...

I wonder if we'd want to have a 'auto' length instead ;)

I suppose exposing a __BIGGEST_VECTOR__ might be possible (not for SVE
though?).

> > Related: PR83875 - because while we're adding things in that area, it'd be
> > nice if they worked with target clones as well.
>
> Yes, this is a problem I've come across as well in the past.
> (https://gcc.gnu.org/ml/gcc-help/2018-10/msg00118.html)
>
> > Are you aware of std::experimental::simd? It didn't make GCC 9.1, but you
> > can easily patch your (installed) libstdc++ using 
> > https://github.com/VcDevel/
> > std-simd.
>
> This looks extremely interesting! I have to look at it in more detail,
> but this might be the way to go in the future.
> However, the code I'm working on may be incorporated into numpy/scipy at
> some point, and the minimum required compilers for these packages are
> pretty old. I can't expect more than vanilla C++11 support there.
>
> Cheers,
>   Martin
>


<    5   6   7   8   9   10   11   12   13   14   >