Re: Linux and Windows generate different binaries

2017-07-12 Thread Klaus Kruse Pedersen (Klaus)
On Wed, 2017-07-12 at 08:57 -0600, Sandra Loosemore wrote:
> On 07/12/2017 05:07 AM, Klaus Kruse Pedersen (Klaus) wrote:
> > I have seen reproducible builds being discussed here, but what is
> > the
> > position on inter c-lib and OS reproducible builds?
> 
> I think we consider unstable sort problems bugs and have fixed them
> in 
> the past.  Bugzilla search turned up #28964 and I remember at least
> one 
> more recent instance of this as well (although not the details any
> more).


Yes, 28964 is similar to the issue I was hit by.

By extension, does that mean that all qsort compare/rank functions that
can return 0, should be considered buggy?

I went through a some of the 140'ish rank functions - and it does
indeed look like considerable effort went into returning only +1 and
-1.

A general pattern seem to be:

return da ? 1 : -1;

And comments like:

  /* If regs are equally good, sort by their numbers, so that the
 results of qsort leave nothing to chance.  */


But there are exceptions, all rank functions in

tree-ssa-loop-ivopts.c,
tree-ssa-loop-niter.c
tree-ssa-loop-im.c

can return 0.




> 
> -Sandra
> 

Loop reversal

2017-07-12 Thread Kugan Vivekanandarajah
I am looking into reversing loop to increased efficiency. There is
already a PR22041 for this and an old patch
https://gcc.gnu.org/ml/gcc-patches/2006-01/msg01851.html by Zdenek
which never made it to mainline.

For constant loop count, ivcanon pass is adding reverse iv but this
not selected by ivopt.

For example:

void copy (unsigned int N, double *a, double *c)
{
  for (int i = 0; i < 800; ++i)
  c[i] = a[i];
}

ivcanon pass Added canonical iv to loop 1, 799 iterations.
ivtmp_14 = ivtmp_15 – 1;

in ivopt, it selects candidates 10

Candidate 10:
Var befor: ivtmp.11
Var after: ivtmp.11
Incr POS: before exit test
IV struct:
Type: sizetype
Base: 0
Step: 8
Biv: N

If we look at the group :

Group 0:
Type: ADDRESS
Use 0.0:
At stmt: _5 = *_3;
At pos: *_3
IV struct:
Type: double *
Base: a_9(D)
Step: 8
Object: (void *) a_9(D)
Biv: N
Overflowness wrto loop niter: Overflow

Group 1:
Type: ADDRESS
Use 1.0:
At stmt: *_4 = _5;
At pos: *_4
IV struct:
Type: double *
Base: c_10(D)
Step: 8
Object: (void *) c_10(D)
Biv: N
Overflowness wrto loop niter: Overflow

Group 2:
Type: COMPARE
Use 2.0:
At stmt: if (ivtmp_14 != 0)
At pos: ivtmp_14
IV struct:
Type: unsigned int
Base: 799
Step: 4294967295
Biv: Y
Overflowness wrto loop niter: Overflow

ivopt cost model assumes that group0 and 1 will have infinite cost for
the iv added by ivcanon pass because of the lower precision with the
IV added by ivcanon pass.

If I change the example to:

void copy (unsigned int N, double *a, double *c)
{
 for (long i = 0; i < 800; ++i)
 c[i] = a[i];
}

It still has higher cost for group0 and 1 due to the negative step. I
think this can be improved. My question is:

1. For the case where the loop count is not constant, can we make
ivcanon to add reverse IV with the current implementation. Can ivopt
be taught to select the reverse iv ?

2. Or is the patch by Zdenek a better option. I am re-basing it for the trunk.

Thanks,
Kugan


Re: Linux and Windows generate different binaries

2017-07-12 Thread Ramana Radhakrishnan
On Wed, Jul 12, 2017 at 3:57 PM, Sandra Loosemore
 wrote:
> On 07/12/2017 05:07 AM, Klaus Kruse Pedersen (Klaus) wrote:
>>
>> I have seen reproducible builds being discussed here, but what is the
>> position on inter c-lib and OS reproducible builds?
>
>
> I think we consider unstable sort problems bugs and have fixed them in the
> past.  Bugzilla search turned up #28964 and I remember at least one more
> recent instance of this as well (although not the details any more).

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02444.html

>
> -Sandra
>


Re: Linux and Windows generate different binaries

2017-07-12 Thread Sandra Loosemore

On 07/12/2017 05:07 AM, Klaus Kruse Pedersen (Klaus) wrote:

I have seen reproducible builds being discussed here, but what is the
position on inter c-lib and OS reproducible builds?


I think we consider unstable sort problems bugs and have fixed them in 
the past.  Bugzilla search turned up #28964 and I remember at least one 
more recent instance of this as well (although not the details any more).


-Sandra



Re: Could preprocessor warn for unsafe macros and side-effects?

2017-07-12 Thread Martin Sebor

On 07/11/2017 11:50 PM, sa...@hederstierna.com wrote:

Hi

Reading about macro pitfalls and eg duplication side-effects
https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls

would it be possible to let the preprocessor generate warnings for any of these 
pitfalls?


The preprocessor has no knowledge of the language rules but it is
possible to detect some of these problems in the parser.  In fact,
a checker for one of the CERT problems was added not too long ago:
-Wmulti-statement-macros.  I would suggest to open an enhancement
request asking for features you would find particularly valuable.



Maybe all language specific parts are not know at this early preprocessing 
stage, but possibly some info could be stored for use in later pass?

I'm thinking of eg. for "function-like macros" with arguments, checking

-Wmacro-side-effects

* IF function-like macro expands/duplicates an argument more than once THEN
WARN if function() is part as the argument
WARN if unary ++ or -- is used on variable as part of argument
WARN if assignment operator = is part of argument
WARN if volatile variable part as the argument

-Wmacro-operator-precedence

* WARN if macro argument contains an expression with operator(s), an a _higher_ 
precedence operator is used within the macro on this argument, without 
parenthesis around

I'm not sure its even possible at preprocessing stage, but it would be nice to 
have,
I saw some static code analysis tools like Coverity detects these
https://www.securecoding.cert.org/confluence/display/c/PRE31-C.+Avoid+side+effects+in+arguments+to+unsafe+macros

Of course it might generate some false-positives so warning might not be 
enabled by default, maybe just -Wall or -Wextra,
but perhaps it hard to solve, and I'm not sure where and how to implement the 
checking algorithm.


Those sound like good ideas.  Some of them and the challenges
with implementing them were discussed in the context of the
-Wmulti-statement-macros enhancement:

  https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00064.html

Martin


Re: Getting spurious FAILS in testsuite?

2017-07-12 Thread Bernd Edlinger
On 07/11/17 22:28, Bernd Edlinger wrote:
> On 07/11/17 21:42, Andrew Pinski wrote:
>> On Tue, Jul 11, 2017 at 12:31 PM, Andrew Pinski  
>> wrote:
>>> On Tue, Jul 11, 2017 at 12:27 PM, Andrew Pinski  
>>> wrote:
 On Tue, Jul 11, 2017 at 12:15 PM, Bernd Edlinger
  wrote:
> Hi,
>
> I see this now as well on Ubuntu 16.04, but I doubt that the Kernel is
> to blame.

 I don't see these failures when I use a 4.11 kernel.  Only with a 
 4.4 kernel.
 Also the guality testsuite does not run at all with a 4.4 kernel, it
 does run when using a 4.11 kernel; I suspect this is the same symptom
 of the bug.
>>>
>>>
>>> 4.11 kernel results (with CentOS 7.4 glibc):
>>> https://gcc.gnu.org/ml/gcc-testresults/2017-07/msg00998.html
>>>
>>> 4.4 kernel results (with ubuntu 1604 glibc):
>>> https://gcc.gnu.org/ml/gcc-testresults/2017-07/msg01009.html
>>>
>>> Note the glibc does not matter here as I get a similar testsuite
>>> results as the CentOS 7.4 glibc using the Ubuntu 1604 glibc but with
>>> the 4.11 kernel.
>>>
>>> Also note I get a similar results with a plain 4.11 kernel compared to
>>> the above kernel.
>>
>>
>> Maybe commit 6a7e6f78c235975cc14d4e141fa088afffe7062c  or
>> 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa to the kernel fixes both
>> issues.
>>
> 
> As Georg-Johann points out here:
> https://gcc.gnu.org/ml/gcc/2017-07/msg00028.html
> updating the kernel alone does not fix the issue.
> 
> These patches do all circle around pty and tty devices,
> but in the strace file I see a pipe(2) command
> creating the fileno 10 and the sequence of events is
> as follows:
> 
> pipe([7, 10]) = 0   (not in my previous e-mail)
> clone() = 16141
> clone() = 16142
> write(4, "spawn: returns {0}\r\n", 20)  = 20
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16141,
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16142,
> read(10,...,4096) = 4096
> read(10,...,4096) = 2144
> read(10, "", 4096) = 0
> close(10) = 0
> wait4(16141, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16141
> wait4(16142, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16142
> 
> All data arrive at the expect process, but apparently too quickly.
> 
> 
> Thanks
> Bernd.



Oh, I think the true reason is this patch:
Author: Upstream
Description: Report and fix both by Nils Carlson.
  Replaced a cc==0 check with proper Tcl_Eof() check.
Last-Modified: Fri, 23 Oct 2015 11:09:07 +0300
Bug: http://sourceforge.net/p/expect/patches/18/
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799301

--- a/expect.c
+++ b/expect.c
@@ -1860,19 +1860,11 @@
  if (cc == EXP_DATA_NEW) {
/* try to read it */
cc = expIRead(interp,esPtr,timeout,tcl_set_flags);
-   
-   /* the meaning of 0 from i_read means eof.  Muck with it a */
-   /* little, so that from now on it means "no new data arrived */
-   /* but it should be looked at again anyway". */
-   if (cc == 0) {
+
+   if (Tcl_Eof(esPtr->channel)) {
cc = EXP_EOF;
-   } else if (cc > 0) {
-   /* successfully read data */
-   } else {
-   /* failed to read data - some sort of error was encountered such as
-* an interrupt with that forced an error return
-*/
}
+
  } else if (cc == EXP_DATA_OLD) {
cc = 0;
  } else if (cc == EXP_RECONFIGURE) {


The correct way to fix this would be something like this:

 if (cc == 0 && Tcl_Eof(esPtr->channel)) {
 cc = EXP_EOF;
 }

What happens for me is cc > 0 AND Tcl_Eof at the same time.
That makes dejagnu ignore the last few lines, because it does
not expect EOF and data at the same time.

Apparently tcl starts to read ahead because the default match
buffer size is unreasonably small like 2000 bytes.

I can work around it by increasing the buffer size like this:

ndex: gcc/testsuite/lib/gcc-dg.exp
===
--- gcc/testsuite/lib/gcc-dg.exp(revision 250150)
+++ gcc/testsuite/lib/gcc-dg.exp(working copy)
@@ -43,6 +43,9 @@
setenv LANG C.ASCII
  }

+# Avoid sporadic data-losses with expect
+match_max -d 1
+
  # Ensure GCC_COLORS is unset, for the rare testcases that verify
  # how output is colorized.
  if [info exists ::env(GCC_COLORS) ] {


What do you think?
Can debian/ubuntu people reproduce and fix this?
Should we increase the match buffer size on the gcc side?



Thanks
Bernd.


[patch] RFC: Hook for insn costs?

2017-07-12 Thread Georg-Johann Lay

Hi,

the current cost computations in rtlanal.c and maybe other places
suffer from the fact that they are hiding parts of the expressions
from the back-end, like SET_DESTs of single_set or the anatomy of
PARALELLs.

Would it be in order to have a hook like the one attached?

I am aware of that, in an ideal world, there wouldn't be more
than one hook to get rtx costs.  But well...

Whilst rtx_costs does in the majority of cases, there are cases
where hiding information leads to performance degradation,
for example when insn combine cooks up a set zero_extract.
combine.c does actually the right thing as it uses insn_rtx_costs,
but insn_rtx_cost is already a lie because it only uses SET_SRC
and digs into PARALELL without exposing the whole story.

The patch just uses a new targetm.insn_cost hook.  If the
back-end doesn't come up with something useful, the classic
functions with rtx_costs for sub-rtxes are called.

The purpose is to allow a friendly transition and not no
raise any performance degradations which would be very likely
if we just called rtx_costs with outer_code = INSN.

If a back-end finds it useful to implement this hook and need
the whole story, they can do so.  Otherwise, or if it is too
lazy to analyse a specific rtx, they can switch to the old
infrastructure.

Returning a magic value for "unknown" is just an implementation
detail; it could just as well be some bool* that would be set
to true or false depending on whether or not the computation
returned something useful or not.

The patch only touches seq_cost insn_rtx_cost of rtlanal.c.

Would something like this be in order, or is a new hook just
a complete no-go?

Index: coretypes.h
===
--- coretypes.h	(revision 250090)
+++ coretypes.h	(working copy)
@@ -253,6 +253,10 @@ construct that is easy to avoid even whe
   WARN_STRICT_OVERFLOW_MAGNITUDE = 5
 };
 
+/* A magic value to be returned by TARGET_INSN_COSTS to indicate that
+   the costs are not known or too complicated to work out there.  */
+#define INSN_COSTS_UNKNOWN (-1234567)
+
 /* The type of an alias set.  Code currently assumes that variables of
this type can take the values 0 (the alias set which aliases
everything) and -1 (sometimes indicating that the alias set is
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 250123)
+++ doc/tm.texi	(working copy)
@@ -6564,6 +6564,39 @@ optimizers should use optab @code{rintdf
 The default hook returns true for all inputs.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_INSN_COSTS (const rtx_insn *@var{insn}, rtx @var{pattern}, bool @var{speed})
+This target hook describes the relative costs of an insn.
+
+@var{insn} might be NULL or an @code{INSN_P}.
+@var{pattern} is the pattern of @var{insn} or an rtx that is supposed
+to be used as the pattern of an insn the remainder of the compilation.
+
+In implementing this hook, you can use the construct
+@code{COSTS_N_INSNS (@var{n})} to specify a cost equal to @var{n}
+instructions.
+When optimizing for execution speed, i.e.@: when @var{speed} is
+true, this target hook should be used to estimate the relative
+execution cost of the pattern, and the size cost of the pattern if
+@var{speed} is false.
+
+Use this pattern if a @code{SET_DEST} is needed to calculate the correct
+costs or when you want to see the whole of a @code{PARALLEL} and not only
+parts of it. Notice that for a @code{single_set} you might see a
+@code{PARALLEL} @var{pattern} that contains a @code{SET} together with
+@code{COBBER}s.
+
+If the pattern is too complicated to be evaluated in this hook, return
+@code{INSN_COSTS_UNKNOWN} which directs the middle-end to use other
+approaches to get the respective costs like calling
+@code{TARGET_RTX_COSTS} for sub-rtxes of @var{pattern}.
+
+Don't implement this hook if it would always return
+@code{INSN_COSTS_UNKNOWN}.
+
+Don't use @code{INSN_COSTS_UNKNOWN} except as a return value for this hook.
+Do @emph{not use} that value as a return value for @code{TARGET_RTX_COSTS}!
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_RTX_COSTS (rtx @var{x}, machine_mode @var{mode}, int @var{outer_code}, int @var{opno}, int *@var{total}, bool @var{speed})
 This target hook describes the relative costs of RTL expressions.
 
Index: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 250123)
+++ doc/tm.texi.in	(working copy)
@@ -4790,6 +4790,8 @@ @samp{fold_range_test ()} is optimal.  T
 
 @hook TARGET_OPTAB_SUPPORTED_P
 
+@hook TARGET_INSN_COSTS
+
 @hook TARGET_RTX_COSTS
 
 @hook TARGET_ADDRESS_COST
Index: rtlanal.c
===
--- rtlanal.c	(revision 250090)
+++ rtlanal.c	(working copy)
@@ -5263,6 +5263,10 @@ insn_rtx_cost (rtx pat, bool speed)
   int i, cost;
   rtx set;
 
+  int icost = targetm.insn_costs (NULL, pat, speed);
+  if (icost != 

Re: try_finally_expr wrong source location info

2017-07-12 Thread Vyacheslav Barinov
Hello,

I've fixed the location issue by generating new STATEMENT_LIST_END tree for
each compound statement and using it to store the location of closing brace.

I do understand that it's rather an ugly workaround than a solution, but still
it resolves the issue for us and doesn't break GCC testsuite. So I'll send it
here, maybe someone else needs the fix.

Best Regards,
Vyacheslav Barinov

Eric Botcazou  writes:

>> But then, after lowering on the eh pass (gcc/tree-eh.c:1161), the "finally"
>> location is set up to the `gimple_location (tf->try_finally_expr)' which
>> actually expands to "testcase.cxx:4", a place where the "try" block starts.
>>
>> The dump testcase.cxx.009t.ehopt shows no location info for destructor:
>>  std::unique_ptr::~unique_ptr (&lang);
>>
>> And next testcase.cxx.010t.eh dump shows equal location for constructor and
>> destructor:
>>  [testcase.cxx:4:38] std::unique_ptr::unique_ptr ([testcase.cxx:4:38]
>> &lang, D.42272); [testcase.cxx:4:38] std::unique_ptr::~unique_ptr
>> (&lang);
>>
>> I performed several experiments trying to get information about gimple_block
>> around the try_finally_expr statement right in `lower_try_finally_onedest',
>> but unsuccessfully. And also, as far as I understand, this issue should be
>> fixed in all the lowering functions, not only this one.
>>
>> Where should I dig in order to fix the issue?
>
> We have been using this patchlet locally for some time:
>
> Index: tree-eh.c
> ===
> --- tree-eh.c   (revision 248944)
> +++ tree-eh.c   (working copy)
> @@ -1413,11 +1413,12 @@ lower_try_finally_switch (struct leh_sta
>x = gimple_build_assign (finally_tmp,
>build_int_cst (integer_type_node,
>   fallthru_index));
> +  gimple_set_location (x, finally_loc);
>gimple_seq_add_stmt (&tf->top_p_seq, x);
>
>tmp = build_int_cst (integer_type_node, fallthru_index);
>last_case = build_case_label (tmp, NULL,
> -   create_artificial_label (tf_loc));
> +   create_artificial_label (finally_loc));
>case_label_vec.quick_push (last_case);
>last_case_index++;
>
> @@ -1426,7 +1427,7 @@ lower_try_finally_switch (struct leh_sta
>
>tmp = lower_try_finally_fallthru_label (tf);
>x = gimple_build_goto (tmp);
> -  gimple_set_location (x, tf_loc);
> +  gimple_set_location (x, finally_loc);
>gimple_seq_add_stmt (&switch_body, x);
>  }
>From 2e3782efbf39bc26342131eb92b1bf45aeed9e8d Mon Sep 17 00:00:00 2001
From: Slava Barinov 
Date: Thu, 6 Jul 2017 16:14:58 +0300
Subject: [PATCH] Fix cleanup location for try_finally_expr.

gcc/
* tree.def: Add STATEMENT_LIST_END tree code.
* tree.c: Add STATEMENT_LIST_END handling as TS_COMMON.
* gimplify.c (gimplify_expr): Use STATEMENT_LIST_END location to
provide right information for try_finally_expr.
* tree-eh.c (lower_try_finally_onedest): Set finally location
* c-family/c-semantics.c (pop_stmt_list): Support single-statement
lists extraction with STATEMENT_LIST_END in the end.
* fold-const.c (operand_equal_p): Add STATEMENT_LIST_END support.
	gcc/cp/
* parser.c (cp_parser_compound_statement): Use STATEMENT_LIST_END
to keep the location of closing brace.
* pt.c: Handle STATEMENT_LIST_END.
* constraint.cc (check_function_concept): Handle concept definitions
with STATEMENT_LIST_END.
* error.c (dump_expr): Add STATEMENT_LIST_END support.
	gcc/testsuite/
* g++.dg/ext/statement-list-end.C: New.

Signed-off-by: Slava Barinov 
---
 gcc/ChangeLog  | 17 ++
 gcc/c-family/c-semantics.c |  6 +
 gcc/cp/ChangeLog   |  7 ++
 gcc/cp/constexpr.c |  7 ++
 gcc/cp/constraint.cc   |  9 
 gcc/cp/error.c |  1 +
 gcc/cp/parser.c| 15 ++---
 gcc/cp/pt.c|  3 +++
 gcc/fold-const.c   |  4 
 gcc/gimple.c   |  1 +
 gcc/gimplify.c | 26 ++
 gcc/testsuite/ChangeLog|  4 
 .../c-c++-common/Wimplicit-fallthrough-6.c | 16 ++---
 .../c-c++-common/Wimplicit-fallthrough-7.c |  4 ++--
 gcc/testsuite/g++.dg/ext/statement-list-end.C  | 11 +
 gcc/testsuite/g++.dg/gcov/gcov-2.C |  4 ++--
 gcc/testsuite/g++.dg/parse/error26.C   |  4 ++--
 gcc/testsuite/g++.dg/tm/inherit2.C |  4 ++--
 gcc/testsuite/g++.dg/tm/unsafe1.C  |  4 ++--
 .

Linux and Windows generate different binaries

2017-07-12 Thread Klaus Kruse Pedersen (Klaus)
I have seen reproducible builds being discussed here, but what is the
position on inter c-lib and OS reproducible builds?

As it happens, I just hit by an interesting case were different OS'es
generate (significant) different code. The difference originate from a
relatively small memory ordering difference in the 'lim2' / tree-ssa-
loop-im  pass". Memory references show something like:

Memory reference 1: AGC_S_error
Memory reference 2: AGC_ACC_error
Memory reference 3: AGC_Coeff
Memory reference 4: *_35
Memory reference 5: *_9

Investigation show that a call to qsort() without any tie breaker is
the root-cause:

static int
sort_bbs_in_loop_postorder_cmp (const void *bb1_, const void *bb2_)
{
  basic_block bb1 = *(basic_block *)const_cast(bb1_);
  basic_block bb2 = *(basic_block *)const_cast(bb2_);
  struct loop *loop1 = bb1->loop_father;
  struct loop *loop2 = bb2->loop_father;
 ->  fprintf(stderr, "Debug: loop1->num == loop2->num (%d,%d)\n",
 loop1->num, loop2->num);
  if (loop1->num == loop2->num)
return 0;
  return bb_loop_postorder[loop1->num] < bb_loop_postorder[loop2-
>num] ? -1 : 1;
}

The debug statement show (for this test-case) that:

loop1->num == loop2->num (1,1)
loop1->num == loop2->num (1,1)
loop1->num == loop2->num (1,1)
[...]
loop1->num == loop2->num (1,1)

... all (loop_father->num)'s are the same and thus the sort_bbs..()
function return 0. This guarantee that qsort shuffle the elements in a
implementation specific way.

Obviously not the best case if you are interested in reproducible
builds (and the qsort() implementation isn't part of your build).


Maybe it is time to add a stable sort?


BR, Klaus