Attention

2014-10-10 Thread Robin Mullane
Hi,

Am Robin Mullane, Cheif operating officer of Standard Bank, Please permit me to 
discuss an inheritance deal connected to your second name. Reply to: 
robinmulla...@gmail.com if interested for more details.



Re: predicated code motion (in lim)

2014-10-10 Thread Evgeniya Maenkova
Got it, thanks.

(Now there are phi instead of if)

On Fri, Oct 10, 2014 at 6:18 PM, Richard Biener
 wrote:
> On Fri, Oct 10, 2014 at 3:44 PM, Evgeniya Maenkova
>  wrote:
>> Hi,
>> could anyone clarify about predicated code motion in lim?
>>
>> After reading a TODO in /tree-ssa-loop-im.c (see [1]) I tried several
>> examples, say [2]. However, in all of them the code was moved out of
>> the loop successfully (either by pre or by lim, as in [2]).
>>
>> So my question is: what the author of this code did mean by
>> "predicated code motion"? (what is the TODO)
>
> It means transforming
>
>> Thanks,
>>
>> Evgeniya
>>
>> [1]
>> TODO:  Support for predicated code motion.  I.e.
>>
>>while (1)
>>  {
>>if (cond)
>>  {
>>a = inv;
>>something;
>>  }
>>  }
>
> this to
>
> if (cond)
>  {
>a= inv;
>something;
>  }
>while (1)
>  ;
>
> which is currently supported in a very limited way by emitting
> this as
>
>a = cond ? inv : a;
>
> for at most two statements.  As it executes stmts unconditionally
> that way it is limited to non-trapping operations.
>
> Richard.
>
>
>> [2]
>>
>> void foo(int cond, int inv)
>> {
>>
>> int a;
>> int i = 0;
>> int j = 0;
>> while (j++ < 100) {
>> while (i++ < 2000)
>>  {
>>if (j % 2)
>> {
>> a = 528*j;
>> printf("Hey1%d %d", a, i);//something;
>>  }
>> }
>>
>>  }
>> }
>>
>> lim:
>>
>> ;; Function foo (foo, funcdef_no=0, decl_uid=1394, cgraph_uid=0, 
>> symbol_order=0)
>>
>> foo (int cond, int inv)
>> {
>>   int j;
>>   int i;
>>   int a;
>>   unsigned int j.0_12;
>>   unsigned int _13;
>>   unsigned int _18;
>>   unsigned int _21;
>>
>>   :
>>   goto ;
>>
>>   :
>>   if (_13 != 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   goto ;
>>
>>   :
>>   printf ("Hey1%d %d", a_14, i_11);
>>
>>   :
>>
>>   :
>>   # i_1 = PHI 
>>   i_11 = i_1 + 1;
>>   if (i_1 <= 1999)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   # i_20 = PHI 
>>   j_9 = j_23 + 1;
>>   if (j_23 != 100)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # i_22 = PHI 
>>   # j_23 = PHI 
>>   j.0_12 = (unsigned int) j_23;
>>   _13 = j.0_12 & 1;
>>   _21 = (unsigned int) j_23;
>>   _18 = _21 * 528;
>>   a_14 = (int) _18;
>>   goto ;
>>
>>   :
>>   return;
>>
>> }
>>
>>
>> However, in loopinit (the optimization before lim) there was no motion
>> (So this was done by lim:
>>   :
>>   a_14 = j_23 * 528;
>>   printf ("Hey1%d %d", a_14, i_11);



-- 
Thanks,

Evgeniya

perfstories.wordpress.com


Re: RTL infrastructure leaks VALUE expressions into aliasing-detecting functions

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 8:18 PM, Jeff Law  wrote:

 I'd like to bring PR 63475 to the attention of RTL maintainers. The
 problem in the referred PR exposed the RTL infrastructure problem,
 where VALUE expressions are leaked instead of MEM expresions into
 various parts of aliasing-detecting support functions.

 As an example, please consider following patch for base_alias_check:

 --cut here--
 Index: alias.c
 ===
 --- alias.c (revision 216025)
 +++ alias.c (working copy)
 @@ -1824,6 +1824,13 @@ base_alias_check (rtx x, rtx x_base, rtx y, rtx
 y_
  if (rtx_equal_p (x_base, y_base))
return 1;

 +  if (GET_CODE (x) == VALUE || GET_CODE (y) == VALUE)
 +{
 +  debug_rtx (x);
 +  debug_rtx (y);
 +  gcc_unreachable ();
 +}
 +
  /* The base addresses are different expressions.  If they are not
 accessed
 via AND, there is no conflict.  We can bring knowledge of object
 alignment into play here.  For example, on alpha, "char a, b;"
 can
>>>
>>>
>>> But when base_alias_check  returns, we call memrefs_conflict_p which does
>>> know how to dig down into a VALUE expression.
>>
>>
>> IIRC, the problem was that base_alias_check returned 0 due to:
>>
>>/* Differing symbols not accessed via AND never alias.  */
>>if (GET_CODE (x_base) != ADDRESS && GET_CODE (y_base) != ADDRESS)
>>  return 0;
>>
>> so, the calling code never reached memrefs_conflict_p down the stream.
>
> Right.  And my question is what happens if we aren't as aggressive here.
> What happens if before this check we return nonzero if X or Y is a VALUE?
> Do we then get into memrefs_conflict_p and does it do the right thing?

Following patch just after AND detection in base_alias_check fixes the
testcase from PR:

--cut here--
Index: alias.c
===
--- alias.c (revision 216100)
+++ alias.c (working copy)
@@ -1842,6 +1842,8 @@ base_alias_check (rtx x, rtx x_base, rtx y, rtx y_
  || (int) GET_MODE_UNIT_SIZE (x_mode) < -INTVAL (XEXP (y, 1
 return 1;

+  return 1;
+
   /* Differing symbols not accessed via AND never alias.  */
   if (GET_CODE (x_base) != ADDRESS && GET_CODE (y_base) != ADDRESS)
 return 0;
--cut here--

I have started a bootstrap on alpha native with this patch (it will
take a day or so) and will report back findings

The results with unpatched gcc are at [1].

[1] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01151.html

Uros.


Re: RTL infrastructure leaks VALUE expressions into aliasing-detecting functions

2014-10-10 Thread Jeff Law

On 10/10/14 12:12, Uros Bizjak wrote:

On Fri, Oct 10, 2014 at 7:56 PM, Jeff Law  wrote:

On 10/09/14 06:14, Uros Bizjak wrote:


Hello!

I'd like to bring PR 63475 to the attention of RTL maintainers. The
problem in the referred PR exposed the RTL infrastructure problem,
where VALUE expressions are leaked instead of MEM expresions into
various parts of aliasing-detecting support functions.

As an example, please consider following patch for base_alias_check:

--cut here--
Index: alias.c
===
--- alias.c (revision 216025)
+++ alias.c (working copy)
@@ -1824,6 +1824,13 @@ base_alias_check (rtx x, rtx x_base, rtx y, rtx y_
 if (rtx_equal_p (x_base, y_base))
   return 1;

+  if (GET_CODE (x) == VALUE || GET_CODE (y) == VALUE)
+{
+  debug_rtx (x);
+  debug_rtx (y);
+  gcc_unreachable ();
+}
+
 /* The base addresses are different expressions.  If they are not
accessed
via AND, there is no conflict.  We can bring knowledge of object
alignment into play here.  For example, on alpha, "char a, b;" can


But when base_alias_check  returns, we call memrefs_conflict_p which does
know how to dig down into a VALUE expression.


IIRC, the problem was that base_alias_check returned 0 due to:

   /* Differing symbols not accessed via AND never alias.  */
   if (GET_CODE (x_base) != ADDRESS && GET_CODE (y_base) != ADDRESS)
 return 0;

so, the calling code never reached memrefs_conflict_p down the stream.
Right.  And my question is what happens if we aren't as aggressive here. 
  What happens if before this check we return nonzero if X or Y is a 
VALUE?  Do we then get into memrefs_conflict_p and does it do the right 
thing?






It might be that targets without AND addresses are immune to this
issue, but the code that deals with ANDs is certailny not prepared to
handle VALUEs.

(The testcase from the PR can be compiled with a crosscompiler to
alpha-linux-gnu, as outlined in the PR. Two AND addresses should be
detected as aliasing, but they are not - resulting in CSE propagating
aliased read after store in (insn 29).)
Yea, but I'm slammed right now and won't be able to look at it under a 
debugger for a while.


jeff


Re: RTL infrastructure leaks VALUE expressions into aliasing-detecting functions

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 7:56 PM, Jeff Law  wrote:
> On 10/09/14 06:14, Uros Bizjak wrote:
>>
>> Hello!
>>
>> I'd like to bring PR 63475 to the attention of RTL maintainers. The
>> problem in the referred PR exposed the RTL infrastructure problem,
>> where VALUE expressions are leaked instead of MEM expresions into
>> various parts of aliasing-detecting support functions.
>>
>> As an example, please consider following patch for base_alias_check:
>>
>> --cut here--
>> Index: alias.c
>> ===
>> --- alias.c (revision 216025)
>> +++ alias.c (working copy)
>> @@ -1824,6 +1824,13 @@ base_alias_check (rtx x, rtx x_base, rtx y, rtx y_
>> if (rtx_equal_p (x_base, y_base))
>>   return 1;
>>
>> +  if (GET_CODE (x) == VALUE || GET_CODE (y) == VALUE)
>> +{
>> +  debug_rtx (x);
>> +  debug_rtx (y);
>> +  gcc_unreachable ();
>> +}
>> +
>> /* The base addresses are different expressions.  If they are not
>> accessed
>>via AND, there is no conflict.  We can bring knowledge of object
>>alignment into play here.  For example, on alpha, "char a, b;" can
>
> But when base_alias_check  returns, we call memrefs_conflict_p which does
> know how to dig down into a VALUE expression.

IIRC, the problem was that base_alias_check returned 0 due to:

  /* Differing symbols not accessed via AND never alias.  */
  if (GET_CODE (x_base) != ADDRESS && GET_CODE (y_base) != ADDRESS)
return 0;

so, the calling code never reached memrefs_conflict_p down the stream.

It might be that targets without AND addresses are immune to this
issue, but the code that deals with ANDs is certailny not prepared to
handle VALUEs.

(The testcase from the PR can be compiled with a crosscompiler to
alpha-linux-gnu, as outlined in the PR. Two AND addresses should be
detected as aliasing, but they are not - resulting in CSE propagating
aliased read after store in (insn 29).)

Uros.


Re: RTL infrastructure leaks VALUE expressions into aliasing-detecting functions

2014-10-10 Thread Jeff Law

On 10/09/14 06:14, Uros Bizjak wrote:

Hello!

I'd like to bring PR 63475 to the attention of RTL maintainers. The
problem in the referred PR exposed the RTL infrastructure problem,
where VALUE expressions are leaked instead of MEM expresions into
various parts of aliasing-detecting support functions.

As an example, please consider following patch for base_alias_check:

--cut here--
Index: alias.c
===
--- alias.c (revision 216025)
+++ alias.c (working copy)
@@ -1824,6 +1824,13 @@ base_alias_check (rtx x, rtx x_base, rtx y, rtx y_
if (rtx_equal_p (x_base, y_base))
  return 1;

+  if (GET_CODE (x) == VALUE || GET_CODE (y) == VALUE)
+{
+  debug_rtx (x);
+  debug_rtx (y);
+  gcc_unreachable ();
+}
+
/* The base addresses are different expressions.  If they are not accessed
   via AND, there is no conflict.  We can bring knowledge of object
   alignment into play here.  For example, on alpha, "char a, b;" can
But when base_alias_check  returns, we call memrefs_conflict_p which 
does know how to dig down into a VALUE expression.


Is it simply the case that we want/need to consider anything with a 
VALUE as not passing the base alias check and defer memrefs_conflict_p?


I really don't know, it's been a long time since I worked with this code 
(it predates the introduction of cselib, so that gives you an idea of 
how long its been :-)


jeff




Re: [PATCH] gcc parallel make check

2014-10-10 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 04:50:47PM +0200, Christophe Lyon wrote:
> On 10 October 2014 16:19, Jakub Jelinek  wrote:
> > On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote:
> >> my.exp contains the following construct which is often used in the 
> >> testsuite:
> >> ==
> >> foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
> >> # If we're only testing specific files and this isn't one of them,
> >> skip it.
> >> if ![runtest_file_p $runtests $src] then {
> >> continue
> >> }
> >> c-torture-execute $src $additional_flags
> >> gcc-dg-runtest $src "" $additional_flags
> >> }
> >> ==
> >> Note that gcc-dg-runtest calls runtest_file_p too.
> >
> > Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p
> > is run the same number of times in all instances unless
> > gcc_parallel_test_enable has been disabled.
> 
> Thanks for your prompt answer.
> 
> Is this documented somewhere, so that such cases do not happen in the future?

Feel free to submit a documentation patch.

> It's in a patch which has been under review for quite some time
> (started before your change), that's why you missed it.

Ah, ok.

> What about my remark about:
> >  # For parallelized check-% targets, this decides whether parallelization
> >  # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything
> >  # but optional --target_board or --extra_opts arguments).  If desirable,
> I think it should be removed from gcc/Makefile.in

Only the " and RUNTESTFLAGS ... arguments" part of that.  Patch preapproved.

Jakub


Re: [PATCH] gcc parallel make check

2014-10-10 Thread Christophe Lyon
On 10 October 2014 16:19, Jakub Jelinek  wrote:
> On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote:
>> my.exp contains the following construct which is often used in the testsuite:
>> ==
>> foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
>> # If we're only testing specific files and this isn't one of them,
>> skip it.
>> if ![runtest_file_p $runtests $src] then {
>> continue
>> }
>> c-torture-execute $src $additional_flags
>> gcc-dg-runtest $src "" $additional_flags
>> }
>> ==
>> Note that gcc-dg-runtest calls runtest_file_p too.
>
> Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p
> is run the same number of times in all instances unless
> gcc_parallel_test_enable has been disabled.

Thanks for your prompt answer.

Is this documented somewhere, so that such cases do not happen in the future?


> See the patches I've posted when adding the fine-grained parallelization,
> e.g. go testsuite has been fixed that way, etc.
> So, in your above example, you'd need:
> gcc_parallel_test_enable 0
> line before c-torture-execute and
> gcc_parallel_test_enable 1
> line after gcc-dg-runtest.  That way, if runtest_file_p says the test should
> be scheduled by current instance, all the subtests will be run there.
>
> If my.exp is part of gcc/testsuite, I'm sorry for missing it, if it is
> elsewhere, just fix it up.

It's in a patch which has been under review for quite some time
(started before your change), that's why you missed it.

> Note, there are #verbose lines in gcc_parallel_test_run_p, you can uncomment
> them and through sed on the log files verify that each instance performs the
> same parallelization checks (same strings).
Yep, I saw those and also added other traces of my own :-)


What about my remark about:
>  # For parallelized check-% targets, this decides whether parallelization
>  # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything
>  # but optional --target_board or --extra_opts arguments).  If desirable,
I think it should be removed from gcc/Makefile.in

Thanks,

Christophe.


Re: fast-math optimization question

2014-10-10 Thread Richard Biener
On Fri, Oct 10, 2014 at 3:27 PM, Vincent Lefevre  wrote:
> On 2014-10-10 11:07:52 +0200, Jakub Jelinek wrote:
>> Though, is such optimization desirable even for fast-math?
>
> I wonder whether fast-math has a well-defined spec, but it should be
> noted that because of possible cancellations, even if the final result
> is a float, it may be better to keep intermediate results in double
> if the user has used double types (explicitly or implicitly via a
> function that returns a double). For instance, if x is a double,
> (float) sin(x) and (float) sin((float) x) can give very different
> results if x is large compared to the period (2*pi).

I think the only case handled right now is x being float and
(float) sin ((double) x) being called.  I think for that case the
demotion is fine (and it is a common mistake that people don't know
about the 'f' variants and expect sin to behave as type-generic function).

Richard.

> --
> Vincent Lefèvre  - Web: 
> 100% accessible validated (X)HTML - Blog: 
> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: [PATCH] gcc parallel make check

2014-10-10 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 04:09:39PM +0200, Christophe Lyon wrote:
> my.exp contains the following construct which is often used in the testsuite:
> ==
> foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
> # If we're only testing specific files and this isn't one of them,
> skip it.
> if ![runtest_file_p $runtests $src] then {
> continue
> }
> c-torture-execute $src $additional_flags
> gcc-dg-runtest $src "" $additional_flags
> }
> ==
> Note that gcc-dg-runtest calls runtest_file_p too.

Such my.exp is invalid, you need to guarantee gcc_parallel_test_run_p
is run the same number of times in all instances unless
gcc_parallel_test_enable has been disabled.

See the patches I've posted when adding the fine-grained parallelization,
e.g. go testsuite has been fixed that way, etc.
So, in your above example, you'd need:
gcc_parallel_test_enable 0
line before c-torture-execute and
gcc_parallel_test_enable 1
line after gcc-dg-runtest.  That way, if runtest_file_p says the test should
be scheduled by current instance, all the subtests will be run there.

If my.exp is part of gcc/testsuite, I'm sorry for missing it, if it is
elsewhere, just fix it up.

Note, there are #verbose lines in gcc_parallel_test_run_p, you can uncomment
them and through sed on the log files verify that each instance performs the
same parallelization checks (same strings).

Jakub


Re: predicated code motion (in lim)

2014-10-10 Thread Richard Biener
On Fri, Oct 10, 2014 at 3:44 PM, Evgeniya Maenkova
 wrote:
> Hi,
> could anyone clarify about predicated code motion in lim?
>
> After reading a TODO in /tree-ssa-loop-im.c (see [1]) I tried several
> examples, say [2]. However, in all of them the code was moved out of
> the loop successfully (either by pre or by lim, as in [2]).
>
> So my question is: what the author of this code did mean by
> "predicated code motion"? (what is the TODO)

It means transforming

> Thanks,
>
> Evgeniya
>
> [1]
> TODO:  Support for predicated code motion.  I.e.
>
>while (1)
>  {
>if (cond)
>  {
>a = inv;
>something;
>  }
>  }

this to

if (cond)
 {
   a= inv;
   something;
 }
   while (1)
 ;

which is currently supported in a very limited way by emitting
this as

   a = cond ? inv : a;

for at most two statements.  As it executes stmts unconditionally
that way it is limited to non-trapping operations.

Richard.


> [2]
>
> void foo(int cond, int inv)
> {
>
> int a;
> int i = 0;
> int j = 0;
> while (j++ < 100) {
> while (i++ < 2000)
>  {
>if (j % 2)
> {
> a = 528*j;
> printf("Hey1%d %d", a, i);//something;
>  }
> }
>
>  }
> }
>
> lim:
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=1394, cgraph_uid=0, 
> symbol_order=0)
>
> foo (int cond, int inv)
> {
>   int j;
>   int i;
>   int a;
>   unsigned int j.0_12;
>   unsigned int _13;
>   unsigned int _18;
>   unsigned int _21;
>
>   :
>   goto ;
>
>   :
>   if (_13 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
>   :
>   printf ("Hey1%d %d", a_14, i_11);
>
>   :
>
>   :
>   # i_1 = PHI 
>   i_11 = i_1 + 1;
>   if (i_1 <= 1999)
> goto ;
>   else
> goto ;
>
>   :
>   # i_20 = PHI 
>   j_9 = j_23 + 1;
>   if (j_23 != 100)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # i_22 = PHI 
>   # j_23 = PHI 
>   j.0_12 = (unsigned int) j_23;
>   _13 = j.0_12 & 1;
>   _21 = (unsigned int) j_23;
>   _18 = _21 * 528;
>   a_14 = (int) _18;
>   goto ;
>
>   :
>   return;
>
> }
>
>
> However, in loopinit (the optimization before lim) there was no motion
> (So this was done by lim:
>   :
>   a_14 = j_23 * 528;
>   printf ("Hey1%d %d", a_14, i_11);


Re: [PATCH] gcc parallel make check

2014-10-10 Thread Christophe Lyon
Hi Jakub,


On 15 September 2014 18:05, Jakub Jelinek  wrote:
[...]
>  # For parallelized check-% targets, this decides whether parallelization
>  # is desirable (if -jN is used and RUNTESTFLAGS doesn't contain anything
>  # but optional --target_board or --extra_opts arguments).  If desirable,
>  # recursive make is run with check-parallel-$lang{,1,2,3,4,5} etc. goals,
>  # which can be executed in parallel, as they are run in separate directories.
> -# check-parallel-$lang{1,2,3,4,5} etc. goals invoke runtest with the longest
> -# running *.exp files from the testsuite, as determined by 
> check_$lang_parallelize
> -# variable.  The check-parallel-$lang goal in that case invokes runtest with
> -# all the remaining *.exp files not handled by the separate goals.
> +# check-parallel-$lang{,1,2,3,4,5} etc. goals invoke runtest with
> +# GCC_RUNTEST_PARALLELIZE_DIR var in the environment and runtest_file_p
> +# dejaGNU procedure is overridden to additionally synchronize through
> +# a $lang-parallel directory which tests will be run by which runtest 
> instance.
>  # Afterwards contrib/dg-extract-results.sh is used to merge the sum and log
>  # files.  If parallelization isn't desirable, only one recursive make
>  # is run with check-parallel-$lang goal and check_$lang_parallelize variable
> @@ -3662,76 +3645,60 @@ check_p_subdirs=$(wordlist 1,$(words $(c
>  # to lang_checks_parallelized variable and define check_$lang_parallelize
>  # variable (see above check_gcc_parallelize description).
>  $(lang_checks_parallelized): check-% : site.exp
> -   @if [ -z "$(filter-out --target_board=%,$(filter-out 
> --extra_opts%,$(RUNTESTFLAGS)))" ] \

Since you removed this test, the comment above is not longer accurate:
setting RUNTESTFLAGS to whatever value no longer disables
parallelization.

Which leads me to discuss a bug I faced after you committed this
change: I am testing a patch which bring a series of new tests.
$ RUNTESTFLAGS=my.exp make -jN check (in fact the 'make -j' is
embedded in a larger build script)

my.exp contains the following construct which is often used in the testsuite:
==
foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
# If we're only testing specific files and this isn't one of them,
skip it.
if ![runtest_file_p $runtests $src] then {
continue
}
c-torture-execute $src $additional_flags
gcc-dg-runtest $src "" $additional_flags
}
==
Note that gcc-dg-runtest calls runtest_file_p too.

What I observed is that if I use -j1, all my .c files get tested,
while with N>2 some of them are silently skipped.

It took me a while to figure out that it's because gcc-dg-runtest
calls runtest_file_p, which means that runtest_file_p is called twice
when the 1st invocation returns 1, and only once when the 1st
invocation returns 0.

For example, if we have pid0, pid1 the concurrent runtest processes,
and file0.c, file1.c,  the testcases, then:
* pid0 decides to keep file0.c file1.c file2.c file3.c file4.c. Since
the above loop calls runtest_file_p twice for each, we reach the
"minor" counter of 10.
* in the mean time, pid1 decides to skip file0.c, file1.c ... file9.c
since it calls runtest_file_p only once for each
* pid1 increments its parallel counter to 1, and create the new testing subdir
* pid1 decides to keep file10, file11, file12, file13 and file14
(again, 2 calls to runtest_file_p per testcase)
* pid0 increments its parallel counter to 1, and decides it has to skip it
* pid0 thus decides to skip file5, file6, file7, ... file14, calling
runtest_file_p once for each
* etc...

In the end, we have ignored file5...file9

I'm not sure why you have made special cases for some of the existing
*.exp when you forced them to disable parallelization.
Was it to handle such cases?

I'm not sure about the next step:
- should I modify my .exp file?
- should you modify gcc_parallel_test_run_p?

Even if I have to modify my .exp file, I think this is error prone,
and others could introduce a similar construct in the future.

Thanks,

Christophe.


predicated code motion (in lim)

2014-10-10 Thread Evgeniya Maenkova
Hi,
could anyone clarify about predicated code motion in lim?

After reading a TODO in /tree-ssa-loop-im.c (see [1]) I tried several
examples, say [2]. However, in all of them the code was moved out of
the loop successfully (either by pre or by lim, as in [2]).

So my question is: what the author of this code did mean by
"predicated code motion"? (what is the TODO)

Thanks,

Evgeniya

[1]
TODO:  Support for predicated code motion.  I.e.

   while (1)
 {
   if (cond)
 {
   a = inv;
   something;
 }
 }

[2]

void foo(int cond, int inv)
{

int a;
int i = 0;
int j = 0;
while (j++ < 100) {
while (i++ < 2000)
 {
   if (j % 2)
{
a = 528*j;
printf("Hey1%d %d", a, i);//something;
 }
}

 }
}

lim:

;; Function foo (foo, funcdef_no=0, decl_uid=1394, cgraph_uid=0, symbol_order=0)

foo (int cond, int inv)
{
  int j;
  int i;
  int a;
  unsigned int j.0_12;
  unsigned int _13;
  unsigned int _18;
  unsigned int _21;

  :
  goto ;

  :
  if (_13 != 0)
goto ;
  else
goto ;

  :
  goto ;

  :
  printf ("Hey1%d %d", a_14, i_11);

  :

  :
  # i_1 = PHI 
  i_11 = i_1 + 1;
  if (i_1 <= 1999)
goto ;
  else
goto ;

  :
  # i_20 = PHI 
  j_9 = j_23 + 1;
  if (j_23 != 100)
goto ;
  else
goto ;

  :

  :
  # i_22 = PHI 
  # j_23 = PHI 
  j.0_12 = (unsigned int) j_23;
  _13 = j.0_12 & 1;
  _21 = (unsigned int) j_23;
  _18 = _21 * 528;
  a_14 = (int) _18;
  goto ;

  :
  return;

}


However, in loopinit (the optimization before lim) there was no motion
(So this was done by lim:
  :
  a_14 = j_23 * 528;
  printf ("Hey1%d %d", a_14, i_11);


Re: fast-math optimization question

2014-10-10 Thread Vincent Lefevre
On 2014-10-10 11:07:52 +0200, Jakub Jelinek wrote:
> Though, is such optimization desirable even for fast-math?

I wonder whether fast-math has a well-defined spec, but it should be
noted that because of possible cancellations, even if the final result
is a float, it may be better to keep intermediate results in double
if the user has used double types (explicitly or implicitly via a
function that returns a double). For instance, if x is a double,
(float) sin(x) and (float) sin((float) x) can give very different
results if x is large compared to the period (2*pi).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: Towards GNU11

2014-10-10 Thread Marek Polacek
On Thu, Oct 09, 2014 at 02:34:51PM -0700, Mike Stump wrote:
> On Oct 7, 2014, at 2:07 PM, Marek Polacek  wrote:
> > I'd like to kick off a discussion about moving the default standard
> > for C from gnu89 to gnu11.
> 
> I endorse the change of default.
 
Thanks for chiming in.

> A wiki page that has the types of changes people hit in real code with how to 
> fix it, would be useful, helpful.  Might be nice to have this in the 
> document, but, don’t know if people want to do that much work.  The wiki site 
> is nice, as if others do world builds, then can add what ever they hit in 
> easily, which then makes that even more complete.  This is a nice to have, I 
> don’t think the work going in should be gated on this.
 
Yeah.  I plan to write something into the "porting to" document.

> Two comment:
> 
>   Thank you for all your hard work.
> 
>   Yes please.

Thank you very much!

Marek


Re: fast-math optimization question

2014-10-10 Thread Jakub Jelinek
On Thu, Oct 09, 2014 at 03:55:34PM -0700, Steve Ellcey wrote:
> On Thu, 2014-10-09 at 19:50 +, Joseph S. Myers wrote:
> > On Thu, 9 Oct 2014, Steve Ellcey wrote:
> > 
> > > Do you know which pass does the simple
> > > '(float)function((double)float_val)' demotion?  Maybe that would be a
> > > good place to extend things.
> > 
> > convert.c does such transformations.  Maybe the transformations in there 
> > could move to the match-and-simplify infrastructure - convert.c is not a 
> > particularly good place for optimization, and having similar 
> > transformations scattered around (fold-const, convert.c, front ends, SSA 
> > optimizers) isn't helpful; hopefully match-and-simplify will allow some 
> > unification of this sort of optimization.
> 
> I did a quick and dirty experiment with the match-and-simplify branch
> just to get an idea of what it might be like.  The branch built for MIPS
> right out of the box so that was great and I added a couple of rules
> (see below) just to see if it would trigger the optimization I wanted
> and it did.  I was impressed with the match-and-simplify infrastructure,
> it seemed to work quite well.  Will this branch be included in GCC 5.0?

Though, is such optimization desirable even for fast-math?
I mean, in the normal demotion, all that changes compared to original
source is the possibility of double rounding, or, if right now as in glibc
the *f suffixed varaints aren't 0.5ulp precise while double ones are.

If you want to demote a chain of calls, you add roundings in the middle too,
and depending on which function it is and which exact argument, I'd worry
the maximum error would already be not just slightly higher, but significantly
worse.  Even for -ffast-math we want only slightly worse precision, not
significantly worse one.

Jakub


Re: fast-math optimization question

2014-10-10 Thread Richard Biener
On Fri, Oct 10, 2014 at 12:55 AM, Steve Ellcey  wrote:
> On Thu, 2014-10-09 at 19:50 +, Joseph S. Myers wrote:
>> On Thu, 9 Oct 2014, Steve Ellcey wrote:
>>
>> > Do you know which pass does the simple
>> > '(float)function((double)float_val)' demotion?  Maybe that would be a
>> > good place to extend things.
>>
>> convert.c does such transformations.  Maybe the transformations in there
>> could move to the match-and-simplify infrastructure - convert.c is not a
>> particularly good place for optimization, and having similar
>> transformations scattered around (fold-const, convert.c, front ends, SSA
>> optimizers) isn't helpful; hopefully match-and-simplify will allow some
>> unification of this sort of optimization.
>
> I did a quick and dirty experiment with the match-and-simplify branch
> just to get an idea of what it might be like.  The branch built for MIPS
> right out of the box so that was great and I added a couple of rules
> (see below) just to see if it would trigger the optimization I wanted
> and it did.  I was impressed with the match-and-simplify infrastructure,
> it seemed to work quite well.  Will this branch be included in GCC 5.0?

Yes, I'm working towards merging it piecewise this stage1.

> Steve Ellcey
> sell...@mips.com
>
>
> Code added to match-builtin.pd:
>
>
> (if (flag_unsafe_math_optimizations)
>  /* Optimize "(float) expN(x)" [where x is type double] to
>  "expNf((float) x)", i.e. call the 'f' single precision func */
>  (simplify
>   (convert (BUILT_IN_LOG @0))
>   (if ((TYPE_MODE (type) == SFmode) && (TYPE_MODE (TREE_TYPE (@0)) == DFmode))
>(BUILT_IN_LOGF (convert @0
> )
>
> (if (flag_unsafe_math_optimizations)
>  /* Optimize "(float) expN(x)" [where x is type double] to
>  "expNf((float) x)", i.e. call the 'f' single precision func */
>  (simplify
>   (convert (BUILT_IN_SIN @0))
>   (if ((TYPE_MODE (type) == SFmode) && (TYPE_MODE (TREE_TYPE (@0)) == DFmode))
>(BUILT_IN_SINF (convert @0
> )

Even better:

(if (flag_unsafe_math_optimizations)
  (for fn (BUILT_IN_LOG BUILT_IN_SIN)
ffn (BUILT_IN_LOGF BUILT_IN_SINF)
   (simplify
 (convert (fn @0))
 (if ((TYPE_MODE (type) == SFmode) && (TYPE_MODE (TREE_TYPE (@0))
== DFmode))
  (ffn (convert @0))

by means of recursion this will pick up (float)log(sin((double)x)) but
also (float)log(sin(x)) if x is double
which may be undesired.  To avoid that make it match (convert (fn
(convert @0))) instead.

Richard.


>
>