The conditions when convert from double to float is permitted?

2020-12-10 Thread Xionghu Luo via Gcc

Hi,

I have a maybe silly question about whether there is any *standard*
or *options* (like -ffast-math) for GCC that allow double to float
demotion optimization?  For example,

1) from PR22326:

#include 

float foo(float f, float x, float y) {
return (fabs(f)*x+y);
}

The fabs will return double result but it could be demoted to float
actually since the function returns float finally.  


2) From PR90070:

 double temp1 = (double)r->red;
 double temp2 = (double)aggregate.red;
 double temp3 = temp2 + (temp1 * 5.0);
 aggregate.red = (float) temp3;

The last data type is also float, so we could also replace the double
precision calculation with single precision.

So, Is it OK to use float instead of double for all or we could ONLY
replace cases when there is explicit double to float conversion in
source code for fast-math build?

However, fast-math option doesn't directly means we could ignore the
precision.

@item -ffast-math
@opindex ffast-math
Sets the options @option{-fno-math-errno}, @option{-funsafe-math-optimizations},
@option{-ffinite-math-only}, @option{-fno-rounding-math},
@option{-fno-signaling-nans}, @option{-fcx-limited-range} and
@option{-fexcess-precision=fast}.


Background is I cooked a patch to track all the double<->float related
convert instructions in backprop pass backwardly, and optimize these
instructions (including assignment in basic block and phi instructions
cross basic block) from double to float with type check, though most
regression test cases passed, there are a few gfortran cases reported
run error of IEEE_INVALID_FLAG, I didn't root caused where the error
happens yet.
There are doubts whether this kind of optimization is *legal* for fast math?
If there are many converts happens in different blocks/regions, how to split
them to avoid interference?

Attached the patch.  Sorry that the code has many hacks and not well refined as
it is still at very early version and just functionally works for most cases.


Thanks,
Xionghu
From 8cd4e2ad438466db87bce5535af1847d3d1ef844 Mon Sep 17 00:00:00 2001
From: Xionghu Luo 
Date: Wed, 25 Nov 2020 20:39:47 -0600
Subject: [PATCH] Implement double promotion remove in backprop

(float) ((double) abs (x) * (double) y + (double) z) could be optimized
to (float)(abs(x) * y + z) if x, y, z are all float types with fast-math
mode.
---
 gcc/gimple-ssa-backprop.c | 247 +-
 1 file changed, 243 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-ssa-backprop.c b/gcc/gimple-ssa-backprop.c
index ced0e6ed83c..e1fb09a8fb3 100644
--- a/gcc/gimple-ssa-backprop.c
+++ b/gcc/gimple-ssa-backprop.c
@@ -103,6 +103,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "tree-hash-traits.h"
 #include "case-cfn-macros.h"
+#include "stor-layout.h"
+#include "flags.h"
 
 namespace {
 
@@ -113,6 +115,8 @@ public:
   usage_info () : flag_word (0) {}
   usage_info &operator &= (const usage_info &);
   usage_info operator & (const usage_info &) const;
+  usage_info &operator |= (const usage_info &);
+  usage_info operator | (const usage_info &) const;
   bool operator == (const usage_info &) const;
   bool operator != (const usage_info &) const;
   bool is_useful () const;
@@ -125,6 +129,8 @@ public:
 {
   /* True if the uses treat x and -x in the same way.  */
   unsigned int ignore_sign : 1;
+  /* True if the uses treat double and float in the same way.  */
+  unsigned int ignore_convert : 1;
 } flags;
 /* All the flag bits as a single int.  */
 unsigned int flag_word;
@@ -163,6 +169,21 @@ usage_info::operator & (const usage_info &other) const
   return info;
 }
 
+usage_info &
+usage_info::operator |= (const usage_info &other)
+{
+  flag_word |= other.flag_word;
+  return *this;
+}
+
+usage_info
+usage_info::operator | (const usage_info &other) const
+{
+  usage_info info (*this);
+  info |= other;
+  return info;
+}
+
 bool
 usage_info::operator == (const usage_info &other) const
 {
@@ -203,6 +224,11 @@ dump_usage_info (FILE *file, tree var, usage_info *info)
   dump_usage_prefix (file, var);
   fprintf (file, "sign bit not important\n");
 }
+  if (info->flags.ignore_convert)
+{
+  dump_usage_prefix (file, var);
+  fprintf (file, "convert from float to double not important\n");
+}
 }
 
 /* Represents one execution of the pass.  */
@@ -257,6 +283,10 @@ private:
  along with information that describes all uses.  */
   auto_vec  m_vars;
 
+  int m_converts;
+
+  bool m_start;
+
   /* A bitmap of blocks that we have finished processing in the initial
  post-order walk.  */
   auto_sbitmap m_visited_blocks;
@@ -279,7 +309,7 @@ backprop::backprop (function *fn)
   : m_fn (fn),
 m_info_pool ("usage_info"),
 m_visited_blocks (last_basic_block_for_fn (m_fn)),
-m_worklist_names (BITMAP_ALLOC (NULL))
+m_worklist_names (BITMAP_ALLOC (NULL)), m_converts(0), m_start(false)
 {
   bitmap_clear (m_visited_blocks);
 }
@@ -413,10

Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Richard Biener via Gcc
On Thu, Dec 10, 2020 at 9:47 AM Xionghu Luo via Gcc  wrote:
>
> Hi,
>
> I have a maybe silly question about whether there is any *standard*
> or *options* (like -ffast-math) for GCC that allow double to float
> demotion optimization?  For example,

The only option we have to this effect would be -funsafe-math-optimizations.
But even with that we usually try to bound errors - not miscomparing
SPEC CPU with -Ofast (which includes -funsafe-math-optimizations)
is one of the goals.

For the specific case...

> 1) from PR22326:
>
> #include 
>
> float foo(float f, float x, float y) {
> return (fabs(f)*x+y);
> }
>
> The fabs will return double result but it could be demoted to float
> actually since the function returns float finally.
>
> 2) From PR90070:
>
>   double temp1 = (double)r->red;
>   double temp2 = (double)aggregate.red;
>   double temp3 = temp2 + (temp1 * 5.0);

temp1 * 5 could be not representable in float but the
result of the add could so the transform could result
in -+Inf where the original computation was fine (but
still very large result).

Usually in such cases one could say we should implement some
diagnostic hints to the user that he might consider refactoring
his code to use float computations because we cannot really say
whether it's safe (we do at the moment not implement value-range
propagation for floating point types).

>   aggregate.red = (float) temp3;
>
> The last data type is also float, so we could also replace the double
> precision calculation with single precision.
>
> So, Is it OK to use float instead of double for all or we could ONLY
> replace cases when there is explicit double to float conversion in
> source code for fast-math build?
>
> However, fast-math option doesn't directly means we could ignore the
> precision.
>
> @item -ffast-math
> @opindex ffast-math
> Sets the options @option{-fno-math-errno}, 
> @option{-funsafe-math-optimizations},
> @option{-ffinite-math-only}, @option{-fno-rounding-math},
> @option{-fno-signaling-nans}, @option{-fcx-limited-range} and
> @option{-fexcess-precision=fast}.
>
>
> Background is I cooked a patch to track all the double<->float related
> convert instructions in backprop pass backwardly, and optimize these
> instructions (including assignment in basic block and phi instructions
> cross basic block) from double to float with type check, though most
> regression test cases passed, there are a few gfortran cases reported
> run error of IEEE_INVALID_FLAG, I didn't root caused where the error
> happens yet.
> There are doubts whether this kind of optimization is *legal* for fast math?
> If there are many converts happens in different blocks/regions, how to split
> them to avoid interference?

With fast math any optimization is *legal* (all applicable rules are lifted) but
we still want to adhere to some basic QOI, otherwise fast math becomes
useless.

> Attached the patch.  Sorry that the code has many hacks and not well refined 
> as
> it is still at very early version and just functionally works for most cases.
>
>
> Thanks,
> Xionghu


Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Marc Glisse

On Thu, 10 Dec 2020, Xionghu Luo via Gcc wrote:


I have a maybe silly question about whether there is any *standard*
or *options* (like -ffast-math) for GCC that allow double to float
demotion optimization?  For example,

1) from PR22326:

#include 

float foo(float f, float x, float y) {
return (fabs(f)*x+y);
}

The fabs will return double result but it could be demoted to float
actually since the function returns float finally.


With fp-contract, this is (float)fma((double)f,(double)x,(double)y). This 
could almost be transformed into fmaf(f,x,y), except that the double 
rounding may not be strictly equivalent. Still, that seems like it would 
be no problem with -funsafe-math-optimizations, just like turning 
(float)((double)x*(double)y) into x*y, as long as it is a single operation 
with casts on all inputs and output. Whether there are cases that can be 
optimized without -funsafe-math-optimizations is harder to tell.


--
Marc Glisse


Re: Move STV(scalars_to_vector) RTL pass from i386 to target independent

2020-12-10 Thread Richard Biener
On Thu, 10 Dec 2020, Dinar Temirbulatov wrote:

> Hi,
> I have observed that STV2 pass added ~20% on CPU2006 456.hmmer with mostly
> by transforming V4SI operations. Looking at the pass itself, it looks like
> it might be transformed into RTL architecture-independent, and the pass
> deals only not wide integer operations. I think it might be useful on other
> targets as well?

The pass moves GPR operations to vector register operations.  While
conceptually this is something generic the implementation is quite
dependent on the actual implementation of the vector patterns,
specifically in the way it uses pardoxical subregs and of course
restricts itself to supported operations and costing.

The 456.hmmer improvement is because the x86 micro-architectures do
not seem to like back-to-back cmov (or even cmov + branch, but less so)
implementing min(min(...)) but the vector min operation is much
faster.

So I guess if there's another target lacking integer min/max operation
using GPRs but do have vector integer min/max it's easy to look
at 456.hmmer and replace the one (or was it two) important occurance
with manually crafted assembly to see if it's worth it.  Then implement
a target-local STV copy that "works".  After we have two implementations
we can see whether commonizing is possible.

Note there is/was quite some fallout because doing STV is not always
profitable and it's difficult to determine exactly when it is (not).
Because we still don't quite understand _why_ 456.hmmer is so much
faster with vector min/max compared to cmov.

Richard.


Re: Help with PR97872

2020-12-10 Thread Richard Biener
On Wed, 9 Dec 2020, Prathamesh Kulkarni wrote:

> On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 7 Dec 2020 at 17:37, Hongtao Liu  wrote:
> > >
> > > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > For the test mentioned in PR, I was trying to see 
> > > > > > > > > > > > > > > if we could do
> > > > > > > > > > > > > > > specialized expansion for vcond in target when 
> > > > > > > > > > > > > > > operands are -1 and 0.
> > > > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > where r117 and r118 are set to vector constants 
> > > > > > > > > > > > > > > -1 and 0 respectively.
> > > > > > > > > > > > > > > However, I am not sure if there's a way to check 
> > > > > > > > > > > > > > > if the register is
> > > > > > > > > > > > > > > constant during expansion time (since we don't 
> > > > > > > > > > > > > > > have df analysis yet) ?
> > > > >
> > > > > It seems to me that all you need to do is relax the predicates of op1
> > > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > > > debugged it, but I see that vcondmn in neon.md only accepts
> > > > > s_register_operand.
> > > > >
> > > > > (define_expand "vcond"
> > > > >   [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > (if_then_else:VDQW
> > > > >   (match_operator 3 "comparison_operator"
> > > > > [(match_operand:VDQW 4 "s_register_operand")
> > > > >  (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > >   (match_operand:VDQW 1 "s_register_operand")
> > > > >   (match_operand:VDQW 2 "s_register_operand")))]
> > > > >   "TARGET_NEON && (! || 
> > > > > flag_unsafe_math_optimizations)"
> > > > > {
> > > > >   arm_expand_vcond (operands, mode);
> > > > >   DONE;
> > > > > })
> > > > >
> > > > > in sse.md it's defined as
> > > > > (define_expand "vcondu"
> > > > >   [(set (match_operand:V_512 0 "register_operand")
> > > > > (if_then_else:V_512
> > > > >   (match_operator 3 ""
> > > > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > > >  (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > > >   (match_operand:V_512 1 "general_operand")
> > > > >   (match_operand:V_512 2 "general_operand")))]
> > > > >   "TARGET_AVX512F
> > > > >&& (GET_MODE_NUNITS (mode)
> > > > >== GET_MODE_NUNITS (mode))"
> > > > > {
> > > > >   bool ok = ix86_expand_int_vcond (operands);
> > > > >   gcc_assert (ok);
> > > > >   DONE;
> > > > > })
> > > > >
> > > > > then we can get operands[1] and operands[2] as
> > > > >
> > > > > (gdb) p debug_rtx (operands[1])
> > > > >  (const_vector:V16QI [
> > > > > (const_int -1 [0x]) repeated x16
> > > > > ])
> > > > > (gdb) p debug_rtx (operands[2])
> > > > > (reg:V16QI 82 [ _2 ])
> > > > > (const_vector:V16QI [
> > > > > (const_int 0 [0]) repeated x16
> > > > > ])
> > > > Hi Hongtao,
> > > > Thanks for the suggestions!
> > > > However IIUC from vector extensions doc page, the result of vector
> > > > comparison is defined to be 0
> > > > or -1, so would it be better to canonicalize
> > > > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > > > adjust targets if required ?
> > >
> > > Yes, it would be more straightforward to handle it in gimple isel, I
> > > would adjust the backend and testcase after you check in the patch.

Re: Help with PR97872

2020-12-10 Thread Prathamesh Kulkarni via Gcc
On Thu, 10 Dec 2020 at 17:11, Richard Biener  wrote:
>
> On Wed, 9 Dec 2020, Prathamesh Kulkarni wrote:
>
> > On Tue, 8 Dec 2020 at 14:36, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 7 Dec 2020 at 17:37, Hongtao Liu  wrote:
> > > >
> > > > On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu  wrote:
> > > > > >
> > > > > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > For the test mentioned in PR, I was trying to 
> > > > > > > > > > > > > > > > see if we could do
> > > > > > > > > > > > > > > > specialized expansion for vcond in target when 
> > > > > > > > > > > > > > > > operands are -1 and 0.
> > > > > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > where r117 and r118 are set to vector constants 
> > > > > > > > > > > > > > > > -1 and 0 respectively.
> > > > > > > > > > > > > > > > However, I am not sure if there's a way to 
> > > > > > > > > > > > > > > > check if the register is
> > > > > > > > > > > > > > > > constant during expansion time (since we don't 
> > > > > > > > > > > > > > > > have df analysis yet) ?
> > > > > >
> > > > > > It seems to me that all you need to do is relax the predicates of 
> > > > > > op1
> > > > > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > > > > debugged it, but I see that vcondmn in neon.md only accepts
> > > > > > s_register_operand.
> > > > > >
> > > > > > (define_expand "vcond"
> > > > > >   [(set (match_operand:VDQW 0 "s_register_operand")
> > > > > > (if_then_else:VDQW
> > > > > >   (match_operator 3 "comparison_operator"
> > > > > > [(match_operand:VDQW 4 "s_register_operand")
> > > > > >  (match_operand:VDQW 5 "reg_or_zero_operand")])
> > > > > >   (match_operand:VDQW 1 "s_register_operand")
> > > > > >   (match_operand:VDQW 2 "s_register_operand")))]
> > > > > >   "TARGET_NEON && (! || 
> > > > > > flag_unsafe_math_optimizations)"
> > > > > > {
> > > > > >   arm_expand_vcond (operands, mode);
> > > > > >   DONE;
> > > > > > })
> > > > > >
> > > > > > in sse.md it's defined as
> > > > > > (define_expand "vcondu"
> > > > > >   [(set (match_operand:V_512 0 "register_operand")
> > > > > > (if_then_else:V_512
> > > > > >   (match_operator 3 ""
> > > > > > [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > > > > >  (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > > > > >   (match_operand:V_512 1 "general_operand")
> > > > > >   (match_operand:V_512 2 "general_operand")))]
> > > > > >   "TARGET_AVX512F
> > > > > >&& (GET_MODE_NUNITS (mode)
> > > > > >== GET_MODE_NUNITS (mode))"
> > > > > > {
> > > > > >   bool ok = ix86_expand_int_vcond (operands);
> > > > > >   gcc_assert (ok);
> > > > > >   DONE;
> > > > > > })
> > > > > >
> > > > > > then we can get operands[1] and operands[2] as
> > > > > >
> > > > > > (gdb) p debug_rtx (operands[1])
> > > > > >  (const_vector:V16QI [
> > > > > > (const_int -1 [0x]) repeated x16
> > > > > > ])
> > > > > > (gdb) p debug_rtx (operands[2])
> > > > > > (reg:V16QI 82 [ _2 ])
> > > > > > (const_vector:V16QI [
> > > > > > (const_int 0 [0]) repeated x16
> > > > > > ])
> > > > > Hi Hongtao,
> > > > > Thanks for the suggestions!
> > > > > However IIUC from vector extensions doc page, the result of vector
> > > > > comparison is defined to be 0
> > > > > or -1, so would it be b

Re: No warning for module global variable which is set but never used

2020-12-10 Thread webmaster
Is it possible to request such feature?

Am 09.12.2020 um 16:45 schrieb webmaster:
> I have the following Code C\C++:
> 
> static int foo = 0;
> 
> static void bar(void)
> {
> foo = 1;
> }
> 
> Here it is clear for the compiler that the variable foo can only be
> accessed from the same modul and not from ther modules. From the
> explanations before I understand that the variable is removed due to
> optimization. But I do not understand why GCC does not throws a warning.
> 
>>From my point of view it is responsibility of the developer to remove
> the unused variable.
> 



Re: No warning for module global variable which is set but never used

2020-12-10 Thread David Brown
On 10/12/2020 16:10, webmaster wrote:

(As a general rule, you'll get more useful responses if you use your
name in your posts.  It's common courtesy.)


> Is it possible to request such feature?
> 

Of course you can file a request for it.  Go to the gcc bugzilla site:



First, search thoroughly to see if it is already requested - obvious
duplicate requests just waste developers' time.  If you find a
duplicate, add a comment and put yourself on the cc list.  If you don't
find a duplicate, file it as a new bug.

Given the replies on this list from gcc developers, I would not hold my
breath waiting for this feature.  It is unlikely to be implemented
unless the relevant compiler passes are re-organised in some way, or
extra information is tracked.  So I don't think it will be a priority.

However, it's always good to track these things - and if many people
want a particular feature, it can't harm its chances of getting done
eventually.

mvh.,

David


> Am 09.12.2020 um 16:45 schrieb webmaster:
>> I have the following Code C\C++:
>>
>> static int foo = 0;
>>
>> static void bar(void)
>> {
>> foo = 1;
>> }
>>
>> Here it is clear for the compiler that the variable foo can only be
>> accessed from the same modul and not from ther modules. From the
>> explanations before I understand that the variable is removed due to
>> optimization. But I do not understand why GCC does not throws a warning.
>>
>> >From my point of view it is responsibility of the developer to remove
>> the unused variable.
>>
> 
> 



Re: No warning for module global variable which is set but never used

2020-12-10 Thread webmaster
Ahhh, ok. Good too know.

I think also it is not of high priority ;-)

Greets

Patrick

Am 10.12.2020 um 16:26 schrieb David Brown:
> On 10/12/2020 16:10, webmaster wrote:
> 
> (As a general rule, you'll get more useful responses if you use your
> name in your posts.  It's common courtesy.)
> 
> 
>> Is it possible to request such feature?
>>
> 
> Of course you can file a request for it.  Go to the gcc bugzilla site:
> 
> 
> 
> First, search thoroughly to see if it is already requested - obvious
> duplicate requests just waste developers' time.  If you find a
> duplicate, add a comment and put yourself on the cc list.  If you don't
> find a duplicate, file it as a new bug.
> 
> Given the replies on this list from gcc developers, I would not hold my
> breath waiting for this feature.  It is unlikely to be implemented
> unless the relevant compiler passes are re-organised in some way, or
> extra information is tracked.  So I don't think it will be a priority.
> 
> However, it's always good to track these things - and if many people
> want a particular feature, it can't harm its chances of getting done
> eventually.
> 
> mvh.,
> 
> David
> 
> 
>> Am 09.12.2020 um 16:45 schrieb webmaster:
>>> I have the following Code C\C++:
>>>
>>> static int foo = 0;
>>>
>>> static void bar(void)
>>> {
>>> foo = 1;
>>> }
>>>
>>> Here it is clear for the compiler that the variable foo can only be
>>> accessed from the same modul and not from ther modules. From the
>>> explanations before I understand that the variable is removed due to
>>> optimization. But I do not understand why GCC does not throws a warning.
>>>
>>> >From my point of view it is responsibility of the developer to remove
>>> the unused variable.
>>>
>>
>>

-- 
___  _ __   _ _  __   _   _  _  _
 |_ _|  __| (_)__| | (_) |_   / _|___ _ _  | |_ ___| |_   | |_  _| |___
  | |  / _` | / _` | | |  _| |  _/ _ \ '_| |  _/ -_) ' \  | | || | |_ /
 |___| \__,_|_\__,_| |_|\__| |_| \___/_|\__\___|_||_| |_|\_,_|_/__|

-BEGIN PGP PUBLIC KEY BLOCK-
Version: GnuPG v2.0.17 (MingW32)

mQENBFHMmT8BCAC0smvU7Bq1ABxAhvBRn7d4ekkk95aCE4TTQo4wy1z/rGLhQfdt
dhiD+Vy61vGrsdK3ei5sW6rBvX2m8+YmBi+8AAgSiZmS0JM3Zz3cmTi5oh0D/yM8
4aDj7wQYfJyzSmYN8InAQ5eA77lwIdqG27kR9wga2szeJwCnWReta0R+7YFkpUW+
zUlf4SWcUx5SmBsaiELQpm+Qcn+fyopo12RX6YVmoNPBvN2nDXDnRhUCKGc+0xhD
UrBpCHrApK6sTnMsD34ClCLTL2L1gckQ0AsQqY3PJlx3R8kIJxlmr6R3WnjPMIG0
lqrukB9PcOrHM1MZXK1gK6AtypHBN98lr8Z9ABEBAAG0KndlYm1hc3RlciA8d2Vi
bWFzdGVyQGRlZmNvbi1jYy5keW5kbnMub3JnPokBPgQTAQIAKAUCUcyZPwIbIwUJ
CWYBgAYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQcN1vxvRQl+0SEwf+KXjf
YtiSUSVS11uqeQ/8g46NwmNa91P3toZvEd7vhLSbjnL9bi/vApzNnUTGT3VP4/NA
dg9SbR4qKlSr8T+YikRMV3tiuiVq8m7g00qM9y8MIomwJTounz8VdO/aJXFSOxAK
Bb6ElREADspCzr2qSZCnozWUzbd+b8owbGeRRq3e33Aa5Nlm/xDRxGDWANbaIA8q
Gkibvy3vWEwrxiwsakvHGaEZnPEtlNm3M1xcmFAuyl73qzUMkLN0u9E/2igo4EB5
EdMb5Ab5hfWdljxBqJr0tsvMfSK4VkzMCbKYkTqHZIRPQnhiSBE6Yo1Q6RCl/Hht
bkvU4RA0J+NkXMZljrkBDQRRzJk/AQgA0HojJnK0uhEkAnbmszYsf477DV+LD02s
ZEAlLGhJlf9qYaDiPMPwaZ3nK8/PYKzPpBWfHgRQP97rLHPIVJYl3BHDa/nWeZ2b
e/HzYhpX0djbK9qe6W/CTfGbXmC/y+4dDGB8dvtTAW3JILm7xEdwiWtywozEVy7V
lnMK4JQvlfOh+3XO6qv71FXyuRkObvkYzqvxUYHewtvvObcVxXHP0C0O6LB44iAW
2boZVVuiHdudnyNAezJajPMUT8SnI0bwL6+0TgnHL4cKNUEPQljIrrvi+9nCkq7V
uBtnsYtyoo2reoxmCbX/Z1zZsxdUcKpeJHlc5AypyN8DUJ+APJ9NnwARAQABiQEl
BBgBAgAPBQJRzJk/AhsMBQkJZgGAAAoJEHDdb8b0UJftOY0H/1TChmQrJC/qzefW
PK7EqFlBg3TIEXdu8JHjF42ZOgzQRfp7E2wWzEx0Y45lNXMs6Yg15hWCEDaUDF6F
5WZKNrP8xIldyR9Aw7fyKqjZ9UuKovqofHsCiaSO7nWzGM6GF3nBDNI9NcFve/wN
wggyjAbohOJrJGal3N0HlG3cakqjEmjBe1gQEMC0ZPlWstb/cqqr49TNPrRmQc4P
SyGffh8Xqhw94m1LDBXFEaYe7AxjNk1sPAVfO1rOdLF6GOun/UwgbhDQX/Rb9C3t
AhjSgyFEiR/gfrUZ7R6SY51qOUf1lN5ZN85C/x27XoZWYlsNaH3Ei6nG+yeswBMk
ZRMbezQ=
=Med3
-END PGP PUBLIC KEY BLOCK-


RFC v2: Re: cacheflush.2

2020-12-10 Thread Alejandro Colomar (man-pages) via Gcc
Hi all,

v2:

[
NOTES
   Unless  you  need  the finer grained control that this system
   call provides, you probably want  to  use  the  GCC  built-in
   function  __builtin___clear_cache(),  which  provides  a more
   portable interface:

   void __builtin___clear_cache(void *begin, void *end);
]

If you like it, I'll send the patch.

BTW, I'll also have a look and document the different prototypes for
cacheflush(2).

Thanks,

Alex

On 12/10/20 8:20 PM, Heinrich Schuchardt wrote:
> On 12/10/20 7:17 PM, Dave Martin wrote:
>> On Wed, Dec 09, 2020 at 07:34:09PM +0100, Alejandro Colomar
>> (man-pages) wrote:
>>> Hi Heinrich & Michael,
>>>
>>> What about the following?:
>>>
>>> [
>>> NOTES
>>>     GCC provides a similar function, which may be useful on  archi‐
>>>     tectures that lack this system call:
>>>
>>>     void __builtin___clear_cache(void *begin, void *end);
>>> ]
>>>
>>> Cheers,
>>>
>>> Alex
>>
>> Maybe we should discourage people from calling the cacheflush syscall?
>>
>> I think that people shouldn't be using the syscall unless they really
>> need the finer grained control it provides, and are prepared to take a
>> hit to portability.
>>
>> (On arches where userspace is allowed to do cache flushing directly,
>> __builtin___clear_cache() should transparently do the right thing, with
>> no syscall overhead -- if not, that's probably a bug in the toolchain or
>> compiler support library.)
> 
> What the compiler builtin does depends on the architecture (e.g. nothing
> for x86, cacheflush() for MIPS, a private syscall (0xf0002) on ARM,
> assembly code on ARM64, ...) and on the the operating system (Linux,
> BSD, OS X). For portable code the builtin is really the best choice.
> 
> Best regards
> 
> Heinrich
> 
>>
>> [...]
>>
>> Cheers
>> ---Dave
>>
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es


gcc-8-20201210 is now available

2020-12-10 Thread GCC Administrator via Gcc
Snapshot gcc-8-20201210 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/8-20201210/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 
revision 87c40733898283f0d1e48bcbf8055c2718064e77

You'll find:

 gcc-8-20201210.tar.xzComplete GCC

  SHA256=71cc524e38f05a484c410bbf821a5121e5bef942df4d27e9c64de7bfdc26b799
  SHA1=e8032a1bfcde750add6fee53b91107f828e102f5

Diffs from 8-20201203 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Integer division on x86 -m32

2020-12-10 Thread Lucas de Almeida via Gcc
Hello,
when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to
__divdi3 is always output, even though it seems the use of the idiv
instruction could be faster.
This seems to remain even under -Ofast and other available options.

To illustrate, this godbolt link: https://godbolt.org/z/hq4GKb
With code

#include 
int32_t d(int64_t a, int32_t b) {
return a / b;
}

Compiles to

d(long long, int):
sub esp, 12
mov eax, DWORD PTR [esp+24]
cdq
pushedx
pusheax
pushDWORD PTR [esp+28]
pushDWORD PTR [esp+28]
call__divdi3
add esp, 28
ret

Why is this?

-- 
Lucas de Almeida


Looking for a Minecraft Server?

2020-12-10 Thread PUSCraft via Gcc
 Minecraft Server List
[http://pub.linuxman.co:3004/mosaico/img?src=http%3A%2F%2Fpub.linuxman.co%3A3004%2Ffiles%2Ftemplate%2Ffile%2F1%2F99a51e72c1e6f787e2748c2e943571d6&method=resize¶ms=258,null]
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/RZoOyKlpNm] 
Looking for a Minecraft Server? Introducing
PUSCraft!
[http://pub.linuxman.co:3004/mosaico/img?src=http%3A%2F%2Fpub.linuxman.co%3A3004%2Ffiles%2Ftemplate%2Ffile%2F1%2F99a51e72c1e6f787e2748c2e943571d6&method=resize¶ms=258,null]
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/RZoOyKlpNm] 
Introducing PUSCraft! PUSCraft introduces a NEW
place for you to find GREAT Minecraft Servers for the whole Minecraft Community!

DISCOVER New Servers! Browse through pages of servers you NEVER thought you 
would come across!

DISCOVER New Servers!
[http://pub.linuxman.co:3004/mosaico/img?src=http%3A%2F%2Fpub.linuxman.co%3A3004%2Ffiles%2Ftemplate%2Ffile%2F1%2F84cb5617db8f43c084b9b2f5b441783a&method=resize¶ms=258,null]
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/RZoOyKlpNm] 
Feed your preference!
[http://pub.linuxman.co:3004/mosaico/img?src=http%3A%2F%2Fpub.linuxman.co%3A3004%2Ffiles%2Ftemplate%2Ffile%2F1%2Fc5951bf8b8f17eb99b67435c87e59686&method=resize¶ms=258,null]
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/RZoOyKlpNm] 
Feed your preference! Do you prefer Skyblock
Servers? SMP? or perhaps Mini-Games? With PUSCraft you can find specific 
servers that fit your niche of game modes!

ADD your OWN Server! The absolute BEST part of PUSCraft is server owners can 
post their servers to draw in new players!

POST NOW 
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/lwdHnxZpAy] 
ADD your OWN Server!
[http://pub.linuxman.co:3004/mosaico/img?src=http%3A%2F%2Fpub.linuxman.co%3A3004%2Ffiles%2Ftemplate%2Ffile%2F1%2F1aa32f49e91f79d61e655b455c9311a8&method=resize¶ms=258,null]
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/RZoOyKlpNm] 
BROWSE NOW
[http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc/RZoOyKlpNm] 
PUSCraft LLC

Minecraft Server List Website for worldwide Minecraft players

Spring Garden Dr Hutto, TX 78634

Unsubscribe 
[http://pub.linuxman.co:3004/subscription/UMVFQKTHWY/unsubscribe/MGgeKgXsYc?c=jCNSyLKswp]
 mt [http://pub.linuxman.co:3004/links/jCNSyLKswp/UMVFQKTHWY/MGgeKgXsYc]


Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Xionghu Luo via Gcc
Thanks,

On 2020/12/10 17:12, Richard Biener wrote:
>> 2) From PR90070:
>>
>>double temp1 = (double)r->red;
>>double temp2 = (double)aggregate.red;
>>double temp3 = temp2 + (temp1 * 5.0);
> temp1 * 5 could be not representable in float but the
> result of the add could so the transform could result
> in -+Inf where the original computation was fine (but
> still very large result).
> 
> Usually in such cases one could say we should implement some
> diagnostic hints to the user that he might consider refactoring
> his code to use float computations because we cannot really say
> whether it's safe (we do at the moment not implement value-range
> propagation for floating point types).
> 

   foo (double x, float y, float z)
  {
  return ( fabs (x) * y - z ) ;
  }

  int main ()
  {
float res = foo (1e38, 5.0, 3e38);
printf ("res:%f\n", res);
  }

(1) $ gcc a.c -Ofast -ffp-contract=off: 

0880 :
 880:   10 0a 20 fc fabsf1,f1
 884:   b2 00 21 fc fmulf1,f1,f2
 888:   28 18 21 fc fsubf1,f1,f3
 88c:   18 08 20 fc frspf1,f1
 890:   20 00 80 4e blr

$ ./a.out
res:19993605713849301312521538346418176.00

(2) $ gcc_MODIFIED a.c -Ofast -ffp-contract=off: 

1660 :
1660:   18 08 00 fc frspf0,f1
1664:   10 02 00 fc fabsf0,f0
1668:   b2 00 00 ec fmuls   f0,f0,f2   // Inf
166c:   28 18 20 ec fsubs   f1,f0,f3
1670:   20 00 80 4e blr

$ ./a.out
res:inf

It's true that if change all double computation to float will result
in INF if "fabs (x) * y" is larger than FLT_MAX, though the double
result in (1) could get back to a large number smaller than FLT_MAX.


But the add/sub could also produces INF similarly,

  foo (double x, float y, float z)
  {
 return ( -fabs (x) + y + z ) ;
  }

  int main ()
  {
 float res = foo (1e38, 1e38, 3e38);
 printf ("res:%f\n", res);
  }

(3) $ gcc a.c -Ofast: 

0880 :
 880:   10 0a 20 fc fabsf1,f1
 884:   28 08 42 fc fsubf2,f2,f1
 888:   2a 18 22 fc faddf1,f2,f3
 88c:   18 08 20 fc frspf1,f1
 890:   20 00 80 4e blr

$ ./a.out
res:3549775575777803994281145270272.00

4) $ gcc_MODIFIED a.c -Ofast:

1660 :
1660:   18 08 20 fc frspf1,f1
1664:   2a 18 42 ec fadds   f2,f2,f3
1668:   10 0a 20 fc fabsf1,f1
166c:   28 08 22 ec fsubs   f1,f2,f1
1670:   20 00 80 4e blr

$ ./a.out
res:inf


Note that the add/sub sequence is different for (3) and (4) since
-funsafe-math-optimizations is implicitly true.  "fp-contract=fast" in
(1) and (2) could avoid Inf as fmads could handle float overflow (verified
it on Power, not sure other targets support this), but without float
value-range info, it is unsafe to change computation from double to
float even for only add/sub expressions.

What's more, It seems also difficult to do computation *partly float
and partly double* in backprop pass since all the expressions are
chained and strong dependent unlike the sign-changing operations,
which could only change expressions partly. 


Thanks,
Xionghu


Re: Integer division on x86 -m32

2020-12-10 Thread Alexander Monakov via Gcc


On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote:

> Hello,
> when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to
> __divdi3 is always output, even though it seems the use of the idiv
> instruction could be faster.
> This seems to remain even under -Ofast and other available options.
> 
> To illustrate, this godbolt link: https://godbolt.org/z/hq4GKb
> With code
> 
> #include 
> int32_t d(int64_t a, int32_t b) {
> return a / b;
> }
> 
> Compiles to
> 
> d(long long, int):
> sub esp, 12
> mov eax, DWORD PTR [esp+24]
> cdq
> pushedx
> pusheax
> pushDWORD PTR [esp+28]
> pushDWORD PTR [esp+28]
> call__divdi3
> add esp, 28
> ret
> 
> Why is this?

C evaluation rules for this are such that first 'b' is extended to int64_t,
the division is done in int64_t, and its result is truncated to int32_t in
an implementation-defined manner. Thus, it must always produce a value,
except if (b == 0 || b == -1 && a == INT64_MIN), in which case division
causes undefined behavior.

The x86 'idiv' instruction, however, will raise a divide error if the result
does not fit in a register, so e.g. dividing INT64_MAX by 1 would trap.

Alexander


Re: Integer division on x86 -m32

2020-12-10 Thread Marc Glisse

On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote:


when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to
__divdi3 is always output, even though it seems the use of the idiv
instruction could be faster.


IIRC, idiv requires that the quotient fit in 32 bits, while your C code 
doesn't. (1LL << 60) / 3 would cause an error with idiv.


It would be possible to use idiv in some cases, if the compiler can prove 
that variables are in the right range, but that's not so easy. You can use 
inline asm to force the use of idiv if you know it is safe for your case, 
the most common being modular arithmetic: if you know that uint32_t a, b, 
c, d are smaller than m (and m!=0), you can compute a*b+c+d in uint64_t, 
then use div to compute that modulo m.


--
Marc Glisse


Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Richard Biener via Gcc
On Fri, Dec 11, 2020 at 7:26 AM Xionghu Luo  wrote:
>
> Thanks,
>
> On 2020/12/10 17:12, Richard Biener wrote:
> >> 2) From PR90070:
> >>
> >>double temp1 = (double)r->red;
> >>double temp2 = (double)aggregate.red;
> >>double temp3 = temp2 + (temp1 * 5.0);
> > temp1 * 5 could be not representable in float but the
> > result of the add could so the transform could result
> > in -+Inf where the original computation was fine (but
> > still very large result).
> >
> > Usually in such cases one could say we should implement some
> > diagnostic hints to the user that he might consider refactoring
> > his code to use float computations because we cannot really say
> > whether it's safe (we do at the moment not implement value-range
> > propagation for floating point types).
> >
>
>foo (double x, float y, float z)
>   {
>   return ( fabs (x) * y - z ) ;
>   }
>
>   int main ()
>   {
> float res = foo (1e38, 5.0, 3e38);
> printf ("res:%f\n", res);
>   }
>
> (1) $ gcc a.c -Ofast -ffp-contract=off:
>
> 0880 :
>  880:   10 0a 20 fc fabsf1,f1
>  884:   b2 00 21 fc fmulf1,f1,f2
>  888:   28 18 21 fc fsubf1,f1,f3
>  88c:   18 08 20 fc frspf1,f1
>  890:   20 00 80 4e blr
>
> $ ./a.out
> res:19993605713849301312521538346418176.00
>
> (2) $ gcc_MODIFIED a.c -Ofast -ffp-contract=off:
>
> 1660 :
> 1660:   18 08 00 fc frspf0,f1
> 1664:   10 02 00 fc fabsf0,f0
> 1668:   b2 00 00 ec fmuls   f0,f0,f2   // Inf
> 166c:   28 18 20 ec fsubs   f1,f0,f3
> 1670:   20 00 80 4e blr
>
> $ ./a.out
> res:inf
>
> It's true that if change all double computation to float will result
> in INF if "fabs (x) * y" is larger than FLT_MAX, though the double
> result in (1) could get back to a large number smaller than FLT_MAX.
>
>
> But the add/sub could also produces INF similarly,
>
>   foo (double x, float y, float z)
>   {
>  return ( -fabs (x) + y + z ) ;
>   }
>
>   int main ()
>   {
>  float res = foo (1e38, 1e38, 3e38);
>  printf ("res:%f\n", res);
>   }
>
> (3) $ gcc a.c -Ofast:
>
> 0880 :
>  880:   10 0a 20 fc fabsf1,f1
>  884:   28 08 42 fc fsubf2,f2,f1
>  888:   2a 18 22 fc faddf1,f2,f3
>  88c:   18 08 20 fc frspf1,f1
>  890:   20 00 80 4e blr
>
> $ ./a.out
> res:3549775575777803994281145270272.00
>
> 4) $ gcc_MODIFIED a.c -Ofast:
>
> 1660 :
> 1660:   18 08 20 fc frspf1,f1
> 1664:   2a 18 42 ec fadds   f2,f2,f3
> 1668:   10 0a 20 fc fabsf1,f1
> 166c:   28 08 22 ec fsubs   f1,f2,f1
> 1670:   20 00 80 4e blr
>
> $ ./a.out
> res:inf
>
>
> Note that the add/sub sequence is different for (3) and (4) since
> -funsafe-math-optimizations is implicitly true.  "fp-contract=fast" in
> (1) and (2) could avoid Inf as fmads could handle float overflow (verified
> it on Power, not sure other targets support this), but without float
> value-range info, it is unsafe to change computation from double to
> float even for only add/sub expressions.

Yes.  As said it's difficult to second guess the programmer here.
The existing cases doing promotion look at unary functions,
doing exp->expf when arguments are promoted floats and the
return value is casted back to float.  That's also a common programmer
"error" not knowing expf or assuming some kind of magic overloading.

So I'm not entirely convinced such transform is a good idea, at least
by default with -ffast-math.  Maybe have a -fassume-float-limited-range
or so documented as that we assume that double or long double values
used fit in floats?

Richard.

> What's more, It seems also difficult to do computation *partly float
> and partly double* in backprop pass since all the expressions are
> chained and strong dependent unlike the sign-changing operations,
> which could only change expressions partly.
>
>
> Thanks,
> Xionghu


Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Xionghu Luo via Gcc

+cc.


On 2020/12/11 14:25, Xionghu Luo via Gcc wrote:

Thanks,

On 2020/12/10 17:12, Richard Biener wrote:

2) From PR90070:

double temp1 = (double)r->red;
double temp2 = (double)aggregate.red;
double temp3 = temp2 + (temp1 * 5.0);

temp1 * 5 could be not representable in float but the
result of the add could so the transform could result
in -+Inf where the original computation was fine (but
still very large result).

Usually in such cases one could say we should implement some
diagnostic hints to the user that he might consider refactoring
his code to use float computations because we cannot really say
whether it's safe (we do at the moment not implement value-range
propagation for floating point types).



foo (double x, float y, float z)
   {
   return ( fabs (x) * y - z ) ;
   }

   int main ()
   {
 float res = foo (1e38, 5.0, 3e38);
 printf ("res:%f\n", res);
   }

(1) $ gcc a.c -Ofast -ffp-contract=off:

0880 :
  880:   10 0a 20 fc fabsf1,f1
  884:   b2 00 21 fc fmulf1,f1,f2
  888:   28 18 21 fc fsubf1,f1,f3
  88c:   18 08 20 fc frspf1,f1
  890:   20 00 80 4e blr

$ ./a.out
res:19993605713849301312521538346418176.00

(2) $ gcc_MODIFIED a.c -Ofast -ffp-contract=off:

1660 :
 1660:   18 08 00 fc frspf0,f1
 1664:   10 02 00 fc fabsf0,f0
 1668:   b2 00 00 ec fmuls   f0,f0,f2   // Inf
 166c:   28 18 20 ec fsubs   f1,f0,f3
 1670:   20 00 80 4e blr

$ ./a.out
res:inf

It's true that if change all double computation to float will result
in INF if "fabs (x) * y" is larger than FLT_MAX, though the double
result in (1) could get back to a large number smaller than FLT_MAX.


But the add/sub could also produces INF similarly,

   foo (double x, float y, float z)
   {
  return ( -fabs (x) + y + z ) ;
   }

   int main ()
   {
  float res = foo (1e38, 1e38, 3e38);
  printf ("res:%f\n", res);
   }

(3) $ gcc a.c -Ofast:

0880 :
  880:   10 0a 20 fc fabsf1,f1
  884:   28 08 42 fc fsubf2,f2,f1
  888:   2a 18 22 fc faddf1,f2,f3
  88c:   18 08 20 fc frspf1,f1
  890:   20 00 80 4e blr

$ ./a.out
res:3549775575777803994281145270272.00

4) $ gcc_MODIFIED a.c -Ofast:

1660 :
 1660:   18 08 20 fc frspf1,f1
 1664:   2a 18 42 ec fadds   f2,f2,f3
 1668:   10 0a 20 fc fabsf1,f1
 166c:   28 08 22 ec fsubs   f1,f2,f1
 1670:   20 00 80 4e blr

$ ./a.out
res:inf


Note that the add/sub sequence is different for (3) and (4) since
-funsafe-math-optimizations is implicitly true.  "fp-contract=fast" in
(1) and (2) could avoid Inf as fmads could handle float overflow (verified
it on Power, not sure other targets support this), but without float
value-range info, it is unsafe to change computation from double to
float even for only add/sub expressions.

What's more, It seems also difficult to do computation *partly float
and partly double* in backprop pass since all the expressions are
chained and strong dependent unlike the sign-changing operations,
which could only change expressions partly.


Thanks,
Xionghu



--
Thanks,
Xionghu