[Bug web/112960] omission in documentation: complex numbers can also have uppercase I and J suffixes

2023-12-11 Thread stephan.stiller at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112960

--- Comment #1 from Stephan Stiller  ---
name of document (forgotten in original bug report text): "GNU C Language
Introduction and Reference Manual"

[Bug web/112960] New: omission in documentation: complex numbers can also have uppercase I and J suffixes

2023-12-11 Thread stephan.stiller at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112960

Bug ID: 112960
   Summary: omission in documentation: complex numbers can also
have uppercase I and J suffixes
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: web
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stephan.stiller at outlook dot com
  Target Milestone: ---

Created attachment 56849
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56849=edit
example for the 4 different complex number suffixes denoting the imaginary part

This page
https://gcc.gnu.org/onlinedocs/gcc/Complex.html
doesn't mention uppercase 'I' and 'J' as legal suffixes denoting the imaginary
part of a complex number, both equivalent to 'i' and 'j'.

These suffixes are described in Richard Stallman's document "":
https://www.gnu.org/software/c-intro-and-ref/
https://www.gnu.org/software/c-intro-and-ref/manual/
(Edition 0.0, section 12.4 Imaginary Constants)

[Bug tree-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2023-10-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #29 from Andrew Pinski  ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630011.html

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-05-23 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #12 from rsandifo at gcc dot gnu.org  
---
The patch in comment 11 is just a related spot improvement.
The PR itself is still unfixed.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-05-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #11 from CVS Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:b096a6ebe9d9f9fed4c105f6555f724eb32af95c

commit r14-1131-gb096a6ebe9d9f9fed4c105f6555f724eb32af95c
Author: Richard Sandiford 
Date:   Tue May 23 11:34:42 2023 +0100

aarch64: Provide FPR alternatives for some bit insertions [PR109632]

At -O2, and so with SLP vectorisation enabled:

struct complx_t { float re, im; };
complx_t add(complx_t a, complx_t b) {
  return {a.re + b.re, a.im + b.im};
}

generates:

fmovw3, s1
fmovx0, d0
fmovx1, d2
fmovw2, s3
bfi x0, x3, 32, 32
fmovd31, x0
bfi x1, x2, 32, 32
fmovd30, x1
faddv31.2s, v31.2s, v30.2s
fmovx1, d31
lsr x0, x1, 32
fmovs1, w0
lsr w0, w1, 0
fmovs0, w0
ret

This is because complx_t is passed and returned in FPRs, but GCC gives
it DImode.  We therefore âneedâ to assemble a DImode pseudo from the
two individual floats, bitcast it to a vector, do the arithmetic,
bitcast it back to a DImode pseudo, then extract the individual floats.

There are many problems here.  The most basic is that we shouldn't
use SLP for such a trivial example.  But SLP should in principle be
beneficial for more complicated examples, so preventing SLP for the
example above just changes the reproducer needed.  A more fundamental
problem is that it doesn't make sense to use single DImode pseudos in a
testcase like this.  I have a WIP patch to allow re and im to be stored
in individual SFmode pseudos instead, but it's quite an invasive change
and might end up going nowhere.

A simpler problem to tackle is that we allow DImode pseudos to be stored
in FPRs, but we don't provide any patterns for inserting values into
them, even though INS makes that easy for element-like insertions.
This patch adds some patterns for that.

Doing that showed that aarch64_modes_tieable_p was too strict:
it didn't allow SFmode and DImode values to be tied, even though
both of them occupy a single GPR and FPR, and even though we allow
both classes to change between the modes.

The *aarch64_bfidi_subreg_ pattern is
especially ugly, but it's not clear what target-independent
code ought to simplify it to, if it was going to simplify it.

We should probably do the same thing for extractions, but that's left
as future work.

After the patch we generate:

ins v0.s[1], v1.s[0]
ins v2.s[1], v3.s[0]
faddv0.2s, v0.2s, v2.2s
fmovx0, d0
ushrd1, d0, 32
lsr w0, w0, 0
fmovs0, w0
ret

which seems like a step in the right direction.

All in all, there's nothing elegant about this patchh.  It just
seems like the least worst option.

gcc/
PR target/109632
* config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow
subregs between any scalars that are 64 bits or smaller.
* config/aarch64/iterators.md (SUBDI_BITS): New int iterator.
(bits_etype): New int attribute.
* config/aarch64/aarch64.md (*insv_reg_)
(*aarch64_bfi_): New patterns.
(*aarch64_bfidi_subreg_): Likewise.

gcc/testsuite/
* gcc.target/aarch64/ins_bitfield_1.c: New test.
* gcc.target/aarch64/ins_bitfield_2.c: Likewise.
* gcc.target/aarch64/ins_bitfield_3.c: Likewise.
* gcc.target/aarch64/ins_bitfield_4.c: Likewise.
* gcc.target/aarch64/ins_bitfield_5.c: Likewise.
* gcc.target/aarch64/ins_bitfield_6.c: Likewise.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-05-02 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #10 from rsandifo at gcc dot gnu.org  
---
After prototyping this further, I no longer think that lowering
at the gimple level is the best answer.  (I should have listened
to Richi.)  Although it works, its major drawback is that
it's one-sided: it allows the current function's PARM_DECLs
and returns to be lowered to individual scalars, but it does
nothing for calls to other functions.  Being one-sided means
(a) that lowering only solves half the problem and (b) that tail
calls cannot be handled easily after lowering.

One thing that does seem to work is to force the structure to have
V2SF (and fix the inevitable ABI fallout).  That could only be done
conditionally, based on a target hook.  But it seems to fix both
test cases: the pass-by-reference one and the pass-by-value one.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #9 from Tamar Christina  ---
Thank you!

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-04-27
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #8 from rsandifo at gcc dot gnu.org  
---
Have a (still hacky) patch that also fixes the example in
comment 4, giving:

fadds1, s1, s3
fadds0, s0, s2
ret

Will work on it a bit more before sending an RFC.  Can imagine
the approach will be somewhat controversial!

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #7 from rsandifo at gcc dot gnu.org  
---
Thinking more about it, it would probably be better to defer the
split until around lower_complex time, after IPA (especially inlining),
NRV and tail-recursion.  Doing it there should also make it easier
to split arguments.

(In reply to Tamar Christina from comment #6)
> That's an interesting approach, I think it would also fix
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 would it not? Since the
> int16x8x3_t return would be "scalarized" avoiding the bad expansion?
I don't think it will help with that, since the returned value
there is a natural V3x8HI (rather than something that the ABI splits
apart).  Splitting in that case might pessimise cases where the
return value is loaded as a whole, rather than assigned to
individually.

But it might be worth giving SRA the option of splitting even
in that case, as a follow-on optimisation, if it fits naturally
with the definitions.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #6 from Tamar Christina  ---
That's an interesting approach, I think it would also fix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 would it not? Since the
int16x8x3_t return would be "scalarized" avoiding the bad expansion?

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #5 from rsandifo at gcc dot gnu.org  
---
Created attachment 54941
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54941=edit
hacky proof-of-concept patch

This is a very hacky proof of concept patch.  Don't try it on
anything serious, and certainly don't try to bootstrap with it --
it'll fall over in the slightest breeze.

But it does produce:

ldp s3, s2, [x0]
ldp s0, s1, [x1]
fadds1, s2, s1
fadds0, s3, s0
ret

for the original testcase.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
Maybe worth noting that if the complex arguments are passed
by value, to give:

struct complx_t {
float re;
float im;
};

complx_t
add(const complx_t a, const complx_t b) {
  return {a.re + b.re, a.im + b.im};
}

and SLP is disabled, we get:

fmovw4, s1
fmovw3, s3
fmovx0, d0
fmovx1, d2
mov x2, 0
bfi x0, x4, 32, 32
bfi x1, x3, 32, 32
fmovd0, x0
fmovd1, x1
sbfxx3, x0, 0, 32
sbfxx0, x1, 0, 32
ushrd1, d1, 32
fmovd3, x0
fmovd2, x3
ushrd0, d0, 32
fadds2, s2, s3
fadds0, s0, s1
fmovw1, s2
fmovw0, s0
bfi x2, x1, 0, 32
bfi x2, x0, 32, 32
lsr x0, x2, 32
lsr w2, w2, 0
fmovs1, w0
fmovs0, w2
ret

which is almost impressive, in its way.

I think we need a way in gimple of “SRA-ing” the arguments
and return value, in cases where that's forced by the ABI.
I.e. provide separate incoming values of a.re and a.im,
and store them to “a” on entry.  Then similarly make the
return stmt return RETURN_DECL.re and RETURN_DECL.im
separately.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #3 from Tamar Christina  ---
note that even if we can't stop SLP, we should be able to generate as efficient
code by being creative about the instruction selection, that's why I marked it
as a target bug :)

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #2 from Tamar Christina  ---
(In reply to Richard Biener from comment #1)
> Well, the usual unknown ABI boundary at function entry/exit.

Yes but LLVM gets it right, so should be a solve able computer science problem.
:)

Note that this was reduced from a bigger routine but end result the same, the
thing shouldn't have been vectorized.

[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

--- Comment #1 from Richard Biener  ---
Well, the usual unknown ABI boundary at function entry/exit.

[Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs

2023-04-26 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632

Bug ID: 109632
   Summary: Inefficient codegen when complex numbers are emulated
with structs
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64*

The following two cases are the same

struct complx_t {
float re;
float im;
};

complx_t
add(const complx_t , const complx_t ) {
  return {a.re + b.re, a.im + b.im};
}

_Complex float
add(const _Complex float *a, const _Complex float *b) {
  return {__real__ *a + __real__ *b, __imag__ *a + __imag__ *b};
}

But we generate much different code (looking at -O2),  For the first one we do:

ldr d1, [x1]
ldr d0, [x0]
faddv0.2s, v0.2s, v1.2s
fmovx0, d0
lsr x1, x0, 32
lsr w0, w0, 0
fmovs1, w1
fmovs0, w0
ret

which is bad for obvious reasons, but also also never needed to go through the
genreg for such a reversal. we could have used many other NEON instructions.

For the second one we generate the good instructions:

add(float _Complex const*, float _Complex const*):
ldp s3, s2, [x0]
ldp s0, s1, [x1]
fadds1, s2, s1
fadds0, s3, s0
ret

The difference being that in the second one we have decomposed the initial
structure by loading the elements:

   [local count: 1073741824]:
  _1 = REALPART_EXPR <*a_8(D)>;
  _2 = REALPART_EXPR <*b_9(D)>;
  _3 = _1 + _2;
  _4 = IMAGPART_EXPR <*a_8(D)>;
  _5 = IMAGPART_EXPR <*b_9(D)>;
  _6 = _4 + _5;
  _10 = COMPLEX_EXPR <_3, _6>;
  return _10;

In the first one we've kept them as vectors:

   [local count: 1073741824]:
  vect__1.6_13 = MEM  [(float *)a_8(D)];
  vect__2.9_15 = MEM  [(float *)b_9(D)];
  vect__3.10_16 = vect__1.6_13 + vect__2.9_15;
  MEM  [(float *)] = vect__3.10_16;
  return D.4435;

This part is probably a costing issue, we SLP them even though it's not
profitable because for the APCS we have to return them in separate registers.

Using -fno-tree-vectorize gets the gimple code right:

   [local count: 1073741824]:
  _1 = a_8(D)->re;
  _2 = b_9(D)->re;
  _3 = _1 + _2;
  D.4435.re = _3;
  _4 = a_8(D)->im;
  _5 = b_9(D)->im;
  _6 = _4 + _5;
  D.4435.im = _6;
  return D.4435;

But we generate worse code:

ldp s1, s0, [x0]
mov x2, 0
ldp s3, s2, [x1]
fadds1, s1, s3
fadds0, s0, s2
fmovw1, s1
fmovw0, s0
bfi x2, x1, 0, 32
bfi x2, x0, 32, 32
lsr x0, x2, 32
lsr w2, w2, 0
fmovs1, w0
fmovs0, w2

where we again use genreg as a very complicated way to do a no-op.

So there are two bugs here:

1. a costing, we shouldn't SLP
2. an expansion, the code out of expand is bad to begin with.

[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2022-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |5.0

[Bug tree-optimization/105451] miss optimizations due to inconsistency in complex numbers associativity

2022-05-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105451

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2022-05-02

--- Comment #1 from Richard Biener  ---
It's difficult to even spot the difference in the SLP tree, but the
associatable multiplications make it somewhat random which of the valid SLP
tree representations we'll pick.  All of the SLP discovery is a heuristically
greedy search for the _first_ match, not for the "best" even ("best"
interpreted as "largest" in case of BB vectorization).

The only thing that reliably happens at the moment is associating until the
leafs are all from one load group (until we support multiple load groups in
one leaf).

[Bug tree-optimization/105451] New: miss optimizations due to inconsistency in complex numbers associativity

2022-05-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105451

Bug ID: 105451
   Summary: miss optimizations due to inconsistency in complex
numbers associativity
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: tnfchris at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The two functions

#include 

void f (complex double * restrict a, complex double *restrict b,
complex double *restrict c, complex double *res, int n)
{
  for (int i = 0; i < n; i++)
   res[i] = a[i] * (b[i] * c[i]);
}

and

void g (complex double * restrict a, complex double *restrict b,
complex double *restrict c, complex double *res, int n)
{
  for (int i = 0; i < n; i++)
   res[i] = (a[i] * b[i]) * c[i];
}

At -Ofast produce the same code, but internally they get there using different
SLP trees.

The former creates a chain of VEC_PERM_EXPR nodes as is expected
tinyurl.com/cmulslp1 however the latter avoids the need of the permutes by
duplicating the elements of the complex number https://tinyurl.com/cmulslp2

The former we can detect as back to back complex multiplication but the latter
we can't.

Not sure what the best way to get consistency here is.

[Bug tree-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2022-02-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Richard Biener  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #28 from Richard Biener  ---
*** Bug 104406 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2021-08-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|tree-optimization

--- Comment #27 from Andrew Pinski  ---
And there are two issues here, one is related to SLP not happening and the
other deals with the argument and return value passing.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2021-08-16 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #26 from Andrew Pinski  ---
Note there might be a dup of this bug somewhere too.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #25 from Joel Yliluoma  ---
(In reply to Jakub Jelinek from comment #24)
> on x86 read e.g. about MXCSR register and in the description of each
> instruction on which Exceptions it can raise.

So the quick answer to #15 is that addps instruction may raise exceptions. Ok,
thanks for clearing that up. My bad. So it seems that LLVM relies on the
assumption that the upper portions of the register are zeroed, and this is what
you said in the first place.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #24 from Jakub Jelinek  ---
Bugzilla is not the right place to educate users.  Of course the C FE_*
exceptions map to real hardware exceptions, on x86 read e.g. about MXCSR
register and in the description of each instruction on which Exceptions it can
raise.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #23 from Joel Yliluoma  ---
(In reply to Jakub Jelinek from comment #21)
> (In reply to Joel Yliluoma from comment #20)
> > Which exceptions would be generated by data in an unused portion of a
> > register?
> 
> addps adds 4 float elements, there is no "unused" portion.
> If some of the elements contain garbage, it can trigger for e.g. the addition
> FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW or FE_INEXACT (FE_DIVBYZERO obviously
> isn't relevant to addition).
> Please read the standard about floating point exceptions, fenv.h etc.

There is “unused” portion, for the purposes of the data use. Same as with
padding in structs; the memory is unused because no part in program relies on
its contents, even though the CPU may load those portions in registers when
e.g. moving and copying the struct. The CPU won’t know whether it’s used or
not.

You mention FE_INVALID etc., but those are concepts within the C standard
library, not in the hardware. The C standard library will not make judgments on
the upper portions of the register. So if you have two float[2]s, and you add
them together into another float[2], and the compiler uses addps to achieve
this task, what is the mechanism that would supposedly generate an exception,
when no part in the software depends and makes judgments on the irrelevant
parts of the register?

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #22 from rguenther at suse dot de  ---
On Tue, 21 Apr 2020, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>  CC||hjl.tools at gmail dot com,
>||hubicka at gcc dot gnu.org,
>||matz at gcc dot gnu.org
> 
> --- Comment #19 from Jakub Jelinek  ---
> CCing Micha and Honza on the ABI question.

The arguments are class SSE (__m64), but I fail to find clarification
as to whether "unused" parts of argument registers (the SSEUP part
of the %xmmN register) is supposed to be zeroed or has unspecified
contents.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #21 from Jakub Jelinek  ---
(In reply to Joel Yliluoma from comment #20)
> Which exceptions would be generated by data in an unused portion of a
> register?

addps adds 4 float elements, there is no "unused" portion.
If some of the elements contain garbage, it can trigger for e.g. the addition
FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW or FE_INEXACT (FE_DIVBYZERO obviously
isn't relevant to addition).
Please read the standard about floating point exceptions, fenv.h etc.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #20 from Joel Yliluoma  ---
(In reply to Jakub Jelinek from comment #16)
> (In reply to Joel Yliluoma from comment #15)
> > (In reply to Richard Biener from comment #14)
> > > I also think llvms code generation is bogus since it appears the ABI
> > > does not guarantee zeroed upper elements of the xmm0 argument
> > > which means they could contain sNaNs:
> > 
> > Why would it matter that the unused portions of the register contain NaNs?
> 
> Because it could then raise exceptions that shouldn't be raised?

Which exceptions would be generated by data in an unused portion of a register?
Does for example “addps” generate an exception if one or two of the operands
contains NaNs? Which instructions would generate exceptions?

I can only think of divps, when dividing by a zero, but it does not seem that
even LLVM compiles the two-element vector division into divps.

If the register is passed as a parameter to a library function, they would not
make judgments based on the values of the unused portions of the registers.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Jakub Jelinek  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com,
   ||hubicka at gcc dot gnu.org,
   ||matz at gcc dot gnu.org

--- Comment #19 from Jakub Jelinek  ---
CCing Micha and Honza on the ABI question.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #18 from Jakub Jelinek  ---
Note, we could do movq %xmm0, %xmm0; movq %xmm1, %xmm1; addpd %xmm1, %xmm0 for
the #c4 first function.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #17 from rguenther at suse dot de  ---
On Tue, 21 Apr 2020, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>  CC||jakub at gcc dot gnu.org
> 
> --- Comment #16 from Jakub Jelinek  ---
> (In reply to Joel Yliluoma from comment #15)
> > (In reply to Richard Biener from comment #14)
> > > I also think llvms code generation is bogus since it appears the ABI
> > > does not guarantee zeroed upper elements of the xmm0 argument
> > > which means they could contain sNaNs:
> > 
> > Why would it matter that the unused portions of the register contain NaNs?
> 
> Because it could then raise exceptions that shouldn't be raised?

Note it might be llvm actually zeros the upper half at the caller
(in disagreement with GCC).  Maybe also the psABI specifies that
should happen and GCC is wrong.  Just at the moment interoperating
GCC and LLVM is prone to the above mentioned issue.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #16 from Jakub Jelinek  ---
(In reply to Joel Yliluoma from comment #15)
> (In reply to Richard Biener from comment #14)
> > I also think llvms code generation is bogus since it appears the ABI
> > does not guarantee zeroed upper elements of the xmm0 argument
> > which means they could contain sNaNs:
> 
> Why would it matter that the unused portions of the register contain NaNs?

Because it could then raise exceptions that shouldn't be raised?

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #15 from Joel Yliluoma  ---
(In reply to Richard Biener from comment #14)
> I also think llvms code generation is bogus since it appears the ABI
> does not guarantee zeroed upper elements of the xmm0 argument
> which means they could contain sNaNs:

Why would it matter that the unused portions of the register contain NaNs?

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #14 from Richard Biener  ---
(In reply to Joel Yliluoma from comment #13)
> GCC 4.1.2 is indicated in the bug report headers.
> Luckily, Compiler Explorer has a copy of that exact version, and it indeed
> vectorizes the second function: https://godbolt.org/z/DC_SSb
> 
> On my own system, the earliest I have is 4.6. The Compiler Explorer has 4.4,
> and it, or anything newer than that, no longer vectorizes either function.

Ah, OK - that's before GCC learned vectorization and is code-generated by
RTL expanding

  return {BIT_FIELD_REF  + BIT_FIELD_REF };

so the only vector support was GCCs generic vectors (and intrinsics).  The
generated code is far from perfect though.  I also think llvms code
generation is bogus since it appears the ABI does not guarantee zeroed
upper elements of the xmm0 argument which means they could contain sNaNs:

typedef float ss2 __attribute__((vector_size(8)));
typedef float ss4 __attribute__((vector_size(16)));
ss2 add2(ss2 a, ss2 b);
void bar(ss4 a)
{
  volatile ss2 x;
  x = add2 ((ss2){a[0], a[1]}, (ss2){a[0], a[1]});
}

produces

bar:
.LFB1:  
.cfi_startproc
subq$56, %rsp
.cfi_def_cfa_offset 64
movdqa  %xmm0, %xmm1
calladd2
movq%xmm0, 24(%rsp)
addq$56, %rsp

which means we pass through 'a' unchanged.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #13 from Joel Yliluoma  ---
GCC 4.1.2 is indicated in the bug report headers.
Luckily, Compiler Explorer has a copy of that exact version, and it indeed
vectorizes the second function: https://godbolt.org/z/DC_SSb

On my own system, the earliest I have is 4.6. The Compiler Explorer has 4.4,
and it, or anything newer than that, no longer vectorizes either function.

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947
 CC||uros at gcc dot gnu.org

--- Comment #12 from Richard Biener  ---
(In reply to Joel Yliluoma from comment #11)
> Looks like this issue has taken a step or two *backwards* in the past years.
> 
> Where as the second function used to be vectorized properly, today it seems
> neither of them are.

Which version do you see vectorizing the second (add2) function?

> Contrast this with Clang, which compiles *both* functions into a single
> instruction:
> 
>   vaddps xmm0, xmm1, xmm0
> 
> or some variant thereof depending on the -m options.
> 
> Compiler Explorer link: https://godbolt.org/z/2AKhnt

The main issues on the GCC side are
  a) ABI details not exposed at the point of vectorization (several PRs about
 this exist)
  b) "Poor" support for two-element float vectors (an understatement, we have
 some support for MMX but that's integer only, but I'm not sure we've
 enabled the 3dnow part to be emulated with SSE)

oddly enough even with -mmmx -m3dnow I see add2 lowered by veclower so
the vector type or the vector add must be unsupported(?).

llvm is known to support emulating smaller vectors just fine (and by
design is also aware of ABI details).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2020-04-21 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

--- Comment #11 from Joel Yliluoma  ---
Looks like this issue has taken a step or two *backwards* in the past years.

Where as the second function used to be vectorized properly, today it seems
neither of them are.

Contrast this with Clang, which compiles *both* functions into a single
instruction:

  vaddps xmm0, xmm1, xmm0

or some variant thereof depending on the -m options.

Compiler Explorer link: https://godbolt.org/z/2AKhnt

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2020-03-10 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #28 from Jonathan Wakely  ---
(In reply to kargl from comment #21)
> Created attachment 46102 [details]
> fix g++ problem with sqrt(z) where z is complex and imag(z) = -0

This one assumes copysign is valid for arguments of type _Tp, which is only
true for float, double and long double. std::pow might make sense for complex
integers, but seems to be already broken, so this doesn't make it any worse
there.

To preserve support for user-defined numeric types (and decimal floats?) we
could add an overloaded helper for copysign which retains the `__x < _Tp()`
check for the generic overload and uses copysign for floating point types.


(In reply to kargl from comment #23)
> Created attachment 46105 [details]
> fix g++ problem with pow(z,0.5) where imag(z) = -0.
> 
> This patch has only been tested with the original test provided by the
> reporter.

This one makes me a little uncomfortable with the use of abs, but we already
use that elsewhere (and inconsistently qualify it as std::abs). Again, this
doesn't seem to make anything worse w.r.t our support for types other than
float, double and long double.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2020-03-10 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #27 from Jonathan Wakely  ---
Not in stage 4.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2020-03-07 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org
   Priority|P3  |P2
   Severity|normal  |major

--- Comment #26 from kargl at gcc dot gnu.org ---
Any chance a libstdc++ person will commit the supplied patches?

[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.

2019-08-03 Thread chinoune.mehdi at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337

Chinoune  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Chinoune  ---
Sorry, It wasn't a bug. the compiler is not skipping the if statement.

-fassociative-math : Allow re-association of operands in series of
floating-point operations. This violates the ISO C and C++ language standard by
possibly changing computation result. NOTE: re-ordering may change the sign of
zero as well as ignore NaNs and inhibit or create underflow or overflow (and
thus cannot be used on code
that relies on rounding behavior like (x + 2**52) - 2**52. May also reorder
floating-point comparisons and thus may not be used when ordered comparisons
are required.

-fassociative-math is enabled by -funsafe-math-optimizations .

[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.

2019-08-03 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337

--- Comment #2 from Steve Kargl  ---
On Sat, Aug 03, 2019 at 03:14:57PM +, kargl at gcc dot gnu.org wrote:
> --- Comment #1 from kargl at gcc dot gnu.org ---
> (In reply to Chinoune from comment #0)
> > I have encountered some underflows/overflows in my code compiled with
> > -Ofast, and after investigations it seems like the complex abs gives zero
> > with small numbers. So I added a workaround. but it didn't work:
> > 
> 
> (snip)
> 
> > 
> > gfortran-9 -O1 -funsafe-math-optimizations -ffinite-math-only
> > bug_skip_if.f90 -o test.x
> > ./test.x
> 
> (snip)
> 
> > 
> > Q : Why does gfortran skip the if statement?
> 
> What happens if you don't use options that allow
> a compiler to violate the standard?
> 

BTW, with the posted code, I cannot reproduce your results
on either i586-*-freebsd or  x86_64-*-freebsd.

[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.

2019-08-03 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337

kargl at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4
   Severity|normal  |minor

[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.

2019-08-03 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #1 from kargl at gcc dot gnu.org ---
(In reply to Chinoune from comment #0)
> I have encountered some underflows/overflows in my code compiled with
> -Ofast, and after investigations it seems like the complex abs gives zero
> with small numbers. So I added a workaround. but it didn't work:
> 

(snip)

> 
> gfortran-9 -O1 -funsafe-math-optimizations -ffinite-math-only
> bug_skip_if.f90 -o test.x
> ./test.x

(snip)

> 
> Q : Why does gfortran skip the if statement?

What happens if you don't use options that allow
a compiler to violate the standard?


'-Ofast'
 Disregard strict standards compliance.  '-Ofast' enables all '-O3'
 optimizations.  It also enables optimizations that are not valid
 for all standard-compliant programs.  It turns on '-ffast-math' and
 the Fortran-specific '-fstack-arrays', unless
 '-fmax-stack-var-size' is specified, and '-fno-protect-parens'.

'-ffast-math'
 Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
 '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
 '-fcx-limited-range' and '-fexcess-precision=fast'.

 This option causes the preprocessor macro '__FAST_MATH__' to be
 defined.

 This option is not turned on by any '-O' option besides '-Ofast'
 since it can result in incorrect output for programs that depend on
 an exact implementation of IEEE or ISO rules/specifications for
 math functions.  It may, however, yield faster code for programs
 that do not require the guarantees of these specifications.

'-funsafe-math-optimizations'

 Allow optimizations for floating-point arithmetic that (a) assume
 that arguments and results are valid and (b) may violate IEEE or
 ANSI standards.  When used at link time, it may include libraries
 or startup files that change the default FPU control word or other
 similar optimizations.

 This option is not turned on by any '-O' option since it can result
 in incorrect output for programs that depend on an exact
 implementation of IEEE or ISO rules/specifications for math
 functions

[Bug fortran/91337] New: gfortran skips an if statement with some mathematical optimisations with complex numbers.

2019-08-03 Thread chinoune.mehdi at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337

Bug ID: 91337
   Summary: gfortran skips an if statement with some mathematical
optimisations with complex numbers.
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chinoune.mehdi at hotmail dot com
  Target Milestone: ---

I have encountered some underflows/overflows in my code compiled with -Ofast,
and after investigations it seems like the complex abs gives zero with small
numbers. So I added a workaround. but it didn't work:

module m
  implicit none
  !
  integer, parameter :: sp = selected_real_kind(6)
  real(sp), parameter :: tiny_sp = tiny(1._sp), sqrt_tiny = sqrt( tiny_sp )
  !
contains
  subroutine sub(z,y)
complex(sp), intent(in) :: z
real(sp), intent(out) :: y
real(sp) :: az
!
az = abs(z)
if( az

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-09 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #25 from Steve Kargl  ---
On Tue, Apr 09, 2019 at 08:24:29PM +, redi at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
> 
> --- Comment #24 from Jonathan Wakely  ---
> Thanks for the patch, I'll test it fully tomorrow.
> 

I think the patch for complex_sqrt() is correct.  The one
for complex_pow(), I think accidently works for OP, but is
likely broken for some general regions of the complex plane.

> I'll open a separate bug for the FreeBSD issue. We could use more fine-grained
> configure checks so that most C99 math functions are enabled, even if some of
> the complex ones are missing.

libgfortran has c99_functions.c that implements missing C99 math
functions when configure cannot find one.  The implementations 
are likely to be fairly direct without much optimization,
worrying about exceptional casea, or even tested extensively.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #24 from Jonathan Wakely  ---
Thanks for the patch, I'll test it fully tomorrow.

I'll open a separate bug for the FreeBSD issue. We could use more fine-grained
configure checks so that most C99 math functions are enabled, even if some of
the complex ones are missing.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #23 from kargl at gcc dot gnu.org ---
Created attachment 46105
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46105=edit
fix g++ problem with pow(z,0.5) where imag(z) = -0.

This patch has only been tested with the original test provided by the
reporter.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #22 from Steve Kargl  ---
On Mon, Apr 08, 2019 at 10:17:00PM +, redi at gcc dot gnu.org wrote:
> (In reply to kargl from comment #19)
> > I get the expected.  So, if you're on a system that has
> > _GLIBCXX_USE_C99_COMPLEX, you won't see the bug.
> 
> Wow, why isn't libstdc++ using the C99  functions on FreeBSD?
> 

Because it is all or nothing.  See comment #8 and #9 in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89125
If you're too busy to look at PR89125 the upshot is that
FreeBSD is missing ccoshl, ccosl, cexpl, csinhl, csinl,
ctanhl, and ctanl.  I have BSD licensed versions of 
ccoshl, ccosl, cexpl, csinhl, and csinl, but testing
on FreeBSD ran into what I consider to be a very bad
bug in clang (FreeBSD system compiler).

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #21 from kargl at gcc dot gnu.org ---
Created attachment 46102
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46102=edit
fix g++ problem with sqrt(z) where z is complex and imag(z) = -0

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #20 from Jonathan Wakely  ---
(In reply to Steve Kargl from comment #16)
> If Andrew is correct and a builtin is called,

Wasn't that me, not Andrew?

> you might find
> my results if you use -fno-builtins (check spelling).

No, same results. Calling __builtin_csqrt doesn't necessarily mean GCC
evaluates it. Without optimisation it still generates a call to csqrt from
libc.

> Looking at ./libstdc++-v3/include/std/complex, one finds.
> 
>   // 26.2.8/13 sqrt(__z): Returns the complex square root of __z.
>   // The branch cut is on the negative axis.
>   template
> complex<_Tp>
> __complex_sqrt(const complex<_Tp>& __z)
> {
>   _Tp __x = __z.real();
>   _Tp __y = __z.imag();
> 
>   if (__x == _Tp())
> {
>   _Tp __t = sqrt(abs(__y) / 2);
>   return complex<_Tp>(__t, __y < _Tp() ? -__t : __t);
> }
>   else
> {
>   _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x)));
>   _Tp __u = __t / 2;
>   return __x > _Tp()
> ? complex<_Tp>(__u, __y / __t)
> : complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u);
> }
> }
> 
> Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)?

Yes, but that code shouldn't be used for modern targets ...

(In reply to kargl from comment #19)
> I get the expected.  So, if you're on a system that has
> _GLIBCXX_USE_C99_COMPLEX, you won't see the bug.

Wow, why isn't libstdc++ using the C99  functions on FreeBSD?

I'll have to look into that.

> It is likely that everywhere that a construct of the
> form __y < _Tp() ? -__u : __u appear, it needs to use
> copysign.

That won't always work, because the generic functions should really only get
used when _Tp is not one of float, double or long double. And in that case
there might be no copysign for the type.

For float, double and long double we should be using the libc routines. So the
bug is that FreeBSD isn't using them.

The _original_ bug report is for std::pow though, and on Ubuntu, which does use
glibc. Comment 9 needs more analysis.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #19 from kargl at gcc dot gnu.org ---
(In reply to Steve Kargl from comment #18)
> On Mon, Apr 08, 2019 at 08:03:36PM +, pinskia at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
> > 
> > --- Comment #17 from Andrew Pinski  ---
> > >Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)?
> > 
> > No, you missed this part:
> > // The branch cut is on the negative axis.
> 
> No, I didn't miss that part.
> 
> > So maybe the bug is inside FreeBSD and Window's libm.  Glibc fixed the 
> > branch
> > cuts issues back in 2012 for csqrt but the other OS's did not change theirs.
> 
> For the C++ code in comment, on x86_64-*-freebsd.
> 
> % g++8 -o z a.cpp -lm && ./z
> z = (-1.84250315177824e-07,-0)
>pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
>   sqrt(z) = (0,0.000429243887758258)
> sqrt(conj(z)) = (0,0.000429243887758258)
> conj(sqrt(z)) = (0,-0.000429243887758258)
> 
> The last two lines are definitely wrong.
> 
> troutmask:sgk[209] nm z | grep csqrt
> troutmask:sgk[210] nm z | grep sqrt
> 0040156b W _ZSt14__complex_sqrtIdESt7complexIT_ERKS2_
> 0040143d W _ZSt4sqrtIdESt7complexIT_ERKS2_
>  U sqrt@@FBSD_1.0
> 
> There is no reference to csqrt in the exectuable.  If I change
> /usr/local/lib/gcc8/include/c++/complex to use copysign
> to account for __y = -0 like
> 
>   template
> complex<_Tp>
> __complex_sqrt(const complex<_Tp>& __z)
> {
>   _Tp __x = __z.real();
>   _Tp __y = __z.imag();
> 
>   if (__x == _Tp())
> {
>   _Tp __t = sqrt(abs(__y) / 2);
> //  return complex<_Tp>(__t, __y < _Tp() ? -__t : __t);
>   return complex<_Tp>(__t, copysign(__t, __y));
> }
>   else
> {
>   _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x)));
>   _Tp __u = __t / 2;
> //  return __x > _Tp()
> //? complex<_Tp>(__u, __y / __t)
> //: complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u);
>   return __x > _Tp()
> ? complex<_Tp>(__u, __y / __t)
> : complex<_Tp>(abs(__y) / __t, copysign(__u, __y));
> }
> }
> 
> 
> The C++ code in comment #10 gives
> 
>  g++8 -o z a.cpp -lm && ./z
> z = (-1.84250315177824e-07,-0)
>pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
>   sqrt(z) = (0,-0.000429243887758258)
> sqrt(conj(z)) = (0,0.000429243887758258)
> conj(sqrt(z)) = (0,0.000429243887758258)
> 
> The correct answer.  QED.

BTW, if change /usr/local/lib/gcc8/include/c++/complex back to
no using copysign(), and instead change

#if _GLIBCXX_USE_C99_COMPLEX
  inline __complex__ float
  __complex_sqrt(__complex__ float __z) { return __builtin_csqrtf(__z); }

to 

#if _GLIBCXX_USE_C99_COMPLEX || SOMETHING_UGLY
  inline __complex__ float
  __complex_sqrt(__complex__ float __z) { return __builtin_csqrtf(__z); }

and do 

 g++8 -DSOMETHING_UGLY -o z a.cpp -lm && ./z
z = (-1.84250315177824e-07,-0)
   pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
  sqrt(z) = (0,-0.000429243887758258)
sqrt(conj(z)) = (0,0.000429243887758258)
conj(sqrt(z)) = (0,0.000429243887758258)

I get the expected.  So, if you're on a system that has
_GLIBCXX_USE_C99_COMPLEX, you won't see the bug.

It is likely that everywhere that a construct of the
form __y < _Tp() ? -__u : __u appear, it needs to use
copysign.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #18 from Steve Kargl  ---
On Mon, Apr 08, 2019 at 08:03:36PM +, pinskia at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
> 
> --- Comment #17 from Andrew Pinski  ---
> >Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)?
> 
> No, you missed this part:
> // The branch cut is on the negative axis.

No, I didn't miss that part.

> So maybe the bug is inside FreeBSD and Window's libm.  Glibc fixed the branch
> cuts issues back in 2012 for csqrt but the other OS's did not change theirs.

For the C++ code in comment, on x86_64-*-freebsd.

% g++8 -o z a.cpp -lm && ./z
z = (-1.84250315177824e-07,-0)
   pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
  sqrt(z) = (0,0.000429243887758258)
sqrt(conj(z)) = (0,0.000429243887758258)
conj(sqrt(z)) = (0,-0.000429243887758258)

The last two lines are definitely wrong.

troutmask:sgk[209] nm z | grep csqrt
troutmask:sgk[210] nm z | grep sqrt
0040156b W _ZSt14__complex_sqrtIdESt7complexIT_ERKS2_
0040143d W _ZSt4sqrtIdESt7complexIT_ERKS2_
 U sqrt@@FBSD_1.0

There is no reference to csqrt in the exectuable.  If I change
/usr/local/lib/gcc8/include/c++/complex to use copysign
to account for __y = -0 like

  template
complex<_Tp>
__complex_sqrt(const complex<_Tp>& __z)
{
  _Tp __x = __z.real();
  _Tp __y = __z.imag();

  if (__x == _Tp())
{
  _Tp __t = sqrt(abs(__y) / 2);
//  return complex<_Tp>(__t, __y < _Tp() ? -__t : __t);
  return complex<_Tp>(__t, copysign(__t, __y));
}
  else
{
  _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x)));
  _Tp __u = __t / 2;
//  return __x > _Tp()
//? complex<_Tp>(__u, __y / __t)
//: complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u);
  return __x > _Tp()
? complex<_Tp>(__u, __y / __t)
: complex<_Tp>(abs(__y) / __t, copysign(__u, __y));
}
}


The C++ code in comment #10 gives

 g++8 -o z a.cpp -lm && ./z
z = (-1.84250315177824e-07,-0)
   pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
  sqrt(z) = (0,-0.000429243887758258)
sqrt(conj(z)) = (0,0.000429243887758258)
conj(sqrt(z)) = (0,0.000429243887758258)

The correct answer.  QED.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #17 from Andrew Pinski  ---
>Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)?

No, you missed this part:
// The branch cut is on the negative axis.

So maybe the bug is inside FreeBSD and Window's libm.  Glibc fixed the branch
cuts issues back in 2012 for csqrt but the other OS's did not change theirs.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #16 from Steve Kargl  ---
On Mon, Apr 08, 2019 at 07:20:22PM +, redi at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
> 
> --- Comment #13 from Jonathan Wakely  ---
> (In reply to Steve Kargl from comment #10)
> > %  g++8 -o z  a.cpp -lm && ./z
> > z = (-1.84250315177824e-07,-0)
> >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
> >   sqrt(z) = (0,0.000429243887758258)
> > sqrt(conj(z)) = (0,0.000429243887758258)
> > conj(sqrt(z)) = (0,-0.000429243887758258)
> > 
> > This looks wrong.
> 
> I can't reproduce this, I get:
> 
> z = (-1.84250315177824e-07,-0)
>pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
>   sqrt(z) = (0,-0.000429243887758258)
> sqrt(conj(z)) = (0,0.000429243887758258)
> conj(sqrt(z)) = (0,0.000429243887758258)
> 

My results are for i585-*-freebsd, which doesn't use glibc.
If Andrew is correct and a builtin is called, you might find
my results if you use -fno-builtins (check spelling).

Looking at ./libstdc++-v3/include/std/complex, one finds.

  // 26.2.8/13 sqrt(__z): Returns the complex square root of __z.
  // The branch cut is on the negative axis.
  template
complex<_Tp>
__complex_sqrt(const complex<_Tp>& __z)
{
  _Tp __x = __z.real();
  _Tp __y = __z.imag();

  if (__x == _Tp())
{
  _Tp __t = sqrt(abs(__y) / 2);
  return complex<_Tp>(__t, __y < _Tp() ? -__t : __t);
}
  else
{
  _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x)));
  _Tp __u = __t / 2;
  return __x > _Tp()
? complex<_Tp>(__u, __y / __t)
: complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u);
}
}

Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)?

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #15 from Andrew Pinski  ---
(In reply to Jonathan Wakely from comment #13)
> (In reply to Steve Kargl from comment #10)
> > %  g++8 -o z  a.cpp -lm && ./z
> > z = (-1.84250315177824e-07,-0)
> >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
> >   sqrt(z) = (0,0.000429243887758258)
> > sqrt(conj(z)) = (0,0.000429243887758258)
> > conj(sqrt(z)) = (0,-0.000429243887758258)
> > 
> > This looks wrong.
> 
> I can't reproduce this, I get:
> 
> z = (-1.84250315177824e-07,-0)
>pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
>   sqrt(z) = (0,-0.000429243887758258)
> sqrt(conj(z)) = (0,0.000429243887758258)
> conj(sqrt(z)) = (0,0.000429243887758258)

My bet now comes to the fact there have been improvements to glibc which
changed the behavior here   

Also I used the wrong term, it is the branch cut that is the issue.  Most of
the branch cuts were fixed in glibc in 2012; though there might have been some
fixed later on.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #14 from Jonathan Wakely  ---
Which is unsurprising because std::sqrt(z) just calls
__builtin_csqrt(z.__rep())

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #13 from Jonathan Wakely  ---
(In reply to Steve Kargl from comment #10)
> %  g++8 -o z  a.cpp -lm && ./z
> z = (-1.84250315177824e-07,-0)
>pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
>   sqrt(z) = (0,0.000429243887758258)
> sqrt(conj(z)) = (0,0.000429243887758258)
> conj(sqrt(z)) = (0,-0.000429243887758258)
> 
> This looks wrong.

I can't reproduce this, I get:

z = (-1.84250315177824e-07,-0)
   pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
  sqrt(z) = (0,-0.000429243887758258)
sqrt(conj(z)) = (0,0.000429243887758258)
conj(sqrt(z)) = (0,0.000429243887758258)

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #12 from Jonathan Wakely  ---
(In reply to Steve Kargl from comment #11)
> unless [Note: ...] is non-normative text.

That's exactly what it is.

But we can still aim to meet the intended behaviour.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #11 from Steve Kargl  ---
On Mon, Apr 08, 2019 at 02:32:38PM +, sgk at troutmask dot
apl.washington.edu wrote:
> 
> I don't have a copy of the C++ standard, so take this specualtion.
> pow(z,0.5) is equivalent to sqrt(z).  From the C standard, one has
> 
> conj(csqrt(z)) = csqrt(conj(z)).
> 
> g++ does not enforce this when the imaginary part is -0;
> while gcc does.

(code snipped)

> % gcc8 -o z c.c -lm && ./z
>  z = CMPLX(-1.8425031517782417e-07, -0.e+00)
>   cpow(z, 0.5) = CMPLX( 2.6283607659835831e-20, -4.2924388775825818e-04)
>   csqrt(z) = CMPLX( 0.e+00, -4.2924388775825818e-04)
> csqrt(conj(z)) = CMPLX( 0.e+00,  4.2924388775825818e-04)
> conj(csqrt(z)) = CMPLX( 0.e+00,  4.2924388775825818e-04)

(code snipped)

> %  g++8 -o z  a.cpp -lm && ./z
> z = (-1.84250315177824e-07,-0)
>pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
>   sqrt(z) = (0,0.000429243887758258)
> sqrt(conj(z)) = (0,0.000429243887758258)
> conj(sqrt(z)) = (0,-0.000429243887758258)
> 
> This looks wrong.

It is wrong.  From n4810.pdf, page 1102,

  template complex sqrt(const complex& x);

  Returns: The complex square root of x, in the range of the right
  half-plane. [Note: The semantics of this function are intended to
  be the same in C++ as they are for csqrt in C. -- end note]

unless [Note: ...] is non-normative text.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #10 from Steve Kargl  ---
On Mon, Apr 08, 2019 at 09:59:22AM +, redi at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
> 
> --- Comment #7 from Jonathan Wakely  ---
> I think it's allowed. The standards have very little to say about accuracy of
> any mathematical functions, and complex(0, 0.0) == complex(0,
> -0.0) is true according to the standard, because +0.0 == -0.0 is true.
> 


I don't have a copy of the C++ standard, so take this specualtion.
pow(z,0.5) is equivalent to sqrt(z).  From the C standard, one has

conj(csqrt(z)) = csqrt(conj(z)).

g++ does not enforce this when the imaginary part is -0;
while gcc does.

% cat c.c
#include 
#include 

int
main(void)
{
   double complex z, t0, t1, t2, t3;

   z = CMPLX(-1.8425031517782417e-07, -0.0);
   t0 = cpow(z, 0.5);
   t1 = csqrt(z);
   t2 = csqrt(conj(z));
   t3 = conj(csqrt(z));
   printf(" z = CMPLX(% .16le, % .16le)\n", creal(z), cimag(z));
   printf("  cpow(z, 0.5) = CMPLX(% .16le, % .16le)\n", creal(t0), cimag(t0));
   printf("  csqrt(z) = CMPLX(% .16le, % .16le)\n", creal(t1), cimag(t1));
   printf("csqrt(conj(z)) = CMPLX(% .16le, % .16le)\n", creal(t2), cimag(t2));
   printf("conj(csqrt(z)) = CMPLX(% .16le, % .16le)\n", creal(t3), cimag(t3));
   return 0;
}
% gcc8 -o z c.c -lm && ./z
 z = CMPLX(-1.8425031517782417e-07, -0.e+00)
  cpow(z, 0.5) = CMPLX( 2.6283607659835831e-20, -4.2924388775825818e-04)
  csqrt(z) = CMPLX( 0.e+00, -4.2924388775825818e-04)
csqrt(conj(z)) = CMPLX( 0.e+00,  4.2924388775825818e-04)
conj(csqrt(z)) = CMPLX( 0.e+00,  4.2924388775825818e-04)


mobile:kargl[210] cat a.cpp
#include 
#include 
#include 

int
main(int argc, char *argv[])
{
   std::complex z, t0, t1, t2, t3;
   z  = std::complex(-1.8425031517782417e-07, -0.0);
   t0 = std::pow(z, 0.5);
   t1 = std::sqrt(z);
   t2 = std::sqrt(std::conj(z));
   t3 = std::conj(std::sqrt(z));
   std::cout << "z = " << std::setprecision(15) << z  << std::endl;
   std::cout << "   pow(z,0.5) = " << std::setprecision(15) << t0 << std::endl;
   std::cout << "  sqrt(z) = " << std::setprecision(15) << t1 << std::endl;
   std::cout << "sqrt(conj(z)) = " << std::setprecision(15) << t2 << std::endl;
   std::cout << "conj(sqrt(z)) = " << std::setprecision(15) << t3 << std::endl;
   return 0;
}

%  g++8 -o z  a.cpp -lm && ./z
z = (-1.84250315177824e-07,-0)
   pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258)
  sqrt(z) = (0,0.000429243887758258)
sqrt(conj(z)) = (0,0.000429243887758258)
conj(sqrt(z)) = (0,-0.000429243887758258)

This looks wrong.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #9 from Jonathan Wakely  ---
(In reply to Richard Biener from comment #5)
> The issue is
> 
> std::pow (__x=..., __y=@0x7fffdcb8: 0.5)
> at /home/space/rguenther/install/gcc-9.0/include/c++/9.0.1/complex:1027
> (gdb) l
> 1022{
> 1023#if ! _GLIBCXX_USE_C99_COMPLEX
> 1024  if (__x == _Tp())
> 1025return _Tp();
> 1026#endif
> 1027  if (__x.imag() == _Tp() && __x.real() > _Tp())
> 1028return pow(__x.real(), __y);
> 
> where __x.imag () == _Tp() says true for -0.0 == 0.0.  This means
> std::pow will return the same values for r + -0.0i and r + 0.0i,
> not sure if that is allowed by the C++ standard.

But __x.real() > _Tp() is false here, so that branch isn't taken anyway.

Instead the pow(val, 0.5) result comes from:

  _Complex double val = -1.8425031517782417e-07 + -0.0 * I;
  _Complex double t = __builtin_clog(val);
  double rho = exp(0.5 * __real__ t);
  double theta = 0.5 * __imag__ t;
  _Complex result = rho * cos(theta) + rho * sin(theta) * I;
  __builtin_printf("%f\n", __imag__ result);

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #8 from Andrew Pinski  ---
Also isn't it true that this is just a different quadrant of the solution? 
That is the answer is correct but which quadrant being selected is different?

That is (a^0.5) actually has two answers where the imaginary part can be
positive or negative?  That is they are conjugate of each other.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #7 from Jonathan Wakely  ---
I think it's allowed. The standards have very little to say about accuracy of
any mathematical functions, and complex(0, 0.0) == complex(0,
-0.0) is true according to the standard, because +0.0 == -0.0 is true.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #6 from Andrew Pinski  ---
(In reply to Richard Biener from comment #5)
> The issue is
> 
> std::pow (__x=..., __y=@0x7fffdcb8: 0.5)
> at /home/space/rguenther/install/gcc-9.0/include/c++/9.0.1/complex:1027
> (gdb) l
> 1022{
> 1023#if ! _GLIBCXX_USE_C99_COMPLEX
> 1024  if (__x == _Tp())
> 1025return _Tp();
> 1026#endif
> 1027  if (__x.imag() == _Tp() && __x.real() > _Tp())
> 1028return pow(__x.real(), __y);
> 
> where __x.imag () == _Tp() says true for -0.0 == 0.0.  This means
> std::pow will return the same values for r + -0.0i and r + 0.0i,
> not sure if that is allowed by the C++ standard.

If it does not allow it, then adding copysign is needed.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code

--- Comment #5 from Richard Biener  ---
The issue is

std::pow (__x=..., __y=@0x7fffdcb8: 0.5)
at /home/space/rguenther/install/gcc-9.0/include/c++/9.0.1/complex:1027
(gdb) l
1022{
1023#if ! _GLIBCXX_USE_C99_COMPLEX
1024  if (__x == _Tp())
1025return _Tp();
1026#endif
1027  if (__x.imag() == _Tp() && __x.real() > _Tp())
1028return pow(__x.real(), __y);

where __x.imag () == _Tp() says true for -0.0 == 0.0.  This means
std::pow will return the same values for r + -0.0i and r + 0.0i,
not sure if that is allowed by the C++ standard.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-08
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from Martin Liška  ---
I see the expected result when replacing '-0.0' with '0.0'.
Well, negative zero should be equal to the positive one according to standard.

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-06 Thread t.sprodowski at web dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #3 from t.sprodowski at web dot de ---
Octave 4.2.2: ans = 2.6284e-20 + 4.2924e-04i

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-05 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #2 from kargl at gcc dot gnu.org ---
(In reply to t.sprodowski from comment #0)
> Following calculation of the complex number leads to a wrong imaginary part:
> 
> 
> #include 
> #include 
> #include 
> 
> int main(int argc, char *argv[])
> {
>   std::complex val = std::complex(-1.8425031517782417e-07,
> -0.0);
>   std::complex testExp = std::pow(val, 0.5);
>   std::cout << "textExp: " << std::setprecision(30) << testExp << std::endl;
>   return 0;
> }
> 
> Result is:
> (2.6283607659835830609796003783e-20,-0.000429243887758258178214548772544),
> but it should be
> (2.628360765983583e-20, 0.0004292438877582582), obtained from Visual Studio,
> MATLAB and Octave.
>

What version of Octave.  I get

>> z = complex(-1.8425031517782417e-07, -0.0)
z = -0.0018425 - 0.000i
>> z**0.5
ans =  2.6284e-20 - 4.2924e-04i

which agrees with clang++ version 7.0.1 (and apparently g++
which I haven't tested).

[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct

2019-04-05 Thread t.sprodowski at web dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

--- Comment #1 from t.sprodowski at web dot de ---
Created attachment 46095
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46095=edit
Source file

Source file to illustrate this bug.

[Bug libstdc++/89991] New: Complex numbers: Calculation of imaginary part is not correct

2019-04-05 Thread t.sprodowski at web dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991

Bug ID: 89991
   Summary: Complex numbers: Calculation of imaginary part is not
correct
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: t.sprodowski at web dot de
  Target Milestone: ---

Following calculation of the complex number leads to a wrong imaginary part:


#include 
#include 
#include 

int main(int argc, char *argv[])
{
  std::complex val = std::complex(-1.8425031517782417e-07,
-0.0);
  std::complex testExp = std::pow(val, 0.5);
  std::cout << "textExp: " << std::setprecision(30) << testExp << std::endl;
  return 0;
}

Result is:
(2.6283607659835830609796003783e-20,-0.000429243887758258178214548772544), but
it should be
(2.628360765983583e-20, 0.0004292438877582582), obtained from Visual Studio,
MATLAB and Octave.

Compilation was done with gnu 8.2.0 and 7.3.0 on Ubuntu 18.04:

g++ -c -pipe -g -std=gnu++1y -Wall -W -D_REENTRANT -fPIC
-DQT_DEPRECATED_WARNINGS -DQT_QML_DEBUG -DQT_CORE_LIB -I../testPrecision -I.
-isystem /usr/include/x86_64-linux-gnu/qt5 -isystem
/usr/include/x86_64-linux-gnu/qt5/QtCore -I.
-I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o main.o
../testPrecision/main.cpp

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-12-03 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

Jerry DeLisle  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jerry DeLisle  ---
Fixed on trunk and on 7. Closing

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-12-03 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

--- Comment #7 from Jerry DeLisle  ---
Author: jvdelisle
Date: Sun Dec  3 20:43:59 2017
New Revision: 255368

URL: https://gcc.gnu.org/viewcvs?rev=255368=gcc=rev
Log:
2017-12-03  Jerry DeLisle  
Dominique d'Humieres  

Backport from trunk
PR libgfortran/83191
* io/transfer.c (list_formatted_read_scalar): Do not set
namelist_mode bit here. (namelist_read): Likewise.
(data_transfer_init): Clear the mode bit here.
(finalize_transfer): Do set the mode bit just before any calls
to namelist_read or namelist_write. It can now be referred to
in complex_write.
* io/write.c (write_complex): Suppress the leading blanks when
namelist_mode bit is not set to 1.

* gfortran.dg/namelist_95.f90: New test.

Added:
branches/gcc-7-branch/gcc/testsuite/gfortran.dg/namelist_95.f90
Modified:
branches/gcc-7-branch/gcc/testsuite/ChangeLog
branches/gcc-7-branch/libgfortran/ChangeLog
branches/gcc-7-branch/libgfortran/io/list_read.c
branches/gcc-7-branch/libgfortran/io/transfer.c
branches/gcc-7-branch/libgfortran/io/write.c

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-12-03 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

--- Comment #6 from Jerry DeLisle  ---
Author: jvdelisle
Date: Sun Dec  3 16:47:12 2017
New Revision: 255365

URL: https://gcc.gnu.org/viewcvs?rev=255365=gcc=rev
Log:
2017-12-03  Jerry DeLisle  
Dominique d'Humieres  

PR libgfortran/83191
* io/transfer.c (list_formatted_read_scalar): Do not set
namelist_mode bit here. (namelist_read): Likewise.
(data_transfer_init): Clear the mode bit here.
(finalize_transfer): Do set the mode bit just before any calls
to namelist_read or namelist_write. It can now be referred to
in complex_write.
^ io/write.c (write_complex): Suppress the leading blanks when
namelist_mode bit is not set to 1.

* gfortran.dg/namelist_95.f90: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/namelist_95.f90
Modified:
trunk/gcc/testsuite/ChangeLog
trunk/libgfortran/ChangeLog
trunk/libgfortran/io/list_read.c
trunk/libgfortran/io/transfer.c
trunk/libgfortran/io/write.c

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-11-28 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

--- Comment #5 from Jerry DeLisle  ---
(In reply to Jerry DeLisle from comment #4)
> Alternatively one could do this:
> 
> @@ -1809,9 +1809,11 @@ write_complex (st_parameter_dt *dtp, const char
> *source, int kind, size_t size)
> precision, buf_size, result1, _len1);
>get_float_string (dtp, , source + size / 2 , kind, 0, buffer,
> precision, buf_size, result2, _len2);
> -  lblanks = width - res_len1 - res_len2 - 3;
> -
> -  write_x (dtp, lblanks, lblanks);
> +  if (!dtp->u.p.namelist_mode)
> +{
> +  lblanks = width - res_len1 - res_len2 - 3;
> +  write_x (dtp, lblanks, lblanks);
> +}
>write_char (dtp, '(');
>write_float_string (dtp, result1, res_len1);
>write_char (dtp, semi_comma);

With the following tweak:

@@ -1950,6 +1952,7 @@ list_formatted_write (st_parameter_dt *dtp, bt type, void
*p, int kind,
  size * GFC_SIZE_OF_CHAR_KIND(kind) : size;

   tmp = (char *) p;
+  dtp->u.p.namelist_mode = 0;

   /* Big loop over all the elements.  */
   for (elem = 0; elem < nelems; elem++)
@@ -2394,6 +2397,7 @@ namelist_write (st_parameter_dt *dtp)
   char c;
   char *dummy_name = NULL;

+  dtp->u.p.namelist_mode = 1;
   /* Set the delimiter for namelist output.  */
   switch (dtp->u.p.current_unit->delim_status)
 {

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-11-28 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

--- Comment #4 from Jerry DeLisle  ---
Alternatively one could do this:

@@ -1809,9 +1809,11 @@ write_complex (st_parameter_dt *dtp, const char *source,
int kind, size_t size)
precision, buf_size, result1, _len1);
   get_float_string (dtp, , source + size / 2 , kind, 0, buffer,
precision, buf_size, result2, _len2);
-  lblanks = width - res_len1 - res_len2 - 3;
-
-  write_x (dtp, lblanks, lblanks);
+  if (!dtp->u.p.namelist_mode)
+{
+  lblanks = width - res_len1 - res_len2 - 3;
+  write_x (dtp, lblanks, lblanks);
+}
   write_char (dtp, '(');
   write_float_string (dtp, result1, res_len1);
   write_char (dtp, semi_comma);

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-11-28 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

--- Comment #3 from Dominique d'Humieres  ---
The following patch does the trick:

--- ../_clean/libgfortran/io/write.c2017-11-22 20:37:44.0 +0100
+++ libgfortran/io/write.c  2017-11-28 23:45:55.0 +0100
@@ -1552,7 +1552,7 @@ select_string (st_parameter_dt *dtp, con
   int kind)
 {
   char *result;
-  *size = size_from_kind (dtp, f, kind) + f->u.real.d;
+  *size = size_from_kind (dtp, f, kind) + f->u.real.d + 1;
   if (*size > BUF_STACK_SZ)
  result = xmalloc (*size);
   else
@@ -1769,7 +1769,8 @@ write_real_g0 (st_parameter_dt *dtp, con


 static void
-write_complex (st_parameter_dt *dtp, const char *source, int kind, size_t
size)
+write_complex (st_parameter_dt *dtp, const char *source, int kind, size_t
size,
+   bool justify)
 {
   char semi_comma =
dtp->u.p.current_unit->decimal_status == DECIMAL_POINT ? ',' : ';';
@@ -1809,9 +1810,12 @@ write_complex (st_parameter_dt *dtp, con
precision, buf_size, result1, _len1);
   get_float_string (dtp, , source + size / 2 , kind, 0, buffer,
precision, buf_size, result2, _len2);
-  lblanks = width - res_len1 - res_len2 - 3;
+  if (justify)
+{
+  lblanks = width - res_len1 - res_len2 - 3;

-  write_x (dtp, lblanks, lblanks);
+  write_x (dtp, lblanks, lblanks);
+}
   write_char (dtp, '(');
   write_float_string (dtp, result1, res_len1);
   write_char (dtp, semi_comma);
@@ -1889,7 +1893,7 @@ list_formatted_write_scalar (st_paramete
   write_real (dtp, p, kind);
   break;
 case BT_COMPLEX:
-  write_complex (dtp, p, kind, size);
+  write_complex (dtp, p, kind, size, true);
   break;
 case BT_CLASS:
   {
@@ -2202,7 +2206,7 @@ nml_write_obj (st_parameter_dt *dtp, nam
   case BT_COMPLEX:
  dtp->u.p.no_leading_blank = 0;
  num++;
-  write_complex (dtp, p, len, obj_size);
+  write_complex (dtp, p, len, obj_size, false);
   break;

case BT_DERIVED:

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-11-28 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

Jerry DeLisle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jvdelisle at gcc dot 
gnu.org

--- Comment #2 from Jerry DeLisle  ---
My bad. I will look into it.

[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers

2017-11-28 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

Dominique d'Humieres  changed:

   What|Removed |Added

   Priority|P3  |P4
 Status|UNCONFIRMED |NEW
  Known to work||6.4.0
   Keywords||wrong-code
   Last reconfirmed||2017-11-28
 CC||jvdelisle at gcc dot gnu.org
 Ever confirmed|0   |1
Summary|Writing a namelist with |[7/8 Regression] Writing a
   |repeated complex numbers|namelist with repeated
   ||complex numbers
   Target Milestone|--- |7.3
  Known to fail||7.2.0, 8.0

--- Comment #1 from Dominique d'Humieres  ---
Likely caused by r237735 (pr48852).

The test in pr48852 comment 0 prints now

 (1.,0.)

It should probably be

(1.,0.)

If I read the code correctly, it is caused by the lines

  lblanks = width - res_len1 - res_len2 - 3;

  write_x (dtp, lblanks, lblanks);

needed to have right justified outputs (case C in pr48852 comment 12).

[Bug fortran/83191] New: Writing a namelist with repeated complex numbers

2017-11-27 Thread ccyang at unlv dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191

Bug ID: 83191
   Summary: Writing a namelist with repeated complex numbers
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ccyang at unlv dot edu
  Target Milestone: ---

This is a program that generates the bug on my Mac OS 10.12.6 and 10.13.1 with
the latest MacPorts port gcc7:

program test

implicit none

integer, parameter :: UNIT = 1
character(len=8), parameter :: FILE = "namelist"

complex, dimension(3) :: a = (/ (0.0, 0.0), (0.0, 0.0), (0.0, 0.0) /)

namelist /complex_namelist/ a

open(UNIT, file=FILE)
write(UNIT, nml=complex_namelist)
close(UNIT)

open(UNIT, file=FILE)
read(UNIT, nml=complex_namelist)
close(UNIT)

end program test

It compiles without any warning, but when run, it fails at reading the newly
created namelist:

$ gfortran test.f90 -o test -Wall -Wextra
$ ./test 
At line 17 of file test.f90 (unit = 1, file = 'namelist')
Fortran runtime error: Cannot match namelist object name (0.0.)

Error termination. Backtrace:
#0  0x10320d0dc
#1  0x10320d99c
#2  0x10320dfff
#3  0x10329b03a
#4  0x1032a0b78
#5  0x1032a0d9f
#6  0x103203df8
#7  0x103203e6c
$

The problem is in the content of the namelist file:

$ cat namelist
_NAMELIST
 A= 3*(0.,0.),
 /
$

where the repeated count (3*) has a gap of blanks to the complex number that
needs to be repeated.

If I have a namelist file without that gap, it can be read in by a program
correctly.

In summary, the reading and writing of a namelist file with a repeated count is
not mutually valid.

[Bug middle-end/65796] unnecessary stack spills during complex numbers function calls

2015-04-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65796

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-04-20
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
There is some older bug where I noticed the same issue.  It's basically an
artifact of the x86 calling conventions and GCC going through generic
argument setup code (and RTL optimizers not being able to optimize
spill + load into unpcklps).

That said, somebody needs to find the duplicate bugreport.

PR48607 is also related.


[Bug middle-end/65796] New: unnecessary stack spills during complex numbers function calls

2015-04-17 Thread jtaylor.debian at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65796

Bug ID: 65796
   Summary: unnecessary stack spills during complex numbers
function calls
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jtaylor.debian at googlemail dot com

following function calling cabsf exhibits poor performance when compiled with
gcc:

#include complex
using namespace std;
void __attribute__((noinline)) v(int nCor, complexfloat * inp, complexfloat
* out)
{
for (int icorr = 0; icorr  nCor; icorr++) {
float amp = abs(inp[icorr]);
if (amp  0.f) {
out[icorr] = amp * inp[icorr];
}   
else {
out[icorr] = 0.; 
}   
}

with gcc 4.9 and 5 (20150208) on x86_64 produces:
g++- test.cc -O2  -c -S

.L15:
movss4(%rsp), %xmm2
addq$8, %rbx
addq$8, %rbp
movss(%rsp), %xmm1
mulss%xmm0, %xmm2
mulss%xmm0, %xmm1
movss%xmm2, -8(%rbx)
movss%xmm1, -4(%rbx)
cmpq%r12, %rbx
je.L14
.L7:
movss0(%rbp), %xmm2
movss4(%rbp), %xmm1
movss%xmm2, 8(%rsp)
movss%xmm1, 12(%rsp)
movq8(%rsp), %xmm0
movss%xmm2, 4(%rsp)
movss%xmm1, (%rsp)
callcabsf
pxor%xmm3, %xmm3
ucomiss%xmm3, %xmm0
ja.L15


note the spills of xmm[12] onto the stack and reloading it into xmm0
instead of spilling to the stack one could use unpcklps to prepare xmm0

with a simple benchmark on 5000 floats this would speed up the function by
about 30% on an intel core2 and an i5 which is quite significant given the
expensive cabs call that is also done in it.


[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #15 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Tue Jan 20 11:06:13 2015
New Revision: 219885

URL: https://gcc.gnu.org/viewcvs?rev=219885root=gccview=rev
Log:
2015-01-20  Richard Biener  rguent...@suse.de

PR tree-optimization/64410
* g++.dg/vect/pr64410.cc: Require vect_double.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/g++.dg/vect/pr64410.cc


[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-20 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Rainer Orth ro at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ro at gcc dot gnu.org

--- Comment #13 from Rainer Orth ro at gcc dot gnu.org ---
The new testcase FAILs on Solaris/SPARC (both 32 and 64-bit):

FAIL: g++.dg/vect/pr64410.cc  -std=c++11  scan-tree-dump vect vectorized 1
loops in function
FAIL: g++.dg/vect/pr64410.cc  -std=c++14  scan-tree-dump vect vectorized 1
loops in function
FAIL: g++.dg/vect/pr64410.cc  -std=c++98  scan-tree-dump vect vectorized 1
loops in function

I'm attaching the .vect dump.

  Rainer


[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-20 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #14 from Rainer Orth ro at gcc dot gnu.org ---
Created attachment 34496
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34496action=edit
sparc-sun-solaris2.11 .vect dump


[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||5.0
 Resolution|--- |FIXED

--- Comment #11 from Richard Biener rguenth at gcc dot gnu.org ---
Fixed in GCC 5.


[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #12 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Fri Jan  9 11:14:55 2015
New Revision: 219380

URL: https://gcc.gnu.org/viewcvs?rev=219380root=gccview=rev
Log:
2015-01-09  Richard Biener  rguent...@suse.de

PR tree-optimization/64410
* tree-ssa.c (non_rewritable_lvalue_p): Allow REALPART/IMAGPART_EXPR
on the LHS.
(execute_update_addresses_taken): Deal with that.
* tree-ssa-forwprop.c (pass_forwprop::execute): Use component-wise
loads/stores for complex variables.

* g++.dg/vect/pr64410.cc: New testcase.

Added:
trunk/gcc/testsuite/g++.dg/vect/pr64410.cc
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-forwprop.c
trunk/gcc/tree-ssa.c


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Marc Glisse from comment #1)
 There are a number of things that make it complicated.
 1) gcc doesn't like to vectorize when the number of iterations is not known
 at compile time.

Not an issue, we know it here (it's symbolic)

 2) gcc doesn't vectorize anything already involving complex or vector
 operations.

Indeed - here the issue is that we have C++ 'complex' aggregate
load / store operations:

  _67 = MEM[(const struct complex )_75];
  __r$_M_value = _67;
...
  _51 = REALPART_EXPR __r$_M_value;
  REALPART_EXPR __r$_M_value = _104;
...
  IMAGPART_EXPR __r$_M_value = _107;
  _108 = __r$_M_value;
  MEM[(struct cx_double *)_72] = _108;

which SRA for some reason didn't decompose as they are not aggregate
(well, they are COMPLEX_TYPE).  They are not in SSA form either because
they are partly written to.  In this case it would have been profitable
to SRA __r$_M_value.  Eventually this should have been complex lowerings
job (but it doesn't try to decompose complex assignments).

 3) the ABI for complex uses 2 separate double instead of a vector of 2
 double.

I think that's unrelated.

 I believe there are dups at least for 2).


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|NEW |ASSIGNED
 Blocks||53947
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
 (In reply to Marc Glisse from comment #1)
  There are a number of things that make it complicated.
  1) gcc doesn't like to vectorize when the number of iterations is not known
  at compile time.
 
 Not an issue, we know it here (it's symbolic)
 
  2) gcc doesn't vectorize anything already involving complex or vector
  operations.
 
 Indeed - here the issue is that we have C++ 'complex' aggregate
 load / store operations:
 
   _67 = MEM[(const struct complex )_75];
   __r$_M_value = _67;
 ...
   _51 = REALPART_EXPR __r$_M_value;
   REALPART_EXPR __r$_M_value = _104;
 ...
   IMAGPART_EXPR __r$_M_value = _107;
   _108 = __r$_M_value;
   MEM[(struct cx_double *)_72] = _108;
 
 which SRA for some reason didn't decompose as they are not aggregate
 (well, they are COMPLEX_TYPE).  They are not in SSA form either because
 they are partly written to.

And this forces it to be TREE_ADDRESSABLE.  Which means update-address-taken
might be a better candidate to fix this.

Note that it will still run into the issue that the vectorizer does not
like complex types (in loads), nor does it like building complex
registers via COMPLEX_EXPR.  After fixing update-address-taken we have

  __r$_M_value_70 = MEM[(const struct complex )_78];
  _66 = MEM[(const double )_77];
  _54 = REALPART_EXPR __r$_M_value_70;
  _105 = _54 + _66;
  _135 = IMAGPART_EXPR __r$_M_value_70;
  _106 = MEM[(const double )_77 + 8];
  _107 = _106 + _135;
  __r$_M_value_180 = COMPLEX_EXPR _105, _107;
  MEM[(struct cx_double *)_76] = __r$_M_value_180;

which we ideally would have converted to piecewise loading / storing,
but the vectorizer may also be able to recover here with some twists.

 In this case it would have been profitable
 to SRA __r$_M_value.  Eventually this should have been complex lowerings
 job (but it doesn't try to decompose complex assignments).
 
  3) the ABI for complex uses 2 separate double instead of a vector of 2
  double.
 
 I think that's unrelated.
 
  I believe there are dups at least for 2).


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 34400
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34400action=edit
update-address-taken fix


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #8 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
 (In reply to Marc Glisse from comment #1)
  There are a number of things that make it complicated.
  1) gcc doesn't like to vectorize when the number of iterations is not known
  at compile time.
 
 Not an issue, we know it here (it's symbolic)

IIRC I tried modifying the original code by replacing all complex operations by
explicit scalar operations and it failed to vectorize, but worked when
replacing the number of iterations by a constant.

  3) the ABI for complex uses 2 separate double instead of a vector of 2
  double.
 
 I think that's unrelated.

Indeed, it's just that with a different ABI we could have been lucky and
stumbled upon the optimal code, almost by accident.


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #10 from Richard Biener rguenth at gcc dot gnu.org ---
Improves runtime from 8.3s to 6.5s (~25%).


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2015-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #9 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 34402
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34402action=edit
patch to pattern-detect the load/store

This pattern matches real/imagpart uses and single-use complex stores and
transforms them to component-wise accesses in forwprop.  Together we vectorize
the loop now and produce:

.L28:
movupd  (%rbx,%rax), %xmm1
movupd  (%r15,%rax), %xmm0
addpd   %xmm1, %xmm0
movups  %xmm0, 0(%r13,%rax)
addq$16, %rax
cmpq%rax, %rdx
jne .L28

note that we need a runtime alias check to disambiguate things (because
std::vector memory cannot be disambiguated statically) and similarly we
cannot prove sufficent alignment to use aligned loads/stores.


[Bug target/47540] ARM THUMB crash with complex numbers

2015-01-02 Thread joel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47540

Joel Sherrill joel at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Joel Sherrill joel at gcc dot gnu.org ---
Based on Sebastian's last comment, I am marking this as resolved/fixed.


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2014-12-27 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

David Edelsohn dje at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-12-28
 Ever confirmed|0   |1

--- Comment #4 from David Edelsohn dje at gcc dot gnu.org ---
Confirmed.


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2014-12-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #1 from Marc Glisse glisse at gcc dot gnu.org ---
There are a number of things that make it complicated.
1) gcc doesn't like to vectorize when the number of iterations is not known at
compile time.
2) gcc doesn't vectorize anything already involving complex or vector
operations.
3) the ABI for complex uses 2 separate double instead of a vector of 2 double.

I believe there are dups at least for 2).


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2014-12-26 Thread conradsand.arma at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #2 from Conrad conradsand.arma at gmail dot com ---
(In reply to Marc Glisse from comment #1)
 3) the ABI for complex uses 2 separate double instead of a vector of 2
 double.

Technically yes, but in practice aren't the 2 separate doubles guaranteed to be
consecutive in memory?


[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers

2014-12-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

--- Comment #3 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Conrad from comment #2)
 (In reply to Marc Glisse from comment #1)
  3) the ABI for complex uses 2 separate double instead of a vector of 2
  double.
 
 Technically yes, but in practice aren't the 2 separate doubles guaranteed to
 be consecutive in memory?

When the complex is in memory, yes. But passing a complex by value to a
function is done with 2 separate registers. And somehow that means the default
expansion for complex addition is 2 addsd, whereas the default expansion for
vector addition is addpd. Using addpd by default for complex would make some
code better (this example would hopefully be optimal without need for any
optimization) and some worse, I don't know if there are good benchmarks for
complex numbers.

Clang's use of add[ps]d seems based entirely on what is done with the result,
as can be seen on:

typedef _Complex double cd;
void f(cdr,cd x,cd y){
  r=x+y;
}
cd f(cdx,cdy,cdz){
  return x+y+z;
}

(I agree that gcc should be improved, I am not trying to defend the current
code generation. And now I'll shut up and let people who actually know the code
speak ;-)


[Bug c++/64410] New: gcc 25% slower than clang 3.5 for adding complex numbers

2014-12-25 Thread conradsand.arma at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Bug ID: 64410
   Summary: gcc 25% slower than clang 3.5 for adding complex
numbers
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: conradsand.arma at gmail dot com

Created attachment 34336
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34336action=edit
cxaddspeed.cpp

gcc 4.9.2 has worse performance than clang 3.5 when dealing with complex
numbers.

Attached is a simple program which adds two vectors with complex numbers. 
Compiled with -O3 on x86-64 (i7), Fedora 21, gcc 4.9.2 and clang 3.5.0.

$ time ./cxaddspeed_gcc 5000 100
5.364u 0.002s 0:05.36 100.0%

$ time ./cxaddspeed_clang 5000 100
4.417u 0.001s 0:04.41 100.0%

ie. gcc is about 25% slower.


inner loop produced by gcc:
.L52:
movsd(%r15,%rax), %xmm1
movsd8(%r15,%rax), %xmm0
addsd0(%rbp,%rax), %xmm1
addsd8(%rbp,%rax), %xmm0
movsd%xmm1, (%rbx,%rax)
movsd%xmm0, 8(%rbx,%rax)
addq$16, %rax
cmpq%rsi, %rax
jne.L52

inner loop produced by clang:
.LBB0_145:
movupd-16(%rbx), %xmm0
movupd-16(%rax), %xmm1
addpd%xmm0, %xmm1
movupd%xmm1, -16(%rdi)
movupd(%rbx), %xmm0
movupd(%rax), %xmm1
addpd%xmm0, %xmm1
movupd%xmm1, (%rdi)
addq$2, %rbp
addq$32, %rbx
addq$32, %rax
addq$32, %rdi
addl$-2, %ecx
jne.LBB0_145


[Bug target/47540] ARM THUMB crash with complex numbers

2013-07-10 Thread sebastian.hu...@embedded-brains.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47540

Sebastian Huber sebastian.hu...@embedded-brains.de changed:

   What|Removed |Added

 CC||sebastian.huber@embedded-br
   ||ains.de

--- Comment #9 from Sebastian Huber sebastian.hu...@embedded-brains.de ---
I cannot reproduce this problem with GCC 4.6.4 and 4.8.1.


  1   2   >