[Bug web/112960] omission in documentation: complex numbers can also have uppercase I and J suffixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112960 --- Comment #1 from Stephan Stiller --- name of document (forgotten in original bug report text): "GNU C Language Introduction and Reference Manual"
[Bug web/112960] New: omission in documentation: complex numbers can also have uppercase I and J suffixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112960 Bug ID: 112960 Summary: omission in documentation: complex numbers can also have uppercase I and J suffixes Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: stephan.stiller at outlook dot com Target Milestone: --- Created attachment 56849 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56849=edit example for the 4 different complex number suffixes denoting the imaginary part This page https://gcc.gnu.org/onlinedocs/gcc/Complex.html doesn't mention uppercase 'I' and 'J' as legal suffixes denoting the imaginary part of a complex number, both equivalent to 'i' and 'j'. These suffixes are described in Richard Stallman's document "": https://www.gnu.org/software/c-intro-and-ref/ https://www.gnu.org/software/c-intro-and-ref/manual/ (Edition 0.0, section 12.4 Imaginary Constants)
[Bug tree-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #29 from Andrew Pinski --- https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630011.html
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #12 from rsandifo at gcc dot gnu.org --- The patch in comment 11 is just a related spot improvement. The PR itself is still unfixed.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #11 from CVS Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:b096a6ebe9d9f9fed4c105f6555f724eb32af95c commit r14-1131-gb096a6ebe9d9f9fed4c105f6555f724eb32af95c Author: Richard Sandiford Date: Tue May 23 11:34:42 2023 +0100 aarch64: Provide FPR alternatives for some bit insertions [PR109632] At -O2, and so with SLP vectorisation enabled: struct complx_t { float re, im; }; complx_t add(complx_t a, complx_t b) { return {a.re + b.re, a.im + b.im}; } generates: fmovw3, s1 fmovx0, d0 fmovx1, d2 fmovw2, s3 bfi x0, x3, 32, 32 fmovd31, x0 bfi x1, x2, 32, 32 fmovd30, x1 faddv31.2s, v31.2s, v30.2s fmovx1, d31 lsr x0, x1, 32 fmovs1, w0 lsr w0, w1, 0 fmovs0, w0 ret This is because complx_t is passed and returned in FPRs, but GCC gives it DImode. We therefore âneedâ to assemble a DImode pseudo from the two individual floats, bitcast it to a vector, do the arithmetic, bitcast it back to a DImode pseudo, then extract the individual floats. There are many problems here. The most basic is that we shouldn't use SLP for such a trivial example. But SLP should in principle be beneficial for more complicated examples, so preventing SLP for the example above just changes the reproducer needed. A more fundamental problem is that it doesn't make sense to use single DImode pseudos in a testcase like this. I have a WIP patch to allow re and im to be stored in individual SFmode pseudos instead, but it's quite an invasive change and might end up going nowhere. A simpler problem to tackle is that we allow DImode pseudos to be stored in FPRs, but we don't provide any patterns for inserting values into them, even though INS makes that easy for element-like insertions. This patch adds some patterns for that. Doing that showed that aarch64_modes_tieable_p was too strict: it didn't allow SFmode and DImode values to be tied, even though both of them occupy a single GPR and FPR, and even though we allow both classes to change between the modes. The *aarch64_bfidi_subreg_ pattern is especially ugly, but it's not clear what target-independent code ought to simplify it to, if it was going to simplify it. We should probably do the same thing for extractions, but that's left as future work. After the patch we generate: ins v0.s[1], v1.s[0] ins v2.s[1], v3.s[0] faddv0.2s, v0.2s, v2.2s fmovx0, d0 ushrd1, d0, 32 lsr w0, w0, 0 fmovs0, w0 ret which seems like a step in the right direction. All in all, there's nothing elegant about this patchh. It just seems like the least worst option. gcc/ PR target/109632 * config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow subregs between any scalars that are 64 bits or smaller. * config/aarch64/iterators.md (SUBDI_BITS): New int iterator. (bits_etype): New int attribute. * config/aarch64/aarch64.md (*insv_reg_) (*aarch64_bfi_): New patterns. (*aarch64_bfidi_subreg_): Likewise. gcc/testsuite/ * gcc.target/aarch64/ins_bitfield_1.c: New test. * gcc.target/aarch64/ins_bitfield_2.c: Likewise. * gcc.target/aarch64/ins_bitfield_3.c: Likewise. * gcc.target/aarch64/ins_bitfield_4.c: Likewise. * gcc.target/aarch64/ins_bitfield_5.c: Likewise. * gcc.target/aarch64/ins_bitfield_6.c: Likewise.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #10 from rsandifo at gcc dot gnu.org --- After prototyping this further, I no longer think that lowering at the gimple level is the best answer. (I should have listened to Richi.) Although it works, its major drawback is that it's one-sided: it allows the current function's PARM_DECLs and returns to be lowered to individual scalars, but it does nothing for calls to other functions. Being one-sided means (a) that lowering only solves half the problem and (b) that tail calls cannot be handled easily after lowering. One thing that does seem to work is to force the structure to have V2SF (and fix the inevitable ABI fallout). That could only be done conditionally, based on a target hook. But it seems to fix both test cases: the pass-by-reference one and the pass-by-value one.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #9 from Tamar Christina --- Thank you!
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2023-04-27 Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #8 from rsandifo at gcc dot gnu.org --- Have a (still hacky) patch that also fixes the example in comment 4, giving: fadds1, s1, s3 fadds0, s0, s2 ret Will work on it a bit more before sending an RFC. Can imagine the approach will be somewhat controversial!
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #7 from rsandifo at gcc dot gnu.org --- Thinking more about it, it would probably be better to defer the split until around lower_complex time, after IPA (especially inlining), NRV and tail-recursion. Doing it there should also make it easier to split arguments. (In reply to Tamar Christina from comment #6) > That's an interesting approach, I think it would also fix > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 would it not? Since the > int16x8x3_t return would be "scalarized" avoiding the bad expansion? I don't think it will help with that, since the returned value there is a natural V3x8HI (rather than something that the ABI splits apart). Splitting in that case might pessimise cases where the return value is loaded as a whole, rather than assigned to individually. But it might be worth giving SRA the option of splitting even in that case, as a follow-on optimisation, if it fits naturally with the definitions.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #6 from Tamar Christina --- That's an interesting approach, I think it would also fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 would it not? Since the int16x8x3_t return would be "scalarized" avoiding the bad expansion?
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #5 from rsandifo at gcc dot gnu.org --- Created attachment 54941 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54941=edit hacky proof-of-concept patch This is a very hacky proof of concept patch. Don't try it on anything serious, and certainly don't try to bootstrap with it -- it'll fall over in the slightest breeze. But it does produce: ldp s3, s2, [x0] ldp s0, s1, [x1] fadds1, s2, s1 fadds0, s3, s0 ret for the original testcase.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #4 from rsandifo at gcc dot gnu.org --- Maybe worth noting that if the complex arguments are passed by value, to give: struct complx_t { float re; float im; }; complx_t add(const complx_t a, const complx_t b) { return {a.re + b.re, a.im + b.im}; } and SLP is disabled, we get: fmovw4, s1 fmovw3, s3 fmovx0, d0 fmovx1, d2 mov x2, 0 bfi x0, x4, 32, 32 bfi x1, x3, 32, 32 fmovd0, x0 fmovd1, x1 sbfxx3, x0, 0, 32 sbfxx0, x1, 0, 32 ushrd1, d1, 32 fmovd3, x0 fmovd2, x3 ushrd0, d0, 32 fadds2, s2, s3 fadds0, s0, s1 fmovw1, s2 fmovw0, s0 bfi x2, x1, 0, 32 bfi x2, x0, 32, 32 lsr x0, x2, 32 lsr w2, w2, 0 fmovs1, w0 fmovs0, w2 ret which is almost impressive, in its way. I think we need a way in gimple of “SRA-ing” the arguments and return value, in cases where that's forced by the ABI. I.e. provide separate incoming values of a.re and a.im, and store them to “a” on entry. Then similarly make the return stmt return RETURN_DECL.re and RETURN_DECL.im separately.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #3 from Tamar Christina --- note that even if we can't stop SLP, we should be able to generate as efficient code by being creative about the instruction selection, that's why I marked it as a target bug :)
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #2 from Tamar Christina --- (In reply to Richard Biener from comment #1) > Well, the usual unknown ABI boundary at function entry/exit. Yes but LLVM gets it right, so should be a solve able computer science problem. :) Note that this was reduced from a bigger routine but end result the same, the thing shouldn't have been vectorized.
[Bug target/109632] Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 --- Comment #1 from Richard Biener --- Well, the usual unknown ABI boundary at function entry/exit.
[Bug target/109632] New: Inefficient codegen when complex numbers are emulated with structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632 Bug ID: 109632 Summary: Inefficient codegen when complex numbers are emulated with structs Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following two cases are the same struct complx_t { float re; float im; }; complx_t add(const complx_t , const complx_t ) { return {a.re + b.re, a.im + b.im}; } _Complex float add(const _Complex float *a, const _Complex float *b) { return {__real__ *a + __real__ *b, __imag__ *a + __imag__ *b}; } But we generate much different code (looking at -O2), For the first one we do: ldr d1, [x1] ldr d0, [x0] faddv0.2s, v0.2s, v1.2s fmovx0, d0 lsr x1, x0, 32 lsr w0, w0, 0 fmovs1, w1 fmovs0, w0 ret which is bad for obvious reasons, but also also never needed to go through the genreg for such a reversal. we could have used many other NEON instructions. For the second one we generate the good instructions: add(float _Complex const*, float _Complex const*): ldp s3, s2, [x0] ldp s0, s1, [x1] fadds1, s2, s1 fadds0, s3, s0 ret The difference being that in the second one we have decomposed the initial structure by loading the elements: [local count: 1073741824]: _1 = REALPART_EXPR <*a_8(D)>; _2 = REALPART_EXPR <*b_9(D)>; _3 = _1 + _2; _4 = IMAGPART_EXPR <*a_8(D)>; _5 = IMAGPART_EXPR <*b_9(D)>; _6 = _4 + _5; _10 = COMPLEX_EXPR <_3, _6>; return _10; In the first one we've kept them as vectors: [local count: 1073741824]: vect__1.6_13 = MEM [(float *)a_8(D)]; vect__2.9_15 = MEM [(float *)b_9(D)]; vect__3.10_16 = vect__1.6_13 + vect__2.9_15; MEM [(float *)] = vect__3.10_16; return D.4435; This part is probably a costing issue, we SLP them even though it's not profitable because for the APCS we have to return them in separate registers. Using -fno-tree-vectorize gets the gimple code right: [local count: 1073741824]: _1 = a_8(D)->re; _2 = b_9(D)->re; _3 = _1 + _2; D.4435.re = _3; _4 = a_8(D)->im; _5 = b_9(D)->im; _6 = _4 + _5; D.4435.im = _6; return D.4435; But we generate worse code: ldp s1, s0, [x0] mov x2, 0 ldp s3, s2, [x1] fadds1, s1, s3 fadds0, s0, s2 fmovw1, s1 fmovw0, s0 bfi x2, x1, 0, 32 bfi x2, x0, 32, 32 lsr x0, x2, 32 lsr w2, w2, 0 fmovs1, w0 fmovs0, w2 where we again use genreg as a very complicated way to do a no-op. So there are two bugs here: 1. a costing, we shouldn't SLP 2. an expansion, the code out of expand is bad to begin with.
[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |5.0
[Bug tree-optimization/105451] miss optimizations due to inconsistency in complex numbers associativity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105451 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2022-05-02 --- Comment #1 from Richard Biener --- It's difficult to even spot the difference in the SLP tree, but the associatable multiplications make it somewhat random which of the valid SLP tree representations we'll pick. All of the SLP discovery is a heuristically greedy search for the _first_ match, not for the "best" even ("best" interpreted as "largest" in case of BB vectorization). The only thing that reliably happens at the moment is associating until the leafs are all from one load group (until we support multiple load groups in one leaf).
[Bug tree-optimization/105451] New: miss optimizations due to inconsistency in complex numbers associativity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105451 Bug ID: 105451 Summary: miss optimizations due to inconsistency in complex numbers associativity Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: tnfchris at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The two functions #include void f (complex double * restrict a, complex double *restrict b, complex double *restrict c, complex double *res, int n) { for (int i = 0; i < n; i++) res[i] = a[i] * (b[i] * c[i]); } and void g (complex double * restrict a, complex double *restrict b, complex double *restrict c, complex double *res, int n) { for (int i = 0; i < n; i++) res[i] = (a[i] * b[i]) * c[i]; } At -Ofast produce the same code, but internally they get there using different SLP trees. The former creates a chain of VEC_PERM_EXPR nodes as is expected tinyurl.com/cmulslp1 however the latter avoids the need of the permutes by duplicating the elements of the complex number https://tinyurl.com/cmulslp2 The former we can detect as back to back complex multiplication but the latter we can't. Not sure what the best way to get consistency here is.
[Bug tree-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 Richard Biener changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #28 from Richard Biener --- *** Bug 104406 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 Andrew Pinski changed: What|Removed |Added Component|rtl-optimization|tree-optimization --- Comment #27 from Andrew Pinski --- And there are two issues here, one is related to SLP not happening and the other deals with the argument and return value passing.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #26 from Andrew Pinski --- Note there might be a dup of this bug somewhere too.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #25 from Joel Yliluoma --- (In reply to Jakub Jelinek from comment #24) > on x86 read e.g. about MXCSR register and in the description of each > instruction on which Exceptions it can raise. So the quick answer to #15 is that addps instruction may raise exceptions. Ok, thanks for clearing that up. My bad. So it seems that LLVM relies on the assumption that the upper portions of the register are zeroed, and this is what you said in the first place.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #24 from Jakub Jelinek --- Bugzilla is not the right place to educate users. Of course the C FE_* exceptions map to real hardware exceptions, on x86 read e.g. about MXCSR register and in the description of each instruction on which Exceptions it can raise.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #23 from Joel Yliluoma --- (In reply to Jakub Jelinek from comment #21) > (In reply to Joel Yliluoma from comment #20) > > Which exceptions would be generated by data in an unused portion of a > > register? > > addps adds 4 float elements, there is no "unused" portion. > If some of the elements contain garbage, it can trigger for e.g. the addition > FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW or FE_INEXACT (FE_DIVBYZERO obviously > isn't relevant to addition). > Please read the standard about floating point exceptions, fenv.h etc. There is “unused” portion, for the purposes of the data use. Same as with padding in structs; the memory is unused because no part in program relies on its contents, even though the CPU may load those portions in registers when e.g. moving and copying the struct. The CPU won’t know whether it’s used or not. You mention FE_INVALID etc., but those are concepts within the C standard library, not in the hardware. The C standard library will not make judgments on the upper portions of the register. So if you have two float[2]s, and you add them together into another float[2], and the compiler uses addps to achieve this task, what is the mechanism that would supposedly generate an exception, when no part in the software depends and makes judgments on the irrelevant parts of the register?
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #22 from rguenther at suse dot de --- On Tue, 21 Apr 2020, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 > > Jakub Jelinek changed: > >What|Removed |Added > > CC||hjl.tools at gmail dot com, >||hubicka at gcc dot gnu.org, >||matz at gcc dot gnu.org > > --- Comment #19 from Jakub Jelinek --- > CCing Micha and Honza on the ABI question. The arguments are class SSE (__m64), but I fail to find clarification as to whether "unused" parts of argument registers (the SSEUP part of the %xmmN register) is supposed to be zeroed or has unspecified contents.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #21 from Jakub Jelinek --- (In reply to Joel Yliluoma from comment #20) > Which exceptions would be generated by data in an unused portion of a > register? addps adds 4 float elements, there is no "unused" portion. If some of the elements contain garbage, it can trigger for e.g. the addition FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW or FE_INEXACT (FE_DIVBYZERO obviously isn't relevant to addition). Please read the standard about floating point exceptions, fenv.h etc.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #20 from Joel Yliluoma --- (In reply to Jakub Jelinek from comment #16) > (In reply to Joel Yliluoma from comment #15) > > (In reply to Richard Biener from comment #14) > > > I also think llvms code generation is bogus since it appears the ABI > > > does not guarantee zeroed upper elements of the xmm0 argument > > > which means they could contain sNaNs: > > > > Why would it matter that the unused portions of the register contain NaNs? > > Because it could then raise exceptions that shouldn't be raised? Which exceptions would be generated by data in an unused portion of a register? Does for example “addps” generate an exception if one or two of the operands contains NaNs? Which instructions would generate exceptions? I can only think of divps, when dividing by a zero, but it does not seem that even LLVM compiles the two-element vector division into divps. If the register is passed as a parameter to a library function, they would not make judgments based on the values of the unused portions of the registers.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 Jakub Jelinek changed: What|Removed |Added CC||hjl.tools at gmail dot com, ||hubicka at gcc dot gnu.org, ||matz at gcc dot gnu.org --- Comment #19 from Jakub Jelinek --- CCing Micha and Honza on the ABI question.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #18 from Jakub Jelinek --- Note, we could do movq %xmm0, %xmm0; movq %xmm1, %xmm1; addpd %xmm1, %xmm0 for the #c4 first function.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #17 from rguenther at suse dot de --- On Tue, 21 Apr 2020, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 > > Jakub Jelinek changed: > >What|Removed |Added > > CC||jakub at gcc dot gnu.org > > --- Comment #16 from Jakub Jelinek --- > (In reply to Joel Yliluoma from comment #15) > > (In reply to Richard Biener from comment #14) > > > I also think llvms code generation is bogus since it appears the ABI > > > does not guarantee zeroed upper elements of the xmm0 argument > > > which means they could contain sNaNs: > > > > Why would it matter that the unused portions of the register contain NaNs? > > Because it could then raise exceptions that shouldn't be raised? Note it might be llvm actually zeros the upper half at the caller (in disagreement with GCC). Maybe also the psABI specifies that should happen and GCC is wrong. Just at the moment interoperating GCC and LLVM is prone to the above mentioned issue.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #16 from Jakub Jelinek --- (In reply to Joel Yliluoma from comment #15) > (In reply to Richard Biener from comment #14) > > I also think llvms code generation is bogus since it appears the ABI > > does not guarantee zeroed upper elements of the xmm0 argument > > which means they could contain sNaNs: > > Why would it matter that the unused portions of the register contain NaNs? Because it could then raise exceptions that shouldn't be raised?
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #15 from Joel Yliluoma --- (In reply to Richard Biener from comment #14) > I also think llvms code generation is bogus since it appears the ABI > does not guarantee zeroed upper elements of the xmm0 argument > which means they could contain sNaNs: Why would it matter that the unused portions of the register contain NaNs?
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #14 from Richard Biener --- (In reply to Joel Yliluoma from comment #13) > GCC 4.1.2 is indicated in the bug report headers. > Luckily, Compiler Explorer has a copy of that exact version, and it indeed > vectorizes the second function: https://godbolt.org/z/DC_SSb > > On my own system, the earliest I have is 4.6. The Compiler Explorer has 4.4, > and it, or anything newer than that, no longer vectorizes either function. Ah, OK - that's before GCC learned vectorization and is code-generated by RTL expanding return {BIT_FIELD_REF + BIT_FIELD_REF }; so the only vector support was GCCs generic vectors (and intrinsics). The generated code is far from perfect though. I also think llvms code generation is bogus since it appears the ABI does not guarantee zeroed upper elements of the xmm0 argument which means they could contain sNaNs: typedef float ss2 __attribute__((vector_size(8))); typedef float ss4 __attribute__((vector_size(16))); ss2 add2(ss2 a, ss2 b); void bar(ss4 a) { volatile ss2 x; x = add2 ((ss2){a[0], a[1]}, (ss2){a[0], a[1]}); } produces bar: .LFB1: .cfi_startproc subq$56, %rsp .cfi_def_cfa_offset 64 movdqa %xmm0, %xmm1 calladd2 movq%xmm0, 24(%rsp) addq$56, %rsp which means we pass through 'a' unchanged.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #13 from Joel Yliluoma --- GCC 4.1.2 is indicated in the bug report headers. Luckily, Compiler Explorer has a copy of that exact version, and it indeed vectorizes the second function: https://godbolt.org/z/DC_SSb On my own system, the earliest I have is 4.6. The Compiler Explorer has 4.4, and it, or anything newer than that, no longer vectorizes either function.
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 Richard Biener changed: What|Removed |Added Blocks||53947 CC||uros at gcc dot gnu.org --- Comment #12 from Richard Biener --- (In reply to Joel Yliluoma from comment #11) > Looks like this issue has taken a step or two *backwards* in the past years. > > Where as the second function used to be vectorized properly, today it seems > neither of them are. Which version do you see vectorizing the second (add2) function? > Contrast this with Clang, which compiles *both* functions into a single > instruction: > > vaddps xmm0, xmm1, xmm0 > > or some variant thereof depending on the -m options. > > Compiler Explorer link: https://godbolt.org/z/2AKhnt The main issues on the GCC side are a) ABI details not exposed at the point of vectorization (several PRs about this exist) b) "Poor" support for two-element float vectors (an understatement, we have some support for MMX but that's integer only, but I'm not sure we've enabled the 3dnow part to be emulated with SSE) oddly enough even with -mmmx -m3dnow I see add2 lowered by veclower so the vector type or the vector add must be unsupported(?). llvm is known to support emulating smaller vectors just fine (and by design is also aware of ABI details). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485 --- Comment #11 from Joel Yliluoma --- Looks like this issue has taken a step or two *backwards* in the past years. Where as the second function used to be vectorized properly, today it seems neither of them are. Contrast this with Clang, which compiles *both* functions into a single instruction: vaddps xmm0, xmm1, xmm0 or some variant thereof depending on the -m options. Compiler Explorer link: https://godbolt.org/z/2AKhnt
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #28 from Jonathan Wakely --- (In reply to kargl from comment #21) > Created attachment 46102 [details] > fix g++ problem with sqrt(z) where z is complex and imag(z) = -0 This one assumes copysign is valid for arguments of type _Tp, which is only true for float, double and long double. std::pow might make sense for complex integers, but seems to be already broken, so this doesn't make it any worse there. To preserve support for user-defined numeric types (and decimal floats?) we could add an overloaded helper for copysign which retains the `__x < _Tp()` check for the generic overload and uses copysign for floating point types. (In reply to kargl from comment #23) > Created attachment 46105 [details] > fix g++ problem with pow(z,0.5) where imag(z) = -0. > > This patch has only been tested with the original test provided by the > reporter. This one makes me a little uncomfortable with the use of abs, but we already use that elsewhere (and inconsistently qualify it as std::abs). Again, this doesn't seem to make anything worse w.r.t our support for types other than float, double and long double.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #27 from Jonathan Wakely --- Not in stage 4.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org Priority|P3 |P2 Severity|normal |major --- Comment #26 from kargl at gcc dot gnu.org --- Any chance a libstdc++ person will commit the supplied patches?
[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337 Chinoune changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from Chinoune --- Sorry, It wasn't a bug. the compiler is not skipping the if statement. -fassociative-math : Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like (x + 2**52) - 2**52. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. -fassociative-math is enabled by -funsafe-math-optimizations .
[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337 --- Comment #2 from Steve Kargl --- On Sat, Aug 03, 2019 at 03:14:57PM +, kargl at gcc dot gnu.org wrote: > --- Comment #1 from kargl at gcc dot gnu.org --- > (In reply to Chinoune from comment #0) > > I have encountered some underflows/overflows in my code compiled with > > -Ofast, and after investigations it seems like the complex abs gives zero > > with small numbers. So I added a workaround. but it didn't work: > > > > (snip) > > > > > gfortran-9 -O1 -funsafe-math-optimizations -ffinite-math-only > > bug_skip_if.f90 -o test.x > > ./test.x > > (snip) > > > > > Q : Why does gfortran skip the if statement? > > What happens if you don't use options that allow > a compiler to violate the standard? > BTW, with the posted code, I cannot reproduce your results on either i586-*-freebsd or x86_64-*-freebsd.
[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337 kargl at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P4 Severity|normal |minor
[Bug fortran/91337] gfortran skips an if statement with some mathematical optimisations with complex numbers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #1 from kargl at gcc dot gnu.org --- (In reply to Chinoune from comment #0) > I have encountered some underflows/overflows in my code compiled with > -Ofast, and after investigations it seems like the complex abs gives zero > with small numbers. So I added a workaround. but it didn't work: > (snip) > > gfortran-9 -O1 -funsafe-math-optimizations -ffinite-math-only > bug_skip_if.f90 -o test.x > ./test.x (snip) > > Q : Why does gfortran skip the if statement? What happens if you don't use options that allow a compiler to violate the standard? '-Ofast' Disregard strict standards compliance. '-Ofast' enables all '-O3' optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on '-ffast-math' and the Fortran-specific '-fstack-arrays', unless '-fmax-stack-var-size' is specified, and '-fno-protect-parens'. '-ffast-math' Sets the options '-fno-math-errno', '-funsafe-math-optimizations', '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans', '-fcx-limited-range' and '-fexcess-precision=fast'. This option causes the preprocessor macro '__FAST_MATH__' to be defined. This option is not turned on by any '-O' option besides '-Ofast' since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. '-funsafe-math-optimizations' Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. This option is not turned on by any '-O' option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions
[Bug fortran/91337] New: gfortran skips an if statement with some mathematical optimisations with complex numbers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91337 Bug ID: 91337 Summary: gfortran skips an if statement with some mathematical optimisations with complex numbers. Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: chinoune.mehdi at hotmail dot com Target Milestone: --- I have encountered some underflows/overflows in my code compiled with -Ofast, and after investigations it seems like the complex abs gives zero with small numbers. So I added a workaround. but it didn't work: module m implicit none ! integer, parameter :: sp = selected_real_kind(6) real(sp), parameter :: tiny_sp = tiny(1._sp), sqrt_tiny = sqrt( tiny_sp ) ! contains subroutine sub(z,y) complex(sp), intent(in) :: z real(sp), intent(out) :: y real(sp) :: az ! az = abs(z) if( az
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #25 from Steve Kargl --- On Tue, Apr 09, 2019 at 08:24:29PM +, redi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 > > --- Comment #24 from Jonathan Wakely --- > Thanks for the patch, I'll test it fully tomorrow. > I think the patch for complex_sqrt() is correct. The one for complex_pow(), I think accidently works for OP, but is likely broken for some general regions of the complex plane. > I'll open a separate bug for the FreeBSD issue. We could use more fine-grained > configure checks so that most C99 math functions are enabled, even if some of > the complex ones are missing. libgfortran has c99_functions.c that implements missing C99 math functions when configure cannot find one. The implementations are likely to be fairly direct without much optimization, worrying about exceptional casea, or even tested extensively.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #24 from Jonathan Wakely --- Thanks for the patch, I'll test it fully tomorrow. I'll open a separate bug for the FreeBSD issue. We could use more fine-grained configure checks so that most C99 math functions are enabled, even if some of the complex ones are missing.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #23 from kargl at gcc dot gnu.org --- Created attachment 46105 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46105=edit fix g++ problem with pow(z,0.5) where imag(z) = -0. This patch has only been tested with the original test provided by the reporter.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #22 from Steve Kargl --- On Mon, Apr 08, 2019 at 10:17:00PM +, redi at gcc dot gnu.org wrote: > (In reply to kargl from comment #19) > > I get the expected. So, if you're on a system that has > > _GLIBCXX_USE_C99_COMPLEX, you won't see the bug. > > Wow, why isn't libstdc++ using the C99 functions on FreeBSD? > Because it is all or nothing. See comment #8 and #9 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89125 If you're too busy to look at PR89125 the upshot is that FreeBSD is missing ccoshl, ccosl, cexpl, csinhl, csinl, ctanhl, and ctanl. I have BSD licensed versions of ccoshl, ccosl, cexpl, csinhl, and csinl, but testing on FreeBSD ran into what I consider to be a very bad bug in clang (FreeBSD system compiler).
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #21 from kargl at gcc dot gnu.org --- Created attachment 46102 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46102=edit fix g++ problem with sqrt(z) where z is complex and imag(z) = -0
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #20 from Jonathan Wakely --- (In reply to Steve Kargl from comment #16) > If Andrew is correct and a builtin is called, Wasn't that me, not Andrew? > you might find > my results if you use -fno-builtins (check spelling). No, same results. Calling __builtin_csqrt doesn't necessarily mean GCC evaluates it. Without optimisation it still generates a call to csqrt from libc. > Looking at ./libstdc++-v3/include/std/complex, one finds. > > // 26.2.8/13 sqrt(__z): Returns the complex square root of __z. > // The branch cut is on the negative axis. > template > complex<_Tp> > __complex_sqrt(const complex<_Tp>& __z) > { > _Tp __x = __z.real(); > _Tp __y = __z.imag(); > > if (__x == _Tp()) > { > _Tp __t = sqrt(abs(__y) / 2); > return complex<_Tp>(__t, __y < _Tp() ? -__t : __t); > } > else > { > _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x))); > _Tp __u = __t / 2; > return __x > _Tp() > ? complex<_Tp>(__u, __y / __t) > : complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u); > } > } > > Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)? Yes, but that code shouldn't be used for modern targets ... (In reply to kargl from comment #19) > I get the expected. So, if you're on a system that has > _GLIBCXX_USE_C99_COMPLEX, you won't see the bug. Wow, why isn't libstdc++ using the C99 functions on FreeBSD? I'll have to look into that. > It is likely that everywhere that a construct of the > form __y < _Tp() ? -__u : __u appear, it needs to use > copysign. That won't always work, because the generic functions should really only get used when _Tp is not one of float, double or long double. And in that case there might be no copysign for the type. For float, double and long double we should be using the libc routines. So the bug is that FreeBSD isn't using them. The _original_ bug report is for std::pow though, and on Ubuntu, which does use glibc. Comment 9 needs more analysis.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #19 from kargl at gcc dot gnu.org --- (In reply to Steve Kargl from comment #18) > On Mon, Apr 08, 2019 at 08:03:36PM +, pinskia at gcc dot gnu.org wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 > > > > --- Comment #17 from Andrew Pinski --- > > >Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)? > > > > No, you missed this part: > > // The branch cut is on the negative axis. > > No, I didn't miss that part. > > > So maybe the bug is inside FreeBSD and Window's libm. Glibc fixed the > > branch > > cuts issues back in 2012 for csqrt but the other OS's did not change theirs. > > For the C++ code in comment, on x86_64-*-freebsd. > > % g++8 -o z a.cpp -lm && ./z > z = (-1.84250315177824e-07,-0) >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > sqrt(z) = (0,0.000429243887758258) > sqrt(conj(z)) = (0,0.000429243887758258) > conj(sqrt(z)) = (0,-0.000429243887758258) > > The last two lines are definitely wrong. > > troutmask:sgk[209] nm z | grep csqrt > troutmask:sgk[210] nm z | grep sqrt > 0040156b W _ZSt14__complex_sqrtIdESt7complexIT_ERKS2_ > 0040143d W _ZSt4sqrtIdESt7complexIT_ERKS2_ > U sqrt@@FBSD_1.0 > > There is no reference to csqrt in the exectuable. If I change > /usr/local/lib/gcc8/include/c++/complex to use copysign > to account for __y = -0 like > > template > complex<_Tp> > __complex_sqrt(const complex<_Tp>& __z) > { > _Tp __x = __z.real(); > _Tp __y = __z.imag(); > > if (__x == _Tp()) > { > _Tp __t = sqrt(abs(__y) / 2); > // return complex<_Tp>(__t, __y < _Tp() ? -__t : __t); > return complex<_Tp>(__t, copysign(__t, __y)); > } > else > { > _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x))); > _Tp __u = __t / 2; > // return __x > _Tp() > //? complex<_Tp>(__u, __y / __t) > //: complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u); > return __x > _Tp() > ? complex<_Tp>(__u, __y / __t) > : complex<_Tp>(abs(__y) / __t, copysign(__u, __y)); > } > } > > > The C++ code in comment #10 gives > > g++8 -o z a.cpp -lm && ./z > z = (-1.84250315177824e-07,-0) >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > sqrt(z) = (0,-0.000429243887758258) > sqrt(conj(z)) = (0,0.000429243887758258) > conj(sqrt(z)) = (0,0.000429243887758258) > > The correct answer. QED. BTW, if change /usr/local/lib/gcc8/include/c++/complex back to no using copysign(), and instead change #if _GLIBCXX_USE_C99_COMPLEX inline __complex__ float __complex_sqrt(__complex__ float __z) { return __builtin_csqrtf(__z); } to #if _GLIBCXX_USE_C99_COMPLEX || SOMETHING_UGLY inline __complex__ float __complex_sqrt(__complex__ float __z) { return __builtin_csqrtf(__z); } and do g++8 -DSOMETHING_UGLY -o z a.cpp -lm && ./z z = (-1.84250315177824e-07,-0) pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) sqrt(z) = (0,-0.000429243887758258) sqrt(conj(z)) = (0,0.000429243887758258) conj(sqrt(z)) = (0,0.000429243887758258) I get the expected. So, if you're on a system that has _GLIBCXX_USE_C99_COMPLEX, you won't see the bug. It is likely that everywhere that a construct of the form __y < _Tp() ? -__u : __u appear, it needs to use copysign.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #18 from Steve Kargl --- On Mon, Apr 08, 2019 at 08:03:36PM +, pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 > > --- Comment #17 from Andrew Pinski --- > >Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)? > > No, you missed this part: > // The branch cut is on the negative axis. No, I didn't miss that part. > So maybe the bug is inside FreeBSD and Window's libm. Glibc fixed the branch > cuts issues back in 2012 for csqrt but the other OS's did not change theirs. For the C++ code in comment, on x86_64-*-freebsd. % g++8 -o z a.cpp -lm && ./z z = (-1.84250315177824e-07,-0) pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) sqrt(z) = (0,0.000429243887758258) sqrt(conj(z)) = (0,0.000429243887758258) conj(sqrt(z)) = (0,-0.000429243887758258) The last two lines are definitely wrong. troutmask:sgk[209] nm z | grep csqrt troutmask:sgk[210] nm z | grep sqrt 0040156b W _ZSt14__complex_sqrtIdESt7complexIT_ERKS2_ 0040143d W _ZSt4sqrtIdESt7complexIT_ERKS2_ U sqrt@@FBSD_1.0 There is no reference to csqrt in the exectuable. If I change /usr/local/lib/gcc8/include/c++/complex to use copysign to account for __y = -0 like template complex<_Tp> __complex_sqrt(const complex<_Tp>& __z) { _Tp __x = __z.real(); _Tp __y = __z.imag(); if (__x == _Tp()) { _Tp __t = sqrt(abs(__y) / 2); // return complex<_Tp>(__t, __y < _Tp() ? -__t : __t); return complex<_Tp>(__t, copysign(__t, __y)); } else { _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x))); _Tp __u = __t / 2; // return __x > _Tp() //? complex<_Tp>(__u, __y / __t) //: complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u); return __x > _Tp() ? complex<_Tp>(__u, __y / __t) : complex<_Tp>(abs(__y) / __t, copysign(__u, __y)); } } The C++ code in comment #10 gives g++8 -o z a.cpp -lm && ./z z = (-1.84250315177824e-07,-0) pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) sqrt(z) = (0,-0.000429243887758258) sqrt(conj(z)) = (0,0.000429243887758258) conj(sqrt(z)) = (0,0.000429243887758258) The correct answer. QED.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #17 from Andrew Pinski --- >Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)? No, you missed this part: // The branch cut is on the negative axis. So maybe the bug is inside FreeBSD and Window's libm. Glibc fixed the branch cuts issues back in 2012 for csqrt but the other OS's did not change theirs.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #16 from Steve Kargl --- On Mon, Apr 08, 2019 at 07:20:22PM +, redi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 > > --- Comment #13 from Jonathan Wakely --- > (In reply to Steve Kargl from comment #10) > > % g++8 -o z a.cpp -lm && ./z > > z = (-1.84250315177824e-07,-0) > >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > > sqrt(z) = (0,0.000429243887758258) > > sqrt(conj(z)) = (0,0.000429243887758258) > > conj(sqrt(z)) = (0,-0.000429243887758258) > > > > This looks wrong. > > I can't reproduce this, I get: > > z = (-1.84250315177824e-07,-0) >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > sqrt(z) = (0,-0.000429243887758258) > sqrt(conj(z)) = (0,0.000429243887758258) > conj(sqrt(z)) = (0,0.000429243887758258) > My results are for i585-*-freebsd, which doesn't use glibc. If Andrew is correct and a builtin is called, you might find my results if you use -fno-builtins (check spelling). Looking at ./libstdc++-v3/include/std/complex, one finds. // 26.2.8/13 sqrt(__z): Returns the complex square root of __z. // The branch cut is on the negative axis. template complex<_Tp> __complex_sqrt(const complex<_Tp>& __z) { _Tp __x = __z.real(); _Tp __y = __z.imag(); if (__x == _Tp()) { _Tp __t = sqrt(abs(__y) / 2); return complex<_Tp>(__t, __y < _Tp() ? -__t : __t); } else { _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x))); _Tp __u = __t / 2; return __x > _Tp() ? complex<_Tp>(__u, __y / __t) : complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u); } } Doesn't this gets the wrong answer for __y = -0 (as -0 < 0 is false)?
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #15 from Andrew Pinski --- (In reply to Jonathan Wakely from comment #13) > (In reply to Steve Kargl from comment #10) > > % g++8 -o z a.cpp -lm && ./z > > z = (-1.84250315177824e-07,-0) > >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > > sqrt(z) = (0,0.000429243887758258) > > sqrt(conj(z)) = (0,0.000429243887758258) > > conj(sqrt(z)) = (0,-0.000429243887758258) > > > > This looks wrong. > > I can't reproduce this, I get: > > z = (-1.84250315177824e-07,-0) >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > sqrt(z) = (0,-0.000429243887758258) > sqrt(conj(z)) = (0,0.000429243887758258) > conj(sqrt(z)) = (0,0.000429243887758258) My bet now comes to the fact there have been improvements to glibc which changed the behavior here Also I used the wrong term, it is the branch cut that is the issue. Most of the branch cuts were fixed in glibc in 2012; though there might have been some fixed later on.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #14 from Jonathan Wakely --- Which is unsurprising because std::sqrt(z) just calls __builtin_csqrt(z.__rep())
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #13 from Jonathan Wakely --- (In reply to Steve Kargl from comment #10) > % g++8 -o z a.cpp -lm && ./z > z = (-1.84250315177824e-07,-0) >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > sqrt(z) = (0,0.000429243887758258) > sqrt(conj(z)) = (0,0.000429243887758258) > conj(sqrt(z)) = (0,-0.000429243887758258) > > This looks wrong. I can't reproduce this, I get: z = (-1.84250315177824e-07,-0) pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) sqrt(z) = (0,-0.000429243887758258) sqrt(conj(z)) = (0,0.000429243887758258) conj(sqrt(z)) = (0,0.000429243887758258)
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #12 from Jonathan Wakely --- (In reply to Steve Kargl from comment #11) > unless [Note: ...] is non-normative text. That's exactly what it is. But we can still aim to meet the intended behaviour.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #11 from Steve Kargl --- On Mon, Apr 08, 2019 at 02:32:38PM +, sgk at troutmask dot apl.washington.edu wrote: > > I don't have a copy of the C++ standard, so take this specualtion. > pow(z,0.5) is equivalent to sqrt(z). From the C standard, one has > > conj(csqrt(z)) = csqrt(conj(z)). > > g++ does not enforce this when the imaginary part is -0; > while gcc does. (code snipped) > % gcc8 -o z c.c -lm && ./z > z = CMPLX(-1.8425031517782417e-07, -0.e+00) > cpow(z, 0.5) = CMPLX( 2.6283607659835831e-20, -4.2924388775825818e-04) > csqrt(z) = CMPLX( 0.e+00, -4.2924388775825818e-04) > csqrt(conj(z)) = CMPLX( 0.e+00, 4.2924388775825818e-04) > conj(csqrt(z)) = CMPLX( 0.e+00, 4.2924388775825818e-04) (code snipped) > % g++8 -o z a.cpp -lm && ./z > z = (-1.84250315177824e-07,-0) >pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) > sqrt(z) = (0,0.000429243887758258) > sqrt(conj(z)) = (0,0.000429243887758258) > conj(sqrt(z)) = (0,-0.000429243887758258) > > This looks wrong. It is wrong. From n4810.pdf, page 1102, template complex sqrt(const complex& x); Returns: The complex square root of x, in the range of the right half-plane. [Note: The semantics of this function are intended to be the same in C++ as they are for csqrt in C. -- end note] unless [Note: ...] is non-normative text.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #10 from Steve Kargl --- On Mon, Apr 08, 2019 at 09:59:22AM +, redi at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 > > --- Comment #7 from Jonathan Wakely --- > I think it's allowed. The standards have very little to say about accuracy of > any mathematical functions, and complex(0, 0.0) == complex(0, > -0.0) is true according to the standard, because +0.0 == -0.0 is true. > I don't have a copy of the C++ standard, so take this specualtion. pow(z,0.5) is equivalent to sqrt(z). From the C standard, one has conj(csqrt(z)) = csqrt(conj(z)). g++ does not enforce this when the imaginary part is -0; while gcc does. % cat c.c #include #include int main(void) { double complex z, t0, t1, t2, t3; z = CMPLX(-1.8425031517782417e-07, -0.0); t0 = cpow(z, 0.5); t1 = csqrt(z); t2 = csqrt(conj(z)); t3 = conj(csqrt(z)); printf(" z = CMPLX(% .16le, % .16le)\n", creal(z), cimag(z)); printf(" cpow(z, 0.5) = CMPLX(% .16le, % .16le)\n", creal(t0), cimag(t0)); printf(" csqrt(z) = CMPLX(% .16le, % .16le)\n", creal(t1), cimag(t1)); printf("csqrt(conj(z)) = CMPLX(% .16le, % .16le)\n", creal(t2), cimag(t2)); printf("conj(csqrt(z)) = CMPLX(% .16le, % .16le)\n", creal(t3), cimag(t3)); return 0; } % gcc8 -o z c.c -lm && ./z z = CMPLX(-1.8425031517782417e-07, -0.e+00) cpow(z, 0.5) = CMPLX( 2.6283607659835831e-20, -4.2924388775825818e-04) csqrt(z) = CMPLX( 0.e+00, -4.2924388775825818e-04) csqrt(conj(z)) = CMPLX( 0.e+00, 4.2924388775825818e-04) conj(csqrt(z)) = CMPLX( 0.e+00, 4.2924388775825818e-04) mobile:kargl[210] cat a.cpp #include #include #include int main(int argc, char *argv[]) { std::complex z, t0, t1, t2, t3; z = std::complex(-1.8425031517782417e-07, -0.0); t0 = std::pow(z, 0.5); t1 = std::sqrt(z); t2 = std::sqrt(std::conj(z)); t3 = std::conj(std::sqrt(z)); std::cout << "z = " << std::setprecision(15) << z << std::endl; std::cout << " pow(z,0.5) = " << std::setprecision(15) << t0 << std::endl; std::cout << " sqrt(z) = " << std::setprecision(15) << t1 << std::endl; std::cout << "sqrt(conj(z)) = " << std::setprecision(15) << t2 << std::endl; std::cout << "conj(sqrt(z)) = " << std::setprecision(15) << t3 << std::endl; return 0; } % g++8 -o z a.cpp -lm && ./z z = (-1.84250315177824e-07,-0) pow(z,0.5) = (2.62836076598358e-20,-0.000429243887758258) sqrt(z) = (0,0.000429243887758258) sqrt(conj(z)) = (0,0.000429243887758258) conj(sqrt(z)) = (0,-0.000429243887758258) This looks wrong.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #9 from Jonathan Wakely --- (In reply to Richard Biener from comment #5) > The issue is > > std::pow (__x=..., __y=@0x7fffdcb8: 0.5) > at /home/space/rguenther/install/gcc-9.0/include/c++/9.0.1/complex:1027 > (gdb) l > 1022{ > 1023#if ! _GLIBCXX_USE_C99_COMPLEX > 1024 if (__x == _Tp()) > 1025return _Tp(); > 1026#endif > 1027 if (__x.imag() == _Tp() && __x.real() > _Tp()) > 1028return pow(__x.real(), __y); > > where __x.imag () == _Tp() says true for -0.0 == 0.0. This means > std::pow will return the same values for r + -0.0i and r + 0.0i, > not sure if that is allowed by the C++ standard. But __x.real() > _Tp() is false here, so that branch isn't taken anyway. Instead the pow(val, 0.5) result comes from: _Complex double val = -1.8425031517782417e-07 + -0.0 * I; _Complex double t = __builtin_clog(val); double rho = exp(0.5 * __real__ t); double theta = 0.5 * __imag__ t; _Complex result = rho * cos(theta) + rho * sin(theta) * I; __builtin_printf("%f\n", __imag__ result);
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #8 from Andrew Pinski --- Also isn't it true that this is just a different quadrant of the solution? That is the answer is correct but which quadrant being selected is different? That is (a^0.5) actually has two answers where the imaginary part can be positive or negative? That is they are conjugate of each other.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #7 from Jonathan Wakely --- I think it's allowed. The standards have very little to say about accuracy of any mathematical functions, and complex(0, 0.0) == complex(0, -0.0) is true according to the standard, because +0.0 == -0.0 is true.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #6 from Andrew Pinski --- (In reply to Richard Biener from comment #5) > The issue is > > std::pow (__x=..., __y=@0x7fffdcb8: 0.5) > at /home/space/rguenther/install/gcc-9.0/include/c++/9.0.1/complex:1027 > (gdb) l > 1022{ > 1023#if ! _GLIBCXX_USE_C99_COMPLEX > 1024 if (__x == _Tp()) > 1025return _Tp(); > 1026#endif > 1027 if (__x.imag() == _Tp() && __x.real() > _Tp()) > 1028return pow(__x.real(), __y); > > where __x.imag () == _Tp() says true for -0.0 == 0.0. This means > std::pow will return the same values for r + -0.0i and r + 0.0i, > not sure if that is allowed by the C++ standard. If it does not allow it, then adding copysign is needed.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 Richard Biener changed: What|Removed |Added Keywords||wrong-code --- Comment #5 from Richard Biener --- The issue is std::pow (__x=..., __y=@0x7fffdcb8: 0.5) at /home/space/rguenther/install/gcc-9.0/include/c++/9.0.1/complex:1027 (gdb) l 1022{ 1023#if ! _GLIBCXX_USE_C99_COMPLEX 1024 if (__x == _Tp()) 1025return _Tp(); 1026#endif 1027 if (__x.imag() == _Tp() && __x.real() > _Tp()) 1028return pow(__x.real(), __y); where __x.imag () == _Tp() says true for -0.0 == 0.0. This means std::pow will return the same values for r + -0.0i and r + 0.0i, not sure if that is allowed by the C++ standard.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-04-08 CC||marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Martin Liška --- I see the expected result when replacing '-0.0' with '0.0'. Well, negative zero should be equal to the positive one according to standard.
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #3 from t.sprodowski at web dot de --- Octave 4.2.2: ans = 2.6284e-20 + 4.2924e-04i
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #2 from kargl at gcc dot gnu.org --- (In reply to t.sprodowski from comment #0) > Following calculation of the complex number leads to a wrong imaginary part: > > > #include > #include > #include > > int main(int argc, char *argv[]) > { > std::complex val = std::complex(-1.8425031517782417e-07, > -0.0); > std::complex testExp = std::pow(val, 0.5); > std::cout << "textExp: " << std::setprecision(30) << testExp << std::endl; > return 0; > } > > Result is: > (2.6283607659835830609796003783e-20,-0.000429243887758258178214548772544), > but it should be > (2.628360765983583e-20, 0.0004292438877582582), obtained from Visual Studio, > MATLAB and Octave. > What version of Octave. I get >> z = complex(-1.8425031517782417e-07, -0.0) z = -0.0018425 - 0.000i >> z**0.5 ans = 2.6284e-20 - 4.2924e-04i which agrees with clang++ version 7.0.1 (and apparently g++ which I haven't tested).
[Bug libstdc++/89991] Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 --- Comment #1 from t.sprodowski at web dot de --- Created attachment 46095 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46095=edit Source file Source file to illustrate this bug.
[Bug libstdc++/89991] New: Complex numbers: Calculation of imaginary part is not correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 Bug ID: 89991 Summary: Complex numbers: Calculation of imaginary part is not correct Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: t.sprodowski at web dot de Target Milestone: --- Following calculation of the complex number leads to a wrong imaginary part: #include #include #include int main(int argc, char *argv[]) { std::complex val = std::complex(-1.8425031517782417e-07, -0.0); std::complex testExp = std::pow(val, 0.5); std::cout << "textExp: " << std::setprecision(30) << testExp << std::endl; return 0; } Result is: (2.6283607659835830609796003783e-20,-0.000429243887758258178214548772544), but it should be (2.628360765983583e-20, 0.0004292438877582582), obtained from Visual Studio, MATLAB and Octave. Compilation was done with gnu 8.2.0 and 7.3.0 on Ubuntu 18.04: g++ -c -pipe -g -std=gnu++1y -Wall -W -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DQT_QML_DEBUG -DQT_CORE_LIB -I../testPrecision -I. -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o main.o ../testPrecision/main.cpp
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 Jerry DeLisle changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Jerry DeLisle --- Fixed on trunk and on 7. Closing
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 --- Comment #7 from Jerry DeLisle --- Author: jvdelisle Date: Sun Dec 3 20:43:59 2017 New Revision: 255368 URL: https://gcc.gnu.org/viewcvs?rev=255368=gcc=rev Log: 2017-12-03 Jerry DeLisleDominique d'Humieres Backport from trunk PR libgfortran/83191 * io/transfer.c (list_formatted_read_scalar): Do not set namelist_mode bit here. (namelist_read): Likewise. (data_transfer_init): Clear the mode bit here. (finalize_transfer): Do set the mode bit just before any calls to namelist_read or namelist_write. It can now be referred to in complex_write. * io/write.c (write_complex): Suppress the leading blanks when namelist_mode bit is not set to 1. * gfortran.dg/namelist_95.f90: New test. Added: branches/gcc-7-branch/gcc/testsuite/gfortran.dg/namelist_95.f90 Modified: branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/libgfortran/ChangeLog branches/gcc-7-branch/libgfortran/io/list_read.c branches/gcc-7-branch/libgfortran/io/transfer.c branches/gcc-7-branch/libgfortran/io/write.c
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 --- Comment #6 from Jerry DeLisle --- Author: jvdelisle Date: Sun Dec 3 16:47:12 2017 New Revision: 255365 URL: https://gcc.gnu.org/viewcvs?rev=255365=gcc=rev Log: 2017-12-03 Jerry DeLisleDominique d'Humieres PR libgfortran/83191 * io/transfer.c (list_formatted_read_scalar): Do not set namelist_mode bit here. (namelist_read): Likewise. (data_transfer_init): Clear the mode bit here. (finalize_transfer): Do set the mode bit just before any calls to namelist_read or namelist_write. It can now be referred to in complex_write. ^ io/write.c (write_complex): Suppress the leading blanks when namelist_mode bit is not set to 1. * gfortran.dg/namelist_95.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/namelist_95.f90 Modified: trunk/gcc/testsuite/ChangeLog trunk/libgfortran/ChangeLog trunk/libgfortran/io/list_read.c trunk/libgfortran/io/transfer.c trunk/libgfortran/io/write.c
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 --- Comment #5 from Jerry DeLisle --- (In reply to Jerry DeLisle from comment #4) > Alternatively one could do this: > > @@ -1809,9 +1809,11 @@ write_complex (st_parameter_dt *dtp, const char > *source, int kind, size_t size) > precision, buf_size, result1, _len1); >get_float_string (dtp, , source + size / 2 , kind, 0, buffer, > precision, buf_size, result2, _len2); > - lblanks = width - res_len1 - res_len2 - 3; > - > - write_x (dtp, lblanks, lblanks); > + if (!dtp->u.p.namelist_mode) > +{ > + lblanks = width - res_len1 - res_len2 - 3; > + write_x (dtp, lblanks, lblanks); > +} >write_char (dtp, '('); >write_float_string (dtp, result1, res_len1); >write_char (dtp, semi_comma); With the following tweak: @@ -1950,6 +1952,7 @@ list_formatted_write (st_parameter_dt *dtp, bt type, void *p, int kind, size * GFC_SIZE_OF_CHAR_KIND(kind) : size; tmp = (char *) p; + dtp->u.p.namelist_mode = 0; /* Big loop over all the elements. */ for (elem = 0; elem < nelems; elem++) @@ -2394,6 +2397,7 @@ namelist_write (st_parameter_dt *dtp) char c; char *dummy_name = NULL; + dtp->u.p.namelist_mode = 1; /* Set the delimiter for namelist output. */ switch (dtp->u.p.current_unit->delim_status) {
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 --- Comment #4 from Jerry DeLisle --- Alternatively one could do this: @@ -1809,9 +1809,11 @@ write_complex (st_parameter_dt *dtp, const char *source, int kind, size_t size) precision, buf_size, result1, _len1); get_float_string (dtp, , source + size / 2 , kind, 0, buffer, precision, buf_size, result2, _len2); - lblanks = width - res_len1 - res_len2 - 3; - - write_x (dtp, lblanks, lblanks); + if (!dtp->u.p.namelist_mode) +{ + lblanks = width - res_len1 - res_len2 - 3; + write_x (dtp, lblanks, lblanks); +} write_char (dtp, '('); write_float_string (dtp, result1, res_len1); write_char (dtp, semi_comma);
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 --- Comment #3 from Dominique d'Humieres --- The following patch does the trick: --- ../_clean/libgfortran/io/write.c2017-11-22 20:37:44.0 +0100 +++ libgfortran/io/write.c 2017-11-28 23:45:55.0 +0100 @@ -1552,7 +1552,7 @@ select_string (st_parameter_dt *dtp, con int kind) { char *result; - *size = size_from_kind (dtp, f, kind) + f->u.real.d; + *size = size_from_kind (dtp, f, kind) + f->u.real.d + 1; if (*size > BUF_STACK_SZ) result = xmalloc (*size); else @@ -1769,7 +1769,8 @@ write_real_g0 (st_parameter_dt *dtp, con static void -write_complex (st_parameter_dt *dtp, const char *source, int kind, size_t size) +write_complex (st_parameter_dt *dtp, const char *source, int kind, size_t size, + bool justify) { char semi_comma = dtp->u.p.current_unit->decimal_status == DECIMAL_POINT ? ',' : ';'; @@ -1809,9 +1810,12 @@ write_complex (st_parameter_dt *dtp, con precision, buf_size, result1, _len1); get_float_string (dtp, , source + size / 2 , kind, 0, buffer, precision, buf_size, result2, _len2); - lblanks = width - res_len1 - res_len2 - 3; + if (justify) +{ + lblanks = width - res_len1 - res_len2 - 3; - write_x (dtp, lblanks, lblanks); + write_x (dtp, lblanks, lblanks); +} write_char (dtp, '('); write_float_string (dtp, result1, res_len1); write_char (dtp, semi_comma); @@ -1889,7 +1893,7 @@ list_formatted_write_scalar (st_paramete write_real (dtp, p, kind); break; case BT_COMPLEX: - write_complex (dtp, p, kind, size); + write_complex (dtp, p, kind, size, true); break; case BT_CLASS: { @@ -2202,7 +2206,7 @@ nml_write_obj (st_parameter_dt *dtp, nam case BT_COMPLEX: dtp->u.p.no_leading_blank = 0; num++; - write_complex (dtp, p, len, obj_size); + write_complex (dtp, p, len, obj_size, false); break; case BT_DERIVED:
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 Jerry DeLisle changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jvdelisle at gcc dot gnu.org --- Comment #2 from Jerry DeLisle --- My bad. I will look into it.
[Bug fortran/83191] [7/8 Regression] Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 Dominique d'Humieres changed: What|Removed |Added Priority|P3 |P4 Status|UNCONFIRMED |NEW Known to work||6.4.0 Keywords||wrong-code Last reconfirmed||2017-11-28 CC||jvdelisle at gcc dot gnu.org Ever confirmed|0 |1 Summary|Writing a namelist with |[7/8 Regression] Writing a |repeated complex numbers|namelist with repeated ||complex numbers Target Milestone|--- |7.3 Known to fail||7.2.0, 8.0 --- Comment #1 from Dominique d'Humieres --- Likely caused by r237735 (pr48852). The test in pr48852 comment 0 prints now (1.,0.) It should probably be (1.,0.) If I read the code correctly, it is caused by the lines lblanks = width - res_len1 - res_len2 - 3; write_x (dtp, lblanks, lblanks); needed to have right justified outputs (case C in pr48852 comment 12).
[Bug fortran/83191] New: Writing a namelist with repeated complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83191 Bug ID: 83191 Summary: Writing a namelist with repeated complex numbers Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: ccyang at unlv dot edu Target Milestone: --- This is a program that generates the bug on my Mac OS 10.12.6 and 10.13.1 with the latest MacPorts port gcc7: program test implicit none integer, parameter :: UNIT = 1 character(len=8), parameter :: FILE = "namelist" complex, dimension(3) :: a = (/ (0.0, 0.0), (0.0, 0.0), (0.0, 0.0) /) namelist /complex_namelist/ a open(UNIT, file=FILE) write(UNIT, nml=complex_namelist) close(UNIT) open(UNIT, file=FILE) read(UNIT, nml=complex_namelist) close(UNIT) end program test It compiles without any warning, but when run, it fails at reading the newly created namelist: $ gfortran test.f90 -o test -Wall -Wextra $ ./test At line 17 of file test.f90 (unit = 1, file = 'namelist') Fortran runtime error: Cannot match namelist object name (0.0.) Error termination. Backtrace: #0 0x10320d0dc #1 0x10320d99c #2 0x10320dfff #3 0x10329b03a #4 0x1032a0b78 #5 0x1032a0d9f #6 0x103203df8 #7 0x103203e6c $ The problem is in the content of the namelist file: $ cat namelist _NAMELIST A= 3*(0.,0.), / $ where the repeated count (3*) has a gap of blanks to the complex number that needs to be repeated. If I have a namelist file without that gap, it can be read in by a program correctly. In summary, the reading and writing of a namelist file with a repeated count is not mutually valid.
[Bug middle-end/65796] unnecessary stack spills during complex numbers function calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65796 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-04-20 Ever confirmed|0 |1 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- There is some older bug where I noticed the same issue. It's basically an artifact of the x86 calling conventions and GCC going through generic argument setup code (and RTL optimizers not being able to optimize spill + load into unpcklps). That said, somebody needs to find the duplicate bugreport. PR48607 is also related.
[Bug middle-end/65796] New: unnecessary stack spills during complex numbers function calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65796 Bug ID: 65796 Summary: unnecessary stack spills during complex numbers function calls Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: jtaylor.debian at googlemail dot com following function calling cabsf exhibits poor performance when compiled with gcc: #include complex using namespace std; void __attribute__((noinline)) v(int nCor, complexfloat * inp, complexfloat * out) { for (int icorr = 0; icorr nCor; icorr++) { float amp = abs(inp[icorr]); if (amp 0.f) { out[icorr] = amp * inp[icorr]; } else { out[icorr] = 0.; } } with gcc 4.9 and 5 (20150208) on x86_64 produces: g++- test.cc -O2 -c -S .L15: movss4(%rsp), %xmm2 addq$8, %rbx addq$8, %rbp movss(%rsp), %xmm1 mulss%xmm0, %xmm2 mulss%xmm0, %xmm1 movss%xmm2, -8(%rbx) movss%xmm1, -4(%rbx) cmpq%r12, %rbx je.L14 .L7: movss0(%rbp), %xmm2 movss4(%rbp), %xmm1 movss%xmm2, 8(%rsp) movss%xmm1, 12(%rsp) movq8(%rsp), %xmm0 movss%xmm2, 4(%rsp) movss%xmm1, (%rsp) callcabsf pxor%xmm3, %xmm3 ucomiss%xmm3, %xmm0 ja.L15 note the spills of xmm[12] onto the stack and reloading it into xmm0 instead of spilling to the stack one could use unpcklps to prepare xmm0 with a simple benchmark on 5000 floats this would speed up the function by about 30% on an intel core2 and an i5 which is quite significant given the expensive cabs call that is also done in it.
[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #15 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Tue Jan 20 11:06:13 2015 New Revision: 219885 URL: https://gcc.gnu.org/viewcvs?rev=219885root=gccview=rev Log: 2015-01-20 Richard Biener rguent...@suse.de PR tree-optimization/64410 * g++.dg/vect/pr64410.cc: Require vect_double. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/vect/pr64410.cc
[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Rainer Orth ro at gcc dot gnu.org changed: What|Removed |Added CC||ro at gcc dot gnu.org --- Comment #13 from Rainer Orth ro at gcc dot gnu.org --- The new testcase FAILs on Solaris/SPARC (both 32 and 64-bit): FAIL: g++.dg/vect/pr64410.cc -std=c++11 scan-tree-dump vect vectorized 1 loops in function FAIL: g++.dg/vect/pr64410.cc -std=c++14 scan-tree-dump vect vectorized 1 loops in function FAIL: g++.dg/vect/pr64410.cc -std=c++98 scan-tree-dump vect vectorized 1 loops in function I'm attaching the .vect dump. Rainer
[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #14 from Rainer Orth ro at gcc dot gnu.org --- Created attachment 34496 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34496action=edit sparc-sun-solaris2.11 .vect dump
[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work||5.0 Resolution|--- |FIXED --- Comment #11 from Richard Biener rguenth at gcc dot gnu.org --- Fixed in GCC 5.
[Bug tree-optimization/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #12 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Fri Jan 9 11:14:55 2015 New Revision: 219380 URL: https://gcc.gnu.org/viewcvs?rev=219380root=gccview=rev Log: 2015-01-09 Richard Biener rguent...@suse.de PR tree-optimization/64410 * tree-ssa.c (non_rewritable_lvalue_p): Allow REALPART/IMAGPART_EXPR on the LHS. (execute_update_addresses_taken): Deal with that. * tree-ssa-forwprop.c (pass_forwprop::execute): Use component-wise loads/stores for complex variables. * g++.dg/vect/pr64410.cc: New testcase. Added: trunk/gcc/testsuite/g++.dg/vect/pr64410.cc Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-forwprop.c trunk/gcc/tree-ssa.c
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #5 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Marc Glisse from comment #1) There are a number of things that make it complicated. 1) gcc doesn't like to vectorize when the number of iterations is not known at compile time. Not an issue, we know it here (it's symbolic) 2) gcc doesn't vectorize anything already involving complex or vector operations. Indeed - here the issue is that we have C++ 'complex' aggregate load / store operations: _67 = MEM[(const struct complex )_75]; __r$_M_value = _67; ... _51 = REALPART_EXPR __r$_M_value; REALPART_EXPR __r$_M_value = _104; ... IMAGPART_EXPR __r$_M_value = _107; _108 = __r$_M_value; MEM[(struct cx_double *)_72] = _108; which SRA for some reason didn't decompose as they are not aggregate (well, they are COMPLEX_TYPE). They are not in SSA form either because they are partly written to. In this case it would have been profitable to SRA __r$_M_value. Eventually this should have been complex lowerings job (but it doesn't try to decompose complex assignments). 3) the ABI for complex uses 2 separate double instead of a vector of 2 double. I think that's unrelated. I believe there are dups at least for 2).
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Status|NEW |ASSIGNED Blocks||53947 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Richard Biener from comment #5) (In reply to Marc Glisse from comment #1) There are a number of things that make it complicated. 1) gcc doesn't like to vectorize when the number of iterations is not known at compile time. Not an issue, we know it here (it's symbolic) 2) gcc doesn't vectorize anything already involving complex or vector operations. Indeed - here the issue is that we have C++ 'complex' aggregate load / store operations: _67 = MEM[(const struct complex )_75]; __r$_M_value = _67; ... _51 = REALPART_EXPR __r$_M_value; REALPART_EXPR __r$_M_value = _104; ... IMAGPART_EXPR __r$_M_value = _107; _108 = __r$_M_value; MEM[(struct cx_double *)_72] = _108; which SRA for some reason didn't decompose as they are not aggregate (well, they are COMPLEX_TYPE). They are not in SSA form either because they are partly written to. And this forces it to be TREE_ADDRESSABLE. Which means update-address-taken might be a better candidate to fix this. Note that it will still run into the issue that the vectorizer does not like complex types (in loads), nor does it like building complex registers via COMPLEX_EXPR. After fixing update-address-taken we have __r$_M_value_70 = MEM[(const struct complex )_78]; _66 = MEM[(const double )_77]; _54 = REALPART_EXPR __r$_M_value_70; _105 = _54 + _66; _135 = IMAGPART_EXPR __r$_M_value_70; _106 = MEM[(const double )_77 + 8]; _107 = _106 + _135; __r$_M_value_180 = COMPLEX_EXPR _105, _107; MEM[(struct cx_double *)_76] = __r$_M_value_180; which we ideally would have converted to piecewise loading / storing, but the vectorizer may also be able to recover here with some twists. In this case it would have been profitable to SRA __r$_M_value. Eventually this should have been complex lowerings job (but it doesn't try to decompose complex assignments). 3) the ABI for complex uses 2 separate double instead of a vector of 2 double. I think that's unrelated. I believe there are dups at least for 2).
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Created attachment 34400 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34400action=edit update-address-taken fix
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #8 from Marc Glisse glisse at gcc dot gnu.org --- (In reply to Richard Biener from comment #5) (In reply to Marc Glisse from comment #1) There are a number of things that make it complicated. 1) gcc doesn't like to vectorize when the number of iterations is not known at compile time. Not an issue, we know it here (it's symbolic) IIRC I tried modifying the original code by replacing all complex operations by explicit scalar operations and it failed to vectorize, but worked when replacing the number of iterations by a constant. 3) the ABI for complex uses 2 separate double instead of a vector of 2 double. I think that's unrelated. Indeed, it's just that with a different ABI we could have been lucky and stumbled upon the optimal code, almost by accident.
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #10 from Richard Biener rguenth at gcc dot gnu.org --- Improves runtime from 8.3s to 6.5s (~25%).
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #9 from Richard Biener rguenth at gcc dot gnu.org --- Created attachment 34402 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34402action=edit patch to pattern-detect the load/store This pattern matches real/imagpart uses and single-use complex stores and transforms them to component-wise accesses in forwprop. Together we vectorize the loop now and produce: .L28: movupd (%rbx,%rax), %xmm1 movupd (%r15,%rax), %xmm0 addpd %xmm1, %xmm0 movups %xmm0, 0(%r13,%rax) addq$16, %rax cmpq%rax, %rdx jne .L28 note that we need a runtime alias check to disambiguate things (because std::vector memory cannot be disambiguated statically) and similarly we cannot prove sufficent alignment to use aligned loads/stores.
[Bug target/47540] ARM THUMB crash with complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47540 Joel Sherrill joel at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #10 from Joel Sherrill joel at gcc dot gnu.org --- Based on Sebastian's last comment, I am marking this as resolved/fixed.
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 David Edelsohn dje at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-12-28 Ever confirmed|0 |1 --- Comment #4 from David Edelsohn dje at gcc dot gnu.org --- Confirmed.
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #1 from Marc Glisse glisse at gcc dot gnu.org --- There are a number of things that make it complicated. 1) gcc doesn't like to vectorize when the number of iterations is not known at compile time. 2) gcc doesn't vectorize anything already involving complex or vector operations. 3) the ABI for complex uses 2 separate double instead of a vector of 2 double. I believe there are dups at least for 2).
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #2 from Conrad conradsand.arma at gmail dot com --- (In reply to Marc Glisse from comment #1) 3) the ABI for complex uses 2 separate double instead of a vector of 2 double. Technically yes, but in practice aren't the 2 separate doubles guaranteed to be consecutive in memory?
[Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 --- Comment #3 from Marc Glisse glisse at gcc dot gnu.org --- (In reply to Conrad from comment #2) (In reply to Marc Glisse from comment #1) 3) the ABI for complex uses 2 separate double instead of a vector of 2 double. Technically yes, but in practice aren't the 2 separate doubles guaranteed to be consecutive in memory? When the complex is in memory, yes. But passing a complex by value to a function is done with 2 separate registers. And somehow that means the default expansion for complex addition is 2 addsd, whereas the default expansion for vector addition is addpd. Using addpd by default for complex would make some code better (this example would hopefully be optimal without need for any optimization) and some worse, I don't know if there are good benchmarks for complex numbers. Clang's use of add[ps]d seems based entirely on what is done with the result, as can be seen on: typedef _Complex double cd; void f(cdr,cd x,cd y){ r=x+y; } cd f(cdx,cdy,cdz){ return x+y+z; } (I agree that gcc should be improved, I am not trying to defend the current code generation. And now I'll shut up and let people who actually know the code speak ;-)
[Bug c++/64410] New: gcc 25% slower than clang 3.5 for adding complex numbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Bug ID: 64410 Summary: gcc 25% slower than clang 3.5 for adding complex numbers Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: conradsand.arma at gmail dot com Created attachment 34336 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34336action=edit cxaddspeed.cpp gcc 4.9.2 has worse performance than clang 3.5 when dealing with complex numbers. Attached is a simple program which adds two vectors with complex numbers. Compiled with -O3 on x86-64 (i7), Fedora 21, gcc 4.9.2 and clang 3.5.0. $ time ./cxaddspeed_gcc 5000 100 5.364u 0.002s 0:05.36 100.0% $ time ./cxaddspeed_clang 5000 100 4.417u 0.001s 0:04.41 100.0% ie. gcc is about 25% slower. inner loop produced by gcc: .L52: movsd(%r15,%rax), %xmm1 movsd8(%r15,%rax), %xmm0 addsd0(%rbp,%rax), %xmm1 addsd8(%rbp,%rax), %xmm0 movsd%xmm1, (%rbx,%rax) movsd%xmm0, 8(%rbx,%rax) addq$16, %rax cmpq%rsi, %rax jne.L52 inner loop produced by clang: .LBB0_145: movupd-16(%rbx), %xmm0 movupd-16(%rax), %xmm1 addpd%xmm0, %xmm1 movupd%xmm1, -16(%rdi) movupd(%rbx), %xmm0 movupd(%rax), %xmm1 addpd%xmm0, %xmm1 movupd%xmm1, (%rdi) addq$2, %rbp addq$32, %rbx addq$32, %rax addq$32, %rdi addl$-2, %ecx jne.LBB0_145
[Bug target/47540] ARM THUMB crash with complex numbers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47540 Sebastian Huber sebastian.hu...@embedded-brains.de changed: What|Removed |Added CC||sebastian.huber@embedded-br ||ains.de --- Comment #9 from Sebastian Huber sebastian.hu...@embedded-brains.de --- I cannot reproduce this problem with GCC 4.6.4 and 4.8.1.