[Bug libstdc++/69191] Wrong equality comparison between error_code and error_condition + segfault

2018-01-01 Thread chip at pobox dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69191

--- Comment #16 from Chip Salzenberg  ---
Still happening in 7.2

[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2015-04-02 Thread chip at pobox dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726

--- Comment #11 from Chip Salzenberg chip at pobox dot com ---
Indeed, 16 is required by the ABI; see
http://www.x86-64.org/documentation/abi.pdf page 12.  Only the SIMD __m256 is
bigger than 16, and there seems no end to Intel's extensions to SIMD registers,
so holding at 16 seems like the Right Thing.


[Bug middle-end/28831] [4.8/4.9/4.10 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter

2014-06-12 Thread chip at pobox dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

--- Comment #24 from Chip Salzenberg chip at pobox dot com ---
In 4.8.2 (Ubuntu trusty), the copy is finally elided.  Good job!  But stack
space is still allocated for the copy that is not made.  So it's not all fixed.


[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2014-06-12 Thread chip at pobox dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726

--- Comment #8 from Chip Salzenberg chip at pobox dot com ---
Further research says that the alignment of a malloc(N) will be = N if there
is a basic type that requires alignment N.  So we may be able to ramp this up
quite a bit.


[Bug rtl-optimization/54585] stack space allocated but never used when calling functions that return structs in registers

2013-09-23 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54585

--- Comment #3 from Chip Salzenberg chip at pobox dot com ---
It's worth it for cache reasons I believe.  The data cache works better you
don't spread out the stack data unnecessarily.  More concretely, if the stack
frame can entirely disappear then you also reduce the instruction count. 
That's fewer instructions to dispatch and less icache pressure.


[Bug middle-end/28831] [4.7/4.8/4.9 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter

2013-09-05 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

--- Comment #22 from Chip Salzenberg chip at pobox dot com ---
Anyone?  Bueller?


[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2013-09-05 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726

--- Comment #7 from Chip Salzenberg chip at pobox dot com ---
Should this ticket have status CONFIRMED ? Also I suspect it's been fixed in
trunk...


[Bug rtl-optimization/54585] stack space allocated but never used when calling functions that return structs in registers

2013-09-05 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54585

--- Comment #1 from Chip Salzenberg chip at pobox dot com ---
I'd like to suggest this ticket be at least CONFIRMED what with the code
samples in the ticket.

What will it take to fix this?


[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2013-03-29 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726



--- Comment #6 from Chip Salzenberg chip at pobox dot com 2013-03-29 06:05:19 
UTC ---

May I have this accepted?


[Bug target/56726] New: i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2013-03-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726



 Bug #: 56726

   Summary: i386: MALLOC_ABI_ALIGNMENT is too small (usually)

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: c...@pobox.com





Observed malloc alignment for the i386 ABI is double POINTER_SIZE. 

BITS_PER_WORD, the current default, is usually too small.  (It's right only on

X32.)



Proposed patch:



--- gcc/config/i386/i386.h  (revision 197055)

+++ gcc/config/i386/i386.h  (working copy)

@@ -815,6 +815,14 @@

x86_field_alignment (FIELD, COMPUTED)

 #endif



+/* The maximum alignment 'malloc' honors.

+

+   This value is taken from glibc documentation for memalign().  It may

+   be up to double the very conservative GCC default.  This should be safe,

+   since even the GCC 4.8 default of BIGGEST_ALIGNMENT usually worked.  */

+

+#define MALLOC_ABI_ALIGNMENT (POINTER_SIZE * 2)

+

 /* If defined, a C expression to compute the alignment given to a

constant that is being placed in memory.  EXP is the constant

and ALIGN is the alignment that the object would ordinarily have


[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-03-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



--- Comment #12 from Chip Salzenberg chip at pobox dot com 2013-03-25 
19:15:10 UTC ---

Thank you.  I've filed #56726 with a patch to update MALLOC_ABI_ALIGNMENT on

i386.


[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2013-03-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726



--- Comment #2 from Chip Salzenberg chip at pobox dot com 2013-03-25 21:35:19 
UTC ---

I'm a bit skeptical of that.  Glibc malloc alignment is 2 * sizeof(void*), and

void* in X32 is 32 bits.  Unless X32 code uses the x86_64 libc, I am confused.



PS: Hi, HJ


[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)

2013-03-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726



--- Comment #4 from Chip Salzenberg chip at pobox dot com 2013-03-25 22:35:57 
UTC ---

If I'm reading that correctly, it seems to agree with my patch.



It looks like MALLOC_ABI_ALIGNMENT of POINTER_SIZE*2 is always either correct

or smaller than necessary, but never too large.  If MALLOC_ABI_ALIGNMENT is

smaller than necessary then optimizations may be missed (depending on the

values).  But if it is too large then performance *will* suffer.  It might even

cause exceptions from unaligned accesses, but i386 is very forgiving, so it'll

just be slower for no apparent reason.



Perhaps the glibc version differences in malloc should be advertised with

__attribute__ on the malloc declarations.  Perhaps a new pragma or attribute is

required to do this 100% right.  But in the meantime I like POINTER_SIZE*2.


[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-03-22 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



--- Comment #10 from Chip Salzenberg chip at pobox dot com 2013-03-22 
20:20:11 UTC ---

Thanks muchly.  Then MALLOC_ABI_ALIGNMENT will need fixing, as Jakub observes,

but that needed to happen anyway.


[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-03-20 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



--- Comment #7 from Chip Salzenberg chip at pobox dot com 2013-03-21 00:31:40 
UTC ---

So ... is there still a question of the Right Thing here?  It seems that fixing

MALLOC_ABI_ALIGNMENT for the world, and ensuring that BIGGEST_ALIGNMENT never

affects the ABI, are the actions to take.  If this were done soon we could even

see it fixed for 4.8.0.  Help?


[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-02-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



--- Comment #2 from Chip Salzenberg chip at pobox dot com 2013-02-25 17:51:36 
UTC ---

I detected this by observing inlined strlen() on a malloc'd pointer did not

first do an unaligned prologue.  I expected it to first advance by bytes until

it detected alignment, but it didn't do any of that; it leapt right into the

word-sized optimized loop.



This suggests that the compiler knows than an 8-byte-aligned (say) pointer has

its low seven bits off and will evaporate away any code that depends on them

being nonzero.  Or is the strlen inlining special-cased?


[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-02-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



--- Comment #3 from Chip Salzenberg chip at pobox dot com 2013-02-25 17:54:23 
UTC ---

I meant the low three bits off, for a maximum value of seven.  Of course.


[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-02-25 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



--- Comment #5 from Chip Salzenberg chip at pobox dot com 2013-02-25 20:02:24 
UTC ---

Indeed.  So MALLOC_ABI_ALIGNMENT should perhaps default to the largest

alignment of all the C89 types, with platform overrides as needed?


[Bug c/56434] New: document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT

2013-02-22 Thread chip at pobox dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434



 Bug #: 56434

   Summary: document that __attribute__((__malloc__)) assumes

returned pointer has BIGGEST_ALIGNMENT

Classification: Unclassified

   Product: gcc

   Version: 4.7.2

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: c...@pobox.com





The docs say that __attribute__((__malloc__)) only has one effect: informing

the compiler that returned pointers do not alias other pointers.  But reading

the compiler output, and then reading gcc source code, proves that it also has

a second effect: informing the compiler that returned pointers are aligned to

BIGGEST_ALIGNMENT.  To quote expand_call:



  /* The return value from a malloc-like function is a pointer.  */

  if (TREE_CODE (rettype) == POINTER_TYPE)

mark_reg_pointer (temp, BIGGEST_ALIGNMENT);



This should be added to the documentation.



As a side issue, BIGGEST_ALIGNMENT changes on the i386 target depending on

whether -mavx is specified (128 vs. 256).  Is it really a good idea for gcc to

assume different things about the behavior of malloc() depending on -mavx?  It

seems that perhaps an alignment of 128 should always be conferred on malloc on

the i386 platform, regardless of -mavx?



What would the new target macro be?  SMALLEST_BIGGEST_ALIGNMENT?  :)


[Bug rtl-optimization/44194] struct returned by value generates useless stores

2012-09-14 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194

--- Comment #48 from Chip Salzenberg chip at pobox dot com 2012-09-14 
17:23:08 UTC ---
May Shub-Internet not see you as you pass.


[Bug rtl-optimization/54585] New: stack space allocated but never used when calling functions that return structs in registers

2012-09-14 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54585

 Bug #: 54585
   Summary: stack space allocated but never used when calling
functions that return structs in registers
Classification: Unclassified
   Product: gcc
   Version: 4.7.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: c...@pobox.com


Now that bug #44194 is fixed, and a returned structure used as a parameter is
no longer stored unnecessarily, a new bug is visible: a stack frame is being
allocated that is entirely unused.  On x86_64 target with the fix for 44194
backported to the 4.7 branch, this code:


#include stdint.h

struct blargh { uint32_t a, b, c; } foo();
void bar(uint32_t a, uint32_t b, uint32_t c);

void func() {
  struct blargh s = foo();
  bar(s.a, s.b, s.c);
}


no longer uses any stack memory at all, but still the function call reserves 24
bytes with subq $24,%rsp and promptly returns it with addq $24,%rsp.   The
generated code looks like this:

 func:
.cfi_startproc
xorl%eax, %eax
subq$24, %rsp
.cfi_def_cfa_offset 32
callfoo
movq%rax, %rsi
movl%eax, %edi
addq$24, %rsp
.cfi_def_cfa_offset 8
shrq$32, %rsi
jmp bar
.cfi_endproc


[Bug rtl-optimization/44194] struct returned by value generates useless stores

2012-09-12 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194

--- Comment #44 from Chip Salzenberg chip at pobox dot com 2012-09-12 
23:21:21 UTC ---
Note that the x86 target has been changed in svn to use TImode for 128-bit
structures, and structures bigger than 128 bits may not be passed in registers,
so triggering this bug may be quite different now.


[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode

2012-08-15 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020

--- Comment #39 from Chip Salzenberg chip at pobox dot com 2012-08-15 
09:13:36 UTC ---
avoiding BLKmode avoids unnecessary spills to memory.  See Bug 28831 and Bug
41194 for examples.


[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter

2012-08-15 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

--- Comment #18 from Chip Salzenberg chip at pobox dot com 2012-08-15 
18:00:39 UTC ---
What will it take to get this fixed?  Pass by value is Big in C++11 style, with
move semantics designed to tie right into the optimization that's being missed
here.

This is sucking a lot for C++.


[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode

2012-08-14 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020

--- Comment #31 from Chip Salzenberg chip at pobox dot com 2012-08-14 
22:46:12 UTC ---
I've tested the attached patch, and I find that it succeeds in preventing the
current missed optimizations in structs passed by value from affecting 128-bit
structs.

IOW: Works for me.  Thanks!


[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter

2012-08-14 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

--- Comment #17 from Chip Salzenberg chip at pobox dot com 2012-08-14 
22:50:01 UTC ---
The patch posted in Bug 20020 prevents missed optimization for 128-bit
structures on x86_64.  So this bug does seem to be all about the BLKmode.


[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode

2012-08-14 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020

--- Comment #32 from Chip Salzenberg chip at pobox dot com 2012-08-14 
23:09:01 UTC ---
More good data: this patch reduces the size of libstdc++.so by .5%

$ size usr/lib/libstdc++.so.6.0.17 /usr/lib/libstdc++.so.6.0.17
   textdata bss dec hex filename
 949608   36200   85088 1070896  105730 usr/lib/libstdc++.so.6.0.17
 955484   36200   85088 1076772  106e24 /usr/lib/libstdc++.so.6.0.17


[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode

2012-08-06 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020

Chip Salzenberg chip at pobox dot com changed:

   What|Removed |Added

 CC||chip at pobox dot com

--- Comment #13 from Chip Salzenberg chip at pobox dot com 2012-08-06 
22:52:41 UTC ---
Is this bug obsolete now?


[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter

2012-08-05 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

Chip Salzenberg chip at pobox dot com changed:

   What|Removed |Added

 CC||chip at pobox dot com

--- Comment #15 from Chip Salzenberg chip at pobox dot com 2012-08-06 
00:37:36 UTC ---
Ping.  I've just run into this with the tip of the gcc 4.7.1 branch.  Is there
a workaround?  Some way to label the struct as not needing to be stored? 
Something like __attribute__((noaddress));

We want to pass and return structs by value as current C++ style recommends,
but the extra register spills are dragging down performance.  For small key
classes we've switched to using big integers with masking functions, but for
larger ones there is no workaround that we know of.

Given this code:

extern val_t foo();
extern int bar(val_t);
int main() {
return bar(foo());
}

When val_t is a struct of two int64_t on x86_64, the code has two extra stores:
   movq%rax, (%rsp)
   movq%rdx, 8(%rsp)
and the stack frame is larger and there is no tail call optimization.

When val_t is __int128 on x86_64, the code is optimal: tail call, no extra
stores, smaller stack frame (because there is no need to store the value).


[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter

2012-08-05 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831

--- Comment #16 from Chip Salzenberg chip at pobox dot com 2012-08-06 
00:57:13 UTC ---
Addendum: In cut down test cases where I only pass by value or only return by
value, but not both, I find no extra stores, which is good; but I still find a
lot of unnecessary frame allocation (either $24 or $40, depending), and tail
call is still missing.


[Bug rtl-optimization/44194] struct returned by value generates useless stores

2012-08-05 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194

Chip Salzenberg chip at pobox dot com changed:

   What|Removed |Added

 CC||chip at pobox dot com

--- Comment #42 from Chip Salzenberg chip at pobox dot com 2012-08-06 
01:22:43 UTC ---
Is bug #28831 a dup of this?


[Bug libstdc++/54075] [4.7.1] unordered_map 3x slower than 4.6.2

2012-07-26 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075

--- Comment #16 from Chip Salzenberg chip at pobox dot com 2012-07-26 
22:50:17 UTC ---
In my tests, with this patch, 4.7.1 is about 10% slower than 4.6 ... a vast
improvement but certainly not parity.

./bench46  1.75s user 0.82s system 99% cpu 2.577 total
./bench47  8.01s user 2.78s system 99% cpu 10.800 total
./bench47+patch  1.95s user 0.80s system 99% cpu 2.764 total


[Bug libstdc++/54075] [4.7.1] unordered_map insert 3x slower than 4.6.2

2012-07-26 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075

--- Comment #18 from Chip Salzenberg chip at pobox dot com 2012-07-26 
23:38:34 UTC ---
I couldn't say.  I don't understand the issue, I'm just reporting results and
deploying packages for my fellow devs.


[Bug libstdc++/54075] [4.7.1] unordered_map insert 3x slower than 4.6.2

2012-07-26 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075

--- Comment #20 from Chip Salzenberg chip at pobox dot com 2012-07-27 
01:00:14 UTC ---
Are you talking to me?  'cause I was providing results for the patch already
committed to svn, using the code in this very bug description.


[Bug libstdc++/54025] New: atomicchrono::duration won't compile: chrono::duration::duration() is not C++11 compliant

2012-07-18 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54025

 Bug #: 54025
   Summary: atomicchrono::duration won't compile:
chrono::duration::duration() is not C++11 compliant
Classification: Unclassified
   Product: gcc
   Version: 4.7.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: c...@pobox.com


Attempting to compile atomicduration fails, because the duration default
constructor is not  = default as required by the standard, but instead
explicitly initializes its representation.  Here is what libstdc++ says:

constexpr duration() : __r() { }

but here is what the standard says should be there, and if I make the change,
compilation succeeds:

constexpr duration() = default;

Test source:

#include atomic
#include chrono
using namespace std;
using namespace chrono;
int main() {
atomicdurationlong, micro dur;
}

Error before patch:

/usr/include/c++/4.7/atomic: In instantiation of ‘struct
std::atomicstd::chrono::durationlong int, std::ratio1ll, 100ll  ’:
atdur.cc:6:35:   required from here
/usr/include/c++/4.7/atomic:160:7: error: function ‘std::atomic_Tp::atomic()
[with _Tp = std::chrono::durationlong int, std::ratio1ll, 100ll ]’
defaulted on its first declaration with an exception-specification that differs
from the implicit declaration ‘constexpr std::atomicstd::chrono::durationlong
int, std::ratio1ll, 100ll  ::atomic()’


[Bug libstdc++/54025] atomicchrono::duration won't compile: chrono::duration::duration() is not C++11 compliant

2012-07-18 Thread chip at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54025

--- Comment #1 from Chip Salzenberg chip at pobox dot com 2012-07-19 02:56:57 
UTC ---
Created attachment 27829
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27829
patch to duration default ctor