[Bug c++/99429] ICE for bool return from <=>

2021-03-06 Thread msharov at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99429

Mike Sharov  changed:

   What|Removed |Added

  Attachment #50315|0   |1
is obsolete||

--- Comment #1 from Mike Sharov  ---
Created attachment 50316
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50316=edit
incorrect duration

[Bug c++/99429] New: ICE for bool return from <=>

2021-03-06 Thread msharov at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99429

Bug ID: 99429
   Summary: ICE for bool return from <=>
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

Created attachment 50315
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50315=edit
std::strong_ordering and incorrect duration code

When erronously declaring <=> to return bool, g++ crashes:

> g++ -c -std=c++20 chrono.cc
chrono.cc: In instantiation of ‘class duration<1>’:
chrono.cc:44:42:   required from here
chrono.cc:38:20: internal compiler error: Segmentation fault
   38 | constexpr bool operator<=> (const duration& d) const = default;
  |^~~~

[Bug c/98404] New: Compiler emits unexpected function call that may cause security problems

2020-12-20 Thread msharov at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98404

Bug ID: 98404
   Summary: Compiler emits unexpected function call that may cause
security problems
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

int rotate_argv (const char** argv, int first, int mid, int end)
{
const char** p = argv+first;
int n1 = mid-first;
int n2 = end-mid;
int nm = n1+n2-1;
for (int j = 0; j < n2; ++j) {
const char* v = p[nm];
for (int i = 0; i < nm; ++i)
p[nm-i] = p[nm-i-1];
p[0] = v;
}
return n1;
}

This bit of code unexpectedly emits a call to memmove to replace the inner copy
loop. Such behavior is highly inappropriate, breaking the
"what-you-see-is-what-you-get" spirit of C. Sure, the loop is equivalent to a
memmove call, but if I wanted to call memmove, I would have called memmove.
Doing it behind my back brings in code paths that may cause problems impossible
to understand by looking at the code. Worse yet, the compiler only does this in
the optimized build (-Os, -O2, and -O3, but not -O1 or -O0), making debugging
of the resulting problem a beat-your-head-on-the-desk frustrating exercise.

The bug in my code was causing memory corruption in argv to happen in that
inner loop, but looking at the code above will not reveal the problem, no
matter how much you scream at the debugger. The bug was in my memmove
implementation returning the wrong value, which the compiler then helpfully
reloaded into p. Naturally, it's a good thing that I fixed the bug; having
never used the return value of memmove myself I doubt I would have discovered
it anytime soon. But this illustrates how a malicious exploit could be
introduced into that loop without anybody being able to figure it out. Let's
remember that we still have that LD_PRELOAD abomination.

On a more mundane note, replacing the loop with memmove causes the compiled
code to grow from 107 bytes to 166. This is using the -Os, switch, of course. I
have complained many times about how gcc doesn't care about size optimization
and doesn't inline stuff because it can't understand that inserting a function
call into code that currently has none has great costs of register saving and
all that. I have by now resigned to having to #define inline inline
__attribute__((always_inline)) everywhere, but will you perhaps someday
reconsider your position that size optimization does not matter? If 55% code
bloat in this example doesn't convince you, what will?

Finally, calling memmove will make the code slower, not faster, due to its much
higher startup overhead that is justifiable for copying large blocks, but not
for copying one or two elements, which is what the code above is made for. The
conceit of the compiler, in thinking it knows better, thus results in worse
outcome all around; in size, speed, and security.

[Bug tree-optimization/93896] New: Store merging uses SSE only for trivial types

2020-02-23 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93896

Bug ID: 93896
   Summary: Store merging uses SSE only for trivial types
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

struct M {
constexpr M() :p{},sz{},cz{}{}
public:
char* p;
unsigned sz;
unsigned cap;
};

struct A { M a,b,c; A(); };
A::A() :a{},b{},c{}{}

gcc 9.2.1 with -march=native -Os on Haswell generates:

_ZN1AC2Ev:
movq$0, (%rdi)
movq$0, 8(%rdi)
movq$0, 16(%rdi)
movq$0, 24(%rdi)
movq$0, 32(%rdi)
movq$0, 40(%rdi)
ret

Store merging is obviously working here, but does not use SSE movups. If the
constructor is removed or defaulted the output is:

_ZN1AC2Ev:
vpxor   %xmm0, %xmm0, %xmm0
vmovups %xmm0, (%rdi)
vmovups %xmm0, 16(%rdi)
vmovups %xmm0, 32(%rdi)
ret

Whether the type is trivial should not matter by the time store merging occurs,
but for some reason it does.

[Bug rtl-optimization/91482] New: __builtin_assume_aligned should not break write combining

2019-08-18 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91482

Bug ID: 91482
   Summary: __builtin_assume_aligned should not break write
combining
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

void write64 (void* p)
{
unsigned* p1 = (unsigned*) __builtin_assume_aligned (p, 8);
*p1++ = 0;
unsigned* p2 = (unsigned*) __builtin_assume_aligned (p1, 4);
*p2++ = 1;
}

When the two stores are written without __builtin_assume_aligned, they are
coalesced into a single movq store. The code above, however, results in two
movl stores, even though the new information provided by
__builtin_assume_aligned does not prevent combination.

[Bug c++/85875] New: -Weffc++ can't understand auto return values

2018-05-22 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85875

Bug ID: 85875
   Summary: -Weffc++ can't understand auto return values
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

struct C {
struct const_iterator {
auto& operator++() { return *this; }
};
};

Compiling with -Weffc++ gives warning:

t.cc:3:24: warning: prefix ‘auto& C::const_iterator::operator++()’
 should return ‘C::const_iterator&’ [-Weffc++]  

even though auto& evaluates to C::const_iterator&

[Bug tree-optimization/85697] At -Os nontrivial ctor does not use SSE to zero

2018-05-22 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85697

--- Comment #2 from Mike Sharov  ---
I previously filed bug #49127 about the non-SSE version of the same xor/mov
optimization. Perhaps both could be addressed in the same manner with a more
general capability of zeroing with a register when doing so is shorter.

[Bug c++/85858] -Weffc++ should not require copy ctor for const pointers

2018-05-22 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85858

--- Comment #12 from Mike Sharov  ---
(In reply to Jonathan Wakely from comment #10)
> It's simply not how C++ works.

Quite right. I already agreed with you here; we are arguing about whether it
/should/ work this way :)

> An object's lifetime is distinct from it's constness, and a pointer-to-const 
> doesn't imply anything about whether the pointed-to object is immutable.

Exactly! I can restate my gripe in these terms: C++ has no way of explicitly
marking the owner of the object or its lifetime. When f() creates object const
A a and passes it as const A* to g(), both f() and g() see the same const A
object, but f() is the owner and should be allowed to delete it, while g() has
only been granted read-only access and should not. If delete required a
non-const pointer, then f() would either keep a non-const pointer to indicate
that it owns a, or have to explicitly const_cast it to delete.

> You seem to be saying that a pointer-to-const implies
> an immortal object that will never be destroyed.

Not at all. Object lifetime is a separate subject, but const correctness should
help enforce it by restricting who gets to set it. Ideally, the object will
have exactly one owner (insert rant on the evils of shared_ptr), and that owner
will determine the lifetime of the object. If const prevented delete, the
compiler could help you catch violations of the one-owner rule that may
compromise defined object lifetime and cause undefined behavior in functions
that hold pointers to that object.

A function can only assume that the pointer it was given remains valid if the
object lifetime is explicitly known, and there is no explicit C++ way of making
it known. We can only define the lifetime in documentation. For example:

> Why should that be true for pointers to the heap
> but not pointers to the stack?

Because the stack frees all owned objects when the scope is exited and the heap
does not. The stack will call destructors to cleanup the objects, the heap will
not. Consequently the stack can be said to be the owner of local objects, but
the heap owns nothing because it destroys nothing.

[Bug c++/85858] -Weffc++ should not require copy ctor for const pointers

2018-05-21 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85858

--- Comment #8 from Mike Sharov  ---
(In reply to Jonathan Wakely from comment #7)
> Your mental model of C++ is simply not how the language works.

My mental model here is actually of const correctness, not C++ specifically.
When I pass around a const object I expect it to stay unmodified. Consider a
function that takes a const T* argument. The signature suggests that the
passed-in object will only be read and will not be modified. If that function
deletes the pointer, "bad things" will likely happen. Suddenly the object
contains garbage and you can't figure out how it happened. All called functions
took it as const, so they shouldn't have messed with it. You assume memory
corruption and onto valgrind we go.

> The const qualifier affects modification of the object, not its destruction.

This is precisely the part I have a problem with. It seems downright loony to
state that destruction is not modification when the object is most definitely
modified by it. It's like saying that if I break into your house and take
stuff, I am a criminal, but if I burn it down, it's all perfectly fine.

> void f() { const int i = 0; }
> 
> Do you think this stack variable can't be destroyed, because it's const?

What it boils down to is this: const restricts access, and so prevents
ownership. If you can't destroy a thing, you don't really own it. The standard
appears to have taken the position that ownership beats const correctness. I
instead argue in favor of const correctness, and its guarantee of invariance.

The stack variable in the example illustrates the difference between access and
ownership. f() has read-only access to i, and therefore does not own it. Who
owns i? The stack does. The stack passes i to f() with limited access, and
then, when f() terminates, the stack destroys i. This way ownership is clearly
delineated. If f() were to say delete i (assuming i were a pointer), it should
be prevented from doing so.

>   using const_int = const int;
>   const int* p = new const_int();
> 
> Do you expect to never be able to delete this object,
> instead being forced to leak it?

Consequently, const objects created with new are owned by nobody, and simply do
not make sense. Somebody has to own the allocated object, so creating a const
object should be an invalid operation.

I suppose it doesn't really matter what my opinion is in this matter. Neither I
nor you write the standard, so I'll just leave this as a closing footnote in a
bug correctly resolved INVALID.

[Bug c++/85858] -Weffc++ should not require copy ctor for const pointers

2018-05-21 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85858

--- Comment #6 from Mike Sharov  ---
(In reply to Jonathan Wakely from comment #5)
> Nope, see the C++ standard:
> 
>   [ Note: A pointer to a const type can be the operand of a
> delete-expression;

Ok, I guess; you have to follow the standard, after all. But I would like to
see the rationale for this, because it sure looks like a violation of const
correctness. I certainly feel violated.

[Bug c++/85858] -Weffc++ should not require copy ctor for const pointers

2018-05-21 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85858

--- Comment #4 from Mike Sharov  ---
(In reply to Jonathan Wakely from comment #3)
> Nothing stops you deallocating a const pointer.

According to http://en.cppreference.com/w/cpp/memory/new/operator_delete
The delete operator takes a void* and attempting to delete a const pointer
would require a const_cast. This is logical, since freeing a memory block is a
modification operation that changes the block's contents by marking it invalid.

To my surprise, I found that g++ actually does currently accept delete of a
const pointer. I believe that should be a bug.

[Bug c++/85858] -Weffc++ should not require copy ctor for const pointers

2018-05-21 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85858

--- Comment #2 from Mike Sharov  ---
(In reply to Jonathan Wakely from comment #1)
> (In reply to Mike Sharov from comment #0)
> > When the pointer is const, it can not point to owned memory
> Why not?

Because a const pointer can not be freed. By "owned memory" I mean memory that
was explicitly allocated by the object, which I assume was the situation that
Effective C++ rule was referring to, or memory the ownership of which was
passed to the object. In both cases the object has to keep a non-const pointer
in order to be able to free it or to pass on the ability to free it to some
other object. I can't think of any case for an owned const pointer; can you?

[Bug c++/85858] New: -Weffc++ should not require copy ctor for const pointers

2018-05-21 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85858

Bug ID: 85858
   Summary: -Weffc++ should not require copy ctor for const
pointers
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

-Weffc++ warns about missing operator= and copy ctor in a class containing a
const pointer. The intent of the warning is to detect manually allocated memory
owned by the class, and to ensure copying operation was explicitly considered.
When the pointer is const, it can not point to owned memory and so should not
result in a warning.

[Bug c++/85856] New: -Weffc++ can't see implicitly deleted constructor

2018-05-21 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85856

Bug ID: 85856
   Summary: -Weffc++ can't see implicitly deleted constructor
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

#include 

struct A {
A (void) {}
virtual ~A (void) {}
A (const A&) = delete;
void operator= (const A&) = delete;
};

struct B : public A {
B (const char* p) :A(),_p(p) {}
const char* _p;
};

int main (void)
{
B b ("hello");
puts (b._p);
return 0;
}

When compiling with -Weffc++ enabled, generates:

t.cc:10:8: warning: ‘struct B’ has pointer data members [-Weffc++]
 struct B : public A {
^
t.cc:10:8: warning:   but does not override ‘B(const B&)’ [-Weffc++]
t.cc:10:8: warning:   or ‘operator=(const B&)’ [-Weffc++]

B already has an implicitly deleted copy constructor and operator= because A
implements them deleted. The compiler will correctly give a warning about it on
B b2=b, for example.

[Bug c/80354] Poor support to silence -Wformat-truncation=1

2018-05-09 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80354

--- Comment #9 from Mike Sharov  ---
(In reply to Martin Sebor from comment #8)
> A simple way to avoid the warning while also avoiding bugs resulting from
> unhandled truncation is to detect it and abort if it happens, e.g.

First of all, you might want to mention this in the error message. The way it
is presently worded gives the impression that the only way to remove the
warning is to increase the buffer size. I guarantee you that most people will
just turn off the warning in this case. And then come here to complain, because
the kind of warning that is wrong in most cases (if only in our opinion) should
not be in -Wall.

Secondly, this is precisely the annoying part about it: you are making the
decision that allowing truncation to happen is always a bug and forcing it to
be handled as one. I do not consider it a problem to pass a truncated filename
to open and having it fail there. There are, naturally, some cases where this
could cause a security problem, but I am the one who should determine whether
each particular snprintf is one of those cases, and consequently I should also
have the option to tell the compiler that it is not. If I was ok with bloating
my program due to an excessive concern with safety, I'd be using Java, not C.

[Bug c/80354] Poor support to silence -Wformat-truncation=1

2018-05-09 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80354

Mike Sharov  changed:

   What|Removed |Added

 CC||msharov at users dot 
sourceforge.n
   ||et

--- Comment #7 from Mike Sharov  ---
I really do have to add my complaint about this one. Can't we have another
override option here? Have the compiler parse "truncates" in a comment, for
example, like it does for fallthrough. Doing format precision is not a good
workaround because it hardcodes the size of the buffer into the format string,
creating a maintenance problem in case the buffer size is increased later. Not
to mention unnecessarily creating multiple format strings where previously a
single one could have been shared. Why make us all create unnecessarily larger
executables?

Worse, truncation is always going to be a false positive here. Nobody wants to
choose buffer size based on worst case output. Sometimes it is merely useless,
such as when writing diagnostic messages. 8k of text won't fit in a message box
anyway and will be truncated. Other times it is distinctly wrong. For example,
if building a path from multiple components in PATH_MAX sized buffers, the
result must not be larger than PATH_MAX anyway, and must be truncated. Another
example is when you are trying to get a prefix from a large string. snprintf is
a great way of doing that, but your warning may now lead people to rewrite the
code with strncpy and its insecure behavior, possibly forgetting that it always
requires explicitly terminating the buffer.

Sure, it is just another warning to fix. I've had to fix some new warning with
every gcc release. Not a single one of them was an actual problem with the
code. It's always just "the way we've got to do things from now on", having to
write each code construct in a particular way to avoid a warning. A 100% false
positive rate is annoying, isn't it? Yet, I keep all warnings on, for some
strange reason. Can't we all be friends and always have a polite way of saying
"I know what I am doing here"?

[Bug target/85697] New: At -Os nontrivial ctor does not use SSE to zero

2018-05-08 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85697

Bug ID: 85697
   Summary: At -Os nontrivial ctor does not use SSE to zero
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

struct alignas(16) A {
A (void) :a(0),b(0),c(0),d(0) {}
int a,b,c,d;
};
__attribute__((noinline)) void UseA (A& a) { a.a=1; }

int main (void)
{
A a {};
UseA (a);
return a.a;
}

-Os -march=native on Haswell, generates:

main:
subq$16, %rsp
movq%rsp, %rdi
movq$0, (%rsp)
movq$0, 8(%rsp)
call_Z4UseAR1A
movl(%rsp), %eax
addq$16, %rsp
ret

Using 16 bytes to zero A with 2 movq. With -O3:

main:
subq$24, %rsp
vpxor   %xmm0, %xmm0, %xmm0
movq%rsp, %rdi
vmovaps %xmm0, (%rsp)
call_Z4UseAR1A
movl(%rsp), %eax
addq$24, %rsp
ret

using only 9 bytes for pxor/movaps. With -mno-avx it is 7 bytes for
xorps/movaps. With multiple objects of type A, the savings would be even
greater, since only one pxor would be needed for all and only 4 bytes per
object for zeroing.

Removing A constructor also results in SSE instruction use.

[Bug c++/85695] New: if constexpr misevaluates typedefed type value

2018-05-08 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85695

Bug ID: 85695
   Summary: if constexpr misevaluates typedefed type value
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

template 
struct integral_constant {
using value_type = T;
static constexpr const value_type value = v;
constexpr operator value_type (void) const { return value; }
};
template  struct is_trivial
: public integral_constant<bool, __is_trivial(T)> {};

template 
T clone_object (const T& p)
{
if constexpr (is_trivial::value)
return p;
else
return p.clone();
}
int main (void) { return clone_object(0); }

This fails to compile: "error: request for member ‘clone’ in ‘p’". The strange
part is that changing the type of integral_constant::value to T makes it work,
as  does using is_trivial() in the conditional, invoking the cast operator.
For some reason, value_type is evaluated differently if it is a variable or
return value, and differently from T.

[Bug c++/85689] New: if constexpr compiles false branch

2018-05-07 Thread msharov at users dot sourceforge.net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85689

Bug ID: 85689
   Summary: if constexpr compiles false branch
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net
  Target Milestone: ---

int main (void)
{
if constexpr (false)
static_assert (false, "this should not be compiled");
return 0;
}

g++ 8.1 fails compiling the branch with the static_assert even though if
constexpr condition is false. May be the same as #85149, but still present in
g++ 8.1.0 on Arch.

[Bug target/59578] New: Overuse of v prefix for SSE instructions

2013-12-22 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59578

Bug ID: 59578
   Summary: Overuse of v prefix for SSE instructions
   Product: gcc
   Version: 4.8.2
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net

typedef float v16sf __attribute__((vector_size(16)));
v16sf f (v16sf x)
{ return (__builtin_ia32_shufps (x, x, 0xff)); }

Compiled on a Haswell 4770 with -march=native -O emits:

vshufps $255, %xmm0, %xmm0, %xmm0

Even though all registers are the same and

shufps $255, %xmm0, %xmm0

would have worked just as well without the extra byte for the v prefix.
This happens with other __builtin instructions as well. For example:

typedef long long v16so __attribute__((vector_size(16)));
v16so k (v16so x)
{ return (__builtin_ia32_aeskeygenassist128 (x, 1)); }

Emits vaeskeygenassist even though no memory accesses are present.


[Bug target/57288] cfi_restore should precede cfi_def_cfa_offset

2013-11-10 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57288

--- Comment #2 from Mike Sharov msharov at users dot sourceforge.net ---
(In reply to Andrew Pinski from comment #1)
 Can you attach the preprocessed source which is used to create this assembly
 file?

I'm afraid not. This call has been created by a gigantic collection of
templates, macros, and inline functions, so is too large to attach. Futhermore,
when compiled with the current gcc 4.8.2, the .cfi directives are entirely
different, with no .cfi_restore instructions emitted. If you really can't
figure out what the cause was, I'd have to wait until I see another function
showing the behavior.


[Bug rtl-optimization/23684] Combine stores for non strict alignment targets

2013-05-18 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684

--- Comment #12 from msharov at users dot sourceforge.net ---
I'd like to add that this is not some corner case; this is a very common issue.
In my own projects, the compiler's inability to combine stores is the single
largest reason for using inline assembly and raw casts. Pretty much every time
I have an object 8 or 16 bytes in size, I end up writing a zeroing ctor, copy
ctor, and operator= that use full-object memory access. That's cast to uint64_t
for 8 bytes, and movups/movaps for 16 bytes. It also shows up when writing raw
protocol data, such as X calls, where it is very common to write several
constants in succession. The last time I checked, forcing whole-object moves in
these cases results in projectwide code size reduction ~10%. Unfortunately, it
also causes a variety of aliasing pessimizations, so I also have to test
including or not including each of the above functions to get the smallest code
size. I would be a very big deal if the optimizer could do this.


[Bug rtl-optimization/57302] New: Should merge zeroing multiple consecutive memory locations

2013-05-16 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57302

Bug ID: 57302
   Summary: Should merge zeroing multiple consecutive memory
locations
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net

struct A { short a,b; A (void); };
A::A (void) : a(0),b(0) {}
void MoveA (const A* a, A* b) { *b = *a; }

Generates:

_ZN1AC2Ev:
movw$0, (%rdi)
movw$0, 2(%rdi)
ret
_Z5MoveAPK1APS_:
movl(%rdi), %eax
movl%eax, (%rsi)
ret

The optimizer can see that a and b are consecutive in memory and can merge the
memory movs into a single 4-byte mov, but does not do the same for the zeroing
code in the constructor. Merging the zeroing to movl, movq, and mov[au]ps (when
SSE is available), would produce smaller code and fewer memory accesses.


[Bug target/57288] New: cfi_restore should precede cfi_def_cfa_offset

2013-05-15 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57288

Bug ID: 57288
   Summary: cfi_restore should precede cfi_def_cfa_offset
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msharov at users dot sourceforge.net

Created attachment 30122
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30122action=edit
The emitted assembly exhibiting the ordering problem

This is on x86_64, compiled with -Os. In the attached assembly, line 89, .L55,
.cfi_restore is emitted for ebx and ebp after .cfi_def_cfa_offset 8 already
invalidated the location where they were stored. cfa_offset should be emitted
after cfi_restores, as it was in the other codepaths like .LEHE0-.L51


[Bug rtl-optimization/56598] New: Optimizer can't invert conditional when inlining a bool function

2013-03-11 Thread msharov at users dot sourceforge.net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56598



 Bug #: 56598

   Summary: Optimizer can't invert conditional when inlining a

bool function

Classification: Unclassified

   Product: gcc

   Version: 4.7.2

Status: UNCONFIRMED

  Severity: minor

  Priority: P3

 Component: rtl-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: msha...@users.sourceforge.net





Created attachment 29640

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29640

Simple test case and its asm



When a bool macro like #define X(i) (i==3) is replaced with a static inline

bool function, the optimizer sometimes generates tangled control flow. With

code blocks ABCD the jumps go A-C-B-D. If the macro is used, the compiler

can invert the conditional and emit ACBD with two fewer jmps. In the attached

test case, func1 has .L3-.L4 block that is jumped into, out of, and over.

Swapping the .L2-.L3 and .L3-.L4 blocks would produce the simpler control flow

in func2. The test case is compiled with -Os, on x86_64. -O2 and -O3 also

produce the same behavior, but require a larger test case to avoid path

unrolling. C++ compiler must be used. The same test case compiled as C produces

identical func1 and func2.


[Bug c++/56583] New: ICE with constexpr ctor and nested structs and unions

2013-03-09 Thread msharov at users dot sourceforge.net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56583



 Bug #: 56583

   Summary: ICE with constexpr ctor and nested structs and unions

Classification: Unclassified

   Product: gcc

   Version: 4.7.2

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: msha...@users.sourceforge.net





Created attachment 29632

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29632

The code causing the failure



An ICE occurs in a constexpr constructor where a member is initialized that is

in an anonymous member union containing an anonymous struct. See attached file.

Compiling with g++ -std=c++11 -c tes.cc yields:



tes.cc: In function 'int main()':

tes.cc:23:21:   in constexpr expansion of 'r.CRect::CRect(1, 2, 3, 4)'

tes.cc:23:21: internal compiler error: in base_field_constructor_elt, at

cp/semantics.c:7033

Please submit a full bug report,

with preprocessed source if appropriate.


[Bug libgcc/56277] New: libgcc.a and libgcc_eh.a should be compiled with function-level linking

2013-02-10 Thread msharov at users dot sourceforge.net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56277



 Bug #: 56277

   Summary: libgcc.a and libgcc_eh.a should be compiled with

function-level linking

Classification: Unclassified

   Product: gcc

   Version: 4.7.2

Status: UNCONFIRMED

  Severity: enhancement

  Priority: P3

 Component: libgcc

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: msha...@users.sourceforge.net





libgcc.a and libgcc_eh.a are not compiled with -ffunction-sections

-fdata-sections. libsupc++.a is compiled with those flags. Because a typical

program will not use much of libgcc, enabling function level linking should 

noticeably reduce size of statically linked executables.


[Bug libgcc/56277] libgcc.a and libgcc_eh.a should be compiled with function-level linking

2013-02-10 Thread msharov at users dot sourceforge.net


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56277



--- Comment #3 from msharov at users dot sourceforge.net 2013-02-11 00:33:03 
UTC ---

(In reply to comment #2)

 Do you think you have an example where adding -ffunction-sections can help for

 the compiled libgcc.a ?



No, sorry. Checking this would require manually recompiling libgcc with

-ffunction-sections and trying it out on my projects. Since I haven't compiled

gcc in quite a long time and have forgotten all the tricks for doing it, this

will be a lot of work. So, if you say you already do all you can to minimize

what is linked in from there, I'll take your word for it.


[Bug c++/53380] .ehframe could be smaller

2012-05-22 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53380

--- Comment #3 from msharov at users dot sourceforge.net 2012-05-22 11:21:41 
UTC ---
 Did -fno-asynchronous-unwind-tables do what you wanted it to do?  In that
 disable the unwinding tables when not using exceptions?

No, it did not. For example:

#include stdio.h
int calculate (int x, int y) { return (x * y); }
void print (void) { printf (%d, calculate(1,2)); }
int main (void) { print(); return (0); }

g++ -Os -c -fno-asynchronous-unwind-tables -o tes.o tes.cc
readelf --debug-dump=frames tes.o

Contents of the .eh_frame section:

 0014  CIE
  Version:   1
  Augmentation:  zR
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data: 1b

  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8
  DW_CFA_nop
  DW_CFA_nop

0018 0010 001c FDE cie= pc=..0006
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

002c 0010 0030 FDE cie= pc=0006..0017
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

0040 0014 0044 FDE cie= pc=..000a
  DW_CFA_advance_loc: 1 to 0001
  DW_CFA_def_cfa_offset: 16
  DW_CFA_advance_loc: 8 to 0009
  DW_CFA_def_cfa_offset: 8
  DW_CFA_nop

As you can see, all three functions still have unwind entries emitted.
According to documentation I saw on the web, -fno-asynchronous-unwind-tables
increases unwind information granularity to function-level, meaning that it
supposedly avoids stepping cfa unless there is a function call there. While I
don't regularly read the unwind tables, I was under impression that this was
happening by default.


[Bug c++/53380] .ehframe could be smaller

2012-05-22 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53380

--- Comment #5 from msharov at users dot sourceforge.net 2012-05-22 18:53:32 
UTC ---
(In reply to comment #4)
 Adding default handling if there is no FDE is an ABI change, so can't be done
 on existing architectures (except those that have that in their ABI already).

I was not suggesting doing it by default. A switch like
-fno-asynchronous-unwind-tables would be perfectly acceptable.

 Why do you care about the .eh_frame size so much?

I don't care so much. I was merely suggesting a way of making it smaller, for
the use of the same people who prefer to use -Os instead -O3. Yes, I probably
could write a post-link tool to do this, but it would be much more work since I
would have to implement a disassembler, etc. to find out which functions do not
need unwind info. In the compiler you already have all that info in the parse
tree, so it is just a matter of adding a couple of if statements.

 If you don't use the unwind info, it often won't be even paged in, or can be 
 discarded from RAM if needed.

Removing unnecessary entries would make lookup faster. Making .eh_frame smaller
would also help the exception path by having less to page in.

 And if you need it, it better be accurrate.

It would still be accurate. Making the common case default does not remove any
useful information.


[Bug c++/53380] New: .ehframe could be smaller

2012-05-16 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53380

 Bug #: 53380
   Summary: .ehframe could be smaller
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: msha...@users.sourceforge.net


The unwinding information can take up a significant (10%+) chunk of a C++
executable, and reducing its size can help make executables smaller. On x86_64
in particular, the need to include .ehframe erases code size reductions
acquired when porting 32bit code, and gives the incorrect impression that 64
bit code is larger. Fortunately, it is possible to make the unwinding table
smaller.

The entries for functions without local variables are basically empty, the
information contained being largely the offset and size of the function body.
If this type of entry was made the default action, all of them could be
removed. Then any function not found in the table would be assumed to have the
return address on top of the stack. Validity checking can still be performed by
verifying that the return address is in .text. Another optimization is to
remove entries for functions that do not throw; i.e. declared as noexcept or
not calling any other functions that are not noexcept.

Just doing these two things should remove more than half the table in a typical
executable. Of course, unwinding information is also used by the debugger and
backtrace, so the table should be left alone by default. It would be nice to
have the option to turn them on explicitly.


[Bug rtl-optimization/52888] New: Unable to inline function pointer call with inexact signature match

2012-04-06 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52888

 Bug #: 52888
   Summary: Unable to inline function pointer call with inexact
signature match
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: msha...@users.sourceforge.net


#include stdio.h

struct A {
template typename T
static inline __attribute__((always_inline))
void Caller (T* pn, void (T::*pm)(void))
{ (pn-*pm)(); }

void Call (int i) {
if (i == 1) Caller(this, A::Func1);
else if (i == 2) Caller(this, A::Func2);
}
inline void Func1 (void){ puts (Func1); }
inline void Func2 (void) noexcept   { puts (Func2); }
};

int main (void) { A a; a.Call(1); a.Call(2); return (0); }
-
Compiling with: g++ (GCC) 4.7.0 20120324 (prerelease), x86_64
 g++ -O -std=c++11 a.cc
a.cc: In function 'int main()':
a.cc:5:55: error: inlining failed in call to always_inline 'static void
A::Caller(T*, void (T::*)()) [with T = A]': mismatched arguments
a.cc:10:42: error: called from here
a.cc:5:55: error: inlining failed in call to always_inline 'static void
A::Caller(T*, void (T::*)()) [with T = A]': mismatched arguments
a.cc:10:42: error: called from here
-
I'm using always_inline to force the error; without it Caller is not inlined
errorlessly.

The problem occurs when the function pointer has an inexact signature match to
the pointed function. In the above example, Func2 has a noexcept tacked on to
it. In more complex examples involving pointer to function with arguments,
using a typedef of an object in argument list results in this error, while
using the actual object works (typedef A a_t; void good(A); void bad(a_t)).
So the compiler is able to explicitly convert an inexact match like A::Func2
to void(A::*)(void) when instantiating the template, but the inliner is not
able to make the same match even though it should have the same information.


[Bug rtl-optimization/49127] New: -Os generates constant mov instead of instruction xor and mov when zeroing

2011-05-23 Thread msharov at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49127

   Summary: -Os generates constant mov instead of instruction xor
and mov when zeroing
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: msha...@users.sourceforge.net


void zero (void* p) { *reinterpret_castulong*(p) = 0; }

Generates:

48 c7 07 00 00 00 00movq   $0x0,(%rdi)

This is shorter by 2 bytes:

31 c0   xor%eax,%eax
48 89 07mov%rax,(%rdi)

And can be reused in further assignments of zero for more savings:

31 c0   xor%eax,%eax
48 89 07mov%rax,(%rdi)
48 89 47 04 mov%rax,0x4(%rdi)