[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-02-18 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

Dzianis Kahanovich  changed:

   What|Removed |Added

  Attachment #47753|0   |1
is obsolete||

--- Comment #102 from Dzianis Kahanovich  ---
Created attachment 47874
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47874&action=edit
additional aligning on demand <10.0 (fixed)

Sorry (choked), playing with 10.0 I lost mind... 9.2 patch was broken. There
are  fixed.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-31 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #101 from Dzianis Kahanovich  ---
Created attachment 47754
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47754&action=edit
additional aligning on demand 10.0 (unsure)

This is same for gcc 10.0 and not fully verifyed.

It MUST work in gcc 10.0, but in current git options helps show nothing
changed:
gcc -Q -O3 -m32 -march=core2 --help=target --help=optimizers |grep
'stackrealign\|cost-model'

Looks like deep options behaviour rework in progress.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-31 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #100 from Dzianis Kahanovich  ---
Created attachment 47753
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47753&action=edit
additional aligning on demand <10.0

Finally (for me), if somebody think to patch by  H.J. Lu is not enough, there
are my patch to force "-fvect-cost-model=cheap -fsimd-cost-model=cheap" or
"-mstackrealign" on demand. Default - first, as no abi violation, but if
defined ENABLE_STACKREALIGN_ABI_VIOLATION=1 - first choice will be
"-mstackrealign". This is for gcc <10.0 and verifyed.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-16 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #99 from Dzianis Kahanovich  ---
PPS About some hidden thinks/things. In pure theory. "*cost-model=cheap" can
reduce SSE usage, -mstackrealign - can increase function prolog/epilog
overhead. In my case - x7-Z8700 CPU have 2 FPU cores for 4 CPU cores
(silvermont-1 have even less FPUs), so solution looks sure better then
"-mstackrealign". But on some other CPUs something may be else and need to be
tested about performance.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-16 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #98 from Dzianis Kahanovich  ---
fix: "I not try to rebuild 32bit "world" without ANY workaround" - on modern
gcc (now all under 9.2). Previous experiments was times & versions ago, so many
other new factors/fixes can solve most issues.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-16 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #97 from Dzianis Kahanovich  ---
No. Looking into gcc/opts.c - "-O3 optimizations" section - line:
{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC },

- so, for -O3 it's "dynamic". Then, RTFM, "cheap" more cares about aligning.
But anymore, I not try to rebuild 32bit "world" without ANY workaround, so all
still dirty ;)

PS For some options configuration behaviour still non-linear, so queryng "gcc
-Q ..." still unsafe to check some defaults...

(In reply to Viktor Ostashevskyi from comment #96)
> Honestly, I don't see how your compiler flags could help. cost-model=cheap
> is default, data-alignment doesn't change incoming stack alignment.
> 
> ср, 15 січ. 2020, 14:31 користувач mahatma at eu dot by <
> gcc-bugzi...@gcc.gnu.org> пише:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
> >
> > --- Comment #95 from Dzianis Kahanovich  ---
> > Just FYI. Novadays, on my Thinkpad tablet with Atom (32 bit userspace
> > Gentoo),
> > I globally replace patch/-mstackrealign to "-fvect-cost-model=cheap
> > -fsimd-cost-model=cheap -malign-data=cacheline" and all works fine for -O3
> > +.
> > (This is dirty example, as cacheline for some old SSE CPUs are different,
> > etc).
> >
> > --
> > You are receiving this mail because:
> > You are on the CC list for the bug.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-16 Thread ostash at ostash dot kiev.ua
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #96 from Viktor Ostashevskyi  ---
Honestly, I don't see how your compiler flags could help. cost-model=cheap
is default, data-alignment doesn't change incoming stack alignment.

ср, 15 січ. 2020, 14:31 користувач mahatma at eu dot by <
gcc-bugzi...@gcc.gnu.org> пише:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
>
> --- Comment #95 from Dzianis Kahanovich  ---
> Just FYI. Novadays, on my Thinkpad tablet with Atom (32 bit userspace
> Gentoo),
> I globally replace patch/-mstackrealign to "-fvect-cost-model=cheap
> -fsimd-cost-model=cheap -malign-data=cacheline" and all works fine for -O3
> +.
> (This is dirty example, as cacheline for some old SSE CPUs are different,
> etc).
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-15 Thread mahatma at eu dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #95 from Dzianis Kahanovich  ---
Just FYI. Novadays, on my Thinkpad tablet with Atom (32 bit userspace Gentoo),
I globally replace patch/-mstackrealign to "-fvect-cost-model=cheap
-fsimd-cost-model=cheap -malign-data=cacheline" and all works fine for -O3 +.
(This is dirty example, as cacheline for some old SSE CPUs are different, etc).

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-13 Thread ostash at ostash dot kiev.ua
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #94 from Viktor Ostashevskyi  ---
(In reply to Florian Weimer from comment #93)
> (In reply to Viktor Ostashevskyi from comment #92)
> > I've tried to run some old binaries yesterday (StarOffice 5.1, get it from
> > archive.org) and hit this bug.
> > 
> > What are possible workarounds?
> 
> You need to use an operating system which was build with -mstackrealign,
> such as Fedora.

Indeed, I can confirm that rebuilding 32-bit libraries with '-mstackrealign' on
Gentoo helps.

Bug probably can be closed as WONTFIX. Additionally, it would be nice to have
this ABI breakage properly documented somewhere (GCC FAQ?).

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-13 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

--- Comment #93 from Florian Weimer  ---
(In reply to Viktor Ostashevskyi from comment #92)
> I've tried to run some old binaries yesterday (StarOffice 5.1, get it from
> archive.org) and hit this bug.
> 
> What are possible workarounds?

You need to use an operating system which was build with -mstackrealign, such
as Fedora.

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2020-01-13 Thread ostash at ostash dot kiev.ua
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

Viktor Ostashevskyi  changed:

   What|Removed |Added

 CC||ostash at ostash dot kiev.ua

--- Comment #92 from Viktor Ostashevskyi  ---
I've tried to run some old binaries yesterday (StarOffice 5.1, get it from
archive.org) and hit this bug.

What are possible workarounds?

[Bug target/40838] gcc shouldn't assume that the stack is aligned

2019-10-30 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #91 from Peter Cordes  ---
This bug should be closed as "resolved fixed".  The "fix" was to change the ABI
doc and break existing hand-written asm, and old binaries.  This was
intentional and resulted in some pain, but at this point it's a done deal.



My attempt at a summary of the current state of affairs for 32-bit x86 calling
conventions (on Linux and elsewhere):

Yes, the version of the i386 System V ABI used on Linux really did change
between gcc2.8 and gcc8.  Those compilers are not ABI-compatible with each
other.  This is a known fact.  Hand-written asm that makes function calls with
misaligned stack pointers is violating the (updated) ABI, and was also
knowingly broken by this change.


(Perhaps unintentionally at first, with stack alignment intended to just
provide a performance benefit, not a correctness issue.  But the resolution
ended up being to standardize on 16-byte alignment matching x86-64 System V.  
Instead of reverting to the old ABI and breaking compat with new binaries that
had started to rely on 16-byte incoming alignment, or to add significant
overhead to every function that didn't know how both its caller and callee were
compiled, i.e. most functions.  Using MOVUPS instead of MOVAPS everywhere
wouldn't work well because it would mean no folding of memory operands into ALU
instructions: without AVX's VEX encoding,  paddd xmm0, [mem] requires aligned
mem.  And existing binaries that rely on incoming 16-byte alignment weren't
doing that.)


An earlier comment also mentioned common arrays: the ABI also requires arrays
larger than 16 bytes to have 16-byte alignment.



Perhaps unnecessary pain for little real benefit: i386 on Linux has been mostly
obsolete for a long time, and the inefficient stack-args calling convention was
never changed.  It's ironic that Linux broke ABI compat for i386 in the name of
more efficient SSE-usage despite not caring to introduce anything like Windows
fastcall or vectorcall (efficient register-args calling conventions).

(GCC does have ABI-changing -mregparm=3 and -msseregparm to pass integers in
regs, and pass/return FP values in XMM registers (instead of passing on the
stack / returning in x87 st0).  But no distros have switched over to using that
calling convention for i386 binaries, AFAIK.  The Linux kernel does use regparm
for 32-bit kernel builds.)

Even more ironic, probably a lot of 32-bit code is compiled without -msse2
(because one of the main reasons for using 32-bit code is CPUs too old for
x86-64, which is about the same vintage as SSE2).  SSE usage can still happen
with runtime dispatching in binaries that are compatible with old machines
while still being able to take advantage of new ones.


But in most cases, if you want performance you use x86-64 kernel + user-space,
or maybe x32 user-space (ILP32 in 64-bit mode) to get modern calling
conventions and the benefit of twice as many registers.  x86-64 System V has
mandated 16-byte stack alignment from the start.  (I don't know the history,
but perhaps i386 code-gen started assuming / depending on it for correctness,
not just performance, by accident because of devs being used to x86-64?)

The 32-bit ABI on some other OSes, including i386 *BSD and 32-bit Windows, has
*not* changed; presumably gcc there doesn't rely on incoming stack alignment. 
(It might try to propagate 16-byte alignment for performance benefits, though.)

My understanding is that i386 MacOS still uses a version of i386 System V that
doesn't include the 16-byte stack alignment update, like other *BSDs.


(In reply to Harald van Dijk from comment #90)
> compile
> 
>   void exit(int);
>   int main(void) { exit(0); }
> 
> with GCC 2.8, compile current glibc with GCC 8, and there will be a segfault
> in glibc's __run_exit_handlers because GCC 2.8 never kept the stack
> 16-byte-aligned, but GCC 8 does now generate code which assumes it.
>
> For the moment, I've rebuilt glibc with -mincoming-stack-boundary=2 to handle 
> the problem well enough for my current needs, but it's not a complete 
> solution.

Yes, you need workarounds like this to change modern GCC's ABI back to legacy
4-byte.

Note that you might break atomicity of C11 _Atomic 8-byte objects even outside
structs by doing this, if they split across a cache line (Intel) or possibly
narrower (AMD) boundary.  But only if they were stack allocated.