[Bug other/59648] -O2 compilation of xorg-server-1.15.0 fails

2014-01-01 Thread nheghathivhistha at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

--- Comment #2 from David Kredba nheghathivhistha at gmail dot com ---
I am sorry but Xorg guys are saying this is gcc problem:

https://bugs.freedesktop.org/show_bug.cgi?id=71127

-O2 compilation is used wide in distributions so in my opinion this issue needs
resolution on one of connected sides.

Could you kindly please re-open this or say what is wrong with Xorg-server
sources?

Thank you in advance.


[Bug other/59648] -O2 -flto compilation of xorg-server-1.15.0 fails

2014-01-01 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

Andrew Pinski pinskia at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|WAITING
   Last reconfirmed||2014-01-01
 Resolution|INVALID |---
Summary|-O2 compilation of  |-O2 -flto compilation of
   |xorg-server-1.15.0 fails|xorg-server-1.15.0 fails
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski pinskia at gcc dot gnu.org ---
This might still not be a GCC bug.

Note the first warning is definitely not a GCC bug and should be fixed in the
sources:
1.15.0/xkb/xkbInit.c:690:22: warning: type of 'XkbDfltAccessXOptions' does not
match original declaration [enabled by default]
 extern unsigned char XkbDfltAccessXOptions;
  ^
/var/tmp/portage/x11-base/xorg-server-1.15.0/work/xorg-server-1.15.0/xkb/xkbAccessX.c:58:16:
note: previously declared here
 unsigned short XkbDfltAccessXOptions =
^

- CUT 
You are going to have to try to reduce the testcase.  Also just this one file
is not enough to reproduce the issue as this happens only with -flto it seems.


[Bug c++/59633] [4.7/4.8/4.9 Regression] ICE with __attribute((vector_size(...))) for enum

2014-01-01 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59633

--- Comment #3 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Volker Reichelt from comment #2)
 Well, because the C-frontend compiles it, the C++-frontend used to compile
 it and even clang (3.2) compiles it, I was under the impression that this
 should compile (using the underlying type of the enum).

Ok, I'll let someone else decide what behavior is wanted.

 And of course, the docs are at least incomplete, if not inaccurate.
 E.g. the vector extension of the ternary operator ?: is missing in this
 chapter.

The doc for ?: is under review. If other parts are incomplete or inaccurate,
don't hesitate to file bugs (or even post doc patches).


[Bug fortran/59654] [OOP] Broken function table with complex OO use case

2014-01-01 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59654

janus at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-01-01
 CC||janus at gcc dot gnu.org
Summary|Broken function table with  |[OOP] Broken function table
   |complex OO use case |with complex OO use case
 Ever confirmed|0   |1
  Known to fail||4.8.1, 4.9.0

--- Comment #7 from janus at gcc dot gnu.org ---
I can confirm the (supposedly wrong) runtime behavior with 4.8 and trunk. 4.7
does not compile the test case.

Uncommenting the private statement in line 144 only changes the behavior with
4.8, but my trunk build still yields the 'wrong' output.

I tried to use -fdump-tree-original to see what changes in the generated code
when flipping the private statement with 4.8, but that does not show *any*
difference.


[Bug other/59648] -O2 -flto compilation of xorg-server-1.15.0 fails

2014-01-01 Thread trippels at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

--- Comment #4 from Markus Trippelsdorf trippels at gcc dot gnu.org ---
Xext/panoramiX.c sets:
int PanoramiXNumScreens = 0; 

and events.i has:
extern __attribute__((visibility(default))) int PanoramiXNumScreens;
...
typedef struct {
  int screens[16];
  int numScreens;
} ScreenInfo;
ScreenInfo screenInfo;
int fn1() {
  if (noPanoramiXExtension) {
int i;
i = PanoramiXNumScreens - 1;
while (i--)
  CheckVirtualMotion_x = (long)screenInfo.screens[i];
  }
  return 0;
}

Which is clearly invalid. 
Setting PanoramiXNumScreens = 1 in Xext/panoramiX.c fixes the issue.


[Bug fortran/59654] [OOP] Broken function table with complex OO use case

2014-01-01 Thread Thomas.L.Clune at nasa dot gov
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59654

tlcclt Thomas.L.Clune at nasa dot gov changed:

   What|Removed |Added

  Attachment #31554|0   |1
is obsolete||

--- Comment #8 from tlcclt Thomas.L.Clune at nasa dot gov ---
Created attachment 31556
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31556action=edit
Updated UML diagram

I've updated/corrected the UML.

Previous version omitted the ConcreteSurrogate class and had some of the
associations off.  New version also reflects all has-a relationships.


[Bug other/59648] -O2 -flto compilation of xorg-server-1.15.0 fails

2014-01-01 Thread nheghathivhistha at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

--- Comment #5 from David Kredba nheghathivhistha at gmail dot com ---
I tried to write a script for c-reduce. It writes output from compiler in two
steps but grep not waits and c-reduce not wanted to accept it as valid for
reducing case becuse test error level was not OK.

When I modified it this way:

#!/bin/bash
TESTCASE=${1:-testcase.i}
x86_64-pc-linux-gnu-gcc -std=gnu99  -Wall -Wpointer-arith
-Wmissing-declarations -Wformat=2 -Wstrict-prototypes -Wmissing-prototypes
-Wnested-externs -Wbad-function-cast -Wold-style-definition
-Wdeclaration-after-statement -Wunused -Wuninitialized -Wshadow
-Wmissing-noreturn -Wmissing-format-attribute -Wredundant-decls -Wlogical-op
-Werror=implicit -Werror=nonnull -Werror=init-self -Werror=main
-Werror=missing-braces -Werror=sequence-point -Werror=return-type
-Werror=trigraphs -Werror=array-bounds -Werror=write-strings -Werror=address
-Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -fno-strict-aliasing
-fvisibility=hidden -O2 -ggdb -pipe -march=native -mtune=native -flto=4 -o
/dev/null /home/dave2/$TESTCASE /home/dave2/libxservertest.a 
/home/dave2/test.txt 21

cat /home/dave2/test.txt | grep -q 'error: array subscript is below array
bounds'
if ! test $? = 0; then
exit 1
fi
exit 0

then grep was returning expected return code but c-reduce was not able to
remove any line. I think I am missing some very basic thing here :-(.
PS I know that grep can be called with a file name without using cat and pipe
but it not wanted to work too.


[Bug other/59648] -O2 -flto compilation of xorg-server-1.15.0 fails

2014-01-01 Thread nheghathivhistha at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

--- Comment #6 from David Kredba nheghathivhistha at gmail dot com ---
Maybe to reduce ii file containing events.i and all ii files that creates 
libxservertest.a together?


[Bug other/59648] -O2 -flto compilation of xorg-server-1.15.0 fails

2014-01-01 Thread trippels at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

--- Comment #7 from Markus Trippelsdorf trippels at gcc dot gnu.org ---
You cannot use full paths for your test output, because 
creduce will run each iteration in a new directory. So
try changing it from:
 -flto=4 -o /dev/null /home/dave2/$TESTCASE /home/dave2/libxservertest.a 
/home/dave2/test.txt 21
cat /home/dave2/test.txt | grep ...

to:
 -flto=4 -o /dev/null $TESTCASE /home/dave2/libxservertest.a  test.txt 21
cat test.txt | grep ...

To reduce /home/dave2/libxservertest.a first extract its contents:
 ar x /home/dave2/libxservertest.a 
and then produce a list of the object files:
 ls -al *.o  list
Edit list and prepend the full path to all object files.
The run delta on list as described here:
http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction#Reducing_LTO_bugs
and generate preprocessed source for the files.

Now reduce the preprocessed files one by one. 
After a few iterations you'll end up with something like:

x4 test # cat test.i
long a;
extern int noPanoramiXExtension, PanoramiXNumScreens, CheckVirtualMotion_x;
typedef struct {
  int screens[16];
  int numScreens;
} ScreenInfo;
ScreenInfo screenInfo;
int fn1() {
  if (noPanoramiXExtension) {
int i;
i = PanoramiXNumScreens - 1;
while (i--)
  CheckVirtualMotion_x = (long)screenInfo.screens[i];
  }
  return 0;
}

int main() {
  screenInfo.numScreens = 0;
  a = (long)*fn1;
  return 0;
}

x4 test # cat panoramiX.i
int PanoramiXNumScreens;
int noPanoramiXExtension=1;
int CheckVirtualMotion_x;

x4 test # gcc -O2  -Werror=array-bounds test.i panoramiX.i 
x4 test # gcc -O2 -flto  -Werror=array-bounds test.i panoramiX.i 
test.i: In function ‘fn1’:
test.i:13:54: warning: iteration 16 invokes undefined behavior
[-Waggressive-loop-optimizations]
   CheckVirtualMotion_x = (long)screenInfo.screens[i];
  ^
test.i:12:11: note: containing loop
 while (i--)
   ^
test.i:13:54: error: array subscript is below array bounds
[-Werror=array-bounds]
   CheckVirtualMotion_x = (long)screenInfo.screens[i];
  ^
lto1: some warnings being treated as errors
lto-wrapper: /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.0/gcc returned 1 exit status
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.0/../../../../x86_64-pc-linux-gnu/bin/ld:
fatal error: lto-wrapper failed
collect2: error: ld returned 1 exit status
x4 test #

[Bug other/59648] -O2 -flto compilation of xorg-server-1.15.0 fails

2014-01-01 Thread nheghathivhistha at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59648

--- Comment #8 from David Kredba nheghathivhistha at gmail dot com ---
Thank you!


[Bug tree-optimization/59651] [4.9 Regression] Vectorizer failing to spot dependence causes incorrect code generation.

2014-01-01 Thread belagod at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59651

--- Comment #4 from belagod at gcc dot gnu.org ---
Thanks for looking at this.

Just to clarify, do you mean loop versioning happens in the up-counting loop?
Because in the down-counting loop, a partition seems to be happening with 2
iterations of the loop getting vectorized and the remaining 2 are left scalar.


[Bug tree-optimization/59642] Performance regression (4.7/4.8) with -ftree-loop-distribute-patterns

2014-01-01 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59642

--- Comment #2 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Marc Glisse from comment #1)
 I've noticed the same in other PRs, normally we manage to track the actual
 value of *p, but we don't manage that when *p was written by __builtin_mem*,
 which should still be doable:

PR 58483 has an example with memcpy.


[Bug libstdc++/59656] New: weak_ptr::lock function crashes when compiling with -fno-exceptions flag

2014-01-01 Thread chus_flores at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59656

Bug ID: 59656
   Summary: weak_ptr::lock function crashes when compiling with
-fno-exceptions flag
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chus_flores at hotmail dot com

weak_ptr::lock crashes when the code is compiled with the -fno-exceptions flag
and the data pointed by the weak_ptr expires during the execution of the lock
function itself:

(from http://gcc.gnu.org/onlinedocs/gcc-4.8.2/libstdc++/api/a01518_source.html)

shared_ptr_Tp
  494   lock() const noexcept
  495   {
  496 #ifdef __GTHREADS
  497 if (this-expired())
  498   return shared_ptr_Tp();
  499 
  500 __try
  501   {
  502 return shared_ptr_Tp(*this);
  503   }
  504 __catch(const bad_weak_ptr)
  505   {
  506 return shared_ptr_Tp();
  507   }
  508 #else
  509 return this-expired() ? shared_ptr_Tp() : shared_ptr_Tp(*this);
  510 #endif
  511   }

If the data is valid when line 497 is executed and the data is released in a
different thread just before executing line 502, the program will crash because
it will try to throw an exception (exceptions are disabled because of the flag
-fno-exceptions). This code only works when exceptions are enabled because the
try/catch will resolve the problem, but not otherwise.

The standard definition says that this function must return safely and it
doesn't throw any exception. I presume this must apply even if the exceptions
are not enabled.


[Bug rtl-optimization/41171] register allocator undoing optimal schedule

2014-01-01 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

 CC||steven at gcc dot gnu.org

--- Comment #9 from Steven Bosscher steven at gcc dot gnu.org ---
(In reply to Peter Bergner from comment #5)
 Looking at update_equiv_regs(), if I disable the replacement for regs
 that are local to one basic block (patch below) like it existed before
 John Wehle's patch way back in Oct 2000:
 
   http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00782.html
 
 then we get the ordering we want.  Does anyone know why John removed
 that part of the test in his patch?  Thoughts anyone?

To allow things to be moved around in, or out of loops.


[Bug fortran/59654] [OOP] Broken function table with complex OO use case

2014-01-01 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59654

--- Comment #9 from janus at gcc dot gnu.org ---
Created attachment 31557
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31557action=edit
reduce test case

Reduced test case. Should print '1' and does so with 4.7.4, but prints '0' with
4.8 and trunk. ICEs with 4.6.


[Bug c/59657] New: SSE intrinsics translates to AVX instructions

2014-01-01 Thread oystein at gnubg dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59657

Bug ID: 59657
   Summary: SSE intrinsics translates to AVX instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: oystein at gnubg dot org

Created attachment 31558
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31558action=edit
Example source code file

Happy new year!

I writing code which should be running both on sse and avx machines. I have
manually vectorized the code for SSE and AVX in two different functions and
using a function pointer to set the right function according to CPU at startup.
The two functions are in the same translation unit.

(See attached code)

compiled with: 
gcc -Wall -O3 -g -mavx sse_test.c -o sse_test

The problem is that the sse intisics in the sse function gets translated to AVX
instructions. This will of course give an illegal instruction on on all non-AVX
machines. 

My gdb session:

Program received signal SIGILL, Illegal instruction.
0x08048452 in calculate_sse (data=data@entry=0xb5e0, scale=scale@entry=0.5,
size=size@entry=256)
at sse_test.c:33
33for ( ; count-- ; p += 4 ){
(gdb) list
28
29static void calculate_sse(float *data, float scale, int size )
30{
31int count = size  2;
32float *p = data;
33for ( ; count-- ; p += 4 ){
34__m128 d = _mm_load_ps( p );
35__m128 s = _mm_set1_ps( scale );
36_mm_store_ps( p, _mm_mul_ps( d, s ));
37}
(gdb) disassemble 
Dump of assembler code for function calculate_sse:
   0x08048440 +0:mov0xc(%esp),%ecx
   0x08048444 +4:mov0x4(%esp),%eax
   0x08048448 +8:sar$0x2,%ecx
   0x0804844b +11:test   %ecx,%ecx
   0x0804844d +13:lea-0x1(%ecx),%edx
   0x08048450 +16:je 0x8048474 calculate_sse+52
= 0x08048452 +18:vbroadcastss 0x8(%esp),%xmm1
   0x08048459 +25:lea0x0(%esi,%eiz,1),%esi
   0x08048460 +32:vmulps (%eax),%xmm1,%xmm0
   0x08048464 +36:sub$0x1,%edx
   0x08048467 +39:add$0x10,%eax
   0x0804846a +42:vmovaps %xmm0,-0x10(%eax)
   0x0804846f +47:cmp$0x,%edx
   0x08048472 +50:jne0x8048460 calculate_sse+32
   0x08048474 +52:repz ret 
End of assembler dump.

(Arch linux)
[oystein@oysteins-laptop ~]$ gcc --version 
gcc (GCC) 4.8.2 20131219 (prerelease)

Bug or feature? I'm not sure if this is the expected way the intrisics should
translate, but it was not what I expected. If it is supposed to be like this,
can I get out of my problem without splitting the the two functions to two
translation units and use two different compile options?

Thanks,
Øystein

[Bug c/59657] SSE intrinsics translates to AVX instructions

2014-01-01 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59657

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Feature.  By compiling with -mavx, any function can use AVX instructions.
You can either define the functions in different files and use -mavx to compile
one and -msse2 or whatever to compile the other one, or you can use the target
attribute or #pragma GCC target.


[Bug libstdc++/54448] many failures with /sbin/loader: Error: libstdc++.so.6: symbol __pthread_mutex_init unresolved

2014-01-01 Thread htl10 at users dot sourceforge.net
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54448

--- Comment #6 from Hin-Tak Leung htl10 at users dot sourceforge.net ---
The latest with 4.6.4 and 4.7.3 :
http://gcc.gnu.org/ml/gcc-testresults/2014-01/msg00048.html
http://gcc.gnu.org/ml/gcc-testresults/2014-01/msg00049.html

seems to be a lot healthier.

During the course of the latest round, I realised that it seems that GNU strip
from GNU binutils seems to confuse the configure system
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44959 bootstrap failed at
Comparing stages 2 and 3) on -gtoggle ; so I am putting /usr/local/bin *last*,
instead of first as previously.

gcc these days requires GNU tar to extract, and GNU make, GNU bash to configure
and make, so it is almost a habit to put /usr/local/bin first, but GNU strip
certainly seems to behave differently from DEC strip.

Would any of GNU binutils causes
/sbin/loader: Error: libstdc++.so.6: symbol __pthread_mutex_init
unresolved? If there is a simple test, I can try.


[Bug libitm/52695] libitm/config/x86/cacheline.h: '__m64' does not name a type

2014-01-01 Thread dirtyepic at gentoo dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52695

--- Comment #7 from Ryan Hill dirtyepic at gentoo dot org ---
(In reply to Jakub Jelinek from comment #5)
 No idea what brokeness the above talks about, it works just fine for me in
 C++, so IMHO it just should always include x86intrin.h, but certainly if
 __MMX__ is defined, but no __SSE__, the above won't include in C++ any
 header which would define __m64.

For 4.8 it just directly includes x86intrin.h.
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00467.html [1]

However after patching 4.7.3 [2] we're seeing a different error on some
systems.

---8---
In file included from
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/x86intrin.h:27:0,
 from
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/gcc-4.7.3/libitm/config/x86/target.h:72,
 from
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/gcc-4.7.3/libitm/libitm_i.h:82,
 from
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/gcc-4.7.3/libitm/aatree.cc:28:
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/ia32intrin.h:
In function ‘int __bsrd(int)’:
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/ia32intrin.h:41:35:
error: ‘__builtin_ia32_bsrsi’ was not declared in this scope
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/ia32intrin.h:
In function ‘long long unsigned int __rdpmc(int)’:
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/ia32intrin.h:89:35:
error: ‘__builtin_ia32_rdpmc’ was not declared in this scope
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/ia32intrin.h:
In function ‘long long unsigned int __rdtsc()’:
/var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/build/./gcc/include/ia32intrin.h:97:32:
error: ‘__builtin_ia32_rdtsc’ was not declared in this scope
---8---

Both the reporters have AMD K8 processors.  They only hit the bug when using
-march=native; -march=k8 is successful.

$ echo  | gcc -march=native -v -E - 21 | grep cc1
 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.3/cc1 -E -quiet -v - -march=k8
-mno-cx16 -mno-sahf -mno-movbe -mno-aes -mno-pclmul -mno-popcnt -mno-abm
-mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-tbm -mno-avx -mno-sse4.2
-mno-sse4.1 --param l1-cache-size=64 --param l1-cache-line-size=64 --param
l2-cache-size=512 -mtune=k8

So it seems there's still a piece missing.

[1] http://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=193369
[2]
http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo/src/patchsets/gcc/4.7.3/gentoo/49_all_x86_pr52695_libitm-m64.patch?revision=1.1view=markup

[Bug tree-optimization/59644] [4.9 Regression] r206243 miscompiles Linux kernel

2014-01-01 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59644

--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org ---
The reason why some changes appear in stdarg functions is:
ix86_update_stack_boundary:
  /* x86_64 vararg needs 16byte stack alignment for register save
 area.  */
  if (TARGET_64BIT
   cfun-stdarg
   crtl-stack_alignment_estimated  128)
crtl-stack_alignment_estimated = 128;
and kernel uses -mno-sse -mpreferred-stack-boundary=3
But because of -mno-sse, that is completely unnecessary, as
setup_incoming_varargs_64 does:
  /* FPR size of varargs save area.  We don't need it if we don't pass
 anything in SSE registers.  */
  if (TARGET_SSE  cfun-va_list_fpr_size)
ix86_varargs_fpr_size = X86_64_SSE_REGPARM_MAX * 16;
  else
ix86_varargs_fpr_size = 0;
thus for !TARGET_SSE it never even allocates the fpr save area that would need
the bigger alignment.  So IMHO we might as well change
ix86_update_stack_boundary
to add  !TARGET_SSE into the condition.

Still, that doesn't explain why the kernel doesn't like it.  As I said earlier,
the explanation could be that something doesn't expect r10 to be clobbered
across some of these calls, or perhaps something assumes  64bit stack
alignment in the called function (but with -mpreferred-stack-boundary=3 that
would be broken assumption).