[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread pepalogik at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #11 from Jan Lachnitt  ---
Thank you all for a rapid investigation of the problem.

Here is a confirmation with the large test case:

jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 phsh1.f -std=legacy
-I. -march=core-avx-i -o core-avx-i/phsh1
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ cd core-avx-i/
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/core-avx-i $ time ./phsh1 <
../bmtz
 Slab or Bulk calculation?
 input 1 for Slab or 0 for Bulk
 Input the MTZ value from the substrate calculation

real221m0.225s
user220m52.488s
sys 0m4.488s
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/core-avx-i $ rm check.o mufftin.d 
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/core-avx-i $ LD_BIND_NOW=1 time
./phsh1 < ../bmtz
 Slab or Bulk calculation?
 input 1 for Slab or 0 for Bulk
 Input the MTZ value from the substrate calculation
4512.06user 1.50system 1:15:16elapsed 99%CPU (0avgtext+0avgdata
7296maxresident)k
23408inputs+34424outputs (7major+1219minor)pagefaults 0swaps


Really, LD_BIND_NOW=1 does wonders :-) .

https://sourceware.org/bugzilla/show_bug.cgi?id=20495#c8 suggests building with
"-Wl,-z,now" (I suppose this does the same as LD_BIND_NOW=1). Can it be used as
a general workaround, before glibc 2.25 is available?

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from H.J. Lu  ---
It is a glibc bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=20495

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #9 from Uroš Bizjak  ---
(In reply to H.J. Lu from comment #7)
> (In reply to H.J. Lu from comment #6)
> > Which glibc was used?  You may run into:
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=20495
> 
> Please run your testcase with
> 
> # LD_BIND_NOW=1 ./test
> 
> with and without -mavx.

Yes, way faster with LD_BIND_NOW=1 in -mavx case:

[uros@localhost march]$ time ./a.out < ../bmtz 
 Slab or Bulk calculation?
 input 1 for Slab or 0 for Bulk
 Input the MTZ value from the substrate calculation

real0m0.693s
user0m0.691s
sys 0m0.002s
[uros@localhost march]$ LD_BIND_NOW=1 time ./a.out < ../bmtz 
 Slab or Bulk calculation?
 input 1 for Slab or 0 for Bulk
 Input the MTZ value from the substrate calculation
0.27user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 3012maxresident)k
0inputs+152outputs (0major+146minor)pagefaults 0swaps

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #8 from Uroš Bizjak  ---
(In reply to H.J. Lu from comment #6)
> Which glibc was used?  You may run into:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=20495

Fedora 25:

$ /lib/libc.so.6 
GNU C Library (GNU libc) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.1.1 20160721 (Red Hat 6.1.1-4).
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
.

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #7 from H.J. Lu  ---
(In reply to H.J. Lu from comment #6)
> Which glibc was used?  You may run into:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=20495

Please run your testcase with

# LD_BIND_NOW=1 ./test

with and without -mavx.

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

H.J. Lu  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #6 from H.J. Lu  ---
Which glibc was used?  You may run into:

https://sourceware.org/bugzilla/show_bug.cgi?id=20495

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #5 from Uroš Bizjak  ---
I can confirm the slowdown with -mavx on ivybridge-E:

-msse4

real0m0.278s
user0m0.276s
sys 0m0.002s

-mavx

real0m0.699s
user0m0.696s
sys 0m0.003s


in -msse4 case, perf report annotates mtz_ with:

  1.33 │ → callq  logf@plt
   │   addss  0x7acb(%rip),%xmm0# 40b0cc

  6.87 │   mulss  0x7ac7(%rip),%xmm0# 40b0d0

  9.53 │   addss  0x7aa7(%rip),%xmm0# 40b0b8

  7.54 │   cvttss %xmm0,%eax
  7.76 │   cmp%r12d,%eax

and in -mavx case:

  0.09 │ → callq  logf@plt
   │   vaddss 0x7a85(%rip),%xmm0,%xmm0# 40b16c

 59.65 │   vmulss 0x7a81(%rip),%xmm0,%xmm0# 40b170

  2.92 │   vaddss 0x7a61(%rip),%xmm0,%xmm0# 40b158

  2.04 │   vcvtts %xmm0,%eax
  2.65 │   cmp%r12d,%eax

Something happens with vmulss, but I doubt it is the compiler fault.

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread pepalogik at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #4 from Jan Lachnitt  ---
Small test case with -march=core-avx-i:
real0m1.300s
user0m1.296s
sys 0m0.000s

I.e., reproduced.

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #3 from Richard Biener  ---
Can't reproduce (with the small testcase) and -march=native on

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt
lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid

(that doesn't have avx)

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #2 from Richard Biener  ---
It's hard to guess what happens, esp. w/o optimization the difference should
be minimal.

It would be nice if you can isolate the difference to a specific architecture
flag rather than native (-march=core-avx-i ?)

[Bug fortran/78611] -march=native makes code 3x slower

2016-11-30 Thread pepalogik at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

--- Comment #1 from Jan Lachnitt  ---
Created attachment 40200
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40200&action=edit
Smaller test case

Here is a smaller test case, which runs for a second only, not hours.

without -march=native:
real0m0.610s
user0m0.560s
sys 0m0.000s

with -march=native:
real0m1.271s
user0m1.268s
sys 0m0.000s