Re: GMP 6.2.1 Aborting when running tuneup program in one.cold()

2021-10-20 Thread Torbjörn Granlund
Simon Sobisch  writes:

  Am 20.10.2021 um 16:19 schrieb Torbjörn Granlund:
  > When tuneup cannot measure things accurately, it bails out.

  That's interesting. Is there any thing I can do to help tuneup measure
  things accurately?

Make sure your system is idle except for the tuneup process.

  Can you please add that important information (abort of the program is
  no bug, just use the non-optimized version) to the documentation
 https://gmplib.org/manual/Performance-optimization
  ideally together with the answer to the related questions
  "What should -f NNN" relate to?" and "Should I manually build the
  average" (if this isn't an effect of "not accurately measured")?

If we find time, perhaps.

Running the tuneup program and make use of its results is mainly
intended for GMP devs.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


macOS version detection broken in configure

2021-10-20 Thread Carlo Cabrera
The configure script mistakenly recognises macOS 11 (Big Sur) as an old version
of macOS, thereby choosing the linker flags `-flat_namespace` and `-undefined 
suppress`.

We want to avoid `-flat_namespace` as this can cause name collisions for users
of the library. [1]

This can be fixed by patching libtool.m4 [2] and regenerating configure.

The problem lies in this snippet from `configure`:

case $host_os in
rhapsody* | darwin1.[012])
  _lt_dar_allow_undefined='$wl-undefined ${wl}suppress' ;;
darwin1.*)
  _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' 
;;
darwin*) # darwin 5.x on
  # if running on 10.5 or later, the deployment target defaults
  # to the OS version, if on x86, and 10.4, the deployment
  # target defaults to 10.4. Don't you love it?
  case ${MACOSX_DEPLOYMENT_TARGET-10.0},$host in
10.0,*86*-darwin8*|10.0,*-darwin[91]*)
  _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
10.[012][,.]*)
  _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined 
${wl}suppress' ;;
10.*)
  _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;;
  esac

To check, we can use `otool -hV` or a newish version of `file` on
`libgmp.10.dylib`:

❯ otool -hV .libs/libgmp.10.dylib
.libs/libgmp.10.dylib:
Mach header
  magic  cputype cpusubtype  capsfiletype ncmds sizeofcmds  
flags
MH_MAGIC_64   X86_64ALL  0x00   DYLIB14   1616 DYLDLINK 
NO_REEXPORTED_DYLIBS
❯ file .libs/libgmp.10.dylib
.libs/libgmp.10.dylib: Mach-O 64-bit x86_64 dynamically linked shared 
library, flags:<|DYLDLINK|NO_REEXPORTED_DYLIBS>

A library built with the correct linker options (`-undefined dynamic_lookup`)
should produce this output (built on macOS 10.15):

❯ otool -hV lib/libgmp.10.dylib
lib/libgmp.10.dylib:
Mach header
  magic  cputype cpusubtype  capsfiletype ncmds sizeofcmds  
flags
MH_MAGIC_64   X86_64ALL  0x00   DYLIB14   1632   
NOUNDEFS DYLDLINK TWOLEVEL NO_REEXPORTED_DYLIBS
❯ file lib/libgmp.10.dylib
lib/libgmp.10.dylib: Mach-O 64-bit x86_64 dynamically linked shared 
library, flags:

In particular, `TWOLEVEL` appears in the flags for the library built on Catalina
but not on Big Sur. Checking the output of `make` also shows that `libtool`
invokes `clang` with the `-flat_namespace` flag.

Here is the information requested in your bug reporting manual:

configure options: none
configure output: see attached `configure_output.txt`
compiler: Apple clang version 13.0.0 (clang-1300.0.29.3)
uname -a: Darwin hermes.lan 20.6.0 Darwin Kernel Version 20.6.0: Mon Aug 30 
06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64 x86_64 i386 
MacBookAir9,1 Darwin
config.guess: nehalem-apple-darwin20.6.0
configfsf.guess: x86_64-apple-darwin20.6.0
config.log: attached

[1] 
http://mirror.informatimago.com/next/developer.apple.com/releasenotes/DeveloperTools/TwoLevelNamespaces.html#intro
[2] https://lists.gnu.org/archive/html/libtool-patches/2020-06/msg1.html



config.log.tar.gz
Description: GNU Zip compressed data
checking build system type... nehalem-apple-darwin20.6.0
checking host system type... nehalem-apple-darwin20.6.0
checking for a BSD-compatible install... 
/usr/local/opt/coreutils/libexec/gnubin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... 
/usr/local/opt/coreutils/libexec/gnubin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking ABI=64
checking whether clang is gcc... yes
checking compiler clang -O2 -pedantic -fomit-frame-pointer -m64 ... yes
checking compiler clang -O2 -pedantic -fomit-frame-pointer -m64  
-mtune=nehalem... yes
checking compiler clang -O2 -pedantic -fomit-frame-pointer -m64 -mtune=nehalem  
-march=nehalem... yes
checking for gcc... clang
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether clang accepts -g... yes
checking for clang option to accept ISO C89... none needed
checking whether clang understands -c and -o together... yes
checking for clang option to accept ISO C99... none needed
checking how to run the C preprocessor... clang -E
checking build system compiler clang... yes
checking for build system preprocessor... clang -E
checking for build system executable suffix... 
checking whether build system compiler is ANSI... yes
checking for build system compiler math library... -lm
checking for grep that handles long lines and -e... 
/usr/local/opt/grep/libexec/gnubin/grep
checking for egrep... /usr/local

Re: GMP 6.2.1 Aborting when running tuneup program in one.cold()

2021-10-20 Thread Simon Sobisch

Thanks for the prompt answer!

Am 20.10.2021 um 16:19 schrieb Torbjörn Granlund:

When tuneup cannot measure things accurately, it bails out.


That's interesting. Is there any thing I can do to help tuneup measure 
things accurately?


Can you please add that important information (abort of the program is 
no bug, just use the non-optimized version) to the documentation

   https://gmplib.org/manual/Performance-optimization
ideally together with the answer to the related questions
"What should -f NNN" relate to?" and "Should I manually build the 
average" (if this isn't an effect of "not accurately measured")?


> No bug.

Maybe the tuneup program could also hint this at start (additional to 
the doc change)?


Something like

./tuneup
Try finding optimal parameters for ./mpn/x86_64/skylake/gmp-mparam.h
If this is not possible this program will abort, which is no bug, and 
you should use the untuned version.

Using: CPU cycle counter, supplemented by microsecond getrusage()
speed_precision 1, speed_unittime 3.34e-10 secs, CPU freq 2992.97 MHz
DEFAULT_MAX_SIZE 1000, fft_max_size 5


Simon
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: GMP 6.2.1 Aborting when running tuneup program in one.cold()

2021-10-20 Thread Torbjörn Granlund
When tuneup cannot measure things accurately, it bails out.

No bug.


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


GMP 6.2.1 Aborting when running tuneup program in one.cold()

2021-10-20 Thread Simon Sobisch

make check passed

Testsuite summary for GNU MP 6.2.1

# TOTAL: 0
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

and the installed library also seems to work fine from a quick test, but 
the tuneup program crashes.


System specs below.


Questions:
* Did I missed a known issue?
* Can I do anything about this?
* Should I care running tuneup at all?
  The application does some heavy computations with it in the range
  +-9 (mostly multiply and divide [often by 10] and
  addition/subtraction and string representation all via GnuCOBOL)
* As the output of the tuneup utility is different each time and the
  docs at https://gmplib.org/manual/Performance-optimization are more
  spare than for other parts: Should I run it multiple times and then
  use the average? It lists -NNN but doesn't give a clue how to set
  it - should I set that? Does a high value make an average on its own?

I'm available to provide more info as needed (please CC me in answers as 
I'm not subscribed to this list).



Thanks,
Simon




* uname -a
Linux cent01test 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 
2018 x86_64 x86_64 x86_64 GNU/Linux


* ../config.guess && ./configfsf.guess
sandybridge-pc-linux-gnu
x86_64-pc-linux-gnu

* gcc -v (build directly before on that machine)
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc-11.2.mit.gcc.flags/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/opt/gcc-11.2 --disable-multilib 
--enable-languages=c,c++,lto --with-gmp=/opt/gmp-6.2.1/lib 
CFLAGS='-march=native -mtune=native' CXXFLAGS='-march=native -mtune=native'

Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.2.0 (GCC)

* m4 --version
m4 (GNU M4) 1.4.16

* backtrace:
$ gdb tuneup --quiet -ex run -ex bt -ex quit
Reading symbols from /opt/install/gmp-6.2.1/tune/tuneup...(no debugging 
symbols found)...done.

Starting program: /opt/install/gmp-6.2.1/tune/tuneup
Parameters for ./mpn/x86_64/skylake/gmp-mparam.h
[Detaching after fork from child process 31364]
Using: CPU cycle counter, supplemented by microsecond getrusage()
speed_precision 1, speed_unittime 3.34e-10 secs, CPU freq 2992.97 MHz
DEFAULT_MAX_SIZE 1000, fft_max_size 5

/* Generated by tuneup.c, 2021-10-20, gcc 11.2 */

#define MOD_1_NORM_THRESHOLD 0  /* always */
#define MOD_1_UNNORM_THRESHOLD   0  /* always */
#define MOD_1N_TO_MOD_1_1_THRESHOLD  4
#define MOD_1U_TO_MOD_1_1_THRESHOLD  3
#define MOD_1_1_TO_MOD_1_2_THRESHOLD13
#define MOD_1_2_TO_MOD_1_4_THRESHOLD38
#define PREINV_MOD_1_TO_MOD_1_THRESHOLD  9
#define USE_PREINV_DIVREM_1  1  /* native */
#define DIV_QR_1_NORM_THRESHOLD  1
#define DIV_QR_1_UNNORM_THRESHOLDMP_SIZE_T_MAX  /* never */
#define DIV_QR_2_PI2_THRESHOLD  33
#define DIVEXACT_1_THRESHOLD 0  /* always (native) */
#define BMOD_1_TO_MOD_1_THRESHOLD   20

#define DIV_1_VS_MUL_1_PERCENT 468

#define MUL_TOOM22_THRESHOLD26
#define MUL_TOOM33_THRESHOLD73
#define MUL_TOOM44_THRESHOLD   208
#define MUL_TOOM6H_THRESHOLD   300
#define MUL_TOOM8H_THRESHOLD   406

#define MUL_TOOM32_TO_TOOM43_THRESHOLD  73
#define MUL_TOOM32_TO_TOOM53_THRESHOLD 159
#define MUL_TOOM42_TO_TOOM53_THRESHOLD 137
#define MUL_TOOM42_TO_TOOM63_THRESHOLD 151
#define MUL_TOOM43_TO_TOOM54_THRESHOLD 106

#define SQR_BASECASE_THRESHOLD   0  /* always (native) */
#define SQR_TOOM2_THRESHOLD 32
#define SQR_TOOM3_THRESHOLD114
#define SQR_TOOM4_THRESHOLD176
#define SQR_TOOM6_THRESHOLD446
#define SQR_TOOM8_THRESHOLD547

#define MULMID_TOOM42_THRESHOLD 48

#define MULMOD_BNM1_THRESHOLD   15
#define SQRMOD_BNM1_THRESHOLD   18

#define MUL_FFT_MODF_THRESHOLD 460  /* k = 5 */
#define MUL_FFT_TABLE3  \
  { {460, 5}, { 23, 6}, { 27, 7}, { 15, 6}, \
{ 31, 7}, { 25, 8}, { 13, 7}, { 29, 8}, \
{ 15, 7}, { 33, 8}, { 17, 7}, { 35, 8}, \
{ 19, 7}, { 40, 8}, { 21, 9}, { 11, 8}, \
{ 35, 9}, { 19, 8}, { 41, 9}, { 23, 8}, \
{ 49, 9}, { 27,10}, { 15, 9}, { 39, 8}, \
{ 81, 9}, { 43,10}, { 23, 9}, { 55,11}, \
{ 15,10}, { 31, 9}, { 71,10}, { 39, 9}, \
{ 87,10}, { 47, 9}, {512,10}, {   1024,11}, \
{   2048,12}, {   4096,13}, {   8192,14}, {  16384,15}, \
{  32768,16}, {  65536,17}, { 131072,18}, { 262144,19}, \
{ 524288,20}, {1048576,21}, {2097152,22}, {4194304,23}, \
{8388608,24}