Bug#572746: marked as done (libm: sinf/cosf performance is awful on amd64)

Debian Bug Tracking System Thu, 17 Dec 2015 10:40:08 -0800

Your message dated Thu, 17 Dec 2015 19:35:14 +0100
with message-id <20151217183514.ga22...@aurel32.net>
and subject line Re: Bug#572746: libm: sinf/cosf performance is awful on amd64
has caused the Debian Bug report #572746,
regarding libm: sinf/cosf performance is awful on amd64
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
572746: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572746
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

--- Begin Message ---

Package: libc6
Version: 2.10.2-6
Severity: normal

Hi,

After many tests and research I've come to the conclusion that the float 
variants 
of
sin/cos (and maybe others) are anormaly slow Debian amd64.
The performance loss is really impressive (around 8 to 9 times slower).
I've attached the prog used to make my experiments and used it in the following 
cases.

+ Lenny-amd64: sinf/cosf is really slow
+ Lenny-i386: float performance is ok (faster than the cos/sin using double)
+ Sid-amd64: sinf/cosf slow
+ Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is OK.

+ OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, 
the tests run fine !
=> The problem is not compiler related.

There seems to be a problem with the way libm is compiled for the amd64 
architecture on Debian.
This is why the OpenSuse test was run: the problem is somewhere in the compile 
chain or debian specific patches.

We're extensively using these for calculations and this is a real problem. 
Using 
cos/sin as
a temporary workaround would do the trick but this is still slower than the 
sinf/cosf 
implementations that works so well on 32 bits computers...

Thank you

Jerome

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-trunk-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.utf8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libc-bin                      2.10.2-6   Embedded GNU C Library: Binaries
ii  libgcc1                       1:4.4.3-3  GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0]         1.5.28     Debian configuration management sy
pn  glibc-doc                     <none>     (no description available)
ii  locales                       2.10.2-6   Embedded GNU C Library: National L

-- debconf information excluded

CC=gcc
CFLAGS=-DNDEBUG -O3 -D_ISOC99_SOURCE -Wall -Wextra
LDFLAGS=-lm

all: test_trig

clean:
	rm test_trig

test_trig: test_trig.c

#include <math.h>
#include <sys/time.h>
#include <stdio.h>




int main(void) 
{
  const int nbElement_i = 10000000;
  int i=0;
  float f1=0.0f, f2=0.0f, f3=0.0f;
  
  struct timeval tv1, tv2; 

  printf("Testing %d sinf and cosf... ", nbElement_i);
  fflush(stdout);
  
  gettimeofday(&tv1, NULL);

  for(i=0; i<nbElement_i; i++){
    f1 += cosf(i); 
    f2 += sinf(i);
  }

  // This is needed for gcc to know a and b results
  // really matters, otherwise sin and cos could
  // be ignored.
  f3 = f1+f2; 

  gettimeofday(&tv2, NULL);

  //
  printf("Result: %f, Duration: %ld sec %ld usec\n", f3, tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec);

  f1 = 0.0f; f2 = 0.0f;
  printf("Testing %d sin and cos (with float args)... ", nbElement_i);
  fflush(stdout);
  
  gettimeofday(&tv1, NULL);

  for(i=0; i<nbElement_i; i++){
    f1 += cos(i); 
    f2 += sin(i);
  }

  // This is needed for gcc to know a and b results
  // really matters, otherwise sin and cos could
  // be ignored.
  f3 = f1+f2; 

  gettimeofday(&tv2, NULL);

  //
  printf("Result: %f, Duration: %ld sec %ld usec\n", f3, tv2.tv_sec - tv1.tv_sec, tv2.tv_usec - tv1.tv_usec);
  
  return 0;
}

--- End Message ---

--- Begin Message ---

Version: 2.17-1

On 2010-03-06 11:42, Jerome Vizcaino wrote:
> Package: libc6
> Version: 2.10.2-6
> Severity: normal
> 
> Hi,
> 
> After many tests and research I've come to the conclusion that the float 
> variants 
> of
> sin/cos (and maybe others) are anormaly slow Debian amd64.
> The performance loss is really impressive (around 8 to 9 times slower).
> I've attached the prog used to make my experiments and used it in the 
> following 
> cases.
> 
> + Lenny-amd64: sinf/cosf is really slow
> + Lenny-i386: float performance is ok (faster than the cos/sin using double)
> + Sid-amd64: sinf/cosf slow
> + Lenny-amd64 using lenny-i386 binary and 32bits libs: float performance is 
> OK.
> 
> + OpenSuse 64 bits (10.3 and 11.1): using the binary compiled on lenny-amd64, 
> the tests run fine !
> => The problem is not compiler related.
> 
> There seems to be a problem with the way libm is compiled for the amd64 
> architecture on Debian.
> This is why the OpenSuse test was run: the problem is somewhere in the 
> compile 
> chain or debian specific patches.
> 
> We're extensively using these for calculations and this is a real problem. 
> Using 
> cos/sin as
> a temporary workaround would do the trick but this is still slower than the 
> sinf/cosf 
> implementations that works so well on 32 bits computers...

SSE2 based sinf/cosf optimized routines have been added in version
2.17-1, fixing the performance and precision issue. I am therefore
closing this bug.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurel...@aurel32.net                 http://www.aurel32.net

--- End Message ---

Bug#572746: marked as done (libm: sinf/cosf performance is awful on amd64)

Reply via email to