Re: 1.3.10 memcmp() bug

2002-04-25 Thread Gareth Pearce


On Tuesday 23 April 2002 23:41, Sami Korhonen wrote:
  On Tue, 23 Apr 2002, Tim Prince wrote:
   On Tuesday 23 April 2002 22:04, Sami Korhonen wrote:
 I wasnt sure wheter I should post about this on gcc bug report list 
or
here. Anyways, it seems that using -O2 flag with gcc causes huge
slowdown in memcmp(). However i dont see performance drop under 
linux,
so I suppose it is cygwin issue.
   
$ gcc memtest.c -O2 -o memtest ; ./memtest.exe
Amount of memory to scan (mbytes)? 100
Memory block size (default 1024)? 1024
Allocating memory
Testing memory - read (1 byte at time)
Complete: 889.73MB/sec
Testing memory - read (4 bytes at time)
Complete: 3313.07MB/sec
Freeing memory
   
$ gcc memtest.c -o memtest ; ./memtest.exe
Amount of memory to scan (mbytes)? 100
Memory block size (default 1024)? 1024
Allocating memory
Testing memory - read (1 byte at time)
Complete: 2517.94MB/sec
Testing memory - read (4 bytes at time)
Complete: 2933.50MB/sec
Freeing memory
   
   
'1 byte at time' is using memcmp() to compare two blocks.
  
   You leave so many relevant considerations unspecified, that anything I
   say must be a stab in the dark.  I assume you have a standard cygwin
   installation, where binutils is built to honor only 4-byte alignments,
   while recent linux configurations provide for 16-byte alignments.  The
   significance of that is different on various CPU families, with code
   alignment being quite important on certain CPU's, and data alignment 
on
   others.  Do we assume that you are running on a 486, since you have 
not
   told gcc otherwise?  You may have fallen accidentally into good 
alignment
   in one case and bad in the other.  You might or might not be using
   similar versions of gcc in cygwin and linux.  If you would provide a 
test
   case, and mention some hardware parameters, some of the mystery could 
be
   eliminated; for example, we could find out whether memcmp() is code
   generated by gcc or from a library.  cygwin is not generally 
considered
   an important target for performance optimization, as you can see from 
the
   alignment considerations and the differences in the libraries.
   --
   Tim Prince
 
   Sorry that I wasnt specific enough with my system configuration. I'm
  running standard installation of cygwin on x86 (P4) and WinXP. Both
  test were run under same setup, only difference was the use of -O2 flag. 
I
  find it odd, that performance differnece is that huge. Source is 
available
  at: http://kotisivu.raketti.net/darkone/memtest/memtest.c
AFAICT there's no reason this should behave differently on linux or cygwin.
You're comparing the speed of memcmp() against the speed of comparing ints 
in
a loop.  When you don't ask the compiler to in-line memcmp(), you get a
library function which is written with enough smarts to compare 4 bytes at 
a
time.   Various versions of gcc are interpreting the instruction to use
optimized in-line code as a rep cmpsb, which is slower than the newlib
memcmp() function, even on my P-III.
P4's, particularly early versions, are notorious for various performance
glitches when using rep cmpsb on long strings.  gcc isn't smart enough to
look at the lengths of your strings and second guess your instruction to do
that, nor does it have a crystal ball to second guess your instruction to
generate 486 code, even if you were running a version with P4 
optimizations.
In time critical applications, it can be quite important to learn the
particular tricks of your compiler and when to choose a separately compiled
string function, or when to ask for in-line, as well as to acquire a 
library
of such functions built for the processor of your choice.   On the P4, you
would have available 64-bit integer comparisons if you chose to use them to
speed this up.
--


gcc 3.1+ are supposed to be 'more' intelligent about such things - althought 
they arent brilliant.

Regards,
Gareth

_
Send and receive Hotmail on your mobile device: http://mobile.msn.com


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/




Re: 1.3.10 memcmp() bug

2002-04-25 Thread Tim Prince

On Thursday 25 April 2002 00:22, Gareth Pearce wrote:
 On Tuesday 23 April 2002 23:41, Sami Korhonen wrote:
   On Tue, 23 Apr 2002, Tim Prince wrote:
 
 AFAICT there's no reason this should behave differently on linux or
  cygwin. You're comparing the speed of memcmp() against the speed of
  comparing ints in
 a loop.  When you don't ask the compiler to in-line memcmp(), you get a
 library function which is written with enough smarts to compare 4 bytes at
 a
 time.   Various versions of gcc are interpreting the instruction to use
 optimized in-line code as a rep cmpsb, which is slower than the newlib
 memcmp() function, even on my P-III.
 P4's, particularly early versions, are notorious for various performance
 glitches when using rep cmpsb on long strings.  gcc isn't smart enough to
 look at the lengths of your strings and second guess your instruction to
  do that, nor does it have a crystal ball to second guess your instruction
  to generate 486 code, even if you were running a version with P4
 optimizations.
 In time critical applications, it can be quite important to learn the
 particular tricks of your compiler and when to choose a separately
  compiled string function, or when to ask for in-line, as well as to
  acquire a library
 of such functions built for the processor of your choice.   On the P4, you
 would have available 64-bit integer comparisons if you chose to use them
  to speed this up.
 --

 gcc 3.1+ are supposed to be 'more' intelligent about such things -
 althought they arent brilliant.

 Regards,
 Gareth

At least with -march=pentium3, gcc-3.1 has the same problem of not knowing 
not to do what is asked.

-- 
Tim Prince

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/




Re: 1.3.10 memcmp() bug

2002-04-24 Thread Sami Korhonen

On Tue, 23 Apr 2002, Tim Prince wrote:

 On Tuesday 23 April 2002 22:04, Sami Korhonen wrote:
   I wasnt sure wheter I should post about this on gcc bug report list or
  here. Anyways, it seems that using -O2 flag with gcc causes huge slowdown
  in memcmp(). However i dont see performance drop under linux, so I suppose
  it is cygwin issue.
 
  $ gcc memtest.c -O2 -o memtest ; ./memtest.exe
  Amount of memory to scan (mbytes)? 100
  Memory block size (default 1024)? 1024
  Allocating memory
  Testing memory - read (1 byte at time)
  Complete: 889.73MB/sec
  Testing memory - read (4 bytes at time)
  Complete: 3313.07MB/sec
  Freeing memory
 
  $ gcc memtest.c -o memtest ; ./memtest.exe
  Amount of memory to scan (mbytes)? 100
  Memory block size (default 1024)? 1024
  Allocating memory
  Testing memory - read (1 byte at time)
  Complete: 2517.94MB/sec
  Testing memory - read (4 bytes at time)
  Complete: 2933.50MB/sec
  Freeing memory
 
 
  '1 byte at time' is using memcmp() to compare two blocks.
 You leave so many relevant considerations unspecified, that anything I say 
 must be a stab in the dark.  I assume you have a standard cygwin 
 installation, where binutils is built to honor only 4-byte alignments, while 
 recent linux configurations provide for 16-byte alignments.  The significance 
 of that is different on various CPU families, with code alignment being quite 
 important on certain CPU's, and data alignment on others.  Do we assume that 
 you are running on a 486, since you have not told gcc otherwise?  You may 
 have fallen accidentally into good alignment in one case and bad in the 
 other.  You might or might not be using similar versions of gcc in cygwin and 
 linux.  If you would provide a test case, and mention some hardware 
 parameters, some of the mystery could be eliminated; for example, we could 
 find out whether memcmp() is code generated by gcc or from a library.  cygwin 
 is not generally considered an important target for performance optimization, 
 as you can see from the alignment considerations and the differences in the 
 libraries.
 -- 
 Tim Prince
 

 Sorry that I wasnt specific enough with my system configuration. I'm
running standard installation of cygwin on x86 (P4) and WinXP. Both
test were run under same setup, only difference was the use of -O2 flag. I
find it odd, that performance differnece is that huge. Source is available
at: http://kotisivu.raketti.net/darkone/memtest/memtest.c


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/




Re: 1.3.10 memcmp() bug

2002-04-24 Thread C. J.

   I wasnt sure wheter I should post about this on gcc bug report list or
  here. Anyways, it seems that using -O2 flag with gcc causes huge 
slowdown
  in memcmp(). However i dont see performance drop under linux, so I 
suppose
  it is cygwin issue.

cygwin's gcc version may be using an outdated x86 'optimization' for memcmp. 
  VC++ has a similar problem, see this:

http://groups.google.com/groups?hl=enthreadm=ucZKhyE3BHA.1464%40tkmsftngp02rnum=3prev=/groups%3Fq%3Drep%2Bgroup:microsoft.public.dotnet.languages.vc%26hl%3Den%26selm%3DucZKhyE3BHA.1464%2540tkmsftngp02%26rnum%3D3


_
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/




Re: 1.3.10 memcmp() bug

2002-04-24 Thread Tim Prince

On Tuesday 23 April 2002 23:41, Sami Korhonen wrote:
 On Tue, 23 Apr 2002, Tim Prince wrote:
  On Tuesday 23 April 2002 22:04, Sami Korhonen wrote:
I wasnt sure wheter I should post about this on gcc bug report list or
   here. Anyways, it seems that using -O2 flag with gcc causes huge
   slowdown in memcmp(). However i dont see performance drop under linux,
   so I suppose it is cygwin issue.
  
   $ gcc memtest.c -O2 -o memtest ; ./memtest.exe
   Amount of memory to scan (mbytes)? 100
   Memory block size (default 1024)? 1024
   Allocating memory
   Testing memory - read (1 byte at time)
   Complete: 889.73MB/sec
   Testing memory - read (4 bytes at time)
   Complete: 3313.07MB/sec
   Freeing memory
  
   $ gcc memtest.c -o memtest ; ./memtest.exe
   Amount of memory to scan (mbytes)? 100
   Memory block size (default 1024)? 1024
   Allocating memory
   Testing memory - read (1 byte at time)
   Complete: 2517.94MB/sec
   Testing memory - read (4 bytes at time)
   Complete: 2933.50MB/sec
   Freeing memory
  
  
   '1 byte at time' is using memcmp() to compare two blocks.
 
  You leave so many relevant considerations unspecified, that anything I
  say must be a stab in the dark.  I assume you have a standard cygwin
  installation, where binutils is built to honor only 4-byte alignments,
  while recent linux configurations provide for 16-byte alignments.  The
  significance of that is different on various CPU families, with code
  alignment being quite important on certain CPU's, and data alignment on
  others.  Do we assume that you are running on a 486, since you have not
  told gcc otherwise?  You may have fallen accidentally into good alignment
  in one case and bad in the other.  You might or might not be using
  similar versions of gcc in cygwin and linux.  If you would provide a test
  case, and mention some hardware parameters, some of the mystery could be
  eliminated; for example, we could find out whether memcmp() is code
  generated by gcc or from a library.  cygwin is not generally considered
  an important target for performance optimization, as you can see from the
  alignment considerations and the differences in the libraries.
  --
  Tim Prince

  Sorry that I wasnt specific enough with my system configuration. I'm
 running standard installation of cygwin on x86 (P4) and WinXP. Both
 test were run under same setup, only difference was the use of -O2 flag. I
 find it odd, that performance differnece is that huge. Source is available
 at: http://kotisivu.raketti.net/darkone/memtest/memtest.c
AFAICT there's no reason this should behave differently on linux or cygwin.  
You're comparing the speed of memcmp() against the speed of comparing ints in 
a loop.  When you don't ask the compiler to in-line memcmp(), you get a 
library function which is written with enough smarts to compare 4 bytes at a 
time.   Various versions of gcc are interpreting the instruction to use 
optimized in-line code as a rep cmpsb, which is slower than the newlib 
memcmp() function, even on my P-III.  
P4's, particularly early versions, are notorious for various performance 
glitches when using rep cmpsb on long strings.  gcc isn't smart enough to 
look at the lengths of your strings and second guess your instruction to do 
that, nor does it have a crystal ball to second guess your instruction to 
generate 486 code, even if you were running a version with P4 optimizations.
In time critical applications, it can be quite important to learn the 
particular tricks of your compiler and when to choose a separately compiled 
string function, or when to ask for in-line, as well as to acquire a library 
of such functions built for the processor of your choice.   On the P4, you 
would have available 64-bit integer comparisons if you chose to use them to 
speed this up.
-- 
Tim Prince

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/




1.3.10 memcmp() bug

2002-04-23 Thread Sami Korhonen

 I wasnt sure wheter I should post about this on gcc bug report list or
here. Anyways, it seems that using -O2 flag with gcc causes huge slowdown
in memcmp(). However i dont see performance drop under linux, so I suppose
it is cygwin issue.

$ gcc memtest.c -O2 -o memtest ; ./memtest.exe
Amount of memory to scan (mbytes)? 100
Memory block size (default 1024)? 1024
Allocating memory
Testing memory - read (1 byte at time)
Complete: 889.73MB/sec
Testing memory - read (4 bytes at time)
Complete: 3313.07MB/sec
Freeing memory

$ gcc memtest.c -o memtest ; ./memtest.exe
Amount of memory to scan (mbytes)? 100
Memory block size (default 1024)? 1024
Allocating memory
Testing memory - read (1 byte at time)
Complete: 2517.94MB/sec
Testing memory - read (4 bytes at time)
Complete: 2933.50MB/sec
Freeing memory


'1 byte at time' is using memcmp() to compare two blocks.


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/




Re: 1.3.10 memcmp() bug

2002-04-23 Thread Tim Prince

On Tuesday 23 April 2002 22:04, Sami Korhonen wrote:
  I wasnt sure wheter I should post about this on gcc bug report list or
 here. Anyways, it seems that using -O2 flag with gcc causes huge slowdown
 in memcmp(). However i dont see performance drop under linux, so I suppose
 it is cygwin issue.

 $ gcc memtest.c -O2 -o memtest ; ./memtest.exe
 Amount of memory to scan (mbytes)? 100
 Memory block size (default 1024)? 1024
 Allocating memory
 Testing memory - read (1 byte at time)
 Complete: 889.73MB/sec
 Testing memory - read (4 bytes at time)
 Complete: 3313.07MB/sec
 Freeing memory

 $ gcc memtest.c -o memtest ; ./memtest.exe
 Amount of memory to scan (mbytes)? 100
 Memory block size (default 1024)? 1024
 Allocating memory
 Testing memory - read (1 byte at time)
 Complete: 2517.94MB/sec
 Testing memory - read (4 bytes at time)
 Complete: 2933.50MB/sec
 Freeing memory


 '1 byte at time' is using memcmp() to compare two blocks.
You leave so many relevant considerations unspecified, that anything I say 
must be a stab in the dark.  I assume you have a standard cygwin 
installation, where binutils is built to honor only 4-byte alignments, while 
recent linux configurations provide for 16-byte alignments.  The significance 
of that is different on various CPU families, with code alignment being quite 
important on certain CPU's, and data alignment on others.  Do we assume that 
you are running on a 486, since you have not told gcc otherwise?  You may 
have fallen accidentally into good alignment in one case and bad in the 
other.  You might or might not be using similar versions of gcc in cygwin and 
linux.  If you would provide a test case, and mention some hardware 
parameters, some of the mystery could be eliminated; for example, we could 
find out whether memcmp() is code generated by gcc or from a library.  cygwin 
is not generally considered an important target for performance optimization, 
as you can see from the alignment considerations and the differences in the 
libraries.
-- 
Tim Prince

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/