Re: 1.3.10 memcmp() bug
On Thursday 25 April 2002 00:22, Gareth Pearce wrote: > >On Tuesday 23 April 2002 23:41, Sami Korhonen wrote: > > > On Tue, 23 Apr 2002, Tim Prince wrote: > > >AFAICT there's no reason this should behave differently on linux or > > cygwin. You're comparing the speed of memcmp() against the speed of > > comparing ints in > >a loop. When you don't ask the compiler to in-line memcmp(), you get a > >library function which is written with enough smarts to compare 4 bytes at > >a > >time. Various versions of gcc are interpreting the instruction to use > >"optimized" in-line code as a rep cmpsb, which is slower than the newlib > >memcmp() function, even on my P-III. > >P4's, particularly early versions, are notorious for various performance > >glitches when using rep cmpsb on long strings. gcc isn't smart enough to > >look at the lengths of your strings and second guess your instruction to > > do that, nor does it have a crystal ball to second guess your instruction > > to generate 486 code, even if you were running a version with P4 > >optimizations. > >In time critical applications, it can be quite important to learn the > >particular tricks of your compiler and when to choose a separately > > compiled string function, or when to ask for in-line, as well as to > > acquire a library > >of such functions built for the processor of your choice. On the P4, you > >would have available 64-bit integer comparisons if you chose to use them > > to speed this up. > >-- > > gcc 3.1+ are supposed to be 'more' intelligent about such things - > althought they arent brilliant. > > Regards, > Gareth At least with -march=pentium3, gcc-3.1 has the same problem of not knowing not to do what is asked. -- Tim Prince -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: 1.3.10 memcmp() bug
>On Tuesday 23 April 2002 23:41, Sami Korhonen wrote: > > On Tue, 23 Apr 2002, Tim Prince wrote: > > > On Tuesday 23 April 2002 22:04, Sami Korhonen wrote: > > > > I wasnt sure wheter I should post about this on gcc bug report list >or > > > > here. Anyways, it seems that using -O2 flag with gcc causes huge > > > > slowdown in memcmp(). However i dont see performance drop under >linux, > > > > so I suppose it is cygwin issue. > > > > > > > > $ gcc memtest.c -O2 -o memtest ; ./memtest.exe > > > > Amount of memory to scan (mbytes)? 100 > > > > Memory block size (default 1024)? 1024 > > > > Allocating memory > > > > Testing memory - read (1 byte at time) > > > > Complete: 889.73MB/sec > > > > Testing memory - read (4 bytes at time) > > > > Complete: 3313.07MB/sec > > > > Freeing memory > > > > > > > > $ gcc memtest.c -o memtest ; ./memtest.exe > > > > Amount of memory to scan (mbytes)? 100 > > > > Memory block size (default 1024)? 1024 > > > > Allocating memory > > > > Testing memory - read (1 byte at time) > > > > Complete: 2517.94MB/sec > > > > Testing memory - read (4 bytes at time) > > > > Complete: 2933.50MB/sec > > > > Freeing memory > > > > > > > > > > > > '1 byte at time' is using memcmp() to compare two blocks. > > > > > > You leave so many relevant considerations unspecified, that anything I > > > say must be a stab in the dark. I assume you have a standard cygwin > > > installation, where binutils is built to honor only 4-byte alignments, > > > while recent linux configurations provide for 16-byte alignments. The > > > significance of that is different on various CPU families, with code > > > alignment being quite important on certain CPU's, and data alignment >on > > > others. Do we assume that you are running on a 486, since you have >not > > > told gcc otherwise? You may have fallen accidentally into good >alignment > > > in one case and bad in the other. You might or might not be using > > > similar versions of gcc in cygwin and linux. If you would provide a >test > > > case, and mention some hardware parameters, some of the mystery could >be > > > eliminated; for example, we could find out whether memcmp() is code > > > generated by gcc or from a library. cygwin is not generally >considered > > > an important target for performance optimization, as you can see from >the > > > alignment considerations and the differences in the libraries. > > > -- > > > Tim Prince > > > > Sorry that I wasnt specific enough with my system configuration. I'm > > running standard installation of cygwin on x86 (P4) and WinXP. Both > > test were run under same setup, only difference was the use of -O2 flag. >I > > find it odd, that performance differnece is that huge. Source is >available > > at: http://kotisivu.raketti.net/darkone/memtest/memtest.c >AFAICT there's no reason this should behave differently on linux or cygwin. >You're comparing the speed of memcmp() against the speed of comparing ints >in >a loop. When you don't ask the compiler to in-line memcmp(), you get a >library function which is written with enough smarts to compare 4 bytes at >a >time. Various versions of gcc are interpreting the instruction to use >"optimized" in-line code as a rep cmpsb, which is slower than the newlib >memcmp() function, even on my P-III. >P4's, particularly early versions, are notorious for various performance >glitches when using rep cmpsb on long strings. gcc isn't smart enough to >look at the lengths of your strings and second guess your instruction to do >that, nor does it have a crystal ball to second guess your instruction to >generate 486 code, even if you were running a version with P4 >optimizations. >In time critical applications, it can be quite important to learn the >particular tricks of your compiler and when to choose a separately compiled >string function, or when to ask for in-line, as well as to acquire a >library >of such functions built for the processor of your choice. On the P4, you >would have available 64-bit integer comparisons if you chose to use them to >speed this up. >-- gcc 3.1+ are supposed to be 'more' intelligent about such things - althought they arent brilliant. Regards, Gareth _ Send and receive Hotmail on your mobile device: http://mobile.msn.com -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: 1.3.10 memcmp() bug
On Tuesday 23 April 2002 23:41, Sami Korhonen wrote: > On Tue, 23 Apr 2002, Tim Prince wrote: > > On Tuesday 23 April 2002 22:04, Sami Korhonen wrote: > > > I wasnt sure wheter I should post about this on gcc bug report list or > > > here. Anyways, it seems that using -O2 flag with gcc causes huge > > > slowdown in memcmp(). However i dont see performance drop under linux, > > > so I suppose it is cygwin issue. > > > > > > $ gcc memtest.c -O2 -o memtest ; ./memtest.exe > > > Amount of memory to scan (mbytes)? 100 > > > Memory block size (default 1024)? 1024 > > > Allocating memory > > > Testing memory - read (1 byte at time) > > > Complete: 889.73MB/sec > > > Testing memory - read (4 bytes at time) > > > Complete: 3313.07MB/sec > > > Freeing memory > > > > > > $ gcc memtest.c -o memtest ; ./memtest.exe > > > Amount of memory to scan (mbytes)? 100 > > > Memory block size (default 1024)? 1024 > > > Allocating memory > > > Testing memory - read (1 byte at time) > > > Complete: 2517.94MB/sec > > > Testing memory - read (4 bytes at time) > > > Complete: 2933.50MB/sec > > > Freeing memory > > > > > > > > > '1 byte at time' is using memcmp() to compare two blocks. > > > > You leave so many relevant considerations unspecified, that anything I > > say must be a stab in the dark. I assume you have a standard cygwin > > installation, where binutils is built to honor only 4-byte alignments, > > while recent linux configurations provide for 16-byte alignments. The > > significance of that is different on various CPU families, with code > > alignment being quite important on certain CPU's, and data alignment on > > others. Do we assume that you are running on a 486, since you have not > > told gcc otherwise? You may have fallen accidentally into good alignment > > in one case and bad in the other. You might or might not be using > > similar versions of gcc in cygwin and linux. If you would provide a test > > case, and mention some hardware parameters, some of the mystery could be > > eliminated; for example, we could find out whether memcmp() is code > > generated by gcc or from a library. cygwin is not generally considered > > an important target for performance optimization, as you can see from the > > alignment considerations and the differences in the libraries. > > -- > > Tim Prince > > Sorry that I wasnt specific enough with my system configuration. I'm > running standard installation of cygwin on x86 (P4) and WinXP. Both > test were run under same setup, only difference was the use of -O2 flag. I > find it odd, that performance differnece is that huge. Source is available > at: http://kotisivu.raketti.net/darkone/memtest/memtest.c AFAICT there's no reason this should behave differently on linux or cygwin. You're comparing the speed of memcmp() against the speed of comparing ints in a loop. When you don't ask the compiler to in-line memcmp(), you get a library function which is written with enough smarts to compare 4 bytes at a time. Various versions of gcc are interpreting the instruction to use "optimized" in-line code as a rep cmpsb, which is slower than the newlib memcmp() function, even on my P-III. P4's, particularly early versions, are notorious for various performance glitches when using rep cmpsb on long strings. gcc isn't smart enough to look at the lengths of your strings and second guess your instruction to do that, nor does it have a crystal ball to second guess your instruction to generate 486 code, even if you were running a version with P4 optimizations. In time critical applications, it can be quite important to learn the particular tricks of your compiler and when to choose a separately compiled string function, or when to ask for in-line, as well as to acquire a library of such functions built for the processor of your choice. On the P4, you would have available 64-bit integer comparisons if you chose to use them to speed this up. -- Tim Prince -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: 1.3.10 memcmp() bug
> > I wasnt sure wheter I should post about this on gcc bug report list or > > here. Anyways, it seems that using -O2 flag with gcc causes huge >slowdown > > in memcmp(). However i dont see performance drop under linux, so I >suppose > > it is cygwin issue. cygwin's gcc version may be using an outdated x86 'optimization' for memcmp. VC++ has a similar problem, see this: http://groups.google.com/groups?hl=en&threadm=ucZKhyE3BHA.1464%40tkmsftngp02&rnum=3&prev=/groups%3Fq%3Drep%2Bgroup:microsoft.public.dotnet.languages.vc%26hl%3Den%26selm%3DucZKhyE3BHA.1464%2540tkmsftngp02%26rnum%3D3 _ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: 1.3.10 memcmp() bug
On Tue, 23 Apr 2002, Tim Prince wrote: > On Tuesday 23 April 2002 22:04, Sami Korhonen wrote: > > I wasnt sure wheter I should post about this on gcc bug report list or > > here. Anyways, it seems that using -O2 flag with gcc causes huge slowdown > > in memcmp(). However i dont see performance drop under linux, so I suppose > > it is cygwin issue. > > > > $ gcc memtest.c -O2 -o memtest ; ./memtest.exe > > Amount of memory to scan (mbytes)? 100 > > Memory block size (default 1024)? 1024 > > Allocating memory > > Testing memory - read (1 byte at time) > > Complete: 889.73MB/sec > > Testing memory - read (4 bytes at time) > > Complete: 3313.07MB/sec > > Freeing memory > > > > $ gcc memtest.c -o memtest ; ./memtest.exe > > Amount of memory to scan (mbytes)? 100 > > Memory block size (default 1024)? 1024 > > Allocating memory > > Testing memory - read (1 byte at time) > > Complete: 2517.94MB/sec > > Testing memory - read (4 bytes at time) > > Complete: 2933.50MB/sec > > Freeing memory > > > > > > '1 byte at time' is using memcmp() to compare two blocks. > You leave so many relevant considerations unspecified, that anything I say > must be a stab in the dark. I assume you have a standard cygwin > installation, where binutils is built to honor only 4-byte alignments, while > recent linux configurations provide for 16-byte alignments. The significance > of that is different on various CPU families, with code alignment being quite > important on certain CPU's, and data alignment on others. Do we assume that > you are running on a 486, since you have not told gcc otherwise? You may > have fallen accidentally into good alignment in one case and bad in the > other. You might or might not be using similar versions of gcc in cygwin and > linux. If you would provide a test case, and mention some hardware > parameters, some of the mystery could be eliminated; for example, we could > find out whether memcmp() is code generated by gcc or from a library. cygwin > is not generally considered an important target for performance optimization, > as you can see from the alignment considerations and the differences in the > libraries. > -- > Tim Prince > Sorry that I wasnt specific enough with my system configuration. I'm running standard installation of cygwin on x86 (P4) and WinXP. Both test were run under same setup, only difference was the use of -O2 flag. I find it odd, that performance differnece is that huge. Source is available at: http://kotisivu.raketti.net/darkone/memtest/memtest.c -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: 1.3.10 memcmp() bug
On Tuesday 23 April 2002 22:04, Sami Korhonen wrote: > I wasnt sure wheter I should post about this on gcc bug report list or > here. Anyways, it seems that using -O2 flag with gcc causes huge slowdown > in memcmp(). However i dont see performance drop under linux, so I suppose > it is cygwin issue. > > $ gcc memtest.c -O2 -o memtest ; ./memtest.exe > Amount of memory to scan (mbytes)? 100 > Memory block size (default 1024)? 1024 > Allocating memory > Testing memory - read (1 byte at time) > Complete: 889.73MB/sec > Testing memory - read (4 bytes at time) > Complete: 3313.07MB/sec > Freeing memory > > $ gcc memtest.c -o memtest ; ./memtest.exe > Amount of memory to scan (mbytes)? 100 > Memory block size (default 1024)? 1024 > Allocating memory > Testing memory - read (1 byte at time) > Complete: 2517.94MB/sec > Testing memory - read (4 bytes at time) > Complete: 2933.50MB/sec > Freeing memory > > > '1 byte at time' is using memcmp() to compare two blocks. You leave so many relevant considerations unspecified, that anything I say must be a stab in the dark. I assume you have a standard cygwin installation, where binutils is built to honor only 4-byte alignments, while recent linux configurations provide for 16-byte alignments. The significance of that is different on various CPU families, with code alignment being quite important on certain CPU's, and data alignment on others. Do we assume that you are running on a 486, since you have not told gcc otherwise? You may have fallen accidentally into good alignment in one case and bad in the other. You might or might not be using similar versions of gcc in cygwin and linux. If you would provide a test case, and mention some hardware parameters, some of the mystery could be eliminated; for example, we could find out whether memcmp() is code generated by gcc or from a library. cygwin is not generally considered an important target for performance optimization, as you can see from the alignment considerations and the differences in the libraries. -- Tim Prince -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
1.3.10 memcmp() bug
I wasnt sure wheter I should post about this on gcc bug report list or here. Anyways, it seems that using -O2 flag with gcc causes huge slowdown in memcmp(). However i dont see performance drop under linux, so I suppose it is cygwin issue. $ gcc memtest.c -O2 -o memtest ; ./memtest.exe Amount of memory to scan (mbytes)? 100 Memory block size (default 1024)? 1024 Allocating memory Testing memory - read (1 byte at time) Complete: 889.73MB/sec Testing memory - read (4 bytes at time) Complete: 3313.07MB/sec Freeing memory $ gcc memtest.c -o memtest ; ./memtest.exe Amount of memory to scan (mbytes)? 100 Memory block size (default 1024)? 1024 Allocating memory Testing memory - read (1 byte at time) Complete: 2517.94MB/sec Testing memory - read (4 bytes at time) Complete: 2933.50MB/sec Freeing memory '1 byte at time' is using memcmp() to compare two blocks. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/