Re: Example in the overview
Steven Schveighoffer wrote: A comment to explain the representation of the array may be good. Well, I did add your explanation to the bugzilla report! Thanks.
Re: Example in the overview
On Fri, 14 May 2010 19:37:58 -0400, Walter Bright wrote: R.Tenton wrote: At the very bottom of http://digitalmars.com/d/2.0/overview.html there is an example implementation of the Eratosthenes' sieve. That code is broken! It counts 1899 prime numbers, while there are only 1028 primes in the interval [1,8191]! Are you sure? What's the mistake in the code? I think the issue is that the expectation is that array index x represents the number x. But it doesn't seem that way. the i + i + 3 is very odd indeed. If we consider each index, it means the first element represents 0 + 0 + 3 = 3; The second element represents 1 + 1 + 3 = 5; The third element represents 2 + 2 + 3 = 7; So it looks like the numbers represented by the array actually go from 3 to (8190 + 8190 + 3) or 16383. According to Wolfram Alpha, the number of primes in that list is 1899 http://www.wolframalpha.com/input/?i=primes+in+3+..+16383 A comment to explain the representation of the array may be good. -Steve
Re: Example in the overview
Walter Bright wrote: bearophile wrote: Walter Bright: Are you sure? What's the mistake in the code? This is the code in the Overview, it prints 1899: http://codepad.org/lzRtggEL This is the code I have suggested in bugzilla, it prints 1027: http://ideone.com/D9ZqQ Wolfram Alpha says they are 1027 (I have left out the last number because Alpha uses a closed interval) http://www.wolframalpha.com/input/?i=primes+in+2+..+8190 It's been printing 1899 primes since the 80's. Just for fun, google [sieve of eratosthenes 1899 primes] I wonder if that has to with there being 1899 primes in 3..16384. http://www.wolframalpha.com/input/?i=primes+in+2+..+16384 An implementation of just that is what I found most of the time, owing back to "A High-Level Language Benchmark" by Jim Gilbreath, in BYTE Magazine, september 1981, pages 180-198. Sadly, I cannot seem to find a version of the original article online. Anyone know of one, that we may get to the bottom of this? -- Simen
Re: Example in the overview
Ary Borenszweig: > A small observation: count +=1 is performed for i = 0, hmmm... It lacks unit tests. A basic unit testing can catch that bug. Recently I have ready a quote: "If it has no unit tests then it's broken". Experience shows me that this is more often true than false. Bye, bearophile
Re: Example in the overview
Walter Bright wrote: bearophile wrote: Walter Bright: Are you sure? What's the mistake in the code? This is the code in the Overview, it prints 1899: http://codepad.org/lzRtggEL This is the code I have suggested in bugzilla, it prints 1027: http://ideone.com/D9ZqQ Wolfram Alpha says they are 1027 (I have left out the last number because Alpha uses a closed interval) http://www.wolframalpha.com/input/?i=primes+in+2+..+8190 It's been printing 1899 primes since the 80's. Just for fun, google [sieve of eratosthenes 1899 primes] A small observation: count +=1 is performed for i = 0, hmmm...
Re: Example in the overview
bearophile wrote: Walter Bright: Are you sure? What's the mistake in the code? This is the code in the Overview, it prints 1899: http://codepad.org/lzRtggEL This is the code I have suggested in bugzilla, it prints 1027: http://ideone.com/D9ZqQ Wolfram Alpha says they are 1027 (I have left out the last number because Alpha uses a closed interval) http://www.wolframalpha.com/input/?i=primes+in+2+..+8190 It's been printing 1899 primes since the 80's. Just for fun, google [sieve of eratosthenes 1899 primes]
Re: Example in the overview
Walter Bright: > Are you sure? What's the mistake in the code? This is the code in the Overview, it prints 1899: http://codepad.org/lzRtggEL This is the code I have suggested in bugzilla, it prints 1027: http://ideone.com/D9ZqQ Wolfram Alpha says they are 1027 (I have left out the last number because Alpha uses a closed interval) http://www.wolframalpha.com/input/?i=primes+in+2+..+8190 Bye, bearophile
Re: Example in the overview
R.Tenton wrote: At the very bottom of http://digitalmars.com/d/2.0/overview.html there is an example implementation of the Eratosthenes' sieve. That code is broken! It counts 1899 prime numbers, while there are only 1028 primes in the interval [1,8191]! Are you sure? What's the mistake in the code? What is the outermost for loop good for? It is a translation of the C benchmark that was common in the 1980's. The outermost loop is to make it take longer so the test can be timed better. The original: http://home.iae.nl/users/mhx/nsieve.c The more common later version: http://www.paxcompiler.com/js_sieve.htm
Re: Confusing behavior involving array operations.
On 15/05/10 00:14, Pillsy wrote: I have the following program on Mac OS X 10.6.3 running dmd version 2.043: $ cat if.d import std.stdio; void main (string [] args) { foreach(arg; args) writeln(arg); auto vec = new double[10]; foreach(i, ref x; vec) { x = cast(double) i; } if (args.length> 1&& args[1] == "bar") { vec[] *= 0.5; } } This program compiles with no warnings. However, when I run it, I get a bus error regardless of the value of the first command-line argument, or whether it's present at all. I also get bus error if I eliminate the conditional altogether. AFAICT, the array operation I'm doing on vec is legitimate according to http://www.digitalmars.com/d/2.0/arrays.html and the fact that the program bombs out regardless of the outcome of the test in the conditional is particularly mystifying. Any help would be greatly appreciated. Thanks, Pillsy I'm using dmd 2.045 on linux here, and that example works fine, so I'd guess it's an OS X specific bug if you can't get it working with newer versions of dmd... There were some issues to do with array operations that have been fixed recently, so I'd try a newer dmd. If that doesn't fix it report a bug.
Re: compiled gdb
On 15/05/10 00:14, eles wrote: hello, since gnu debugger (gdb) is free and now there is a version which works properly with D, could someone host (and made available for download) *compiled* versions for windows and linux? of course, patched versions (so that it would work with D). maybe on dsource? thanks (I am not good when it comes to dependencies and so on). eles While this is possible I'm not sure how would be best to go about this, as the official patches in gdb are currently buggy it seems, and there isn't an exact patch for 7.1. I agree there should be an easy way to get binaries until 7.2 is released though.
compiled gdb
hello, since gnu debugger (gdb) is free and now there is a version which works properly with D, could someone host (and made available for download) *compiled* versions for windows and linux? of course, patched versions (so that it would work with D). maybe on dsource? thanks (I am not good when it comes to dependencies and so on). eles
Confusing behavior involving array operations.
I have the following program on Mac OS X 10.6.3 running dmd version 2.043: $ cat if.d import std.stdio; void main (string [] args) { foreach(arg; args) writeln(arg); auto vec = new double[10]; foreach(i, ref x; vec) { x = cast(double) i; } if (args.length > 1 && args[1] == "bar") { vec[] *= 0.5; } } This program compiles with no warnings. However, when I run it, I get a bus error regardless of the value of the first command-line argument, or whether it's present at all. I also get bus error if I eliminate the conditional altogether. AFAICT, the array operation I'm doing on vec is legitimate according to http://www.digitalmars.com/d/2.0/arrays.html and the fact that the program bombs out regardless of the outcome of the test in the conditional is particularly mystifying. Any help would be greatly appreciated. Thanks, Pillsy
Re: Loop optimization
bearophile wrote: > kai: > >> I was scared off by the warning that D 2.0 support is experimental. > > LDC is D1 still, mostly :-( > And at the moment it uses LLVM 2.6. > LLVM 2.7 contains a new optimization that can improve that code some more. > > >> Good to know, thanks (thats actually a great feature for scientists!). > > In theory D is a bit fit for numerical computations too, but there is lot of > work to do still. And some parts of D design will need to be improved to help > numerical code performance. > > From my extensive tests, if you use it correctly, D1 code compiled with LDC > can be about as efficient as C code compiled with GCC or sometimes a little > more efficient. > > - > > Steven Schveighoffer: >> In C/C++, the default value for doubles is 0. > > I think in C and C++ the default value for doubles is "uninitialized" (that > is anything). > That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type. The default value for local variables and malloc/new memory is "whatever was in this place in memory before" which can be anything. The default value for calloc is to have all bits to 0 as for global variables. In the OP code, the malloc will probably return memory that has never been used before, therefore probably initialized to 0 too (OS dependent). Jerome -- mailto:jeber...@free.fr http://jeberger.free.fr Jabber: jeber...@jabber.fr signature.asc Description: OpenPGP digital signature
Re: Loop optimization
kai: > I was scared off by the warning that D 2.0 support is experimental. LDC is D1 still, mostly :-( And at the moment it uses LLVM 2.6. LLVM 2.7 contains a new optimization that can improve that code some more. > Good to know, thanks (thats actually a great feature for scientists!). In theory D is a bit fit for numerical computations too, but there is lot of work to do still. And some parts of D design will need to be improved to help numerical code performance. >From my extensive tests, if you use it correctly, D1 code compiled with LDC >can be about as efficient as C code compiled with GCC or sometimes a little >more efficient. - Steven Schveighoffer: > In C/C++, the default value for doubles is 0. I think in C and C++ the default value for doubles is "uninitialized" (that is anything). Bye, bearophile
Re: Loop optimization
== Quote from bearophile (bearophileh...@lycos.com)'s article > But the bigger problem in your code is that you are performing operations on NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf? I used it as a null for structs.
Re: Loop optimization
Thanks for the help all! > 2. Can you use vector operations? If the example you gave is > representative of your specific problem, then you can't because you are > adding overlapping parts of the array. But if you are doing operations > on separate arrays, then array operations will be *much* faster. Unfortunately, I don't think I will be able to. The actual code is computing norms of a sequence of points and then updating their values as needed (MLE smoothing/prediction). > For that evaluation you probably have to use the LDC compiler, that is > able to optimize better. I was scared off by the warning that D 2.0 support is experimental. I realize D 2 itself is still non-production, but for academic interests industrial-strength isnt all that important if it usually works :). > Using floating point for indexes and lengths is not a good practice. > In D large numbers are written like 1_000_000. Use -release too. Good to know, thanks (thats actually a great feature for scientists!). > DMD compiler doesn't perform many optimizations, especially on floating > point computations. But the bigger problem in your code is that you are > performing operations on NaNs (that's the default initalization of FP > values in D), and operations on NaNs are usually quite slower. > in D, the default value for doubles is nan, so you are adding countless > scores of nan's which is costly for some reason (not a big floating point > guy, so I'm not sure about this). Ah ha, that was it-- serves me right for trying to boil down a test case and failing miserably. I'll head back to my code now and try to find the real problem :-) At some point I removed the initialization data obviously.
Re: Assertion failure: 'fieldi>=0 && fieldi < se->elements->dim' on line 2062 in file 'interpret.c'
or : module main; //const S s = S(.5f); // Uncomment to make it compile struct S { float a; static S opCall( float a_ ) { S s = { a_ }; return s; } const S _s = S( 1f ); } void main(){}
Re: Assertion failure: 'fieldi>=0 && fieldi < se->elements->dim' on line 2062 in file 'interpret.c'
== Quote from bearophile (bearophileh...@lycos.com)'s article > This produces the same errors: > struct Foo { > int bar; > static Foo baz() { > return Foo(); > } > const Foo one = Foo.baz(); > } > void main() {} > Bye, > bearophile And this is why in my program compiled anyways : module main; import s_def; void main(){} module s_def; import e_def; // <---this struct S { float area; static S opCall( float a_ ) { S s = { a_ }; return s; } const S _s = S( 1f ); } module e_def; import s_def; const S e = S(1f); I have import problems quite regularly, but I always fail at tinifying them :)
Re: Loop optimization
On Thu, 13 May 2010 22:38:40 -0400, kai wrote: Hello, I was evaluating using D for some numerical stuff. However I was surprised to find that looping & array indexing was not very speedy compared to alternatives (gcc et al). I was using the DMD2 compiler on mac and windows, with -O -release. Here is a boiled down test case: void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (int i=0;i<1e3;i++) { for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } } Any ideas? Am I somehow not hitting a vital compiler optimization? Thanks for your help. I figured it out. in D, the default value for doubles is nan, so you are adding countless scores of nan's which is costly for some reason (not a big floating point guy, so I'm not sure about this). In C/C++, the default value for doubles is 0. BTW, without any initialization of the array, what are you expecting the code to do? In the C++ version, I suspect you are simply adding a bunch of 0s together. Equivalent D code which first initializes the array to 0s: void main (string[] args) { double [] foo = new double [cast(int)1e6]; foo[] = 0; // probably want to change this to something more meaningful for (int i=0;i
Re: Loop optimization
On Fri, 14 May 2010 07:32:54 -0400, Steven Schveighoffer wrote: > On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad > wrote: > >> On Fri, 14 May 2010 02:38:40 +, kai wrote: > > >>> I was using the DMD2 compiler on >>> mac and windows, with -O -release. >> >> 1. Have you tried the -noboundscheck compiler switch? Unlike C, D >> checks that you do not try to read/write beyond the end of an array, >> but you can turn those checks off with said switch. > > -release implies -noboundscheck (in fact, I did not know there was a > noboundscheck flag, I thought you had to use -release). > > -Steve You are right, just checked it now. But it's strange, I thought the whole point of the -noboundscheck switch was that it would be independent of -release. But perhaps I remember wrongly (or perhaps Walter just hasn't gotten around to it yet). Anyway, sorry for the misinformation. -Lars
Re: Loop optimization
On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad wrote: On Fri, 14 May 2010 02:38:40 +, kai wrote: I was using the DMD2 compiler on mac and windows, with -O -release. 1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch. -release implies -noboundscheck (in fact, I did not know there was a noboundscheck flag, I thought you had to use -release). -Steve
Re: Loop optimization
kai: > I was evaluating using D for some numerical stuff. For that evaluation you probably have to use the LDC compiler, that is able to optimize better. > void main (string[] args) > { > double [] foo = new double [cast(int)1e6]; > for (int i=0;i<1e3;i++) > { > for (int j=0;j<1e6-1;j++) > { > foo[j]=foo[j]+foo[j+1]; > } > } > } Using floating point for indexes and lengths is not a good practice. In D large numbers are written like 1_000_000. Use -release too. > Any ideas? Am I somehow not hitting a vital compiler optimization? DMD compiler doesn't perform many optimizations, especially on floating point computations. But the bigger problem in your code is that you are performing operations on NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. Your code in C: #include "stdio.h" #include "stdlib.h" #define N 100 int main() { double *foo = calloc(N, sizeof(double)); // malloc suffices here int i, j; for (j = 0; j < N; j++) foo[j] = 1.0; for (i = 0; i < 1000; i++) for (j = 0; j < N-1; j++) foo[j] = foo[j] + foo[j + 1]; printf("%f", foo[N-1]); return 0; } /* gcc -O3 -s -Wall test.c -o test Timings, outer loop=1_000 times: 7.72 -- gcc -Wall -O3 -fomit-frame-pointer -msse3 -march=native test.c -o test (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.69 s Just the inner loop: .L7: fldl8(%edx) fadd%st, %st(1) fxch%st(1) fstpl (%edx) addl$8, %edx cmpl%ecx, %edx jne .L7 */ Your code in D1: version (Tango) import tango.stdc.stdio: printf; else import std.c.stdio: printf; void main() { const int N = 1_000_000; double[] foo = new double[N]; foo[] = 1.0; for (int i = 0; i < 1_000; i++) for (int j = 0; j < N-1; j++) foo[j] = foo[j] + foo[j + 1]; printf("%f", foo[N-1]); } /* dmd -O -release -inline test.d (Not running on a VirtualBox) Timings, outer loop=1_000 times: 9.35 s Just the inner loop: L34:fld qword ptr 8[EDX*8][ECX] faddqword ptr [EDX*8][ECX] fstpqword ptr [EDX*8][ECX] inc EDX cmp EDX,0F423Fh jb L34 --- ldc -O3 -release -inline test.d (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.87 s Just the inner loop: .LBB1_2: movsd (%eax,%ecx,8), %xmm0 addsd 8(%eax,%ecx,8), %xmm0 movsd %xmm0, (%eax,%ecx,8) incl%ecx cmpl$99, %ecx jne .LBB1_2 --- ldc -unroll-allow-partial -O3 -release -inline test.d (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.75 s Just the inner loop: .LBB1_2: movsd (%eax,%ecx,8), %xmm0 addsd 8(%eax,%ecx,8), %xmm0 movsd %xmm0, (%eax,%ecx,8) movsd 8(%eax,%ecx,8), %xmm0 addsd 16(%eax,%ecx,8), %xmm0 movsd %xmm0, 8(%eax,%ecx,8) movsd 16(%eax,%ecx,8), %xmm0 addsd 24(%eax,%ecx,8), %xmm0 movsd %xmm0, 16(%eax,%ecx,8) movsd 24(%eax,%ecx,8), %xmm0 addsd 32(%eax,%ecx,8), %xmm0 movsd %xmm0, 24(%eax,%ecx,8) movsd 32(%eax,%ecx,8), %xmm0 addsd 40(%eax,%ecx,8), %xmm0 movsd %xmm0, 32(%eax,%ecx,8) movsd 40(%eax,%ecx,8), %xmm0 addsd 48(%eax,%ecx,8), %xmm0 movsd %xmm0, 40(%eax,%ecx,8) movsd 48(%eax,%ecx,8), %xmm0 addsd 56(%eax,%ecx,8), %xmm0 movsd %xmm0, 48(%eax,%ecx,8) movsd 56(%eax,%ecx,8), %xmm0 addsd 64(%eax,%ecx,8), %xmm0 movsd %xmm0, 56(%eax,%ecx,8) movsd 64(%eax,%ecx,8), %xmm0 addsd 72(%eax,%ecx,8), %xmm0 movsd %xmm0, 64(%eax,%ecx,8) addl$9, %ecx cmpl$99, %ecx jne .LBB1_2 */ As you see the code generated by ldc is about as good as the one generated by gcc. There are of course other ways to optimize this code... Bye, bearophile