Re: Example in the overview

2010-05-14 Thread Walter Bright

Steven Schveighoffer wrote:

A comment to explain the representation of the array may be good.


Well, I did add your explanation to the bugzilla report! Thanks.


Re: Example in the overview

2010-05-14 Thread Steven Schveighoffer
On Fri, 14 May 2010 19:37:58 -0400, Walter Bright  
 wrote:



R.Tenton wrote:

At the very bottom of http://digitalmars.com/d/2.0/overview.html
there is an example implementation of the Eratosthenes' sieve.
That code is broken! It counts 1899 prime numbers, while there are only  
1028

primes in the interval [1,8191]!


Are you sure? What's the mistake in the code?


I think the issue is that the expectation is that array index x represents  
the number x.  But it doesn't seem that way.


the i + i + 3 is very odd indeed.

If we consider each index, it means the first element represents 0 + 0 + 3  
= 3;

The second element represents 1 + 1 + 3 = 5;
The third element represents 2 + 2 + 3 = 7;

So it looks like the numbers represented by the array actually go from 3  
to (8190 + 8190 + 3) or 16383.


According to Wolfram Alpha, the number of primes in that list is 1899

http://www.wolframalpha.com/input/?i=primes+in+3+..+16383

A comment to explain the representation of the array may be good.

-Steve


Re: Example in the overview

2010-05-14 Thread Simen kjaeraas

Walter Bright  wrote:


bearophile wrote:

Walter Bright:

Are you sure? What's the mistake in the code?

 This is the code in the Overview, it prints 1899:
http://codepad.org/lzRtggEL
 This is the code I have suggested in bugzilla, it prints 1027:
http://ideone.com/D9ZqQ
 Wolfram Alpha says they are 1027 (I have left out the last number  
because Alpha uses a closed interval)

http://www.wolframalpha.com/input/?i=primes+in+2+..+8190



It's been printing 1899 primes since the 80's. Just for fun, google  
[sieve of eratosthenes 1899 primes]


I wonder if that has to with there being 1899 primes in 3..16384.
http://www.wolframalpha.com/input/?i=primes+in+2+..+16384

An implementation of just that is what I found most of the time,
owing back to "A High-Level Language Benchmark" by Jim Gilbreath,
in BYTE Magazine, september 1981, pages 180-198.

Sadly, I cannot seem to find a version of the original article
online. Anyone know of one, that we may get to the bottom of this?

--
Simen


Re: Example in the overview

2010-05-14 Thread bearophile
Ary Borenszweig:
> A small observation: count +=1 is performed for i = 0, hmmm...

It lacks unit tests. A basic unit testing can catch that bug.
Recently I have ready a quote: "If it has no unit tests then it's broken". 
Experience shows me that this is more often true than false.

Bye,
bearophile


Re: Example in the overview

2010-05-14 Thread Ary Borenszweig

Walter Bright wrote:

bearophile wrote:

Walter Bright:

Are you sure? What's the mistake in the code?


This is the code in the Overview, it prints 1899:
http://codepad.org/lzRtggEL

This is the code I have suggested in bugzilla, it prints 1027:
http://ideone.com/D9ZqQ

Wolfram Alpha says they are 1027 (I have left out the last number 
because Alpha uses a closed interval)

http://www.wolframalpha.com/input/?i=primes+in+2+..+8190



It's been printing 1899 primes since the 80's. Just for fun, google 
[sieve of eratosthenes 1899 primes]


A small observation: count +=1 is performed for i = 0, hmmm...


Re: Example in the overview

2010-05-14 Thread Walter Bright

bearophile wrote:

Walter Bright:

Are you sure? What's the mistake in the code?


This is the code in the Overview, it prints 1899:
http://codepad.org/lzRtggEL

This is the code I have suggested in bugzilla, it prints 1027:
http://ideone.com/D9ZqQ

Wolfram Alpha says they are 1027 (I have left out the last number because Alpha 
uses a closed interval)
http://www.wolframalpha.com/input/?i=primes+in+2+..+8190



It's been printing 1899 primes since the 80's. Just for fun, google [sieve of 
eratosthenes 1899 primes]


Re: Example in the overview

2010-05-14 Thread bearophile
Walter Bright:
> Are you sure? What's the mistake in the code?

This is the code in the Overview, it prints 1899:
http://codepad.org/lzRtggEL

This is the code I have suggested in bugzilla, it prints 1027:
http://ideone.com/D9ZqQ

Wolfram Alpha says they are 1027 (I have left out the last number because Alpha 
uses a closed interval)
http://www.wolframalpha.com/input/?i=primes+in+2+..+8190

Bye,
bearophile


Re: Example in the overview

2010-05-14 Thread Walter Bright

R.Tenton wrote:

At the very bottom of http://digitalmars.com/d/2.0/overview.html
there is an example implementation of the Eratosthenes' sieve.
That code is broken! It counts 1899 prime numbers, while there are only 1028
primes in the interval [1,8191]!


Are you sure? What's the mistake in the code?


What is the outermost for loop good for?


It is a translation of the C benchmark that was common in the 1980's. The 
outermost loop is to make it take longer so the test can be timed better.


The original:

http://home.iae.nl/users/mhx/nsieve.c

The more common later version:

http://www.paxcompiler.com/js_sieve.htm


Re: Confusing behavior involving array operations.

2010-05-14 Thread Robert Clipsham

On 15/05/10 00:14, Pillsy wrote:

I have the following program on Mac OS X 10.6.3 running dmd version 2.043:

$ cat if.d
import std.stdio;

void main (string [] args)
{
   foreach(arg; args)
 writeln(arg);

   auto vec = new double[10];

   foreach(i, ref x; vec) { x = cast(double) i; }

   if (args.length>  1&&  args[1] == "bar") { vec[] *= 0.5; }
}

This program compiles with no warnings. However, when I run it, I get a bus 
error regardless of the
value of the first command-line argument, or whether it's present at all. I 
also get bus error if I
eliminate the conditional altogether. AFAICT, the array operation I'm doing on 
vec is legitimate
according to

http://www.digitalmars.com/d/2.0/arrays.html

and the fact that the program bombs out regardless of the outcome of the test 
in the conditional is
particularly mystifying.

Any help would be greatly appreciated.

Thanks,
Pillsy


I'm using dmd 2.045 on linux here, and that example works fine, so I'd 
guess it's an OS X specific bug if you can't get it working with newer 
versions of dmd... There were some issues to do with array operations 
that have been fixed recently, so I'd try a newer dmd. If that doesn't 
fix it report a bug.


Re: compiled gdb

2010-05-14 Thread Robert Clipsham

On 15/05/10 00:14, eles wrote:

hello,

since gnu debugger (gdb) is free and now there is a version which works
properly with D, could someone host (and made available for download)
*compiled* versions for windows and linux? of course, patched versions
(so that it would work with D). maybe on dsource?

thanks (I am not good when it comes to dependencies and so on).

eles


While this is possible I'm not sure how would be best to go about this, 
as the official patches in gdb are currently buggy it seems, and there 
isn't an exact patch for 7.1. I agree there should be an easy way to get 
binaries until 7.2 is released though.


compiled gdb

2010-05-14 Thread eles
hello,

since gnu debugger (gdb) is free and now there is a version which works
properly with D, could someone host (and made available for download)
*compiled* versions for windows and linux? of course, patched versions
(so that it would work with D). maybe on dsource?

thanks (I am not good when it comes to dependencies and so on).

eles


Confusing behavior involving array operations.

2010-05-14 Thread Pillsy
I have the following program on Mac OS X 10.6.3 running dmd version 2.043:

$ cat if.d
import std.stdio;

void main (string [] args)
{
  foreach(arg; args)
writeln(arg);

  auto vec = new double[10];

  foreach(i, ref x; vec) { x = cast(double) i; }

  if (args.length > 1 && args[1] == "bar") { vec[] *= 0.5; }
}

This program compiles with no warnings. However, when I run it, I get a bus 
error regardless of the
value of the first command-line argument, or whether it's present at all. I 
also get bus error if I
eliminate the conditional altogether. AFAICT, the array operation I'm doing on 
vec is legitimate
according to

http://www.digitalmars.com/d/2.0/arrays.html

and the fact that the program bombs out regardless of the outcome of the test 
in the conditional is
particularly mystifying.

Any help would be greatly appreciated.

Thanks,
Pillsy


Re: Loop optimization

2010-05-14 Thread Jérôme M. Berger
bearophile wrote:
> kai:
> 
>> I was scared off by the warning that D 2.0 support is experimental.
> 
> LDC is D1 still, mostly :-(
> And at the moment it uses LLVM 2.6.
> LLVM 2.7 contains a new optimization that can improve that code some more.
> 
> 
>> Good to know, thanks (thats actually a great feature for scientists!).
> 
> In theory D is a bit fit for numerical computations too, but there is lot of 
> work to do still. And some parts of D design will need to be improved to help 
> numerical code performance.
> 
> From my extensive tests, if you use it correctly, D1 code compiled with LDC 
> can be about as efficient as C code compiled with GCC or sometimes a little 
> more efficient.
> 
> -
> 
> Steven Schveighoffer:
>> In C/C++, the default value for doubles is 0.
> 
> I think in C and C++ the default value for doubles is "uninitialized" (that 
> is anything).
> 
That depends. In C/C++, the default value for any global variable
is to have all bits set to 0 whatever that means for the actual data
type. The default value for local variables and malloc/new memory is
"whatever was in this place in memory before" which can be anything.
The default value for calloc is to have all bits to 0 as for global
variables.

In the OP code, the malloc will probably return memory that has
never been used before, therefore probably initialized to 0 too (OS
dependent).

Jerome
-- 
mailto:jeber...@free.fr
http://jeberger.free.fr
Jabber: jeber...@jabber.fr



signature.asc
Description: OpenPGP digital signature


Re: Loop optimization

2010-05-14 Thread bearophile
kai:

> I was scared off by the warning that D 2.0 support is experimental.

LDC is D1 still, mostly :-(
And at the moment it uses LLVM 2.6.
LLVM 2.7 contains a new optimization that can improve that code some more.


> Good to know, thanks (thats actually a great feature for scientists!).

In theory D is a bit fit for numerical computations too, but there is lot of 
work to do still. And some parts of D design will need to be improved to help 
numerical code performance.

>From my extensive tests, if you use it correctly, D1 code compiled with LDC 
>can be about as efficient as C code compiled with GCC or sometimes a little 
>more efficient.

-

Steven Schveighoffer:
> In C/C++, the default value for doubles is 0.

I think in C and C++ the default value for doubles is "uninitialized" (that is 
anything).

Bye,
bearophile


Re: Loop optimization

2010-05-14 Thread strtr
== Quote from bearophile (bearophileh...@lycos.com)'s article
> But the bigger problem in your code is that you are performing operations on
NaNs (that's the default initalization of FP values in D), and operations on 
NaNs
are usually quite slower.

I didn't know that. Is it the same for inf?
I used it as a null for structs.



Re: Loop optimization

2010-05-14 Thread kai
Thanks for the help all!

> 2. Can you use vector operations?  If the example you gave is
> representative of your specific problem, then you can't because you are
> adding overlapping parts of the array.  But if you are doing operations
> on separate arrays, then array operations will be *much* faster.

Unfortunately, I don't think I will be able to. The actual code is
computing norms of a sequence of points and then updating their values as
needed (MLE smoothing/prediction).

> For that evaluation you probably have to use the LDC compiler, that is
> able to optimize better.

I was scared off by the warning that D 2.0 support is experimental. I
realize D 2 itself is still non-production, but for academic interests
industrial-strength isnt all that important if it usually works :).

> Using floating point for indexes and lengths is not a good practice.
> In D large numbers are written like 1_000_000. Use -release too.

Good to know, thanks (thats actually a great feature for scientists!).

> DMD compiler doesn't perform many optimizations, especially on floating
> point computations. But the bigger problem in your code is that you are
> performing operations on NaNs (that's the default initalization of FP
> values in D), and operations on NaNs are usually quite slower.

> in D, the default value for doubles is nan, so you are adding countless
> scores of nan's which is costly for some reason (not a big floating point
> guy, so I'm not sure about this).

Ah ha, that was it-- serves me right for trying to boil down a test case and
failing miserably. I'll head back to my code now and try to find the real
problem :-) At some point I removed the initialization data obviously.


Re: Assertion failure: 'fieldi>=0 && fieldi < se->elements->dim' on line 2062 in file 'interpret.c'

2010-05-14 Thread strtr
or :

module main;
//const S s = S(.5f); // Uncomment to make it compile
struct S
{
  float a;
  static S opCall( float a_ )
  {
S s = { a_ };
return s;
  }
  const S _s = S( 1f );
}
void main(){}


Re: Assertion failure: 'fieldi>=0 && fieldi < se->elements->dim' on line 2062 in file 'interpret.c'

2010-05-14 Thread strtr
== Quote from bearophile (bearophileh...@lycos.com)'s article
> This produces the same errors:
> struct Foo {
> int bar;
> static Foo baz() {
> return Foo();
> }
> const Foo one = Foo.baz();
> }
> void main() {}
> Bye,
> bearophile

And this is why in my program compiled anyways :


module main;

import s_def;

void main(){}

module s_def;

import e_def; // <---this

struct S
{
  float area;
  static S opCall( float a_ )
  {
S s = { a_ };
return s;
  }
  const S _s = S( 1f );
}

module e_def;

import s_def;

const S e = S(1f);


I have import problems quite regularly, but I always fail at tinifying them :)





Re: Loop optimization

2010-05-14 Thread Steven Schveighoffer

On Thu, 13 May 2010 22:38:40 -0400, kai  wrote:


Hello,

I was evaluating using D for some numerical stuff. However I was  
surprised to

find that looping & array indexing was not very speedy compared to
alternatives (gcc et al). I was using the DMD2 compiler on mac and  
windows,

with -O -release. Here is a boiled down test case:

void main (string[] args)
{
double [] foo = new double [cast(int)1e6];
for (int i=0;i<1e3;i++)
{
for (int j=0;j<1e6-1;j++)
{
foo[j]=foo[j]+foo[j+1];
}
}
}

Any ideas? Am I somehow not hitting a vital compiler optimization?  
Thanks for

your help.


I figured it out.

in D, the default value for doubles is nan, so you are adding countless  
scores of nan's which is costly for some reason (not a big floating point  
guy, so I'm not sure about this).


In C/C++, the default value for doubles is 0.

BTW, without any initialization of the array, what are you expecting the  
code to do?  In the C++ version, I suspect you are simply adding a bunch  
of 0s together.


Equivalent D code which first initializes the array to 0s:

void main (string[] args)
{
double [] foo = new double [cast(int)1e6];
foo[] = 0; // probably want to change this to something more meaningful
for (int i=0;i

Re: Loop optimization

2010-05-14 Thread Lars T. Kyllingstad
On Fri, 14 May 2010 07:32:54 -0400, Steven Schveighoffer wrote:

> On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad
>  wrote:
> 
>> On Fri, 14 May 2010 02:38:40 +, kai wrote:
> 
> 
>>> I was using the DMD2 compiler on
>>> mac and windows, with -O -release.
>>
>> 1. Have you tried the -noboundscheck compiler switch?  Unlike C, D
>> checks that you do not try to read/write beyond the end of an array,
>> but you can turn those checks off with said switch.
> 
> -release implies -noboundscheck (in fact, I did not know there was a
> noboundscheck flag, I thought you had to use -release).
> 
> -Steve


You are right, just checked it now.  But it's strange, I thought the 
whole point of the -noboundscheck switch was that it would be independent 
of -release.  But perhaps I remember wrongly (or perhaps Walter just 
hasn't gotten around to it yet).

Anyway, sorry for the misinformation.

-Lars


Re: Loop optimization

2010-05-14 Thread Steven Schveighoffer
On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad  
 wrote:



On Fri, 14 May 2010 02:38:40 +, kai wrote:




I was using the DMD2 compiler on
mac and windows, with -O -release.


1. Have you tried the -noboundscheck compiler switch?  Unlike C, D checks
that you do not try to read/write beyond the end of an array, but you can
turn those checks off with said switch.


-release implies -noboundscheck (in fact, I did not know there was a  
noboundscheck flag, I thought you had to use -release).


-Steve


Re: Loop optimization

2010-05-14 Thread bearophile
kai:

> I was evaluating using D for some numerical stuff.

For that evaluation you probably have to use the LDC compiler, that is able to 
optimize better.


>   void main (string[] args)
>   {
>   double [] foo = new double [cast(int)1e6];
>   for (int i=0;i<1e3;i++)
>   {
>   for (int j=0;j<1e6-1;j++)
>   {
>   foo[j]=foo[j]+foo[j+1];
>   }
>   }
>   }

Using floating point for indexes and lengths is not a good practice. In D large 
numbers are written like 1_000_000.
Use -release too.

 
> Any ideas? Am I somehow not hitting a vital compiler optimization?

DMD compiler doesn't perform many optimizations, especially on floating point 
computations.
But the bigger problem in your code is that you are performing operations on 
NaNs (that's the default initalization of FP values in D), and operations on 
NaNs are usually quite slower.


Your code in C:

#include "stdio.h"
#include "stdlib.h"
#define N 100

int main() {
double *foo = calloc(N, sizeof(double)); // malloc suffices here
int i, j;
for (j = 0; j < N; j++)
foo[j] = 1.0;

for (i = 0; i < 1000; i++)
for (j = 0; j < N-1; j++)
foo[j] = foo[j] + foo[j + 1];

printf("%f", foo[N-1]);
return 0;
}

/*
gcc -O3 -s -Wall test.c -o test
Timings, outer loop=1_000 times: 7.72

--

gcc -Wall -O3 -fomit-frame-pointer -msse3 -march=native test.c -o test
(Running on a VirtualBox)
Timings, outer loop=1_000 times: 7.69 s
Just the inner loop:
.L7:
fldl8(%edx)
fadd%st, %st(1)
fxch%st(1)
fstpl   (%edx)
addl$8, %edx
cmpl%ecx, %edx
jne .L7
*/



Your code in D1:

version (Tango)
import tango.stdc.stdio: printf;
else
import std.c.stdio: printf;

void main() {
const int N = 1_000_000;
double[] foo = new double[N];
foo[] = 1.0;

for (int i = 0; i < 1_000; i++)
for (int j = 0; j < N-1; j++)
foo[j] = foo[j] + foo[j + 1];

printf("%f", foo[N-1]);
}


/*
dmd -O -release -inline test.d
(Not running on a VirtualBox)
Timings, outer loop=1_000 times: 9.35 s
Just the inner loop:
L34:fld qword ptr 8[EDX*8][ECX]
faddqword ptr [EDX*8][ECX]
fstpqword ptr [EDX*8][ECX]
inc EDX
cmp EDX,0F423Fh
jb  L34

---

ldc -O3 -release -inline test.d
(Running on a VirtualBox)
Timings, outer loop=1_000 times: 7.87 s
Just the inner loop:
.LBB1_2:
movsd   (%eax,%ecx,8), %xmm0
addsd   8(%eax,%ecx,8), %xmm0
movsd   %xmm0, (%eax,%ecx,8)
incl%ecx
cmpl$99, %ecx
jne .LBB1_2

---

ldc -unroll-allow-partial -O3 -release -inline test.d
(Running on a VirtualBox)
Timings, outer loop=1_000 times: 7.75 s
Just the inner loop:
.LBB1_2:
movsd   (%eax,%ecx,8), %xmm0
addsd   8(%eax,%ecx,8), %xmm0
movsd   %xmm0, (%eax,%ecx,8)
movsd   8(%eax,%ecx,8), %xmm0
addsd   16(%eax,%ecx,8), %xmm0
movsd   %xmm0, 8(%eax,%ecx,8)
movsd   16(%eax,%ecx,8), %xmm0
addsd   24(%eax,%ecx,8), %xmm0
movsd   %xmm0, 16(%eax,%ecx,8)
movsd   24(%eax,%ecx,8), %xmm0
addsd   32(%eax,%ecx,8), %xmm0
movsd   %xmm0, 24(%eax,%ecx,8)
movsd   32(%eax,%ecx,8), %xmm0
addsd   40(%eax,%ecx,8), %xmm0
movsd   %xmm0, 32(%eax,%ecx,8)
movsd   40(%eax,%ecx,8), %xmm0
addsd   48(%eax,%ecx,8), %xmm0
movsd   %xmm0, 40(%eax,%ecx,8)
movsd   48(%eax,%ecx,8), %xmm0
addsd   56(%eax,%ecx,8), %xmm0
movsd   %xmm0, 48(%eax,%ecx,8)
movsd   56(%eax,%ecx,8), %xmm0
addsd   64(%eax,%ecx,8), %xmm0
movsd   %xmm0, 56(%eax,%ecx,8)
movsd   64(%eax,%ecx,8), %xmm0
addsd   72(%eax,%ecx,8), %xmm0
movsd   %xmm0, 64(%eax,%ecx,8)
addl$9, %ecx
cmpl$99, %ecx
jne .LBB1_2
*/

As you see the code generated by ldc is about as good as the one generated by 
gcc. There are of course other ways to optimize this code...

Bye,
bearophile