Which D features to emphasize for academic review article

2012-08-09 Thread TJB

Hello D Users,

The Software Editor for the Journal of Applied Econometrics has 
agreed to let me write a review of the D programming language for 
econometricians (econometrics is where economic theory and 
statistical analysis meet).  I will have only about 6 pages.  I 
have an idea of what I am going to write about, but I thought I 
would ask here what features are most relevant (in your minds) to 
numerical programmers writing codes for statistical inference.


I look forward to your suggestions.

Thanks,

TJB


Re: Which D features to emphasize for academic review article

2012-08-09 Thread dsimcha
Ok, so IIUC the audience is academic BUT is people interested in 
using D as a means to an end, not computer scientists?  I use D 
for bioinformatics, which IIUC has similar requirements to 
econometrics.  From my point of view:


I'd emphasize the following:

Native efficiency.  (Important for large datasets and monte carlo 
simulations)


Garbage collection.  (Important because it makes it much easier 
to write non-trivial data structures that don't leak memory, and 
statistical analyses are a lot easier if the data is structured 
well.)


Ranges/std.range/builtin arrays and associative arrays.  (Again, 
these make data handling a pleasure.)


Templates.  (Makes it easier to write algorithms that aren't 
overly specialized to the data structure they operate on.  This 
can also be done with OO containers but requires more boilerplate 
and compromises on efficiency.)


Disclaimer:  These last two are things I'm the primary designer 
and implementer of.  I intentionally put them last so it doesn't 
look like a shameless plug.


std.parallelism  (Important because you can easily parallelize 
your simulation, etc.)


dstats  (https://github.com/dsimcha/dstats  Important because a 
lot of statistical analysis code is already implemented for you.  
It's admittedly very basic compared to e.g. R or Matlab, but it's 
also in many cases better integrated and more efficient.  I'd say 
that it has the 15% of the functionality that covers ~70% of use 
cases.  I welcome contributors to add more stuff to it.  I 
imagine economists would be interested in time series, which is 
currently a big area of missing functionality.)




Re: Which D features to emphasize for academic review article

2012-08-09 Thread Paulo Pinto

On Thursday, 9 August 2012 at 18:20:08 UTC, Justin Whear wrote:

On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:


Hello D Users,

The Software Editor for the Journal of Applied Econometrics 
has agreed

to let me write a review of the D programming language for
econometricians (econometrics is where economic theory and 
statistical
analysis meet).  I will have only about 6 pages.  I have an 
idea of what
I am going to write about, but I thought I would ask here what 
features
are most relevant (in your minds) to numerical programmers 
writing codes

for statistical inference.

I look forward to your suggestions.

Thanks,

TJB


Lazy ranges are a lifesaver when dealing with big data.  E.g. 
read a
large csv file, use filter and map to clean and transform the 
data,
collect stats as you go, then output to a destination file.  
The lazy
nature of most of the ranges in Phobos means that you don't 
need to have
the data in memory, but you can write simple imperative code 
just as if

it was.


Ah, the beauty of functional programming and streams.


Re: Which D features to emphasize for academic review article

2012-08-09 Thread Justin Whear
On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:

> Hello D Users,
> 
> The Software Editor for the Journal of Applied Econometrics has agreed
> to let me write a review of the D programming language for
> econometricians (econometrics is where economic theory and statistical
> analysis meet).  I will have only about 6 pages.  I have an idea of what
> I am going to write about, but I thought I would ask here what features
> are most relevant (in your minds) to numerical programmers writing codes
> for statistical inference.
> 
> I look forward to your suggestions.
> 
> Thanks,
> 
> TJB

Lazy ranges are a lifesaver when dealing with big data.  E.g. read a 
large csv file, use filter and map to clean and transform the data, 
collect stats as you go, then output to a destination file.  The lazy 
nature of most of the ranges in Phobos means that you don't need to have 
the data in memory, but you can write simple imperative code just as if 
it was.


Re: Which D features to emphasize for academic review article

2012-08-09 Thread Walter Bright

On 8/9/2012 10:40 AM, dsimcha wrote:

I'd emphasize the following:


I'd like to add to that:

1. Proper support for 80 bit floating point types. Many compilers' libraries 
have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 
80 bit floats reduce the incidence of creeping roundoff error.


2. Support for SIMD vectors as native types.

3. Floating point values are default initialized to NaN.

4. Correct support for NaN and infinity values.

5. Correct support for unordered operations.

6. Array types do not degenerate into pointer types whenever passed to a 
function. In other words, array types know their dimension.


7. Array loop operations, i.e.:

for (size_t i = 0; i < a.length; i++)
   a[i] = b[i] + c;

can be written as:

a[] = b[] + c;

8. Global data is thread local by default, lessening the risk of unintentional 
unsynchronized sharing between threads.


Re: Which D features to emphasize for academic review article

2012-08-10 Thread F i L

Walter Bright wrote:

3. Floating point values are default initialized to NaN.


This isn't a good feature, IMO. C# handles this much more 
conveniently with just as much optimization/debugging benefit 
(arguably more so, because it catches NaN issues at 
compile-time). In C#:


class Foo
{
float x; // defaults to 0.0f

void bar()
{
float y; // doesn't default
y ++; // ERROR: use of unassigned local

float z = 0.0f;
z ++; // OKAY
}
}

This is the same behavior for any local variable, so where in D 
you need to explicitly set variables to 'void' to avoid 
assignment costs, C# automatically benefits and catches your NaN 
mistakes before runtime.


Sorry, I'm not trying to derail this thread. I just think D's has 
other, much better advertising points that this one.


Re: Which D features to emphasize for academic review article

2012-08-10 Thread Minas Mina
1) I think compile-time function execution is a very big plus for 
people doing calculations.


For example:

ulong fibonacci(ulong n) {  }

static x = fibonacci(50); // calculated at compile time! runtime 
cost = 0 !!!


2) It has support for a BigInt structure in its standard library 
(which is really fast!)


Re: Which D features to emphasize for academic review article

2012-08-10 Thread TJB

On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:

On 8/9/2012 10:40 AM, dsimcha wrote:

I'd emphasize the following:


I'd like to add to that:

1. Proper support for 80 bit floating point types. Many 
compilers' libraries have inaccurate 80 bit math functions, or 
don't implement 80 bit floats at all. 80 bit floats reduce the 
incidence of creeping roundoff error.


How unique to D is this feature?  Does this imply that things 
like BLAS and LAPACK, random number generators, statistical 
distribution functions, and other numerical software should be 
rewritten in pure D rather than calling out to external C or 
Fortran codes?


TJB


Re: Which D features to emphasize for academic review article

2012-08-10 Thread Walter Bright

On 8/10/2012 1:38 AM, F i L wrote:

Walter Bright wrote:

3. Floating point values are default initialized to NaN.


This isn't a good feature, IMO. C# handles this much more conveniently with just
as much optimization/debugging benefit (arguably more so, because it catches NaN
issues at compile-time). In C#:

 class Foo
 {
 float x; // defaults to 0.0f

 void bar()
 {
 float y; // doesn't default
 y ++; // ERROR: use of unassigned local

 float z = 0.0f;
 z ++; // OKAY
 }
 }

This is the same behavior for any local variable,


It catches only a subset of these at compile time. I can craft any number of 
ways of getting it to miss diagnosing it. Consider this one:


float z;
if (condition1)
 z = 5;
... lotsa code ...
if (condition2)
 z++;

To diagnose this correctly, the static analyzer would have to determine that 
condition1 produces the same result as condition2, or not. This is impossible to 
prove. So the static analyzer either gives up and lets it pass, or issues an 
incorrect diagnostic. So our intrepid programmer is forced to write:


float z = 0;
if (condition1)
 z = 5;
... lotsa code ...
if (condition2)
 z++;

Now, as it may turn out, for your algorithm the value "0" is an out-of-range, 
incorrect value. Not a problem as it is a dead assignment, right?


But then the maintenance programmer comes along and changes condition1 so it is 
not always the same as condition2, and now the z++ sees the invalid "0" value 
sometimes, and a silent bug is introduced.


This bug will not remain undetected with the default NaN initialization.



so where in D you need to
explicitly set variables to 'void' to avoid assignment costs,


This is incorrect, as the optimizer is perfectly capable of removing dead 
assignments like:


   f = nan;
   f = 0.0f;

The first assignment is optimized away.

> I just think D's has other, much better advertising points that this one.

Whether you agree with it being a good feature or not, it is a feature unique to 
D and merits discussion when talking about D's suitability for numerical 
programming.






Re: Which D features to emphasize for academic review article

2012-08-10 Thread Walter Bright

On 8/10/2012 8:31 AM, TJB wrote:

On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:

On 8/9/2012 10:40 AM, dsimcha wrote:

I'd emphasize the following:


I'd like to add to that:

1. Proper support for 80 bit floating point types. Many compilers' libraries
have inaccurate 80 bit math functions, or don't implement 80 bit floats at
all. 80 bit floats reduce the incidence of creeping roundoff error.


How unique to D is this feature?  Does this imply that things like BLAS and
LAPACK, random number generators, statistical distribution functions, and other
numerical software should be rewritten in pure D rather than calling out to
external C or Fortran codes?


I attended a talk given by a physicist a few months ago where he was using C 
transcendental functions. I pointed out to him that those functions were 
unreliable, producing wrong bits in a manner that suggested to me that they were 
internally truncating to double precision.


He expressed astonishment and told me I must be mistaken.

What can I say? I run across this repeatedly, and that's exactly why Phobos 
(with Don's help) has its own implementations, rather than simply calling the 
corresponding C ones.


I encourage you to run your own tests, and draw your own conclusions.



Re: Which D features to emphasize for academic review article

2012-08-10 Thread Jonathan M Davis
On Friday, August 10, 2012 15:10:47 Walter Bright wrote:
> What can I say? I run across this repeatedly, and that's exactly why Phobos
> (with Don's help) has its own implementations, rather than simply calling
> the corresponding C ones.

I think that it's pretty typical for programmers to think that something like 
a standard library function is essentially bug-free - especially for an older 
language like C. And unless you see results that are clearly wrong or someone 
else points out the problem, I don't know why you'd ever think that there was 
one. I certainly had no clue that C implementations had issues with floating 
point arithmetic before it was pointed out here. Regardless though, it's great 
that D gets it right.

- Jonathan M Davis


Re: Which D features to emphasize for academic review article

2012-08-10 Thread TJB

On Friday, 10 August 2012 at 22:11:23 UTC, Walter Bright wrote:

On 8/10/2012 8:31 AM, TJB wrote:
On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright 
wrote:

On 8/9/2012 10:40 AM, dsimcha wrote:

I'd emphasize the following:


I'd like to add to that:

1. Proper support for 80 bit floating point types. Many 
compilers' libraries
have inaccurate 80 bit math functions, or don't implement 80 
bit floats at
all. 80 bit floats reduce the incidence of creeping roundoff 
error.


How unique to D is this feature?  Does this imply that things 
like BLAS and
LAPACK, random number generators, statistical distribution 
functions, and other
numerical software should be rewritten in pure D rather than 
calling out to

external C or Fortran codes?


I attended a talk given by a physicist a few months ago where 
he was using C transcendental functions. I pointed out to him 
that those functions were unreliable, producing wrong bits in a 
manner that suggested to me that they were internally 
truncating to double precision.


He expressed astonishment and told me I must be mistaken.

What can I say? I run across this repeatedly, and that's 
exactly why Phobos (with Don's help) has its own 
implementations, rather than simply calling the corresponding C 
ones.


I encourage you to run your own tests, and draw your own 
conclusions.


Hopefully this will help make the case that D is the best choice 
for numerical programmers. I want to do my part to convince 
economists.


Another reason to implement BLAS and LAPACK in pure D is that the 
old routines like dgemm, cgemm, sgemm, and zgemm (all defined for 
different types) seem ripe for templatization.


Almost thou convinceth me ...

TJB



Re: Which D features to emphasize for academic review article

2012-08-10 Thread F i L

Walter Bright wrote:
It catches only a subset of these at compile time. I can craft 
any number of ways of getting it to miss diagnosing it. 
Consider this one:


float z;
if (condition1)
 z = 5;
... lotsa code ...
if (condition2)
 z++;

To diagnose this correctly, the static analyzer would have to 
determine that condition1 produces the same result as 
condition2, or not. This is impossible to prove. So the static 
analyzer either gives up and lets it pass, or issues an 
incorrect diagnostic. So our intrepid programmer is forced to 
write:


float z = 0;
if (condition1)
 z = 5;
... lotsa code ...
if (condition2)
 z++;


Yes, but that's not really an issue since the compiler informs 
the coder of it's limitation. You're simply forced to initialize 
the variable in this situation.



Now, as it may turn out, for your algorithm the value "0" is an 
out-of-range, incorrect value. Not a problem as it is a dead 
assignment, right?


But then the maintenance programmer comes along and changes 
condition1 so it is not always the same as condition2, and now 
the z++ sees the invalid "0" value sometimes, and a silent bug 
is introduced.


This bug will not remain undetected with the default NaN 
initialization.


I had a debate on here a few months ago about the merits of 
default-to-NaN and others brought up similar situations. but 
since we can write:


float z = float.nan;
...

explicitly, then this could be thought of as a debugging feature 
available to the programmer. The problem I've always had with 
defaulting to NaN is that it's inconsistent with integer types, 
and while there may be merit to the idea of defaulting all types 
to NaN/Null, it's simply unavailable for half of the number 
spectrum. I can only speak for myself, but I much prefer 
consistency over anything else because it means there's less 
discrepancies I need to remember when hacking things together. It 
also steepens the learning curve.


More importantly, what we have now is code where bugs-- like the 
one you mentioned above --are still possible with Ints, but also 
easy to miss since "the other number type" behaves differently 
and programmers may accidentally assume a NaN will propagate 
where it will not.



This is incorrect, as the optimizer is perfectly capable of 
removing dead assignments like:


   f = nan;
   f = 0.0f;

The first assignment is optimized away.


I thought there was some optimization by avoiding assignment, but 
IDK enough about memory at that level. Now I'm confused as to the 
point of 'float x = void' type annotations. :-\



Whether you agree with it being a good feature or not, it is a 
feature unique to D and merits discussion when talking about 
D's suitability for numerical programming.


True, and I misspoke by saying it wasn't a "selling point". I 
only meant to raise issue with a feature that has been more of an 
annoyance rather than a boon to me personally. That said, I also 
agree that this thread was the wrong place to raise issue with it.


Re: Which D features to emphasize for academic review article

2012-08-10 Thread Walter Bright

On 8/10/2012 9:01 PM, F i L wrote:

I had a debate on here a few months ago about the merits of default-to-NaN and
others brought up similar situations. but since we can write:

 float z = float.nan;
 ...


That is a good solution, but in my experience programmers just throw in an =0, 
as it is simple and fast, and they don't normally think about NaN's.



explicitly, then this could be thought of as a debugging feature available to
the programmer. The problem I've always had with defaulting to NaN is that it's
inconsistent with integer types, and while there may be merit to the idea of
defaulting all types to NaN/Null, it's simply unavailable for half of the number
spectrum. I can only speak for myself, but I much prefer consistency over
anything else because it means there's less discrepancies I need to remember
when hacking things together. It also steepens the learning curve.


It's too bad that ints don't have a NaN value, but interestingly enough, 
valgrind does default initialize them to some internal NaN, making it a most 
excellent bug detector.




More importantly, what we have now is code where bugs-- like the one you
mentioned above --are still possible with Ints, but also easy to miss since "the
other number type" behaves differently and programmers may accidentally assume a
NaN will propagate where it will not.


Sadly, D has to map onto imperfect hardware :-(

We do have NaN values for chars (0xFF) and pointers (the villified 'null'). 
Think how many bugs the latter has exposed, and then think of all the floating 
point code with no such obvious indicator of bad initialization.



I thought there was some optimization by avoiding assignment, but IDK enough
about memory at that level. Now I'm confused as to the point of 'float x = void'
type annotations. :-\


It would be used where the static analysis is not able to detect that the 
initializer is dead.


Re: Which D features to emphasize for academic review article

2012-08-10 Thread Walter Bright

On 8/10/2012 9:32 PM, Walter Bright wrote:

On 8/10/2012 9:01 PM, F i L wrote:

I had a debate on here a few months ago about the merits of default-to-NaN and
others brought up similar situations. but since we can write:

 float z = float.nan;
 ...


That is a good solution, but in my experience programmers just throw in an =0,
as it is simple and fast, and they don't normally think about NaN's.


Let me amend that. I've never seen anyone use float.nan, or whatever NaN is in 
the language they were using. They always use =0. I doubt that yelling at them 
will change anything.


Re: Which D features to emphasize for academic review article

2012-08-10 Thread F i L

F i L wrote:

Walter Bright wrote:
It catches only a subset of these at compile time. I can craft 
any number of ways of getting it to miss diagnosing it. 
Consider this one:


   float z;
   if (condition1)
z = 5;
   ... lotsa code ...
   if (condition2)
z++;

[...]


Yes, but that's not really an issue since the compiler informs 
the coder of it's limitation. You're simply forced to 
initialize the variable in this situation.


I just want to clarify something here. In C#, only class/struct 
fields are defaulted to a usable value. Locals have to be 
explicitly set before they're used.. so, expanding on your 
example above:


float z;
if (condition1)
z = 5;
else
z = 6; // 'else' required

... lotsa code ...
if (condition2)
z++;

On the first condition, without an 'else z = ...', or if the 
condition was removed at a later time, then you'll get a compiler 
error and be forced to explicitly assign 'z' somewhere above 
using it. So C# and D work in "similar" ways in this respect 
except that C# catches these issues at compile-time, whereas in D 
you need to:


  1. run the program
  2. get bad result
  3. hunt down bug

NaNs in C# are "mostly" (citations needed) set to ensure fields 
are initialized in a constructor:


class Foo
{
float f = float.NaN; // Can't 'f' use unless Foo is
 // properly constructed.
}


Re: Which D features to emphasize for academic review article

2012-08-10 Thread F i L

Walter Bright wrote:

Sadly, D has to map onto imperfect hardware :-(

We do have NaN values for chars (0xFF) and pointers (the 
villified 'null'). Think how many bugs the latter has exposed, 
and then think of all the floating point code with no such 
obvious indicator of bad initialization.


Yes, if 'int' had a NaN state it would be great. (Though I 
remember hearing about a hardware that did support it.. 
somewhere).





Re: Which D features to emphasize for academic review article

2012-08-11 Thread Walter Bright

On 8/10/2012 9:55 PM, F i L wrote:

On the first condition, without an 'else z = ...', or if the condition was
removed at a later time, then you'll get a compiler error and be forced to
explicitly assign 'z' somewhere above using it. So C# and D work in "similar"
ways in this respect except that C# catches these issues at compile-time,
whereas in D you need to:

   1. run the program
   2. get bad result
   3. hunt down bug


However, and I've seen this happen, people will satisfy the compiler complaint 
by initializing the variable to any old value (usually 0), because that value 
will never get used. Later, after other things change in the code, that value 
suddenly gets used, even though it may be an incorrect value for the use.


Re: Which D features to emphasize for academic review article

2012-08-11 Thread F i L

Walter Bright wrote:
That is a good solution, but in my experience programmers just 
throw in an =0, as it is simple and fast, and they don't 
normally think about NaN's.


See! Programmers just want usable default values :-P


It's too bad that ints don't have a NaN value, but 
interestingly enough, valgrind does default initialize them to 
some internal NaN, making it a most excellent bug detector.


I heard somewhere before there's actually an (Intel?) CPU which 
supports NaN ints... but maybe that's just hearsay.




Sadly, D has to map onto imperfect hardware :-(

We do have NaN values for chars (0xFF) and pointers (the 
villified 'null'). Think how many bugs the latter has exposed, 
and then think of all the floating point code with no such 
obvious indicator of bad initialization.


Ya, but I don't think pointers/refs and floats are comparable 
because one is copy semantics and the other is not. Conceptually, 
pointers are only references to data while numbers are actual 
data. It makes sense that one would default to different things. 
Thought if Int did have a NaN value, I'm not sure which way I 
would side on this issue. I still think I would prefer having 
some level of compile-time indication or my errors simply because 
it saves time when you're making something.



It would be used where the static analysis is not able to 
detect that the initializer is dead.


Good to know.


However, and I've seen this happen, people will satisfy the 
compiler complaint by initializing the variable to any old 
value (usually 0), because that value will never get used. 
Later, after other things change in the code, that value 
suddenly gets used, even though it may be an incorrect value 
for the use.


Maybe the perfect solution is to have the compiler initialize the 
value to NaN, but it also does a bit of static analysis and gives 
a compiler error when it can determine your variable is being 
used before being assigned for the sake of productivity.


In fact, for the sake of consistency, you could always enforce 
that (compiler error) rule on every local variable, so even ints 
would be required to have explicit initialization before use.


I still prefer float class members to be defaulted to a usable 
value, for the sake of consistency with ints.


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Era Scarecrow

On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:
It's too bad that ints don't have a NaN value, but 
interestingly enough, valgrind does default initialize them to 
some internal NaN, making it a most excellent bug detector.


 The compiler could always have flags specifying if variables 
were used, and if they are false they are as good as NaN. Only 
downside is a performance hit unless you Mark it as a release 
binary. It really comes down to if it's worth implementing or 
considered a big change (unless it's a flag you have to specially 
turn on)


example:

  int a;

  writeln(a++); //compile-time error, or throws an exception on 
at runtime (read access before being set)


internally translated as:
  int a;
  bool _is_a_used = false;

  if (!_a__is_a_used)
throw new exception("a not initialized before use!");
//passing to functions will throw the exception,
//unless the signature is 'out'
  writeln(a);

  ++a;
  _a__is_a_used= true;



Sadly, D has to map onto imperfect hardware :-(


 Not so much imperfect hardware, just the imperfect 'human' 
variable.


We do have NaN values for chars (0xFF) and pointers (the 
villified 'null'). Think how many bugs the latter has exposed, 
and then think of all the floating point code with no such 
obvious indicator of bad initialization.




Re: Which D features to emphasize for academic review article

2012-08-11 Thread Walter Bright

On 8/11/2012 1:30 AM, Era Scarecrow wrote:

On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:

It's too bad that ints don't have a NaN value, but interestingly enough,
valgrind does default initialize them to some internal NaN, making it a most
excellent bug detector.


  The compiler could always have flags specifying if variables were used, and if
they are false they are as good as NaN. Only downside is a performance hit
unless you Mark it as a release binary. It really comes down to if it's worth
implementing or considered a big change (unless it's a flag you have to
specially turn on)


Not so easy. Suppose you pass a pointer to the variable to another function. 
Does that function set it?


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Walter Bright

On 8/11/2012 1:57 AM, Jakob Ovrum wrote:

The compiler in languages like C# doesn't try to prove that the variable is NOT
set and then emits an error. It tries to prove that the variable IS set, and if
it can't prove that, it's an error.

It's not an incorrect diagnostic, it does exactly what it's supposed to do


Of course it is doing what the language requires, but it is an incorrect 
diagnostic because a dead assignment is required.


And being a dead assignment, it can lead to errors when the code is later 
modified, as I explained. I also dislike on aesthetic grounds meaningless code 
being required.



In D, on the other hand, it's possible to write D code like:

for(size_t i; i < length; ++i)
{
 ...
}

And I've actually seen this kind of code a lot in the wild. It boggles my mind
that you think that this code should be legal. I think it's lazy - the intention
is not clear. Is the default initializer being intentionally relied on, or was
it unintentional? I've seen both cases. The for-loop example is an extreme one
for demonstrative purposes, most examples are less obvious.


That perhaps is your experience with other languages (that do not default 
initialize) showing. I don't think that default initialization is so awful. In 
fact, C++ enables one to specify default initialization for user defined types. 
Are you against that, too?




Saying that most programmers will explicitly initialize floating point numbers
to 0 instead of NaN when taking on initialization responsibility is a cop-out -


You can certainly say it's a copout, but it's what I see them do. I've never 
seen them initialize to NaN, but I've seen the "just throw in a 0" many times.




float.init and float.nan are obviously the values you should be going for. The
benefit is easy for programmers to understand, especially if they already
understand why float.init is NaN. You say yelling at them probably won't help -
why not?


Because experience shows that even the yellers tend to do the short, convenient 
one rather than the longer, correct one. Bruce Eckel wrote an article about this 
years ago in reference to why Java exception specifications were a failure and 
actually caused people to write bad code, including those who knew better.






Re: Which D features to emphasize for academic review article

2012-08-11 Thread Paulo Pinto

On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:

On 8/11/2012 1:57 AM, Jakob Ovrum wrote:
Because experience shows that even the yellers tend to do the 
short, convenient one rather than the longer, correct one. 
Bruce Eckel wrote an article about this years ago in reference 
to why Java exception specifications were a failure and 
actually caused people to write bad code, including those who 
knew better.


I have to agree here.

I spend my work time between JVM and .NET based languages, and
checked exceptions are on my top 5 list of what went wrong with 
Java.


You see lots of

try {
 ...
} catch (Exception e) {
  e.printStackException();
}

in enterprise code.

--
Paulo


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Andrei Alexandrescu

On 8/11/12 3:11 AM, F i L wrote:

I still prefer float class members to be defaulted to a usable value,
for the sake of consistency with ints.


Actually there's something that just happened two days ago to me that's 
relevant to this, particularly because it's in a different language 
(SQL) and different domain (Machine Learning).


I was working with an iterative algorithm implemented in SQL, which 
performs some aggregate computation, on some 30 billions of samples. The 
algorithm is rather intricate, and each iteration takes the previous 
one's result as input.


Somehow at the end there were NaNs in the sample data I was looking at 
(there weren't supposed to). So I started investigating; the NaNs could 
appear only in a rare data corruption case. And indeed before long I 
found 4 (four) samples out of 30 billion that were corrupt. After one 
iteration, there were 300K NaNs. After two iterations, a few millions. 
After four, 800M samples were messed up. NaNs did save the day.


Although this case is not about default values but about the result of a 
computation (in this case 0.0/0.0), I think it still reveals the 
usefulness of having a singular value in the floating point realm.



Andrei


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Era Scarecrow

On Saturday, 11 August 2012 at 09:26:42 UTC, Walter Bright wrote:

On 8/11/2012 1:30 AM, Era Scarecrow wrote:


The compiler could always have flags specifying if variables 
were used, and if they are false they are as good as NaN. Only 
downside is a performance hit unless you Mark it as a release 
binary. It really comes down to if it's worth implementing or 
considered a big change (unless it's a flag you have to 
specially turn on)


Not so easy. Suppose you pass a pointer to the variable to 
another function. Does that function set it?


 I suppose there could be a second hidden pointer/bool as part of 
calls, but then it's completely incompatible with any C calling 
convention, meaning that is probably out of the question.


 Either a) pointers are low level enough that like casting; At 
which case it's all up to the programmer. or b) same as before 
that unless it's an 'out' parameter is specified, it would likely 
throw an exception at that point, (Since attempting to read/pass 
the address of an uninitialized variable is the same as accessing 
it directly). Afterall having a false positive is better than not 
being involved at all right?


 Of course with that in mind, specifying a variable to begin as 
void (uninitialized) could be it's own form of initialization? 
(Meaning it wouldn't be checking those even though they hold 
known garbage)


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Jakob Ovrum

On Friday, 10 August 2012 at 22:01:46 UTC, Walter Bright wrote:
It catches only a subset of these at compile time. I can craft 
any number of ways of getting it to miss diagnosing it. 
Consider this one:


float z;
if (condition1)
 z = 5;
... lotsa code ...
if (condition2)
 z++;

To diagnose this correctly, the static analyzer would have to 
determine that condition1 produces the same result as 
condition2, or not. This is impossible to prove. So the static 
analyzer either gives up and lets it pass, or issues an 
incorrect diagnostic. So our intrepid programmer is forced to 
write:


float z = 0;
if (condition1)
 z = 5;
... lotsa code ...
if (condition2)
 z++;

Now, as it may turn out, for your algorithm the value "0" is an 
out-of-range, incorrect value. Not a problem as it is a dead 
assignment, right?


But then the maintenance programmer comes along and changes 
condition1 so it is not always the same as condition2, and now 
the z++ sees the invalid "0" value sometimes, and a silent bug 
is introduced.


This bug will not remain undetected with the default NaN 
initialization.


The compiler in languages like C# doesn't try to prove that the 
variable is NOT set and then emits an error. It tries to prove 
that the variable IS set, and if it can't prove that, it's an 
error.


It's not an incorrect diagnostic, it does exactly what it's 
supposed to do and the programmer has to be explicit when one 
takes on the responsibility of initialization. I don't see 
anybody complaining about this feature in C#, most experienced C# 
programmers I've talked to love it (I much prefer it too).


Leaving a local variable initially uninitialized (or rather, not 
explicitly initialized) is a good way to portray the intention 
that it's going to be conditionally initialized later. In C#, if 
your program compiles, your variable is guaranteed to be 
initialized later but before use. This is a useful guarantee when 
reading/maintaining code.


In D, on the other hand, it's possible to write D code like:

for(size_t i; i < length; ++i)
{
...
}

And I've actually seen this kind of code a lot in the wild. It 
boggles my mind that you think that this code should be legal. I 
think it's lazy - the intention is not clear. Is the default 
initializer being intentionally relied on, or was it 
unintentional? I've seen both cases. The for-loop example is an 
extreme one for demonstrative purposes, most examples are less 
obvious.


Saying that most programmers will explicitly initialize floating 
point numbers to 0 instead of NaN when taking on initialization 
responsibility is a cop-out - float.init and float.nan are 
obviously the values you should be going for. The benefit is easy 
for programmers to understand, especially if they already 
understand why float.init is NaN. You say yelling at them 
probably won't help - why not? I personally use 
float.init/double.init etc. in my own code, and I'm sure other 
informed programmers do too. I can understand why people don't do 
it in, say, C, with NaN being less defined there afaik. D 
promotes NaN actively and programmers should be eager to leverage 
NaN explicitly too.


It's also important to note that C# works the same as D for 
non-local variables - they all have a defined default initializer 
(the C# equivalent of T.init is default(T)). Another point is 
that the local-variable analysis is limited to the scope of a 
single function body, it does not do inter-procedural analysis.


I think this would be a great thing for D, and I believe that all 
code this change breaks is actually broken to begin with.




Re: Which D features to emphasize for academic review article

2012-08-11 Thread Jakob Ovrum

On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
Of course it is doing what the language requires, but it is an 
incorrect diagnostic because a dead assignment is required.


And being a dead assignment, it can lead to errors when the 
code is later modified, as I explained. I also dislike on 
aesthetic grounds meaningless code being required.


It is not meaningless, it's declarative. The same resulting code 
as now would be generated, but it's easier for the maintainer to 
understand what's being meant.


That perhaps is your experience with other languages (that do 
not default initialize) showing. I don't think that default 
initialization is so awful. In fact, C++ enables one to specify 
default initialization for user defined types. Are you against 
that, too?


No, because user-defined types can have explicitly initialized 
members. I do think that member fields relying on the default 
initializer are ambiguous and should be explicit, but flow 
analysis on aggregate members is not going to work in any current 
programming language. D already works similarly to C# on this 
point.


And for the record, I have more experience with D than C#. I 
barely use C#, but I'm not afraid to point out its good parts 
even though D is my personal favourite.


You can certainly say it's a copout, but it's what I see them 
do. I've never seen them initialize to NaN, but I've seen the 
"just throw in a 0" many times.


Again, I agree with this - except the examples are not from D, 
and certainly not from the future D that is being proposed. I 
don't blame anyone from steering away from NaN in other C-style 
languages.


I do, however, believe that D programmers are perfectly capable 
of doing the right thing if informed. And let's face it - there's 
a lot that relies on education in D, like whether to receive a 
string parameter as const or immutable, and using scope on a 
subset of callback parameters. Both of these examples require 
more typing than the intuitive/straight-forward choice (always 
receive `string` and no `scope` on delegates), but informed D 
programmers still choose the more lengthy, correct version.


Consider `pure` member functions - turns out most of them are 
actually pure because the implicit `this` parameter is allowed to 
be mutated and it's rare for a member function to mutate global 
state, yet we all strive to correctly decorate our methods `pure` 
when applicable.


Because experience shows that even the yellers tend to do the 
short, convenient one rather than the longer, correct one. 
Bruce Eckel wrote an article about this years ago in reference 
to why Java exception specifications were a failure and 
actually caused people to write bad code, including those who 
knew better.


I don't think the comparison is fair.

Compared to Java exception specifications, the difference between 
'0' and 'float.nan'/'float.init' is negligible, especially in 
generic functions when the desired initializer would typically be 
'T.init'.


Java exception specifications have widespread implications for 
the entire codebase, while the difference between '0' and 
'float.nan' is constant and entirely a local improvement.





Re: Which D features to emphasize for academic review article

2012-08-11 Thread F i L

Andrei Alexandrescu wrote:

[ ... ]

Although this case is not about default values but about the 
result of a computation (in this case 0.0/0.0), I think it 
still reveals the usefulness of having a singular value in the 
floating point realm.


My argument was never against the usefulness of NaN for 
debugging... only that it should be considered a debugging 
feature and explicitly defined, rather than intruding on 
convenience and consistency (with Int) by being the default.


I completely agree NaNs are important for debugging floating 
point math, in fact D's default-to-NaN has caught a couple of my 
construction mistakes before. The problem, is that this sort of 
construction mistake is bigger than just floating point and NaN. 
You can mis-set a variable, float or not, or you can not set an 
int when you should have.


So the question becomes not what benefit NaN is for debugging, 
but what a persons thought process is when creating/debugging 
code, and herein lies the heart of my qualm. In D we have a bit 
of a conceptual double standard within the number community. I 
have to remember these rules when I'm creating something, not 
just when I'm debugging it. As often as D may have caught a 
construction mistake specifically related to floats in my code, 
10x more so it's produced NaN's where I intended a number, 
because I forgot about the double standard when adding a field or 
creating a variable.


A C++ guy might not think twice about this because he's used to 
having to default values all the time (IDK, I'm not that guy), 
but to a C# guy, D's approach feels more like a regression, and 
that's a paper-cut on someone's opinion of the language.




Re: Which D features to emphasize for academic review article

2012-08-11 Thread Walter Bright

On 8/11/2012 12:33 PM, F i L wrote:

In D we have a bit of a conceptual double standard within the
number community. I have to remember these rules when I'm creating something,
not just when I'm debugging it. As often as D may have caught a construction
mistake specifically related to floats in my code, 10x more so it's produced
NaN's where I intended a number, because I forgot about the double standard when
adding a field or creating a variable.


I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in 
the field.




A C++ guy might not think twice about this because he's used to having to
default values all the time (IDK, I'm not that guy),


Only if a default constructor is defined for the type, which it often is not, 
and you'll get garbage for a default initialization.





Re: Which D features to emphasize for academic review article

2012-08-11 Thread bearophile

F i L:


Walter Bright wrote:

3. Floating point values are default initialized to NaN.


This isn't a good feature, IMO. C# handles this much more 
conveniently


An alternative possibility is to:
1) Default initialize variables just as currently done in D, with 
0s, NaNs, etc;
2) Where the compiler is certain a variable is read before any 
possible initialization, it generates a compile-time error;

3) Warnings for unused variables and unused last assignments.

Where the compiler is not sure, not able to tell, or sees there 
is one or more paths where the variable is initialized, it gives 
no errors, and eventually the code will use the default 
initialized values, as currently done in D.



The D compiler is already doing this a little, if you compile 
this with -O:


class Foo {
  void bar() {}
}
void main() {
  Foo f;
  f.bar();
}

You get at compile-time:
temp.d(6): Error: null dereference in function _Dmain


A side effect of those rules is that this code doesn't compile, 
and similarly lot of current D code:


class Foo {}
void main() {
  Foo f;
  assert(f is null);
}


Bye,
bearophile


Re: Which D features to emphasize for academic review article

2012-08-11 Thread F i L

Walter Bright wrote:
I'd rather have a 100 easy to find bugs than 1 unnoticed one 
that went out in the field.


That's just the thing, bugs are arguably easier to hunt down when 
things default to a consistent, usable value. When variables are 
defaulted to Zero, I have a guarantee that any propagated NaN bug 
is _not_ coming from them (directly). With NaN defaults, I only 
have a guarantee that the value _might_ be coming said variable.


Then, I also have more to be aware of when searching through 
code, because my ints behave differently than my floats. 
Arguably, you always have to be aware of this, but at least with 
explicit sets to NaN, I know the potential culprits earlier 
(because they'll have distinct assignment).


With static analysis warning against local scope NaN issues, 
there's really only one situation where setting to NaN catches 
bugs, and that's when you want to guarantee that a member 
variable is specifically assigned a value (of some kind) during 
construction. This is a corner case situation because:


1. It makes no guarantees about what value is actually assigned 
to the variable, only that it's set to something. Which means 
it's either forgotten in favor of a  'if' statement, or in 
combination with an if statement.


2. Because of it's singular debugging potential, NaN safeguards 
are, most often, intentionally put in place (or in D's case, left 
in place).


This is why I think such situations should require an explicit 
assignment to NaN. The "100 easy bugs" you mentioned weren't 
actually "bugs", they where times I forgot floats defaulted 
_differently_. The 10 times where NaN caught legitimate bugs, I 
would have had to hunt down the mistake either way, and it was 
trivial to do regardless of the the NaN. Even if it wasn't 
trivial, I could have very easily assigned NaN to questionable 
variables explicitly.


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Walter Bright

On 8/11/2012 2:41 PM, bearophile wrote:

2) Where the compiler is certain a variable is read before any possible
initialization, it generates a compile-time error;


This has been suggested repeatedly, but it is in utter conflict with the whole 
notion of default initialization, which nobody complains about for user-defined 
types.


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Walter Bright

On 8/11/2012 3:01 PM, F i L wrote:

Walter Bright wrote:

I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in
the field.


That's just the thing, bugs are arguably easier to hunt down when things default
to a consistent, usable value.


Many, many programming bugs trace back to assumptions that floating point 
numbers act like ints. There's just no way to avoid knowing and understanding 
the differences.




When variables are defaulted to Zero, I have a
guarantee that any propagated NaN bug is _not_ coming from them (directly). With
NaN defaults, I only have a guarantee that the value _might_ be coming said
variable.


I don't see why this is a bad thing. The fact is, with NaN you know there is a 
bug. With 0, you may never realize there is a problem. Andrei wrote me about the 
output of a program he is working on having billions of result values, and he 
noticed a few were NaNs, which he traced back to a bug. If the bug had set the 
float value to 0, there's no way he would have ever noticed the issue.


It's all about daubing bugs with day-glo orange paint so you know there's a 
problem. Painting them with camo is not the right solution.





Re: Which D features to emphasize for academic review article

2012-08-11 Thread Chad J

On 08/10/2012 06:01 PM, Walter Bright wrote:

On 8/10/2012 1:38 AM, F i L wrote:

Walter Bright wrote:

3. Floating point values are default initialized to NaN.


This isn't a good feature, IMO. C# handles this much more conveniently
with just
as much optimization/debugging benefit (arguably more so, because it
catches NaN
issues at compile-time). In C#:

class Foo
{
float x; // defaults to 0.0f

void bar()
{
float y; // doesn't default
y ++; // ERROR: use of unassigned local

float z = 0.0f;
z ++; // OKAY
}
}

This is the same behavior for any local variable,


It catches only a subset of these at compile time. I can craft any
number of ways of getting it to miss diagnosing it. Consider this one:

float z;
if (condition1)
z = 5;
... lotsa code ...
if (condition2)
z++;

To diagnose this correctly, the static analyzer would have to determine
that condition1 produces the same result as condition2, or not. This is
impossible to prove. So the static analyzer either gives up and lets it
pass, or issues an incorrect diagnostic. So our intrepid programmer is
forced to write:

float z = 0;
if (condition1)
z = 5;
... lotsa code ...
if (condition2)
z++;

Now, as it may turn out, for your algorithm the value "0" is an
out-of-range, incorrect value. Not a problem as it is a dead assignment,
right?

But then the maintenance programmer comes along and changes condition1
so it is not always the same as condition2, and now the z++ sees the
invalid "0" value sometimes, and a silent bug is introduced.

This bug will not remain undetected with the default NaN initialization.



To address the concern of static analysis being too hard: I wish we 
could have it but limit the amount of static analysis that's done. 
Something like this: the compiler will test branches of if-else 
statements and switch-case statements, but it will not drop into 
function calls with ref parameters nor will it accept initialization in 
looping constructs (foreach, for, while, etc).  A compiler is an 
incorrect implementation if it implements /too much/ static analysis.


The example code you give can be implemented with such limited static 
analysis:


void lotsaCode() {
... lotsa code ...
}

float z;
if ( condition1 )
{
z = 5;
lotsaCode();
z++;
}
else
{
lotsaCode();
}

I will, in advance, concede that this does not prevent people from just 
writing "float z = 0;".  In my dream-world the compiler recognizes a set 
of common mistake-inducing patterns like the one you mentioned and then 
prints helpful error messages suggesting alternative design patterns. 
That way, bugs are prevented and users become better programmers.


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Era Scarecrow

On Saturday, 11 August 2012 at 23:49:18 UTC, Chad J wrote:

On 08/10/2012 06:01 PM, Walter Bright wrote:
It catches only a subset of these at compile time. I can craft 
any number of ways of getting it to miss diagnosing it. 
Consider this one:


float z;
if (condition1)
z = 5;
... lotsa code ...
if (condition2)
z++;

To diagnose this correctly, the static analyzer would have to 
determine that condition1 produces the same result as 
condition2, or not. This is impossible to prove. So the static 
analyzer either gives up and lets it pass, or issues an 
incorrect diagnostic. So our intrepid programmer is forced to 
write:


float z = 0;
if (condition1)
z = 5;
... lotsa code ...
if (condition2)
z++;

Now, as it may turn out, for your algorithm the value "0" is 
an out-of-range, incorrect value. Not a problem as it is a 
dead assignment, right?


But then the maintenance programmer comes along and changes 
condition1 so it is not always the same as condition2, and now 
the z++ sees the invalid "0" value sometimes, and a silent bug 
is introduced.


This bug will not remain undetected with the default NaN 
initialization.


 Let's keep in mind everyone of these truths:

1) Programmers are lazy; If you can get away with not 
initializing something then you'll avoid it. In C I've failed to 
initialized variables many times until a bug crops up and it's 
difficult to find sometimes, where a NaN or equiv would have 
quickly cropped them out before running with any real data.


2) There are a lot of inexperienced programmers. I worked for a 
company for a short period of time that did minimal training on a 
language like Java, where I ended up being seen as an utter 
genius (compared to even the teachers).


3) Bugs in a large environment and/or scenarios are far more 
difficult if not impossible to debug. I've made a program that 
handles merging of various dialogs (using double linked-like 
lists); I can debug them if they are 100 or less to work with, 
but after 100 (and often it's tens of thousands) it can become 
such a pain based on it's indirection and how the original 
structure was built that I refuse based on difficulty vs end 
results (Plus sanity).


 We also need to sometimes laugh at our mistakes, and learn from 
others. I'll recommend everyone read from rinkworks a bit if you 
have the time and refresh yourselves.


 http://www.rinkworks.com/stupid/cs_programming.shtml


Re: Which D features to emphasize for academic review article

2012-08-11 Thread F i L

Walter Bright wrote:
That's just the thing, bugs are arguably easier to hunt down 
when things default

to a consistent, usable value.


Many, many programming bugs trace back to assumptions that 
floating point numbers act like ints. There's just no way to 
avoid knowing and understanding the differences.


My point was that the majority of the time there wasn't a bug 
introduced. Meaning the code was written an functioned as 
expected after I initialized the value to 0. I was only expecting 
the value to act similar (in initial value) as it's 'int' 
relative, but received a NaN in the output because I forgot to be 
explicit.



I don't see why this is a bad thing. The fact is, with NaN you 
know there is a bug. With 0, you may never realize there is a 
problem. Andrei wrote me about the output of a program he is 
working on having billions of result values, and he noticed a 
few were NaNs, which he traced back to a bug. If the bug had 
set the float value to 0, there's no way he would have ever 
noticed the issue.


It's all about daubing bugs with day-glo orange paint so you 
know there's a problem. Painting them with camo is not the 
right solution.


Yes, and this is an excellent argument for using NaN as a 
debugging practice in general, but I don't see anything in favor 
of defaulting to NaN. If you don't do some kind of check against 
code, especially with such large data sets, bugs of various kinds 
are going to go unchecked regardless.


A bug where an initial data value was accidentally initialized to 
0 (by a third party later on, for instance), could be just as 
hard to miss, or harder if you're expecting a NaN to appear. In 
fact, an explicit set to NaN might discourage a third party to 
assigning without first questioning the original intention. In 
this situation I imagine best practice would be to write:


float dataValue = float.nan; // MUST BE NaN, DO NOT CHANGE!
 // set to NaN to ensure is-set.


Re: Which D features to emphasize for academic review article

2012-08-11 Thread Andrei Alexandrescu

On 8/11/12 7:33 PM, Walter Bright wrote:
[snip]

Allow me to insert an opinion here. This post illustrates quite well how 
opinionated our community is (for better or worse).


The OP has asked a topical question in a matter that is interesting and 
also may influence the impact of the language to the larger community. 
Before long the thread has evolved into the familiar pattern of a debate 
over a minor issue on which reasonable people may disagree and that's 
unlikely to change. We should instead do our best to give a balanced 
high-level view of what D offers for econometrics.


To the OP - here are a few aspects that may deserve interest:

* Modeling power - from what I understand econometrics is 
modeling-heavy, which is more difficult to address in languages such as 
Fortran, C, C++, Java, Python, or the likes of Matlab.


* Efficiency - D generates native code for floating point operations and 
has control over data layout and allocation. Speed of generated code is 
dependent on the compiler, and the reference compiler (dmd) does a 
poorer job at it than the gnu-based compiler (gdc) compiler.


* Convenience - D is designed to "do what you mean" wherever possible 
and simplify common programming tasks, numeric or not. That makes the 
language comfortable to use even by a non-specialist, in particular in 
conjunction with appropriate libraries.


A few minuses I can think of:

- Maturity and availability of numeric and econometrics library is an 
obvious issue. There are some libraries (e.g. 
https://github.com/kyllingstad/scid/wiki) maintained and extended 
through volunteer effort.


- The language's superior modeling power and level of control comes at 
an increase in complexity compared to languages such as e.g. Python. So 
the statistician would need a larger upfront investment in order to reap 
the associated benefits.



Andrei



Re: Which D features to emphasize for academic review article

2012-08-11 Thread bearophile

Andrei Alexandrescu:

- The language's superior modeling power and level of control 
comes at an increase in complexity compared to languages such 
as e.g. Python. So the statistician would need a larger upfront 
investment in order to reap the associated benefits.


Statistician often use the R language 
(http://en.wikipedia.org/wiki/R_language ).
Python contains much more "computer science" and CS complexity 
compared to R. Not just advanced stuff like coroutines, 
metaclasses, decorators, Abstract Base Classes, operator 
overloading, and so on, but even simpler things, like generators, 
standard library collections like heaps and deques, and so on.
For some statisticians I've seen, even several parts of Python 
are too much hard to use or understand. I have rewritten several 
of their Python scripts.


Bye,
bearophile


Re: Which D features to emphasize for academic review article

2012-08-11 Thread TJB
On Sunday, 12 August 2012 at 02:28:44 UTC, Andrei Alexandrescu 
wrote:

On 8/11/12 7:33 PM, Walter Bright wrote:
[snip]

Allow me to insert an opinion here. This post illustrates quite 
well how opinionated our community is (for better or worse).


The OP has asked a topical question in a matter that is 
interesting and also may influence the impact of the language 
to the larger community. Before long the thread has evolved 
into the familiar pattern of a debate over a minor issue on 
which reasonable people may disagree and that's unlikely to 
change. We should instead do our best to give a balanced 
high-level view of what D offers for econometrics.


To the OP - here are a few aspects that may deserve interest:

* Modeling power - from what I understand econometrics is 
modeling-heavy, which is more difficult to address in languages 
such as Fortran, C, C++, Java, Python, or the likes of Matlab.


* Efficiency - D generates native code for floating point 
operations and has control over data layout and allocation. 
Speed of generated code is dependent on the compiler, and the 
reference compiler (dmd) does a poorer job at it than the 
gnu-based compiler (gdc) compiler.


* Convenience - D is designed to "do what you mean" wherever 
possible and simplify common programming tasks, numeric or not. 
That makes the language comfortable to use even by a 
non-specialist, in particular in conjunction with appropriate 
libraries.


A few minuses I can think of:

- Maturity and availability of numeric and econometrics library 
is an obvious issue. There are some libraries (e.g. 
https://github.com/kyllingstad/scid/wiki) maintained and 
extended through volunteer effort.


- The language's superior modeling power and level of control 
comes at an increase in complexity compared to languages such 
as e.g. Python. So the statistician would need a larger upfront 
investment in order to reap the associated benefits.



Andrei


Andrei,

Thanks for bringing this back to the original topic and for your 
thoughts.


Indeed, a lot of econometricians are using MATLAB, R, Guass, Ox 
and the like. But there are a number of econometricians who need 
the raw power of a natively compiled language (especially 
financial econometricians whose data are huge) who typically 
program in either Fortran or C/C++.  It is really this group that 
I am trying to reach.  I think D has a lot to offer this group in 
terms of programmer productivity and reliability of code.  I 
think this applies to statisticians as well, as I see a lot of 
them in this latter group too.


I also want to reach the MATLABers because I think they can get a 
lot more modeling power (I like how you put that) without too 
much more difficulty (see Ox - nearly as complicated as C++ but 
without the power).  Many MATLAB and R programmers end up 
recoding a good part of their algorithms in C++ and calling that 
code from the interpreted language.  I have always found this 
kind of mixed language programming to be messy, time consuming, 
and error prone.  Special tools are cropping up to handle this 
(see Rcpp).  This just proves to me the usefulness of a 
productive AND powerful language like D for econometricians!


I am sensitive to the drawbacks you mention (especially lack of 
numeric libraries).  I am so sick of wasting my time in C++ 
though that I have almost decided to just start writing my own 
econometric library in D.  Earlier in this thread there was a 
discussion of extended precision in D and I mentioned the need to 
recode things like BLAS and LAPACK in D.  Templates in D seem 
perfect for this problem.  As an expert in template 
meta-programming what are your thoughts?  How is this different 
than what is being done in SciD?  It seems they are mostly 
concerned about wrapping the old CBLAS and CLAPACK libraries.


Again, thanks for your thoughts and your TDPL book. Probably the 
best programming book I've ever read!


TJB


Re: Which D features to emphasize for academic review article

2012-08-11 Thread dennis luehring

Am 12.08.2012 02:43, schrieb F i L:

Yes, and this is an excellent argument for using NaN as a
debugging practice in general, but I don't see anything in favor
of defaulting to NaN. If you don't do some kind of check against
code, especially with such large data sets, bugs of various kinds
are going to go unchecked regardless.



is makes absolutely no sense to have different initialization stylel in 
debug an release - and according to Andrei example: there are many 
situations where slow-debug code isn't capable to reproduce the error in 
a human-timespan - especially when working with million, billion 
datasets (like i also do...)




Re: Which D features to emphasize for academic review article

2012-08-12 Thread Walter Bright

On 8/11/2012 7:30 AM, Jakob Ovrum wrote:

On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:

Of course it is doing what the language requires, but it is an incorrect
diagnostic because a dead assignment is required.

And being a dead assignment, it can lead to errors when the code is later
modified, as I explained. I also dislike on aesthetic grounds meaningless code
being required.


It is not meaningless, it's declarative. The same resulting code as now would be
generated, but it's easier for the maintainer to understand what's being meant.


No, it is not easier to understand, because there's no way to determine if the 
intent is to:


1. initialize to a valid value -or-
2. initialize to get the compiler to stop complaining



I do, however, believe that D programmers are perfectly capable of doing the
right thing if informed.


Of course they are capable of it. But experience shows they simply don't.



Consider `pure` member functions - turns out most of them are actually pure
because the implicit `this` parameter is allowed to be mutated and it's rare for
a member function to mutate global state, yet we all strive to correctly
decorate our methods `pure` when applicable.


A better design would be to have pure be the default and impure would require 
annotation. The same for const/immutable. Unfortunately, it's too late for that 
now. My fault.




Java exception specifications have widespread implications for the entire
codebase, while the difference between '0' and 'float.nan' is constant and
entirely a local improvement.


I believe there's a lot more potential for success when you have a design where 
the easiest way is the correct way, and you've got to make some effort to do it 
wrong. Much of my attitude on that goes back to my experience at Boeing on 
designing things (yes, my boring Boeing anecdotes again), and Boeing's long 
experience with pilots and mechanics and what they actually do vs what they're 
trained to do. (And not only are these people professionals, not fools, but 
their lives depend on doing it right.)


Over and over and over again, the easy way had better be the correct way. I 
could bore you even more with the aviation horror stories I heard that justified 
that attitude.


Re: Which D features to emphasize for academic review article

2012-08-12 Thread simendsjo
On Sun, 12 Aug 2012 12:38:47 +0200, Walter Bright  
 wrote:

On 8/11/2012 7:30 AM, Jakob Ovrum wrote:

On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
Consider `pure` member functions - turns out most of them are actually  
pure
because the implicit `this` parameter is allowed to be mutated and it's  
rare for

a member function to mutate global state, yet we all strive to correctly
decorate our methods `pure` when applicable.


A better design would be to have pure be the default and impure would  
require annotation. The same for const/immutable. Unfortunately, it's  
too late for that now. My fault.




I have thought that many times. The same with default non-null class  
references. I keep adding assert(someClass) everywhere.


Re: Which D features to emphasize for academic review article

2012-08-12 Thread dennis luehring

Am 12.08.2012 12:38, schrieb Walter Bright:

On 8/11/2012 7:30 AM, Jakob Ovrum wrote:

Consider `pure` member functions - turns out most of them are actually pure
because the implicit `this` parameter is allowed to be mutated and it's rare for
a member function to mutate global state, yet we all strive to correctly
decorate our methods `pure` when applicable.


A better design would be to have pure be the default and impure would require
annotation. The same for const/immutable. Unfortunately, it's too late for that
now. My fault.


its never to late - put it back on the list for D 3 - please
(and local variables are immuteable by default - or seomthing like that)



Re: Which D features to emphasize for academic review article

2012-08-12 Thread Era Scarecrow

On Sunday, 12 August 2012 at 11:34:20 UTC, dennis luehring wrote:

Am 12.08.2012 12:38, schrieb Walter Bright:
A better design would be to have pure be the default and 
impure would require annotation. The same for const/immutable. 
Unfortunately, it's too late for that now. My fault.


its never to late - put it back on the list for D 3 - please 
(and local variables are immutable by default - or something 
like that)


 Agreed. If it is only a signature change then it might have been 
possible to accept such a change; as I'm sure it would simplify 
quite a bit of signatures and only complicate a few. Probably 
default signatures to try and include are: pure, and @safe 
(Others off hand I can't think of).


 Make a list of all the issues/mistakes that can be done in D3 
(be it ten or fifteen years from now); who knows, maybe the 
future is just around the corner if there's a big enough reason 
for it. The largest reason not to make big changes is so people 
don't get fed up and quit (especially while still trying to write 
library code); That and this is suppose to be the 'stable' D2 
language right now with language changes having to be weighted 
heavily on.


Re: Which D features to emphasize for academic review article

2012-08-12 Thread Andrei Alexandrescu

On 8/12/12 12:52 AM, TJB wrote:

Thanks for bringing this back to the original topic and for your thoughts.

Indeed, a lot of econometricians are using MATLAB, R, Guass, Ox and the
like. But there are a number of econometricians who need the raw power
of a natively compiled language (especially financial econometricians
whose data are huge) who typically program in either Fortran or C/C++.
It is really this group that I am trying to reach. I think D has a lot
to offer this group in terms of programmer productivity and reliability
of code. I think this applies to statisticians as well, as I see a lot
of them in this latter group too.

I also want to reach the MATLABers because I think they can get a lot
more modeling power (I like how you put that) without too much more
difficulty (see Ox - nearly as complicated as C++ but without the
power). Many MATLAB and R programmers end up recoding a good part of
their algorithms in C++ and calling that code from the interpreted
language. I have always found this kind of mixed language programming to
be messy, time consuming, and error prone. Special tools are cropping up
to handle this (see Rcpp). This just proves to me the usefulness of a
productive AND powerful language like D for econometricians!


I think this is a great angle. In our lab when I was a grad student in 
NLP/ML there was also a very annoying trend going on: people would start 
with Perl for text preprocessing and Matlab for math, and then, after 
the proof of concept, would need to recode most parts in C++. (I recall 
hearing complaints about large overheads in Matlab caused by eager copy 
semantics, is that true?)



I am sensitive to the drawbacks you mention (especially lack of numeric
libraries). I am so sick of wasting my time in C++ though that I have
almost decided to just start writing my own econometric library in D.
Earlier in this thread there was a discussion of extended precision in D
and I mentioned the need to recode things like BLAS and LAPACK in D.
Templates in D seem perfect for this problem. As an expert in template
meta-programming what are your thoughts? How is this different than what
is being done in SciD? It seems they are mostly concerned about wrapping
the old CBLAS and CLAPACK libraries.


There's a large body of experience and many optimizations accumulated in 
these libraries, which are worth exploiting. The remaining matter is 
offering a convenient shell. I think Cristi's work on SciD goes that 
direction.



Andrei


Re: Which D features to emphasize for academic review article

2012-08-12 Thread Jakob Ovrum

On Sunday, 12 August 2012 at 10:39:01 UTC, Walter Bright wrote:
No, it is not easier to understand, because there's no way to 
determine if the intent is to:


1. initialize to a valid value -or-
2. initialize to get the compiler to stop complaining



If there is an explicit initializer, it means that the intent is 
either of those two. The latter case is probably quite rare, and 
might suggest a problem with the code - if the compiler can't 
prove your variable to be initialized, then the programmer 
probably has to spend some time figuring out the real answer. 
Legitimate cases of the compiler being too conservative can be 
annotated with a comment to eliminate the ambiguity.


The interesting part is that you can be sure that variables 
*without* initializers are guaranteed to be initialized at a 
later point, or the program won't compile. Without the guarantee, 
the default value could be intended as a valid initializer or 
there could be a bug in the program.


The current situation is not bad, I just think the one that 
allows for catching more errors at compile-time is much, much 
better.


Of course they are capable of it. But experience shows they 
simply don't.


If they do it for contagious attributes like const, immutable and 
pure, I'm sure they'll do it for a simple fix like using explicit 
'float.nan' in the rare case the compiler can't prove 
initialization before use.


A better design would be to have pure be the default and impure 
would require annotation. The same for const/immutable. 
Unfortunately, it's too late for that now. My fault.


I agree, but on the flip side it was easier to port D1 code to D2 
this way, and that might have saved D2 from even further 
alienation by some D1 users during its early stages. The most 
common complaints I remember from the IRC channel were complaints 
about const and immutable which was now forced on D programs to 
some degree due to string literals. This made some people really 
apprehensive about moving their code to D2, and I can imagine the 
fallout would be a lot worse if they had to annotate all their 
impure functions etc.


I believe there's a lot more potential for success when you 
have a design where the easiest way is the correct way, and 
you've got to make some effort to do it wrong. Much of my 
attitude on that goes back to my experience at Boeing on 
designing things (yes, my boring Boeing anecdotes again), and 
Boeing's long experience with pilots and mechanics and what 
they actually do vs what they're trained to do. (And not only 
are these people professionals, not fools, but their lives 
depend on doing it right.)


Over and over and over again, the easy way had better be the 
correct way. I could bore you even more with the aviation 
horror stories I heard that justified that attitude.


Problem is, we've pointed out the easy way has issues and is not 
necessarily correct.




Re: Which D features to emphasize for academic review article

2012-08-12 Thread bearophile

Andrei Alexandrescu:

(I recall hearing complaints about large overheads in Matlab 
caused by eager copy semantics, is that true?)


In Matlab there is COW:
http://www.matlabtips.com/copy-on-write-in-subfunctions/

Bye,
bearophile


Re: Which D features to emphasize for academic review article

2012-08-12 Thread dsimcha

On Sunday, 12 August 2012 at 03:30:24 UTC, bearophile wrote:

Andrei Alexandrescu:

- The language's superior modeling power and level of control 
comes at an increase in complexity compared to languages such 
as e.g. Python. So the statistician would need a larger 
upfront investment in order to reap the associated benefits.


Statistician often use the R language 
(http://en.wikipedia.org/wiki/R_language ).
Python contains much more "computer science" and CS complexity 
compared to R. Not just advanced stuff like coroutines, 
metaclasses, decorators, Abstract Base Classes, operator 
overloading, and so on, but even simpler things, like 
generators, standard library collections like heaps and deques, 
and so on.
For some statisticians I've seen, even several parts of Python 
are too much hard to use or understand. I have rewritten 
several of their Python scripts.


Bye,
bearophile



For people with more advanced CS/programming knowledge, though, 
this is an advantage of D.  I find Matlab and R incredibly 
frustrating to use for anything but very standard 
matrix/statistics computations on data that's already structured 
the way I like it.  This is mostly because the standard CS 
concepts you mention are at best awkward and at worst impossible 
to express and, being aware of them, I naturally want to take 
advantage of them.


Using Matlab or R feels like being forced to program with half 
the tools in my toolbox either missing or awkwardly misshapen, so 
I avoid it whenever practical.  (Actually, languages like C and 
Java that don't have much modeling power feel the same way to me 
now that I've primarily used D and to a lesser extent Python for 
the past few years.  Ironically, these are the languages that are 
easy to integrate with R and Matlab respectively.  Do most 
serious programmers who work in problem domains relevant to 
Matlab and R feel this way or is it just me?).  This was my 
motivation for writing Dstats and mentoring Cristi's fork of 
SciD.  D's modeling power is so outstanding that I was able to 
replace R and Matlab for a lot of use cases with plain old 
libraries written in D.


Re: Which D features to emphasize for academic review article

2012-08-12 Thread TJB

On Sunday, 12 August 2012 at 17:22:21 UTC, dsimcha wrote:

...  I find Matlab and R incredibly frustrating to use for 
anything but very standard matrix/statistics computations on 
data that's already structured the way I like it.


This is exactly how I feel, and why I am turning to D. My data 
sets are huge (64 TB for just a few years of data) and my 
econometric methods computationally intensive and the limitations 
of Matlab and R are always almost instantly constraining.


Using Matlab or R feels like being forced to program with half 
the tools in my toolbox either missing or awkwardly misshapen, 
so I avoid it whenever practical.  Actually, languages like C 
and Java that don't have much modeling power feel the same way 
to me ...


Very well put - it expresses my feeling precisely.  And C++ is 
such a complicated beast that I feel caught in between.  I'd been 
dreaming of a language that offers modeling power as well as 
efficiency.


...  Do most serious programmers who work in problem domains 
relevant to Matlab and R feel this way or is it just me?.


I certainly feel the same. I only use them when I have to or for 
very simple prototyping.


This was my motivation for writing Dstats and mentoring 
Cristi's fork of SciD.  D's modeling power is so outstanding 
that I was able to replace R and Matlab for a lot of use cases 
with plain old libraries written in D.


Thanks for your work on these packages! I will for sure be 
including them in my write up. I think they offer great 
possibilities for econometrics in D.


TJB




Re: Which D features to emphasize for academic review article

2012-08-12 Thread Adam Wilson
On Sun, 12 Aug 2012 03:38:47 -0700, Walter Bright  
 wrote:



On 8/11/2012 7:30 AM, Jakob Ovrum wrote:

On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:
Of course it is doing what the language requires, but it is an  
incorrect

diagnostic because a dead assignment is required.

And being a dead assignment, it can lead to errors when the code is  
later
modified, as I explained. I also dislike on aesthetic grounds  
meaningless code

being required.


It is not meaningless, it's declarative. The same resulting code as now  
would be
generated, but it's easier for the maintainer to understand what's  
being meant.


No, it is not easier to understand, because there's no way to determine  
if the intent is to:


1. initialize to a valid value -or-
2. initialize to get the compiler to stop complaining


I do, however, believe that D programmers are perfectly capable of  
doing the

right thing if informed.


Of course they are capable of it. But experience shows they simply don't.


Consider `pure` member functions - turns out most of them are actually  
pure
because the implicit `this` parameter is allowed to be mutated and it's  
rare for

a member function to mutate global state, yet we all strive to correctly
decorate our methods `pure` when applicable.


A better design would be to have pure be the default and impure would  
require annotation. The same for const/immutable. Unfortunately, it's  
too late for that now. My fault.



Java exception specifications have widespread implications for the  
entire
codebase, while the difference between '0' and 'float.nan' is constant  
and

entirely a local improvement.


I believe there's a lot more potential for success when you have a  
design where the easiest way is the correct way, and you've got to make  
some effort to do it wrong. Much of my attitude on that goes back to my  
experience at Boeing on designing things (yes, my boring Boeing  
anecdotes again), and Boeing's long experience with pilots and mechanics  
and what they actually do vs what they're trained to do. (And not only  
are these people professionals, not fools, but their lives depend on  
doing it right.)


Over and over and over again, the easy way had better be the correct  
way. I could bore you even more with the aviation horror stories I heard  
that justified that attitude.


As a pilot, I completely agree!

--
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/


Re: Which D features to emphasize for academic review article

2012-08-12 Thread F i L

Andrei Alexandrescu wrote:
* Efficiency - D generates native code for floating point 
operations and has control over data layout and allocation. 
Speed of generated code is dependent on the compiler, and the 
reference compiler (dmd) does a poorer job at it than the 
gnu-based compiler (gdc) compiler.


I'd like to add to this. Right now I'm reworking some libraries 
to include Simd support using DMD on Linux 64bit. A simple 
benchmark between DMD and GCC of 2 million simd vector 
addition/subtractions actually runs faster with my DMD D code 
than the GCC C code. Only by ~0.8 ms, and that could be due to a 
difference between D's sdt.datetime.StopWatch() and C's 
time.h/clock(), but it's consistently faster none-the-less, which 
is impressive.


That said, it's also much easier to "accidentally slow that 
figure down significantly in DMD, whereas GCC usually always 
optimizes very well.



Also, and I'm not sure this isn't just me, but I ran a DMD 
(v2.057 T think) vector test (no simd) against Mono C# a few 
moths back where DMD got only ~10 ms improvement over C# (~79ms 
vs ~88ms). Now a similar test compiled with DMD 2.060 runs at 
~22ms vs C#'s 80ms, so I believe there's been some definite 
optimization improvements in the internal DMD compiler over the 
last few version.


Re: Which D features to emphasize for academic review article

2012-08-12 Thread Joseph Rushton Wakeling

On 12/08/12 18:22, dsimcha wrote:

For people with more advanced CS/programming knowledge, though, this is an
advantage of D.  I find Matlab and R incredibly frustrating to use for anything
but very standard matrix/statistics computations on data that's already
structured the way I like it.  This is mostly because the standard CS concepts
you mention are at best awkward and at worst impossible to express and, being
aware of them, I naturally want to take advantage of them.


The main use-case and advantage of both R and MATLAB/Octave seems to me to be 
the plotting functionality -- I've seen some exceptionally beautiful stuff done 
with R in particular, although I've not personally explored its capabilities too 
far.


The annoyance of R in particular is the impenetrable thicket of dependencies 
that can arise among contributed packages; it feels very much like some are 
thrown over the wall and then built on without much concern for organization. :-(


Re: Which D features to emphasize for academic review article

2012-08-12 Thread dsimcha
On Monday, 13 August 2012 at 01:52:28 UTC, Joseph Rushton 
Wakeling wrote:
The main use-case and advantage of both R and MATLAB/Octave 
seems to me to be the plotting functionality -- I've seen some 
exceptionally beautiful stuff done with R in particular, 
although I've not personally explored its capabilities too far.


The annoyance of R in particular is the impenetrable thicket of 
dependencies that can arise among contributed packages; it 
feels very much like some are thrown over the wall and then 
built on without much concern for organization. :-(


I've addressed that, too :).

https://github.com/dsimcha/Plot2kill

Obviously this is a one-man project without nearly the same 
number of features that R and Matlab have, but like Dstats and 
SciD, it has probably the 20% of functionality that handles 80% 
of use cases.  I've used it for the figures in scientific 
articles that I've submitted for publication and in my Ph.D. 
proposal and dissertation.


Unlike SciD and Dstats, Plot2kill doesn't highlight D's modeling 
capabilities that much, but it does get the job done for simple 
2D plots.


Re: Which D features to emphasize for academic review article

2012-08-13 Thread Don Clugston

On 12/08/12 01:31, Walter Bright wrote:

On 8/11/2012 3:01 PM, F i L wrote:

Walter Bright wrote:

I'd rather have a 100 easy to find bugs than 1 unnoticed one that
went out in
the field.


That's just the thing, bugs are arguably easier to hunt down when
things default
to a consistent, usable value.


Many, many programming bugs trace back to assumptions that floating
point numbers act like ints. There's just no way to avoid knowing and
understanding the differences.


Exactly. I have come to believe that there are very few algorithms 
originally designed for integers, which also work correctly for floating 
point.


Integer code nearly always assumes things like, x + 1 != x, x == x,
(x + y) - y == x.


for (y = x; y < x + 10; y = y + 1) {  }

How many times does it loop?




Re: Which D features to emphasize for academic review article

2012-08-13 Thread Joseph Rushton Wakeling

On 13/08/12 11:11, Don Clugston wrote:

Exactly. I have come to believe that there are very few algorithms originally
designed for integers, which also work correctly for floating point.


  
import std.stdio;

void main()
{
real x = 1.0/9.0;

writefln("x = %.128g", x);
writefln("9x = %.128g", 9.0*x);
}
  

... well, that doesn't work, does it?  Looks like some sort of cheat in place to 
make sure that the successive division and multiplication will revert to the 
original number.



Integer code nearly always assumes things like, x + 1 != x, x == x,
(x + y) - y == x.


There's always good old "if(x==0)" :-)


Re: Which D features to emphasize for academic review article

2012-08-13 Thread bearophile

Don Clugston:

I have come to believe that there are very few algorithms 
originally designed for integers, which also work correctly for 
floating point.


And JavaScript programs that use integers?

Bye,
bearophile


Re: Which D features to emphasize for academic review article

2012-08-13 Thread Walter Bright

On 8/13/2012 5:38 AM, Joseph Rushton Wakeling wrote:

Looks like some sort of cheat in place to
make sure that the successive division and multiplication will revert to the
original number.


That's called "rounding". But rounding always implies some, small, error that 
can accumulate into being a very large error.


Re: Which D features to emphasize for academic review article

2012-08-13 Thread Walter Bright

On 8/12/2012 6:38 PM, F i L wrote:

Also, and I'm not sure this isn't just me, but I ran a DMD (v2.057 T think)
vector test (no simd) against Mono C# a few moths back where DMD got only ~10 ms
improvement over C# (~79ms vs ~88ms). Now a similar test compiled with DMD 2.060
runs at ~22ms vs C#'s 80ms, so I believe there's been some definite optimization
improvements in the internal DMD compiler over the last few version.


There's a fair amount of low hanging optimization fruit that D makes possible 
that dmd does not take advantage of. I hope to get to this.


One thing is I suspect that D can generate much better SIMD code than C/C++ can 
without compiler extensions.


Another is that D allows values to be moved without needing a 
copyconstruct/destruct operation.


Re: Which D features to emphasize for academic review article

2012-08-13 Thread Joseph Rushton Wakeling

On 13/08/12 20:04, Walter Bright wrote:

That's called "rounding". But rounding always implies some, small, error that
can accumulate into being a very large error.


Well, yes.  I was just remarking on the choice of rounding and the motivation 
behind it.


After all, you _could_ round it instead as,

x = 1.0/9.0 == 0.11 ... 111  [finite number of decimal places]

but then

9*x == 0. ...    [i.e. doesn't multiply back to 1.0].

... and this is probably more likely to result in undesirable error than the 
other rounding scheme.  (I think the calculator app on Windows used to have this 
behaviour some years back.)


Re: Which D features to emphasize for academic review article

2012-08-13 Thread TJB

On Monday, 13 August 2012 at 10:11:06 UTC, Don Clugston wrote:

 ... I have come to believe that there are very few algorithms 
originally designed for integers, which also work correctly for 
floating point.


Integer code nearly always assumes things like, x + 1 != x, x 
== x,

(x + y) - y == x.


for (y = x; y < x + 10; y = y + 1) {  }

How many times does it loop?


Don,

I would appreciate your thoughts on the issue of re-implementing 
numeric codes like BLAS and LAPACK in pure D to benefit from the 
many nice features listed in this discussion.  Is it feasible? 
Worthwhile?


Thanks,

TJB


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Don Clugston

On 14/08/12 05:03, TJB wrote:

On Monday, 13 August 2012 at 10:11:06 UTC, Don Clugston wrote:


 ... I have come to believe that there are very few algorithms
originally designed for integers, which also work correctly for
floating point.

Integer code nearly always assumes things like, x + 1 != x, x == x,
(x + y) - y == x.


for (y = x; y < x + 10; y = y + 1) {  }

How many times does it loop?


Don,

I would appreciate your thoughts on the issue of re-implementing numeric
codes like BLAS and LAPACK in pure D to benefit from the many nice
features listed in this discussion.  Is it feasible? Worthwhile?

Thanks,

TJB


I found that when converting code for Special Functions from C to D, the 
code quality improved enormously. Having 'static if' and things like 
float.epsilon as built-ins makes a surprisingly large difference. It 
encourages correct code. (For example, it makes any use of magic numbers 
in the code look really ugly and wrong). Unit tests help too.


That probably doesn't apply so much to LAPACK and BLAS, but it would be 
interesting to see how far we can get with the new SIMD support.




Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Saturday, 11 August 2012 at 05:41:23 UTC, Walter Bright wrote:

On 8/10/2012 9:55 PM, F i L wrote:
On the first condition, without an 'else z = ...', or if the 
condition was removed at a later time, then you'll get a 
compiler error and be forced to explicitly assign 'z' 
somewhere above using it. So C# and D work in "similar" ways 
in this respect except that C# catches these issues at 
compile-time, whereas in D you need to:

  1. run the program
  2. get bad result
  3. hunt down bug


However, and I've seen this happen, people will satisfy the 
compiler complaint by initializing the variable to any old 
value (usually 0), because that value will never get used. 
Later, after other things change in the code, that value 
suddenly gets used, even though it may be an incorrect value 
for the use.



Note to Walter:

You're obviously correct that you can make an arbitrarily complex 
program to make it too difficult for the compiler to enforce 
initialization, the way C# does (and gives up in some cases).


What you seem to be missing is that the issue you're saying is 
correct in theory, but too much of a corner case in practice.


C#/Java programmers ___rarely___ run into the sort of issue 
you're mentioning, and even when they do, they don't have nearly 
as much of a problem with fixing it as you seem to think.


The only reason you run into this sort of problem (assuming you 
do, and it's not just a theoretical discussion) is that you're in 
the C/C++ mindset, and using variables in the C/C++ fashion.
If you were a "C#/Java Programmer" instead of a "C++ Programmer", 
you simply _wouldn't_ try to make things so complicated when 
coding, and you simply _wouldn't_ run into these problems the way 
you /think/ you would, as a C++ programmer.



Regardless, it looks to me like you two are arguing for two 
orthogonal issues:


F i L:  The compiler should detect uninitialized variables.
Walter: The compiler should choose initialize variables with NaN.


What I'm failing to understand is, why can't we have both?

1. Compiler _warns_ about "uninitialized variables" (or scalars, 
at least) the same way C# and Java do, __unless__ the user takes 
the address of the variable, in which case the compiler gives up 
trying to detect the flow (like C#).
Bonus points: Try to detect a couple of common cases (e.g. 
if/else) instead of giving up so easily.


2. In any case, the compiler initializes the variable with 
whatever default value Walter deems useful.



Then you get the best of both worlds:

1. You force the programmer to manually initialize the variable 
in most cases, forcing him to think about the default value. It's 
almost no trouble for


2. In the cases where it's not possible, the language helps the 
programmer catch bugs.



Why the heck D avoids #1, I have no idea.

It's one of the _major_ features of C# and Java that help promote 
correctness, and #1 looks orthogonal to #2 to me.




For users who don't like #1: They can suppress the warning. 
Nothing lost, anyway.
For users who DO like #1: They can turn it into an error. A lot 
to be gained.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Michal Minich

On Tuesday, 14 August 2012 at 10:31:30 UTC, Mehrdad wrote:

Note to Walter:

You're obviously correct that you can make an arbitrarily 
complex program to make it too difficult for the compiler to 
enforce initialization, the way C# does (and gives up in some 
cases).


What you seem to be missing is that the issue you're saying is 
correct in theory, but too much of a corner case in practice.


C#/Java programmers ___rarely___ run into the sort of issue 
you're mentioning, and even when they do, they don't have 
nearly as much of a problem with fixing it as you seem to think.


Completely agree. I find it quite useful in C#. It helps a lot in 
hairy code (nested if/foreach/try) to make sure all cases are 
handled when initializing variable. Compilation errors can be 
simply dismissed by assigning a 'default' value to variable at 
the beginning the functions, but is generally a sloppy programing 
and you loose useful help of the compiler.


The rules in C# are very simple and almost verbatim can be 
applied to D

http://msdn.microsoft.com/en-us/library/aa691172%28v=vs.71%29.aspx



Re: Which D features to emphasize for academic review article

2012-08-14 Thread Don Clugston

On 14/08/12 12:31, Mehrdad wrote:

On Saturday, 11 August 2012 at 05:41:23 UTC, Walter Bright wrote:

On 8/10/2012 9:55 PM, F i L wrote:

On the first condition, without an 'else z = ...', or if the
condition was removed at a later time, then you'll get a compiler
error and be forced to explicitly assign 'z' somewhere above using
it. So C# and D work in "similar" ways in this respect except that C#
catches these issues at compile-time, whereas in D you need to:
  1. run the program
  2. get bad result
  3. hunt down bug


However, and I've seen this happen, people will satisfy the compiler
complaint by initializing the variable to any old value (usually 0),
because that value will never get used. Later, after other things
change in the code, that value suddenly gets used, even though it may
be an incorrect value for the use.



Note to Walter:

You're obviously correct that you can make an arbitrarily complex
program to make it too difficult for the compiler to enforce
initialization, the way C# does (and gives up in some cases).

What you seem to be missing is that the issue you're saying is correct
in theory, but too much of a corner case in practice.

C#/Java programmers ___rarely___ run into the sort of issue you're
mentioning, and even when they do, they don't have nearly as much of a
problem with fixing it as you seem to think.

The only reason you run into this sort of problem (assuming you do, and
it's not just a theoretical discussion) is that you're in the C/C++
mindset, and using variables in the C/C++ fashion.
If you were a "C#/Java Programmer" instead of a "C++ Programmer", you
simply _wouldn't_ try to make things so complicated when coding, and you
simply _wouldn't_ run into these problems the way you /think/ you would,
as a C++ programmer.


Regardless, it looks to me like you two are arguing for two orthogonal
issues:

F i L:  The compiler should detect uninitialized variables.
Walter: The compiler should choose initialize variables with NaN.


What I'm failing to understand is, why can't we have both?

1. Compiler _warns_ about "uninitialized variables" (or scalars, at
least) the same way C# and Java do, __unless__ the user takes the
address of the variable, in which case the compiler gives up trying to
detect the flow (like C#).
Bonus points: Try to detect a couple of common cases (e.g. if/else)
instead of giving up so easily.

2. In any case, the compiler initializes the variable with whatever
default value Walter deems useful.


Then you get the best of both worlds:

1. You force the programmer to manually initialize the variable in most
cases, forcing him to think about the default value. It's almost no
trouble for

2. In the cases where it's not possible, the language helps the
programmer catch bugs.


Why the heck D avoids #1, I have no idea.


DMD detects uninitialized variables if you compile with -O. It's hard to 
implement the full Monty at the moment, because all that code is in the 
backend rather than the front-end.



It's one of the _major_ features of C# and Java that help promote
correctness, and #1 looks orthogonal to #2 to me.


Completely agree.
I always thought the intention was that assigning to NaN was simply a 
way of catching the difficult cases that slip through compile-time 
checks. Which includes the situation where the compile-time checking 
isn't yet implemented at all.
This is the first time I've heard the suggestion that it might never be 
implemented.


The thing which is really bizarre though, is float.init. I don't know 
what the semantics of it are.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread F i L

Mehrdad wrote:

Note to Walter:

You're obviously correct that you can make an arbitrarily 
complex program to make it too difficult for the compiler to 
enforce initialization, the way C# does (and gives up in some 
cases).


[ ... ]


I think some here are mis-interpreting Walters position 
concerning static analysis from our earlier conversation, so I'll 
share my impression of his thoughts.


I can't speak for Walter, of course, but I'm pretty sure that 
early on in our conversation he agreed that having the compiler 
catch local scope initialization issues was a good idea, or at 
least, wasn't a bad one (again, correct me if I'm wrong). I doubt 
he would be adverse to eventually having DMD perform this sort of 
static analysis to help developers, though I doubt it's a high 
priority for him.


The majority of the conversation after that was concerning 
struct/class fields defaults:


  class Foo
  {
  float x; // I think this should be 0.0f
   // Walter thinks it should be NaN
  }

In this situation static analysis can't help catch issues, and 
we're forced to rely on a default value of some kind. Both Walter 
and I have stated our opinion's reasoning previously, so I won't 
repeat them here.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Simen Kjaeraas

On Tue, 14 Aug 2012 16:32:25 +0200, F i L  wrote:


   class Foo
   {
   float x; // I think this should be 0.0f
// Walter thinks it should be NaN
   }

In this situation static analysis can't help catch issues, and we're  
forced to rely on a default value of some kind.


Really? We can catch (or, should be able to) missing initialization
of stuff with @disable this(), but not floats?

Classes have constructors, which lend themselves perfectly to doing
exactly this (just pretend the member is a local variable).

Perhaps there are problems with structs without disabled default
constructors, but even those are trivially solvable by requiring
a default value at declaration time.

--
Simen


Re: Which D features to emphasize for academic review article

2012-08-14 Thread F i L

On Tuesday, 14 August 2012 at 14:46:30 UTC, Simen Kjaeraas wrote:
On Tue, 14 Aug 2012 16:32:25 +0200, F i L  
wrote:



  class Foo
  {
  float x; // I think this should be 0.0f
   // Walter thinks it should be NaN
  }

In this situation static analysis can't help catch issues, and 
we're forced to rely on a default value of some kind.


Really? We can catch (or, should be able to) missing 
initialization

of stuff with @disable this(), but not floats?

Classes have constructors, which lend themselves perfectly to 
doing

exactly this (just pretend the member is a local variable).

Perhaps there are problems with structs without disabled default
constructors, but even those are trivially solvable by requiring
a default value at declaration time.


You know, I never actually thought about it much, but I think 
you're right. I guess the same rules could apply to type fields.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Era Scarecrow

On Tuesday, 14 August 2012 at 15:24:30 UTC, F i L wrote:
Really? We can catch (or, should be able to) missing 
initialization of stuff with @disable this(), but not floats?


Classes have constructors, which lend themselves perfectly to 
doing exactly this (just pretend the member is a local 
variable).


Perhaps there are problems with structs without disabled 
default constructors, but even those are trivially solvable by 
requiring a default value at declaration time.


You know, I never actually thought about it much, but I think 
you're right. I guess the same rules could apply to type fields.


Mmmm... What if you added a command that has a file/local scope? 
perhaps following the @disable this(), it could be @disable init; 
or @disable .init. This would only work for built-in types, and 
possibly structs with variables that aren't explicitly set with 
default values. It sorta already fits with what's there.


@disable init; //global scope in file, like @safe.

struct someCipher {
  @disable init; //local scope, in this case the whole struct.

  int[][] tables; //now gives compile-time error unless @disable 
this() used.

  ubyte[] key = [1,2,3,4]; //explicitly defined as a default
  this(ubyte[] k, int[][] t){key=k;tables=t;}
}

void myfun() {
  someCipher x; //compile time error since struct fails (But not 
at this line unless @disable this() used)
  someCipher y = someCipher([[1,2],[1,2]]); //should work as 
expected.

}


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Tuesday, 14 August 2012 at 15:24:30 UTC, F i L wrote:
On Tuesday, 14 August 2012 at 14:46:30 UTC, Simen Kjaeraas 
wrote:
On Tue, 14 Aug 2012 16:32:25 +0200, F i L 
 wrote:



 class Foo
 {
 float x; // I think this should be 0.0f
  // Walter thinks it should be NaN
 }

In this situation static analysis can't help catch issues, 
and we're forced to rely on a default value of some kind.


Really? We can catch (or, should be able to) missing 
initialization

of stuff with @disable this(), but not floats?

Classes have constructors, which lend themselves perfectly to 
doing

exactly this (just pretend the member is a local variable).

Perhaps there are problems with structs without disabled 
default
constructors, but even those are trivially solvable by 
requiring

a default value at declaration time.


You know, I never actually thought about it much, but I think 
you're right. I guess the same rules could apply to type fields.


C# structs, as you might recall, enforce definite initialization. 
:)


We could do the same for structs and classes... what I said 
doesn't just apply to local variables.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Tuesday, 14 August 2012 at 14:32:26 UTC, F i L wrote:

Mehrdad wrote:

Note to Walter:

You're obviously correct that you can make an arbitrarily 
complex program to make it too difficult for the compiler to 
enforce initialization, the way C# does (and gives up in some 
cases).


[ ... ]


I think some here are mis-interpreting Walters position 
concerning static analysis from our earlier conversation, so 
I'll share my impression of his thoughts.


I can't speak for Walter, of course, but I'm pretty sure that 
early on in our conversation he agreed that having the compiler 
catch local scope initialization issues was a good idea, or at 
least, wasn't a bad one (again, correct me if I'm wrong). I 
doubt he would be adverse to eventually having DMD perform this 
sort of static analysis to help developers, though I doubt it's 
a high priority for him.



Ah, well if he's for it, then I misunderstood. I read through the 
entire thread (but not too carefully, just 1 read) and my 
impression was that he didn't like the idea because it would fail 
in some cases (and because D doesn't seem to love emitting 
compiler warnings in general), but if he likes it, then great. :)


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Walter Bright

On 8/14/2012 3:31 AM, Mehrdad wrote:

Then you get the best of both worlds:

1. You force the programmer to manually initialize the variable in most cases,
forcing him to think about the default value. It's almost no trouble for

2. In the cases where it's not possible, the language helps the programmer catch
bugs.


Why the heck D avoids #1, I have no idea.


As I've explained before, user defined types have "default constructors". If 
builtin types do not, then you've got a barrier to writing generic code.


Default initialization also applies to static arrays, tuples, structs and 
dynamic allocation. It seems a large inconsistency to complain about them only 
for local variables of basic types, and not for any aggregate type or user 
defined type.




It's one of the _major_ features of C# and Java that help promote correctness,
and #1 looks orthogonal to #2 to me.


I know Java doesn't have default construction - does C#?

As for the 'rarity' of the error I mentioned, yes, it is unusual. The trouble is 
when it creeps unexpectedly into otherwise working code that has been working 
for a long time.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Tuesday, 14 August 2012 at 21:22:14 UTC, Mehrdad wrote:

C# and Java don't.


Typo, scratch Java, it's N/A for Java.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Tuesday, 14 August 2012 at 21:13:01 UTC, Walter Bright wrote:

On 8/14/2012 3:31 AM, Mehrdad wrote:

Then you get the best of both worlds:

1. You force the programmer to manually initialize the 
variable in most cases,
forcing him to think about the default value. It's almost no 
trouble for


2. In the cases where it's not possible, the language helps 
the programmer catch

bugs.


Why the heck D avoids #1, I have no idea.


As I've explained before, user defined types have "default 
constructors". If builtin types do not, then you've got a 
barrier to writing generic code.


Just because they _have_ a default constructor doesn't mean the 
compiler should implicitly _call_ them on your behalf.


C# and Java don't.


It's one of the _major_ features of C# and Java that help 
promote correctness, and #1 looks orthogonal to #2 to me.


I know Java doesn't have default construction - does C#?



Huh? I think you completely misread my post...
I was talking about "definite assignment", i.e. the _lack_ of 
automatic initialization.



As for the 'rarity' of the error I mentioned, yes, it is 
unusual. The trouble is when it creeps unexpectedly into 
otherwise working code that has been working for a long time.


It's no "trouble" in practice, that's what I'm trying to say. It 
only looks like "trouble" if you look at it from the C/C++ 
perspective instead of the C#/Java perspective.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Walter Bright

On 8/14/2012 2:22 PM, Mehrdad wrote:

I was talking about "definite assignment", i.e. the _lack_ of automatic
initialization.


I know. How does that fit in with default construction?


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Tuesday, 14 August 2012 at 21:58:20 UTC, Walter Bright wrote:

On 8/14/2012 2:22 PM, Mehrdad wrote:
I was talking about "definite assignment", i.e. the _lack_ of 
automatic initialization.


I know. How does that fit in with default construction?


They aren't called unless the user calls them.


void Bar(T value) { }

void Foo()
where T : new()   // generic constraint for default constructor
{
T uninitialized;
T initialized = new T();

Bar(initialized);  // error
Bar(uninitialized);  // OK
}

void Test() { Foo(); Foo(); }


D could take a similar approach.


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Tuesday, 14 August 2012 at 22:57:26 UTC, Mehrdad wrote:

Bar(initialized);  // error
Bar(uninitialized);  // OK



Er, other way around I mean...


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Walter Bright

On 8/14/2012 3:57 PM, Mehrdad wrote:

I know. How does that fit in with default construction?

They aren't called unless the user calls them.


I guess they aren't really default constructors, then .

So what happens when you allocate an array of them?



D could take a similar approach.


It could, but default construction is better (!).


Re: Which D features to emphasize for academic review article

2012-08-14 Thread Mehrdad

On Wednesday, 15 August 2012 at 00:32:43 UTC, Walter Bright wrote:

On 8/14/2012 3:57 PM, Mehrdad wrote:
I guess they aren't really default constructors, then .


I say potayto, you say potahto...  :P



So what happens when you allocate an array of them?


For arrays, they're called automatically.


Well, OK, that's a bit of a simplification.

It's what happens from the user perspective, not the compiler's 
(or runtime's).


Here's the full story.
And please read it carefully, since I'm __not__ saying D should 
adopt what C# does word for word!


In C#:
- You can define a custom default constructor for classes, but 
not structs.
- Structs _always_ have a zero-initializing default 
(no-parameter) constructor.
- Therefore, there is no such thing as "copy construction"; it's 
bitwise-copied.
- Ctors for _structs_ MUST initialize every field (or call the 
default ctor)

- Ctors for _classes_ don't have this restriction.
- Since initialization is "Cheap", the runtime _always_ does it, 
for _security_.

- The above^ is IRRELEVANT to the compiler!
  * It enforces initialization where it can.
  * It explicitly tells the runtime to auto-initialize when it 
can't.
-- You can ONLY take the address of a variable in unsafe{} 
blocks.
-- This implies you know what you're doing, so it's not a 
problem.



What D would do _ideally_, IMO:

1. Keep the ability to define default (no-args) and postblit 
constructors.


2. _Always_ force the programmer to initialize _all_ variables 
explicitly.

   * No, this is NOT what C++ does.
   * Yes, it is tested & DOES work well in practice. But NOT in 
the C++ mindset.
   * If the programmer _needs_ vars to be uninitialized, he can 
say  = void.
   * If the programmer wants NaNs, he can just say = T.init. 
Bingo.



It should work pretty darn well, if you actually give it a try.

(Don't believe me? Put it behind a compiler switch, and see how 
many people start using it, and how many of them [don't] complain 
about it!)




D could take a similar approach.


It could, but default construction is better (!).


Well, that's so convincing, I'm left speechless!