Re: Optionally strongly typed array indexes

2014-06-06 Thread Philippe Sigaud via Digitalmars-d
 Also true, though as a side note, I think a library solution for this could
 be quite nice:

   enum newton = 1.as!kg*m/s^2;  // One possibility.
   enum newton = 1*kg*m/square(s); // Another.

Sorry to intrude, but you can also get:

enum newton = 1.kg/m/s^^2;

Which is quite readable. In this case, 'kg', 'm' and 's' are factory
functions. The '1.kg' part is just UFCS in action.

For the subjacent return type, you can also define the power operator,
at least for integers.


Re: Optionally strongly typed array indexes

2014-06-06 Thread Mason McGill via Digitalmars-d
On Friday, 6 June 2014 at 20:22:35 UTC, Philippe Sigaud via 
Digitalmars-d wrote:

Sorry to intrude, but you can also get:

enum newton = 1.kg/m/s^^2;


Good point. I actually learned D had an exponentiation operator 
just yesterday. It definitely helps readability in this case.


Re: Optionally strongly typed array indexes

2014-06-05 Thread bearophile via Digitalmars-d

Sorry for my answers coming so slowly.

Mason McGill:


Is this considered idiomatic?


Time ago I have asked here to make this index 'i' immutable on 
default, because the literal 0 .. 10 defines an immutable range 
of values:


void main() {
import std.stdio;
foreach (i; 0 .. 10) {
writeln(i);
}
}


Originally code like this was giving weird results, but thanks to 
Kenji this design bug was fixed, and now this code prints the 
first ten integers normally, so now the situation is not as bad 
as it used to be:


void main() {
import std.stdio;
foreach (i; 0 .. 10) {
writeln(i);
i++;
}
}



Currently this prints just the even numbers, I think this is a 
little weird, but I think it sufficiently safe thanks to the 
explicit 'ref' annotation:


void main() {
import std.stdio;
foreach (ref i; 0 .. 10) {
writeln(i);
i++;
}
}


Generally in my D code I write like this, as explained here:
http://blog.knatten.org/2011/11/11/disempower-every-variable/


void main() {
import std.stdio;
foreach (immutable i; 0 .. 10) {
writeln(i);
i++; // Not allowed.
}
}


If your 'i' variable or your function arguments are 
const/immutable, you will have less surprises in your code (and a 
little less bugs). In D it's better to use const/immutable 
everywhere you can. Immutable by default is probably the right 
choice for a modern language. Unfortunately D has missed this 
opportunity.



I think this is one part of the larger problem of representing 
units.


Units of measure introduce a good amount of complexity and logic 
that is absent in the idea of giving strongly types to array 
indexes. You don't need to convert inches to centimetres, you 
don't need to perform a compile-time dimensional analysis for the 
correctness of expressions, you don't need to keep in mind the 
0-based problems of converting Celsius to Fahrenheit, you don't 
need to solve the syntax problems of attaching m / s somehow to 
number literals, and so on. So they are two rather different 
problems.




though I'll have to think about whether it covers all
the cases you're interested in.


Another problem is visible in this code, I have a function bar() 
that must receive a different value for each item of the enum 
Foo. If I use an associative array, the compiler doesn't catch at 
compile-time the bug of the missing Foo.B case:



enum Foo : ubyte { A, B }
void bar(int[Foo] aa) {}
void main() {
bar([Foo.A: 10]);
}


And often using an associative array for this purpose is a waste 
of memory and computation.


The Ada compiler is able to catch this bug at compile-time, using 
a simple fixed-size array (exactly as long as the number of items 
in the enum) with indexes strongly typed as Foo. So the array 
literal must contain all possible indexes only once:



import std.traits: EnumMembers;
enum Foo : ubyte { A, B }
void bar(int[@typed(Foo) EnumMembers!Foo.length] input) {}
void main() {
bar([Foo.A: 10, Foo.B: 20]);
}


If you give a strong type to the array index you avoid that bug 
at compile time. (To be accepted as index type, an enumeration 
must behave nicely, all such tests can be done at compile-time).


Bye,
bearophile


Re: Optionally strongly typed array indexes

2014-06-05 Thread Mason McGill via Digitalmars-d

On Thursday, 5 June 2014 at 09:23:00 UTC, bearophile wrote:

Sorry for my answers coming so slowly.


Hah! I'm on these forums (selfishly) trying to understand/improve 
D for my own use at work, while you seem to be 
helping/teaching/contributing for the sake of it. I'm continually 
surprised by how pleasant and professional this community is; 
nothing at all to be sorry for!



void main() {
import std.stdio;
foreach (i; 0 .. 10) {
writeln(i);
i++;
}
}

Currently this prints just the even numbers, I think this is a 
little weird.


That is weird. It goes counter to the idea of `i` coming from 
the range. Though, I'd say someone writing that code is asking 
for strange behavior, so it's not much of an issue.



Generally in my D code I write like this, as explained here:
http://blog.knatten.org/2011/11/11/disempower-every-variable/


I agree, with the caveat of trying to not add so much noise 
that it becomes unreadable.


Immutable by default is probably the right choice for a modern 
language.


Yes, or at least making operations with side effects look 
different from operations without.


I think this is one part of the larger problem of representing 
units.


Units of measure introduce a good amount of complexity and 
logic that is absent in the idea of giving strongly types to 
array indexes.


You don't need to convert inches to centimetres,


True.

You don't need to perform a compile-time dimensional analysis 
for the correctness of expressions.


I'd argue you do, at least given my understanding of the goals of 
strongly-typed array indices. For example, consider strided 
indexing:


  const dogs = [Rover, Fido, Rex].of!dog;
  const cats = [Fluffy, Mr. Whiskers].of!cat;

  // This should work.
  const i1 = index!dog(1);
  dogs[i1];

  // This should not work.
  const i2 = index!cat(1);
  dogs[i2];

  // This should work.
  const i3 = 2 * i1;
  dogs[i3];

  // This should not work.
  const i4 = i2 * i1;
  dogs[i4];

So, the library needs to do some form of dimensional analysis to 
allow typed indices to be multiplied by unit-less scalars, but 
not scalars with units.


You don't need to keep in mind the 0-based problems of 
converting Celsius to Fahrenheit.


True. This is why I said the problem of strongly typed array 
indexes is *part* of the problem of units, not equivalent to it.


You don't need to solve the syntax problems of attaching m / 
s somehow to number literals.


Also true, though as a side note, I think a library solution for 
this could be quite nice:


  enum newton = 1.as!kg*m/s^2;  // One possibility.
  enum newton = 1*kg*m/square(s); // Another.


though I'll have to think about whether it covers all
the cases you're interested in.


Another problem is visible in this code, I have a function 
bar() that must receive a different value for each item of the 
enum Foo. If I use an associative array, the compiler doesn't 
catch at compile-time the bug of the missing Foo.B case:



enum Foo : ubyte { A, B }
void bar(int[Foo] aa) {}
void main() {
bar([Foo.A: 10]);
}


And often using an associative array for this purpose is a 
waste of memory and computation.


The Ada compiler is able to catch this bug at compile-time, 
using a simple fixed-size array (exactly as long as the number 
of items in the enum) with indexes strongly typed as Foo. So 
the array literal must contain all possible indexes only once:



import std.traits: EnumMembers;
enum Foo : ubyte { A, B }
void bar(int[@typed(Foo) EnumMembers!Foo.length] input) {}
void main() {
bar([Foo.A: 10, Foo.B: 20]);
}


If you give a strong type to the array index you avoid that bug 
at compile time. (To be accepted as index type, an enumeration 
must behave nicely, all such tests can be done at compile-time).


That's quite clever, though couldn't it also be done with a 
library?


  enum Foo : ubyte { A, B }
  enum nFoo = EnumMembers!Foo.length;
  alias AllFoo = WithUnits!(int[nFoo], Foo);

  void bar(AllFoo) {}

  void main() {
  bar(AllFoo([/*A*/ 10, /*B*/ 20]));
  }


Re: Optionally strongly typed array indexes

2014-06-04 Thread bearophile via Digitalmars-d

Mason McGill:

There's definitely a tension between writing generic code 
(NumPy-style, operating on abstract dimensions) and safe code 
(operating on dimensions with semantics;


While you can shove lot of different kind of code inside the 
matrix-based style of coding of APL, NumPy and so on, I find 
there is also plenty of code that is more naturally expressed 
with loops or with higher order functions as map/filter.


And sometimes in D you want to implement the array processing 
functions (the ones of NumPy that are written in Fortran, C, 
C++), using loops. The non-built-in data structures need to be 
implemented in D, so strong typing for indexes helps you write 
those routines more correctly.



It seems to me that the users who may benefit most from this 
feature are likely to be using non-built-in data structures.


Strongly typed indexes are useful not just for handling 2D or 3D 
matrices, but also when you need to use two or more 1D arrays, to 
avoid confusing indexes of an array with the indexes of the 
other. Such situations are common even in non numerical D code. I 
write plenty of code like this.



For example, you mention interpreting the first level of array 
nesting as rows and the second as columns, implying you are 
treating the nested arrays as a matrix


Right.

You can give strong typing to indexes of a matrix library, this 
doesn't need language changes. So a dynamic array library in 
Phobos could fulfil most of the possible desires or needs for 
strongly typed indexes. But you lose some of the convenience of 
the built-in arrays, and you have to import a bulky library that 
you are mostly not using.
I see several places in all kinds of D code where strongly typed 
indexes can help tell things apart more clearly and safely, but I 
think only in some of such situations programmers are willing to 
import a numpy-like library just to have strong typing. In D 
learn I see that people don't even add immutable to the foreach 
variable that spans an interval. So I don't expect people to 
import numpy just to give types to two indexes of two 1D arrays. 
And statistically I think such cases of just two or tree 1D 
arrays are much more common than numerical code that manages 
several matrices with a good matrix library.


Bye,
bearophile


Re: Optionally strongly typed array indexes

2014-06-04 Thread Mason McGill via Digitalmars-d

On Wednesday, 4 June 2014 at 09:43:20 UTC, bearophile wrote
You can give strong typing to indexes of a matrix library, this 
doesn't need language changes. So a dynamic array library in 
Phobos could fulfil most of the possible desires or needs for 
strongly typed indexes. But you lose some of the convenience of 
the built-in arrays, and you have to import a bulky library 
that you are mostly not using.
I see several places in all kinds of D code where strongly 
typed indexes can help tell things apart more clearly and 
safely, but I think only in some of such situations programmers 
are willing to import a numpy-like library just to have strong 
typing.


I'll have to check out Ada to get a sense of how this feature 
works and how people use it.


In D learn I see that people don't even add immutable to the 
foreach variable that spans an interval.


I noticed you did that, though I haven't seen this much on 
Dlang.org. Is this considered idiomatic? Coming mostly from 
JavaScript/Python/CUDA C++ I'm not sure how I feel about the 
terseness/safety trade-off (I can honestly see it both ways).


So I don't expect people to import numpy just to give types to 
two indexes of two 1D arrays. And statistically I think such 
cases of just two or tree 1D arrays are much more common than 
numerical code that manages several matrices with a good matrix 
library.


Good point. Though, I don't think adding compiler support for 
strongly typed array dimensions is the full solution. I think 
this is one part of the larger problem of representing units. 
Other languages have solutions to this (I particularly like F#'s: 
http://en.wikibooks.org/wiki/F_Sharp_Programming/Units_of_Measure), 
and it would be very nice to see something this in Phobos:


  /* User code. */
  import std.awesome_unit_library: Unit, of;
  // Unit: defines new types that can be multiplied  divided.
  // of: wraps a container in a stricter version, enforcing units.

  alias Name = Unit!name;
  alias Food = Unit!food;
  auto names = [Fred, Alice, Sue].of!Name;
  auto foods = [Apple, Orange, Tofu].of!Food;

  foreach (name; [Bill, Ted].of!Name)
  {
  names ~= name; // This compiles happily.
  foods ~= name; // This does not compile, preventing
 // Bill and Ted from being eaten.
  }


Re: Optionally strongly typed array indexes

2014-06-04 Thread Mason McGill via Digitalmars-d

On Wednesday, 4 June 2014 at 18:46:04 UTC, Mason McGill wrote:

  /* User code. */
  import std.awesome_unit_library: Unit, of;
  // Unit: defines new types that can be multiplied  divided.
  // of: wraps a container in a stricter version, enforcing 
units.


  alias Name = Unit!name;
  alias Food = Unit!food;
  auto names = [Fred, Alice, Sue].of!Name;
  auto foods = [Apple, Orange, Tofu].of!Food;

  foreach (name; [Bill, Ted].of!Name)
  {
  names ~= name; // This compiles happily.
  foods ~= name; // This does not compile, preventing
 // Bill and Ted from being eaten.
  }


You could actually just let the `of` template create the `Unit` 
type for you, if you want to be terse:


  import std.awesome_unit_library: of;

  auto names = [Fred, Alice, Sue].of!name;
  auto foods = [Apple, Orange, Tofu].of!food;

  foreach (name; [Bill, Ted].of!name)
  {
  names ~= name; // This compiles happily.
  foods ~= name; // This does not compile, preventing
 // Bill and Ted from being eaten.
  }

It looks like this library solution puts user work at or below 
the level of your proposed syntax, though I'll have to think 
about whether it covers all the cases you're interested in.


Optionally strongly typed array indexes

2014-06-03 Thread bearophile via Digitalmars-d
This language feature is absent in D, and it's present in Ada and 
partially present in ObjectPascal. I think it's significant.


All the following ideas are preliminary (uncooked, primitive), 
they are just ideas (and quite probably there are smart people 
that can invent better things).


This is an example of wrong code, it allocates and initializes a 
matrix with three rows and two columns:


void main() {
import std.stdio: writeln;
auto mat = new int[][](3, 2);
writeln(mat);

foreach (immutable i; 0 .. 2)
foreach (immutable j; 0 .. 3)
mat[i][j] = i * 10 + j; // Wrong.
writeln(mat);
}


The out of bound array access bug is found at run-time with this 
error followed by a stack trace:


core.exception.RangeError@test.d(8): Range violation


The Ada language allows you to spot that bug at compile time.

Note: in D you usually avoid that bug writing the code like this. 
But this is not enough in more complex situations:


foreach (immutable i, row; mat)
foreach (immutable j, ref m; row)
m = i * 10 + j;


If you give strong types to the array indexes, the code becomes 
more self-documenting, and the compiler can catch some more of 
your mistakes. In D the associative array type syntax already 
gives a type to the indexes, so I have to tell apart the the case 
of associative array from the case of normal dynamic/fixed-size 
array with a strongly typed index. This will make the syntax 
worse.


This is a first possible syntax (it looks ugly):


void main() {
auto M = new int[TR = @typed][TC = @typed](3, 2);

foreach (immutable TR i; 0 .. 2)
foreach (immutable TC j; 0 .. 3)
M[i][j] = i * 10 + j; // Wrong.
}



@typed used inside the [] means that array has strongly typed 
(size_t) index. TR = @typed means that TR is the aliased name 
of such type (so you can think of it as [alias TR = @typed]).


Now such program gives two compile time errors (type mismatch on 
the j and i indexes).



Other examples of allocation of 1D arrays with strongly typed 
indexes:


auto v1 = new int[TV = @typed](10);
int[TV = @typed 10] v2; // fixed-size.


An usage example in a simple function that performs a 2D matrix 
transposition:



T[][] transpose(T)(in T[TC = @typed][TR = @typed] m)
pure nothrow @safe {
auto r = new T[@typed(TR)][@typed(TC)](m[0].length, m.length);
foreach (immutable nr, const row; m)
foreach (immutable nc, immutable c; row)
r[nc][nr] = c;
return r;
}


@typed(TR) means that for this array I am using an already 
defined index type named TR.


In theory you can also infer the type of the index with the 
template, but I am not sure how much useful this is:


void foo(TI)(int[@typed(TI)] data) {}


In that transpose function you can also see that you can assign a 
matrix with typed indexes to one without index types:


int[TI = @typed 10] a;
int[10] b = a; // OK.
a = b; // OK.

This transpose returns a matrix with untyped indexes to simplify 
a little the use of the resulting matrix. But if you iterate with 
a foreach on such result, the index types are inferred, so it's 
not a big problem.



Probably there is also some need for a trait to get the type of 
the index of an array (it could return size_t if it's untyped):


static assert(is(__trait(index_type, a) == TI));


(In Ada you can also get the range of an array, so such interval 
types also keep some other compile-time information. But I think 
this is not essential for D, so I have not included this 
information.)


--

I have used both dynamic languages (like Python) and strongly 
typed languages (like D, Haskell). I have seen that both have 
some advantages and disadvantages. In the Haskell world lot of 
people use types to guide their coding, but when I am not using 
Haskell I prefer dynamic typing in small programs or when I 
devise a complex algorithm, and strong static typing in larger 
programs or when I already have the basic code written and I want 
to be more sure of its correctness. So I like a language like D 
that gives me the freedom to use more precise types or less 
precise types according to my current needs. Optional strong 
types for arrays are meant for situations where I want to be more 
sure of the code correctness, or in larger programs, or when the 
complexity of a data structure or the intricacy of a piece of 
code require me to put down lot of precise types to avoid losing 
control of what I am doing.


Strong index types allow you to avoid mixing by mistake index 
variables when you have more than one array, or when you have 2D 
or 3D matrices and you need to not mix rows with columns. In my D 
programming I sometimes mix the indexes, and when I am lucky I 
find the bug at runtime (but if your 2D matrix is a square it's 
less immediate to spot the bug at run-time).


Bye,
bearophile


Re: Optionally strongly typed array indexes

2014-06-03 Thread Mason McGill via Digitalmars-d
Interesting idea. I run into these sorts of bugs often doing 
multimedia processing. There's definitely a tension between 
writing generic code (NumPy-style, operating on abstract 
dimensions) and safe code (operating on dimensions with 
semantics; OpenCV supports this to an extent). I've never used 
Ada so I may be missing the point, but here are my impressions:


It seems to me that the users who may benefit most from this 
feature are likely to be using non-built-in data structures. For 
example, you mention interpreting the first level of array 
nesting as rows and the second as columns, implying you are 
treating the nested arrays as a matrix (rather than a list of 
lists, boxes of kittens, etc.). I'm inclined to believe use-cases 
for strongly-typed indices would often overlap with use-cases for 
strongly-typed containers. If that is true, then the language 
already provides decent support for annotating indices with 
units:


  /* Deep in the belly of some image-processing library... */
  alias Row = Dimension!row;
  alias Column = Dimension!column;
  alias Channel = Dimension!channel;
  alias Image = DenseGrid!(ubyte, row column channel);

  /* In user code... */
  Image image = drawPrettyPicture();
  ubyte r = image[Row(0), Column(0), Channel(0)];
  ubyte g = image[Row(0), Column(0), Channel(1)];
  ubyte b = image[Row(0), Column(0), Channel(2)];

  /* Or if you want to get really fancy... */
  enum red = Channel(0), green = Channel(1), blue = Channel(2);
  ubyte r = image[Row(0), Column(0), red];
  ubyte g = image[Row(0), Column(0), green];
  ubyte b = image[Row(0), Column(0), blue];

  /* Depending on how DenseGrid.opIndex is defined,
 these statements could fail to compile... */
  image[Column(0), Row(0), red];
  image[blue, Row(0), Column(0)];

Speculation: A good behavior for such a `DenseGrid.opIndex` might 
allow untyped (size_t) indices for generic algorithms and users 
who can't be bothered, but reject incorrectly-typed indices (e.g. 
a `BoxOfKittens` should not be indexed with a `Row`, just as an 
`Image` should not be indexed with a `CatName`.