Wider tuple design discussion

bearophile Tue, 02 Aug 2011 06:20:35 -0700

Walter has asked for a more global vision on tuples before implementing their 
unpacking, so need to show the discussion here again:
http://d.puremagic.com/issues/show_bug.cgi?id=6365#c26


Walter:
> My main reservation about this is not that I can see anything obviously wrong
> about it, but I am nervous we are building a special case for tuples. I think
> any expansion of tuple support should be part of a more comprehensive design
> for tuples.
> 
> If we grow tuple support by adding piecemeal special cases for this and that,
> without thinking about them more globally, we are liable to build ourselves
> into a box we can't get out of.
> 
> I can give examples in other languages that did this - the most infamous being
> the great C idea of conflating arrays and pointers. It seems to be a great 
> idea
> in the small, but only much larger experience showed what a disastrous mistake
> it was.
> 
> We should step back and figure out what we want to do with tuples in the much
> more general case.


More stuff I've written:
http://d.puremagic.com/issues/show_bug.cgi?id=6367
http://d.puremagic.com/issues/show_bug.cgi?id=6383


A global design for tuples in D was right what I have tried to avoid, because I 
have thought that Andrei was against a full tuple built-in design and that 
generally there was not so much desire for large changes in D2 now. With that 
syntax sugar suggestion I have tried to find the simpler and more essential D 
change that allows me to use tuples in a more handy way. Now I'm back to the 
start. And I still fear that a better and more comprehensive tuple design will 
be refused in D. So I fear there is no way out: if you design a minimal change 
Walter fears of corner cases and myopic design, and if you design a 
comprehensive tuple, it risks being a too much big change for this stage of D 
evolution. In both cases the end result is of not having good enough tuples in 
D.

In the end Walter is right. Better to not have built-in support for tuples than 
having some wrongly designed built-in syntax sugar for tuples. So let's think 
about the only viable alternative, a wider design, and hope.

When you design something you need first of all to think about what your tool 
will be used for.

Tuples are used all the time in functional-style languages, usually once every 
line of code or so. In Python and Haskell I use them all the time. In D I 
already use tuples, but they are not so handy.

Tuples are not essential in D. You are able to write large programs without 
tuples. But they are sometimes handy, especially if you use a more 
functional-style programming.

Tuples do have some disadvantages. Their fields often are anonymous, this goes 
against the typical well defined style of D/Java/Ada programs, where you know 
the name of what you are using. So tuples are better if you use them locally, 
or if they have only few items. Long tuples become hard to use because you risk 
forgetting what each field is, so the most common tuples have between 2 and 5 
items. In Python you are also allowed to define a tuple with zero items, and a 
tuple with one item, but in my experience they are not so useful (in Python you 
sometimes use longer tuples made of items of the same type, because you are 
using them essentially as immutable arrays. This is considered not pythonic). D 
tuples allow a name for the fields, this is useful at usage point, because 
instead of a "[2]" you use (example) ".isOpen", that's more descriptive.

A tuple isn't an object, it's more like a record, it's almost like a more handy 
POD. So tuples are data structures that you use with code (functions) that they 
don't contain. OOP design is against this. But I am not fond of pure OO design. 
There are many situations where you want something more handy, more functional 
style, or more in "abstract data structure"-style. Python, D, Scala and other 
languages recognize this.

In some languages (like Python) tuples are immutable, while in D they are the 
opposite, you currently can't give them immutable fields. In D tuples are 
values, while in Python they are used by kind of reference (by name). In 
functional languages you can't how they managed, because there is referential 
transparency (so even if there is a reference, it's invisible. So the compiler 
is free to implement them as it wants, and optimize as much as it can).

In some languages tuples are managed with a nominal type system, while in most 
other situations they are managed by a structural type system (that means that 
two tuples are seen of different type only if the types of their fields 
differ). (Time ago I have even suggested a @structural attribute to be used in 
the Phobos code that defines D tuples).

There are many operations you want to perform on tuples, the most basic ones 
are:
0) define the type of a tuple
1) create a tuple
2) copy a tuple
3) Read a tuple field
4) Write a tuple field
5) Unpack tuple fields into variables, this is useful in 3 or more different 
situations:
5.1) The most basic one is unpacking a tuple locally, inside a function.
5.2) Another common situation is unpacking a small tuple inside a foreach 
statement, when you iterate on a collection of tuples.
5.3) Another usage (quite common in Haskell and similar languages) is to unpack 
a tuple in the signature of a function. Haskell even allows to give names at 
the same time to both fields and to the whole tuple.

Other operations are:
6) Slice a tuple like a Python/D array: tup[1..3]. (This is already supported 
in D, but you can't use the normal slice syntax, you need to use tup.slice!(1, 
3) ).
7) Concat two or more tuples (using ~ and ~= operators).

There are many other operations (remove a field, rename a field, etc), but they 
are not commonly useful. If you know a common need that I have not listed 
please show it.

It's better to show some examples of those 5.1, 5.2 and 5.3.

5.1 is useful for functions that return more than one value too:
auto (x, y) = tuple(1, 2);
(int x, int y) = foo(100);
(int x, auto y, auto z) = bar("hello");
The proposal in 6365 was to implement this very common case, that currently is 
not good enough in D.

5.2 (a different syntax is possible, this is just an example):
auto array = [tuple(1,"foo"), tuple(2,"bar")];
foreach (tuple(id, name); array) {}

5.3) This is already possible in D using the typetuple coming from a tuple 
using the (undocumented?) .tupleof. But this has the problem of losing some 
safety, this compiles with no errors:

import std.typecons;
void bar(int i, string s) {}
void main() {
    auto t1 = tuple(1, "foo");
    bar(t1.tupleof);
    auto t2 = tuple(1);
    auto t3 = tuple("foo");
    bar(t2.tupleof, t3.tupleof);
}

This is not allowed in Python2 and Haskell, because tuples are true solid 
tuples at the unpacking point too, in the function signature.

To implement the feature 5.3 well you need something the enforces the tuple 
length at the unpacking point too, something more like this:

void bar(Tuple!(int i, string s)) {
    // here use varibles s and i 
}


In Python there are more features similar to the point (5). Yo are allowed to 
unpack a dynamic array too into variables, and even a lazy generator. This is 
so handy and useful that I have suggested to allow the same thing in D too:
http://d.puremagic.com/issues/show_bug.cgi?id=6383


Some usage examples, assign from a dynamic array, in D2:

import std.string;
void main() {
    auto s = " foo bar ";
    auto ss = s.split();
    auto sa = ss[0];
    auto sb = ss[1];
}


In Python2.6:

>>> sa, sb = " foo bar ".split()
>>> sa
'foo'
>>> sb
'bar'


Proposed:

import std.string;
void main() {
    auto s = " foo bar ";
    auto (sa, sb) = s.split();
}




>From lazy range, D2:


import std.algorithm, std.conv, std.string;
void main() {
    string s = " 15 27";

    // splitter doesn't work yet for this
    // (here it doesn't strip the leading space)
    //auto xy = map!(to!int)(splitter(s));

    auto xy = map!(to!int)(split(s));
    int x = xy[0];
    int y = xy[1];
}



In Python:

>>> x, y = map(int, " 15 27".split())
>>> x
15
>>> y
27
>>> from itertools import imap
>>> x, y = imap(int, " 15 27".split())
>>> x
15
>>> y
27


Proposed:

import std.algorithm, std.conv, std.string;
void main() {
    string s = " 15 27";
    (int x, int y) = map!(to!int)(splitter(s));
}


Another example of unpacking a lazy range is to unpack the result of match(x, 
r"...").captures.


In my opinion unpacking a dynamic array in variables is natural in D because D 
already contains something that logically is very similar or almost equal:
foo[] a1 = [1, 2];
int[2] a2 = a1;

This performs a run-time test on the length of a1.

In Python3 there are even ways to unpack only part of a tuple:
a, b, *rest = (1, 2, 3, 4)

More thinking is needed. Please help me answer what Walter was asking for.
And please Walter, if you have more specific questions, it's good moment to ask.

Bye,
bearophile

Wider tuple design discussion

Reply via email to