Here's an idea, going back a few years, for a small extension to
Haskell that would make it a little easier to define datatype
selector functions. I'm curious to see what other people think
about it, but I'm not sure that I want to put this forward as
a proper proposal for 1.3 or 2.0 or ...
I'll start with a brief outline, then go into a little more
detail for those who'd like to see more in the way of motivation
and discussion.
A ROUGH SKETCH:
---------------
Each constructor in a datatype definition takes the form:
Constr type_1 ... type_n.
My suggestion is to allow the programmer (as an option) to attach the name of
a `selector' function to individual components in a type, by replacing (say)
type_i in the constructor with (name_i :: type_i). The effect is exactly
the same as the original definition except that, in addition to the
constructor functions (for both values and types) introduced by the
definition, we also get a new selector function:
name_i :: T a1 ... an -> type_i
name_i (Constr _ ... x_i ... _) = x_i
For a more concrete example:
data List a = Nil | Cons (hd :: a) (tl :: List a)
would be equivalent to:
data List a = Nil | Cons a (List a)
hd :: List a -> a
hd (Cons x xs) = x
tl :: List a -> List a
tl (Cons x xs) = xs
Clearly, this doesn't change the language a great deal, but I will now go
on to explain why I think this would be useful.
MOTIVATION:
-----------
The motivation for this proposal comes from the desire to manipulate data
structures that have many components, making pattern matching very painful.
To keep this message short I'll have to use a smaller example, so please
bear with me.
Suppose that we want to define a datatype representing Dates. One simple
possibility would be:
data Date = Date Int Int Int
The three components here are intended to represent a combination of a day,
month and year, and I might choose to reflect this by adding comments to the
definition:
data Date = Date Int -- day
Int -- month
Int -- year
or, by introducing type synonyms:
data Date = Date Day Month Year
type Day = Int
type Month = Int
type Year = Int
Both of these help to make the program easier to read (i.e. easier for the
programmer to express their intentions about how values will be used), but
the need to access the components of a date by pattern matching causes some
problems:
o Some type errors can be missed. For example, an American may write
a program using:
validDate (Date month day year) = ....
while a European might write:
validDate (Date day month year) = ....
If both of these usages creep into a single program, then we may find
ourselves with some very subtle bugs.
o Inconvenience. Pattern matching against an object with just a couple
of fields isn't too bad, but imagine writing a definition to extract
the eighth field from a structure with a dozen components:
f (Struct _ _ _ _ _ _ _ x _ _ _ _) = ...
Not very appealing!
o Extensibility is a pain. Suppose we want to add extra fields to a
structure. Maybe I'll add a field to indicate whether a given day
is a holiday:
data Date = Date Int Int Int Bool
The problem now is that I have to comb through the whole program
looking for places where I've pattern matched on a Date, adding
an extra dummy field to each one.
The proposal I have outlined above avoids these problems by allowing the
programmer to write:
data Date = Date (day::Int) (month::Int) (year::Int)
Of course, programmers can already include definitions of these functions
in their code, but the fact that the definitions of selectors day, month
and year are produced automatically means that, should we decide to add
extra fields to a structure, we don't need to update places where Date is
used in pattern matches.
OTHER COMMENTS:
---------------
There are several reasons why you might *not* like this proposal:
o Datatype definitions are already rather complicated. In one
concise notation, they allow us to specify:
- a new type constructor,
- a family of new value constructors, and
- (implicitly, and often neglected) a means of pattern
matching against values of the new type.
Now I'm talking about adding in selectors too ... I don't
think seasoned Haskers will have a problem with this, but I
know that some beginners have been confused by the fact that
a constructor definition like Cons a (List a) contains both
values and types; we usually tell people that these two things
live in different worlds ...
o You can't use the same name for components of different types.
The classic examples where both a ColouredPoint and a Point
have x fields cannot be dealt with like this.
o It only solves half the problem. More precisely, we still
run into the same kind of difficulties described above when
we want to construct values; is Christmas day represented
by (Date 12 25 1903), or should that be (Date 25 12 1903)?
Perhaps solving half the problem is better than no solution
at all ... or perhaps there is a more complete solution that
would be worth the wait?
o A proper system of records in Haskell would go quite a long
way to solving this kind of problem (although it might still
be nice to have a simple way to define selectors). Of course,
Haskell doesn't have records at the moment, but its possible
that some future version of the language will include record
types and values. However, this may require more substantial
changes to the language than the simple proposal outline here.
---
So what do you think? Do you spend a lot of time writing selector
functions, or updating pattern matches when you make a change to a
data structure?
Mark