Hi everyone,
This is a proposal that I've been kicking around for maybe a month now,
and last week I talked it over with Dave and Paul.
The idea is to have record types be nominal, just as tags are. In this
scheme, record types are declared using a syntax akin to that used to
declare tags:
rec point3 {
x: int;
y: int;
z: int;
}
Here "point3" becomes the name of the record. To construct an instance
of this record, the user does the following:
import foo::point3; // if not already in scope
auto pt = { x: 10, y: 20, z: 30 };
Or, if either the import is not desired or the combination of field
names is ambiguous:
auto a = rec foo::point3 { x: 10, y: 20, z: 30 };
(The leading "rec" makes parsing easier. Might not be needed.)
It's valid to have two records in scope with overlapping field names.
The combination of field names is used to determine which record is
meant when record literal syntax is used.
rec point2 {
int x;
int y;
}
import foo::point2; // if not already in scope
auto b = { x: 10, y: 20 }; // constructs a point2
auto c = { x: 10, y: 20, z: 30 }; // constructs a point3
Selecting a field from a record requires neither the record name nor the
module the record is declared in to be specified:
log b.x; // just works
log c.x; // works too
OCaml requires that field names be unique and that record fields be
fully qualified if not in scope so that its type inference engine can
uniquely determine a type for the LHS of a field expression ("b" in
"b.x" above). In Rust, this is not needed because we require that the
LHS of a field expression already have a fully-resolved type by the time
we encounter it during typechecking. Importantly, this is not a new
restriction; automatic dereference demands this rule already.
Record constructors wouldn't require that the fields are supplied in the
same order that the record declaration specifies. The declaration of the
record supplies the canonical ordering for memory layout purposes. For
example:
auto pt4 = { z: 30, y: 20, x: 10 }; // constructs an identical value to
"pt" above
Now there are a few obvious drawbacks with this proposal:
(1) Anonymous records are no longer allowed. All records must have their
types declared up front, potentially increasing programmer burden.
(2) Ad-hoc sharing of records is no longer possible; if module A defines
a "point3" with fields { x: int, y: int, z: int } and module B
independently defines a "point3" with the same fields, the two modules
no longer export compatible types.
(3) If two records are in scope and all their field names and types are
identical, extra work is required to disambiguate them.
However, there are a number of benefits, roughly in decreasing order of
importance:
(1) Recursive records are now easy to handle without having to create a
tag in between. Paul encountered an issue recently in which a record was
unable to contain a function that took the same record as an argument.
The workaround--to create a singleton tag--is somewhat awkward and
requires the creation of helper functions to make usable. I imagine that
this isn't the last time we'll be in this situation.
(2) Ordering of fields in record constructors is no longer significant.
This simplifies maintenance; for example, a programmer could experiment
with different memory layouts for a record to see which yields the best
performance without having to rewrite every record literal. It also
means that the cognitive overhead of remembering the right order for
record fields is reduced.
(3) Type errors are more helpful. A record with the wrong types, for
example, generates an error immediately at site of construction instead
of farther down. Moreover, no complicated diffing logic is needed to
make type mismatches between large record types sensible to the user.
(4) Typechecking should speed up significantly. Much of the time spent
in typechecking is spent unifying large record types.
And in practice I think that the drawbacks mentioned above are not
significant:
(1) In Rust, truly anonymous record types seem to hardly ever be used in
practice. Every record I know of in the standard library and in rustc
has an associated typedef. This is due to the fact that functions
require type annotations; sooner or later practically every record type
that gets used tends to end up as part of the signature of a function,
at which point its type must be specified in full. So, in practice,
requiring the programmer to specify the types of every field up front is
no more of a burden than the status quo.
(2) Ad-hoc sharing of records seems rare to me, and we have tuples for
that. In fact, I think simple "point"-like types, which are the ones in
which ad-hoc sharing is commonest, may well better be specified as
tuples for this exact reason. Tuples are less fragile than records
anyway; in the current scheme, { x: int, y: int } and { x: int, y: int }
exported by two modules happen to be type-compatible, but what if the
two modules used { x: int, y: int } and { xcoord: int, ycoord: int }
instead? Tuples don't have this problem, so it seems to me that most of
the cases in which ad-hoc sharing is desired would be better served by
using tuple types instead.
(3) Having two identically-structured records in scope does require
extra work to disambiguate. But this is no worse than having functions
with identical names in scope. It's a hazard to be sure, but I suspect
it'll be rare enough for the benefits to outweigh this drawback.
Anyway, that's quite enough for one email. Opinions?
Patrick
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev