Hi everyone,

This is a proposal that I've been kicking around for maybe a month now, and last week I talked it over with Dave and Paul.

The idea is to have record types be nominal, just as tags are. In this scheme, record types are declared using a syntax akin to that used to declare tags:

rec point3 {
    x: int;
    y: int;
    z: int;
}

Here "point3" becomes the name of the record. To construct an instance of this record, the user does the following:

import foo::point3;     // if not already in scope
auto pt = { x: 10, y: 20, z: 30 };

Or, if either the import is not desired or the combination of field names is ambiguous:

auto a = rec foo::point3 { x: 10, y: 20, z: 30 };

(The leading "rec" makes parsing easier. Might not be needed.)

It's valid to have two records in scope with overlapping field names. The combination of field names is used to determine which record is meant when record literal syntax is used.

rec point2 {
    int x;
    int y;
}

import foo::point2;                     // if not already in scope
auto b = { x: 10, y: 20 };              // constructs a point2
auto c = { x: 10, y: 20, z: 30 };       // constructs a point3

Selecting a field from a record requires neither the record name nor the module the record is declared in to be specified:

log b.x;        // just works
log c.x;        // works too

OCaml requires that field names be unique and that record fields be fully qualified if not in scope so that its type inference engine can uniquely determine a type for the LHS of a field expression ("b" in "b.x" above). In Rust, this is not needed because we require that the LHS of a field expression already have a fully-resolved type by the time we encounter it during typechecking. Importantly, this is not a new restriction; automatic dereference demands this rule already.

Record constructors wouldn't require that the fields are supplied in the same order that the record declaration specifies. The declaration of the record supplies the canonical ordering for memory layout purposes. For example:

auto pt4 = { z: 30, y: 20, x: 10 }; // constructs an identical value to "pt" above

Now there are a few obvious drawbacks with this proposal:

(1) Anonymous records are no longer allowed. All records must have their types declared up front, potentially increasing programmer burden.

(2) Ad-hoc sharing of records is no longer possible; if module A defines a "point3" with fields { x: int, y: int, z: int } and module B independently defines a "point3" with the same fields, the two modules no longer export compatible types.

(3) If two records are in scope and all their field names and types are identical, extra work is required to disambiguate them.

However, there are a number of benefits, roughly in decreasing order of importance:

(1) Recursive records are now easy to handle without having to create a tag in between. Paul encountered an issue recently in which a record was unable to contain a function that took the same record as an argument. The workaround--to create a singleton tag--is somewhat awkward and requires the creation of helper functions to make usable. I imagine that this isn't the last time we'll be in this situation.

(2) Ordering of fields in record constructors is no longer significant. This simplifies maintenance; for example, a programmer could experiment with different memory layouts for a record to see which yields the best performance without having to rewrite every record literal. It also means that the cognitive overhead of remembering the right order for record fields is reduced.

(3) Type errors are more helpful. A record with the wrong types, for example, generates an error immediately at site of construction instead of farther down. Moreover, no complicated diffing logic is needed to make type mismatches between large record types sensible to the user.

(4) Typechecking should speed up significantly. Much of the time spent in typechecking is spent unifying large record types.

And in practice I think that the drawbacks mentioned above are not significant:

(1) In Rust, truly anonymous record types seem to hardly ever be used in practice. Every record I know of in the standard library and in rustc has an associated typedef. This is due to the fact that functions require type annotations; sooner or later practically every record type that gets used tends to end up as part of the signature of a function, at which point its type must be specified in full. So, in practice, requiring the programmer to specify the types of every field up front is no more of a burden than the status quo.

(2) Ad-hoc sharing of records seems rare to me, and we have tuples for that. In fact, I think simple "point"-like types, which are the ones in which ad-hoc sharing is commonest, may well better be specified as tuples for this exact reason. Tuples are less fragile than records anyway; in the current scheme, { x: int, y: int } and { x: int, y: int } exported by two modules happen to be type-compatible, but what if the two modules used { x: int, y: int } and { xcoord: int, ycoord: int } instead? Tuples don't have this problem, so it seems to me that most of the cases in which ad-hoc sharing is desired would be better served by using tuple types instead.

(3) Having two identically-structured records in scope does require extra work to disambiguate. But this is no worse than having functions with identical names in scope. It's a hazard to be sure, but I suspect it'll be rare enough for the benefits to outweigh this drawback.

Anyway, that's quite enough for one email. Opinions?

Patrick
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to