[rust-dev] Proposal: nominal records

Patrick Walton Sun, 05 Jun 2011 21:45:52 -0700

Hi everyone,

This is a proposal that I've been kicking around for maybe a month now,and last week I talked it over with Dave and Paul.

The idea is to have record types be nominal, just as tags are. In thisscheme, record types are declared using a syntax akin to that used todeclare tags:


rec point3 {
    x: int;
    y: int;
    z: int;
}

Here "point3" becomes the name of the record. To construct an instanceof this record, the user does the following:


import foo::point3;     // if not already in scope
auto pt = { x: 10, y: 20, z: 30 };

Or, if either the import is not desired or the combination of fieldnames is ambiguous:


auto a = rec foo::point3 { x: 10, y: 20, z: 30 };

(The leading "rec" makes parsing easier. Might not be needed.)

It's valid to have two records in scope with overlapping field names.The combination of field names is used to determine which record ismeant when record literal syntax is used.


rec point2 {
    int x;
    int y;
}

import foo::point2;                     // if not already in scope
auto b = { x: 10, y: 20 };              // constructs a point2
auto c = { x: 10, y: 20, z: 30 };       // constructs a point3

Selecting a field from a record requires neither the record name nor themodule the record is declared in to be specified:


log b.x;        // just works
log c.x;        // works too

OCaml requires that field names be unique and that record fields befully qualified if not in scope so that its type inference engine canuniquely determine a type for the LHS of a field expression ("b" in"b.x" above). In Rust, this is not needed because we require that theLHS of a field expression already have a fully-resolved type by the timewe encounter it during typechecking. Importantly, this is not a newrestriction; automatic dereference demands this rule already.

Record constructors wouldn't require that the fields are supplied in thesame order that the record declaration specifies. The declaration of therecord supplies the canonical ordering for memory layout purposes. Forexample:

auto pt4 = { z: 30, y: 20, x: 10 }; // constructs an identical value to"pt" above


Now there are a few obvious drawbacks with this proposal:

(1) Anonymous records are no longer allowed. All records must have theirtypes declared up front, potentially increasing programmer burden.

(2) Ad-hoc sharing of records is no longer possible; if module A definesa "point3" with fields { x: int, y: int, z: int } and module Bindependently defines a "point3" with the same fields, the two modulesno longer export compatible types.

(3) If two records are in scope and all their field names and types areidentical, extra work is required to disambiguate them.

However, there are a number of benefits, roughly in decreasing order ofimportance:

(1) Recursive records are now easy to handle without having to create atag in between. Paul encountered an issue recently in which a record wasunable to contain a function that took the same record as an argument.The workaround--to create a singleton tag--is somewhat awkward andrequires the creation of helper functions to make usable. I imagine thatthis isn't the last time we'll be in this situation.

(2) Ordering of fields in record constructors is no longer significant.This simplifies maintenance; for example, a programmer could experimentwith different memory layouts for a record to see which yields the bestperformance without having to rewrite every record literal. It alsomeans that the cognitive overhead of remembering the right order forrecord fields is reduced.

(3) Type errors are more helpful. A record with the wrong types, forexample, generates an error immediately at site of construction insteadof farther down. Moreover, no complicated diffing logic is needed tomake type mismatches between large record types sensible to the user.

(4) Typechecking should speed up significantly. Much of the time spentin typechecking is spent unifying large record types.

And in practice I think that the drawbacks mentioned above are notsignificant:

(1) In Rust, truly anonymous record types seem to hardly ever be used inpractice. Every record I know of in the standard library and in rustchas an associated typedef. This is due to the fact that functionsrequire type annotations; sooner or later practically every record typethat gets used tends to end up as part of the signature of a function,at which point its type must be specified in full. So, in practice,requiring the programmer to specify the types of every field up front isno more of a burden than the status quo.

(2) Ad-hoc sharing of records seems rare to me, and we have tuples forthat. In fact, I think simple "point"-like types, which are the ones inwhich ad-hoc sharing is commonest, may well better be specified astuples for this exact reason. Tuples are less fragile than recordsanyway; in the current scheme, { x: int, y: int } and { x: int, y: int }exported by two modules happen to be type-compatible, but what if thetwo modules used { x: int, y: int } and { xcoord: int, ycoord: int }instead? Tuples don't have this problem, so it seems to me that most ofthe cases in which ad-hoc sharing is desired would be better served byusing tuple types instead.

(3) Having two identically-structured records in scope does requireextra work to disambiguate. But this is no worse than having functionswith identical names in scope. It's a hazard to be sure, but I suspectit'll be rare enough for the benefits to outweigh this drawback.


Anyway, that's quite enough for one email. Opinions?

Patrick
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

[rust-dev] Proposal: nominal records

Reply via email to