Hi,

Some of you may have noticed that in the rewrite from rustboot to rustc we're becoming substantially more expression-language-ish. This is mostly a result of me yielding to the preferences of other developers (and LLVM's semantics), as well as some hint that things get much easier in syntax extensions and calculating compile-time-constants if we permit more "statement-ish" forms as expressions. Particularly conditionals.

We've run into a (common, seen in many other languages) sort of problem along the way here, which is that some expressions are implicitly ignored (or must be, due to being in an ignored context) whereas others are not. We have a nil-type (), but we don't always have sensible rules for forcing things to have the nil type by context.

This email is a poll of alternative solutions. I'll give two example cases and ask people for their input on which modification of the rules feels best.

Example case that does compile:

  A:  auto x = if (foo()) { 10; } else { 11; };

Example case that does not compile:

  B:  if (foo()) { 10; } else { "hello"; }

We can write this in rust at the moment, but in the rustc typechecking rules it will fail to compile, because 'if' is an expression-statement, expressions have types, and the types of the two branches (judged as the last statement's expression value, if it's an expression, or else nil) are of different types.

Here are some approaches to solving this example. Please pick the one you like the most:

(1) Kick all branchy expressions out of the expression grammar, put them back in the statement grammar. Case B will compile, and case A must be rewritten like so:

  A:  auto x = { auto t = 11; if (foo()) { t = 10; }; t; };

This is the C-with-GNU-extensions model.

(2) Hoist all statements up into the expression language and make semicolon into a sequencing operator, with a trailing-semi ignored by the parser. Then we need to rewrite only the second case to force unit types in the to-be-ignored differing branches.

  B:  if (foo()) { 10; () } else { "hello"; () }

Though we'd also be *allowed* to rewrite the first case to drop the semicolons:

  A:  auto x = if (foo() { 10 } else { 11 };

This is the Ocaml approach.

(3) A slightly weaker form of (2), which is to reformulate blocks with the following grammar:

    block ::=  { [ stmt ; ]* expr? }

In other words, every block becomes a brace-enclosed sequence of semicolon-terminated statements, followed by an optional expr. If the expr is missing, it is implied as (). In this case we'd be rewriting only the first case:

  A:  auto x = if (foo()) { 10 } else { 11 };

This is similar to the Ocaml rule in practice, except that it makes the presence or absence of the final semicolon in a block equivalent to ending the block with the nil type. This is a possible hazard (especially during refactoring or editing) to users who want to write a value-producing block but accidentally semicolon-terminate the last expression; but it's not a huge hazard since the typechecker will tell them the value they produced is of nil type. It just might be hit a lot.

(4) Statically determine the contexts in which an expression's value "will be used" in an outer expression, and only typecheck those contexts. This permits both of the examples to compile as-is, but it's the most unorthodox approach, and poses a refactoring hazard as code may become type-invalid when nested into an expression context that "uses" its previously-ignored result. Again, as in (3) the typechecker will catch these cases, but they might happen more or less often than those in (3).

We can't think of any other options. Significant whitespace is not an option :)

Personally my knee-jerk reaction is to embrace (1) since I like statements anyway, but I can see plausible arguments for the other 3. Can I get a show of hands? We have to pick something.

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to