(This post is a second try of the ideas I discussed in this ER:
http://d.puremagic.com/issues/show_bug.cgi?id=5906 )

It is sometimes handy and useful to be able to give structs some of the capabilities and qualities of some build-in values, this makes structs more powerful, handy and allow to better implement built-in-like features with library code.

The bounds of built-in arrays are verified at compile time where possible:


void main() {
    int[5] a;
    immutable idx = 5;
    a[idx] = 1;
}

test.d(4): Error: array index 5 is out of bounds a[0 .. 5]

- - - - - - - - - - - - - - - - - - -

Where the values are not known at compile-time they are ignored, and the problem is found at run-time (here foo() could be easily run at compile-time, but the compiler does not attempt to run it a compile-time):

size_t foo() { return 5; }
void main(string[] args) {
    int[5] a;
    a[5 + args.length] = 1;
    a[foo()] = 2;
}


core.exception.RangeError@test(4): Range violation

- - - - - - - - - - - - - - - - - - -

If I define a library type, using a struct, that mimics/wraps a fixed size array, the compile-time test capability of built-in arrays is lost:


struct IntArray(size_t n) {
    int[n] data;
    ref int opIndex(size_t i) {
        return data[i];
    }
}
void main() {
    IntArray!5 a;
    a[5] = 1;
}


core.exception.RangeError@test(4): Range violation

- - - - - - - - - - - - - - - - - - -

Since some time I'd like to give this basic built-ins feature to user-defined types too. (I have seen that Ada2012 has "Static_Predicate" and "Dynamic_Predicate" that are related to this feature).

A bare-bones library implementation of a ranged value (this interval is closed on the right, to make it more easy to include the last representable values of ints, ubytes, etc):


import std.stdio, std.typecons;

struct RangedValue(T, alias inf, alias sup) {
    T _data;
    alias access this;

    this(T x)
    in {
        assert(_data >= inf && _data <= sup);
    } body {
        _data = x;
    }

    invariant() { /*...*/ }

    @property T access() { return _data; }

    @property T access(T x)
    in {
        assert(_data >= inf && _data <= sup);
    } body {
        return _data = x;
    }
}

void main() {
    alias Month = RangedValue!(int, 1, 12);
    auto m = Month(12);
    m = 13; // line 29, core.exception.AssertError@test(11)
}

- - - - - - - - - - - - - - - - - - -

An idea is to introduce compile-time preconditions, here I have used the syntax:
"enum in(arg1, arg2, ...)":


    this(T x)
    enum in(x) {
        assert(_data >= inf && _data <= sup);
    } in {
        assert(_data >= inf && _data <= sup);
    } body {
        _data = x;
    }

- - - - - - - - - - - - - - - - - - -

Some of the rules:
- A function with one or more arguments is allowed to have a (regular, dynamic) precondition, an enum ("static") precondition, or both.
- An enum pre-condition is run only at compile-time, or never.
- An enum pre-condition lists one or more arguments that will be "verified". If all the arguments listed inside the "in()" are literals or explicit compile-time constants, then the enum precondition is run, otherwise it is ignored. No attempts are done to run the enum pre-condition in all other cases. - This analysis of "literalness" or "near-literalness" is conservative. This means a D compiler is free to assume a value is not a literal and to ignore the enum pre-condition. - enum pre-conditions are run at the calling point. This means the code they contain is never present inside the binary, it never causes code slowdown or binary bloat. And errors and violations are reported at the calling point, so in the code above the compile-time error message of the violation shows the line number 29 instead of 11. So enum preconditions are a bit like default values of functions, they are something relative to the calling point and not to the function itself. - Like CTFE enum preconditions should be pure (but they can call __ctWrite, so purity is not fully enforced). - The D programmer that uses enum preconditions has to keep in mind that the run-rime correctness of a program can't rely on an enum precondition. An enum precondition is just meant to catch at compile time some violations and bugs that otherwise will be caught at run-time. So typically there is some code duplication between the precondition and the enum precondition. But this in not terrible. - D programmers that use enum preconditions should remember that enum preconditions must contain very light code, to keep compilation time low. (Given the performance of "interpreters" like JavaScript V8 or LuaJIT, future D compilers will run compile-time code much faster even if they will keep not using a JIT, so this problem will become much smaller). - Future smart IDEs for D are allowed to run an enum precondition while you type code, to catch and underline some bugs in red color even before compile-time.

[Note: to speed up compilation it could be added a compiler switch that makes the D compiler ignore all enum preconditions. But probably this is not necessary.]

Data in external files loaded at compile-time with mixin(import("data.txt")) can be verified with an enum precondition (currently you can call a test function at compile-time and assign its result to an enum).

[I am not sure but I think enum preconditions of class members follow inheritance rules similar to the ones of the normal dynamic pre-conditions.]

- - - - - - - - - - - - - - - - - - -

Alternative syntax, one more usage of the "static" keyword:

"static in(arg1, arg2, ...)"

- - - - - - - - - - - - - - - - - - -

Some use cases for enum preconditions:

1-2) MyArray() and RangedValue!(int, 1, 12): enum preconditions can catch at compile-time some bugs, they make the usage of such simple data structures more similar to built-ins giving compile-time errors like "Error: array index 5 is out of bounds a[0 .. 5]".

- - - - - - - - -

3) EnumSubset() is an idea to define enumeration subsets in library code, that is sometimes useful to me. enum preconditions allow to catch some wrong assignments at compile time (I think this is an use case of the "Static_Predicate" of Ada2012):

Given a struct that represents:
enum Foo { A, B, C }
alias Foo2 = EnumSubset!(Foo.A, Foo.B);

enum pre-conditions verify:

Foo f = Foo.A; // OK
Foo f = Foo.C; // Compile-time error.

- - - - - - - - -

4) enum array literals, for table games, etc.:

Sometimes I like to use enums of chars:

enum Cell : char { A='A', B='B', C='C' }

Then later in the program you can use a with() to catch mistakes in literals:

with (Cell)
    foo([A,B,A,A,B,B,A,A,D], 3);

But putting them in a string is more handy when such tables become larger (Ada language allows something like this, and verifies at compile-time that such array of enumerated values is correct!):

foo(ToEnumArray!"ABAABBAAD", 3);

ToEnumArray veryfies its string input is correct and turns it into an array, at compile-time (std.conv.to can't be used directly here because it manages strings differently when you want to convert them to enums).

With an enum pre-condition I can simplify the code (this foo will still need to convert the input string to an array of enums, perhaps even using just a cast):

foo("ABAABBAAD", 3);

- - - - - - - - -

5) In high-reliability D code you want to move as many tests of the input data from run-time to compile-time. You want to minimize the number of run-time asserts you need in the code. I think that D could (someday) be used to replace some of the usages of the Ada language for medium-integrity programs, like in some of the control units used in cars. In such programs you call functions with various kinds of literals. Verifying such literals at compile-time could improve the run-time integrity of the program.

Here you can see a small D program that accepts complex literal data:
http://rosettacode.org/wiki/Universal_Turing_machine#Nearly_Strongly_Typed_Version

    enum States2 : ubyte { A, B, C, H }
    alias Symbols2 = Symbols1;
    alias M2 = UTM!(States2, Symbols2);
    M2.TuringMachine tm2;
with (tm2) with (States2) with (Symbols2) with (M2.Direction) {
        alias R = M2.Rule;
        initialState = A;
        rules = [A: [s0: R(s1, right, B), s1: R(s1, left,  C)],
                 B: [s0: R(s1, left,  A), s1: R(s1, right, B)],
                 C: [s0: R(s1, left,  B), s1: R(s1, stay,  H)]];
        symbolMap = ["0", "1"];
    }


The Ada entry of this program is able to verify its whole input data at compile-time:
http://rosettacode.org/wiki/Universal_Turing_machine#Ada


type States is (A, B, C, Stop);
   type Symbols is range 0 .. 1;
   package UTM is new Turing(States, Symbols); use UTM;

   Map: Symbol_Map := (1 => '1', 0 => '0');

   Rules: Rules_Type :=
(A => (0 => (New_State => B, Move_To => Right, New_Symbol => 1), 1 => (New_State => C, Move_To => Left, New_Symbol => 1)), B => (0 => (New_State => A, Move_To => Left, New_Symbol => 1), 1 => (New_State => B, Move_To => Right, New_Symbol => 1)), C => (0 => (New_State => B, Move_To => Left, New_Symbol => 1), 1 => (New_State => Stop, Move_To => Stay, New_Symbol => 1)));

   Tape:  Tape_Type := To_Tape("", Map);


enum pre-conditions could be used in D (with some changes in the D entry) to verify all the input at compile-time (again, currently you can write compile-time functions in D to perform similar tests, but the code becomes less nice and less natural, you need two functions, etc).

- - - - - - - - -

6) writef("%d", 5.5), format("%d", 5.5):

void writef(Char, A...)(in Char[] fmt, A args) ...
enum in (fmt) {
    // Verify fmt correctness
} ...


This stops most writeflns in my D code giving formatting string errors at run-time. And this causes no binary bloat, unlike this template function call (that also can't be used with a run-time formatting string, unlike the writef with an enum precondition, needing only one function for both cases, and allowing writef to be retrofitted with no changes in its API):

cWritef!"%d"(5.5);

(Rust language uses a different strategy to verify those formatting strings at compile-time.)

- - - - - - - - -

I am sure more usage cases could be found.

The ideas discussed here are also related to the __builtin_constant_p() of GCC:
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

DMD is able to test some array bounds at compile-time so a related feature is already present inside the compiler. enum preconditions pull this compiler feature out for some user-code usages, just like __traits often does.

Bye,
bearophile

Reply via email to