My Reference Safety System (DIP???)

Zach the Mystic via Digitalmars-d Tue, 24 Feb 2015 17:16:13 -0800

So I've been thinking about how to do safety for a while, andthis is how I would do it if I got to start from scratch. I thinkit can be harnessed to D, but I'm worried that people will beconfused by it, or that there might be a show-stopping use case Ihaven't thought of, or that it is simply too cumbersome to betaken seriously, but I'll make a DIP when it overcomes thesethree obstacles.

I'm feeding off the momentum built by the approval of DIP25, andoff of other recent `scope` proposals:

http://wiki.dlang.org/DIP25
http://wiki.dlang.org/User:Schuetzm/scope
http://wiki.dlang.org/DIP69

This system goes farther than either DIP25 or DIP69 towardscomplete safety, but is simpler and easier to implement I (Ithink) than Mark Schutz's and deadalnix's proposal. It is not anownership or reference counting system, but can serve as thefoundation to one. Which leads to...

Principle 1: Memory safety is indispensable to ownership, but notthe other way around. Memory safety focuses on all the thingswhich *might* happen, and casts a wide net, akin to an algebraicunion, whereas ownership targets specific things, focuses on what*will* happen, and is akin to the algebraic intersection ofthings. I will therefore present the memory safety system first,leave grafting an ownership system on top of it for later.

Principle 2: The Function is the key unit of memory safety. Thecompiler must never need to leave the function it is compiling toverify that it is safe. This means that no information importantto safety can be excluded from the signatures of the functionsthat the compiling function is calling. This principle hasalready been conceded in part by Walter and Andrei's acceptanceof `return ref` parameters in DIP25, which simply implements themost common use case where safety is needed. Here I am takingthis principle to the extreme, in the interest of total safety.But speaking of function signatures,

Principle 3: Extra function and parameter attributes are thetradeoff for great memory safety. There is no other way tosupport both encapsulation of control flow (Principle 2) and theseparate-compilation model (indispensable to D). Functionsignatures pay the price for this with their expanding size. Itry to create the new attributes for the rare case, as opposed tothe common one, so that they don't appear very often.

Principle 4: Scopes. My system has its own notion of scopes. Theyare compile time information, used by the compiler to ensuresafety. Every declaration which holds data at runtime must have ascope, called its "declaration scope". Every reference type(defined below in Principle 6) will have an additional scopecalled its "reference scope". A scope consists of a very shortbit array, with a minimum of approximately 16 bits and reasonablemaximum of 32, let's say. For this proposal I'm using 16, inorder to emphasize this system's memory efficiency. 32 bits wouldnot change anything fundamental, only allow the compiler to be alittle more precise about what's safe and what's not, which isnot a big deal since it conservatively defaults to @system whenit doesn't know.

So what are these bits? Reserve 4 bits for an unsigned integer(range 0-15) I call "scopedepth". Scopedepth is easier for me tothink about than lifetime, of which it is simply the inverse,with (0) scopedepth being infinite lifetime, 1 having a lifetimeat function scope, etc. Anyway, a declaration's scopedepth isdetermined according to logic similar that found in DIP69 andMark Schutz's proposal:


int r; // declaration scopedepth(0)

void fun(int a /*scopedepth(0)*/) {
  int b; // depth(1)
  {
    int c; // depth(2)
    {
      int d; // (3)
    }
    {
      int e; // (3)
    }
  }
  int f; // (1)
}

Principle 5: It's always un@safe to copy a declaration scope froma higher scopedepth to a reference variable stored at lowerscopedepth. DIP69 tries to banish this type of thing only in`scope` variables, but I'm not afraid to banish it in all @safecode period:


void gun() @safe {
  T* t; // t's declaration depth: 1
  T u;
  {
    T* uu = &u; // fine, this is normal
    T tt;
    t = &tt; // t's reference depth: 2, error, un@safe
  }
  // now t is corrupted
}

So you'd have to enclose "t = &tt;" above in a @trusted lambda ora @system block. The truth is, it is absurd to copy the addressof something with shorter lifetime into something with longerlifetime... what use would you ever have for it in thelonger-lived variable? I'm therefore simplifying the system bymaking all instances of this unsafe.


Looking at Principle 5, I realize I forgot:

Principle 6: Reference variables: Any data which stores areference is a "reference variable". That includes any pointer,class instance, array/slice, `ref` parameter, or any structcontaining any of those. For the sake of simplicity, I boil _all_of these down to "T*" in this proposal. All reference types areeffectively the _same_ in this regard. DIP25 does not indicatethat it has any interest in expanding beyond `ref` parameters.But all reference types are unsafe in exactly the same way as`ref` is. (By the way, see footnote [1] for why I think `ref` ismuch different from `scope`). I don't understand the restrictionof dIP25 to `ref` paramteres only. Part of my system is to expand`return` parameter to all reference types.

Principle 7: In this system, all scopes are *transitive*: anyreference type with double indirections inherits the scope of theoutermost reference. Think of it this way:


T** grun() {
  T** tpp = new T*; // reference scopedepth(0)
  return tpp; // fine, safe

  static T st; // decl depth(0)
  T* tp = &st; // ref depth(0)
  *tpp = tp;
  return tpp; // safe, all depths still 0

  T t; // decl depth(1)
  tp = &t; // tp reference depth now (1)
  *tpp = &tp; // safe, depths all 1
  return tpp; // un@safe
}

If a reference type contains *any* pointer, no matter howindirect, to a local scope, the *whole* type is corrupted whenthe scope finishes.

Principle 8: Any time a reference is copied, the reference scopeinherits the *maximum* of the two scope depths:


T* gru() {
  static T st; // decl depth(0)
  T t; // decl depth(1)
  T* tp = &t; // ref depth(1)
  tp = &st; // ref depth STILL (1)
  return tp; // error!
}

If you have ever loaded a reference with a local scope, itretains that scope level permanently, ensuring the safety of thereference.

Whatever your worries about scopedepth, I want to introduce thepurpose of the other 12 bits in a scope.

I said a scope consisted of 16 bits, and I only used 4 so far.What are the other 12 for, then? Simple, we need one bit for eachof the function's parameters. Let's reserve 8 bits for them. Allreferences copied to or from the 8th parameter or above aretreated as if they copied to *all* of them. Very few functionswill do this, so we paint them all with a broad brush, for safetyreasons. (Likewise, all scopedepths above 15 are treated thesame.)

We have 4 bits left. These are for the "special" parameters: Onefor the implicit `this` parameter of member functions, one bitfor the context of a nested function, one special bit tosymbolize access to or from global or heap variables, and one bitleft over in case I missed something. Remember, the "luxury"version would have a whole 32, or even 64 bits to play aroundwith, but 16 will suffice in most cases.

Each of the functions parameters is initialized with its own bitset. All these bits represent "mystery scopes" -- that is, wedon't know what their scope is in the calling function, but:

Principle 8: We don't need to know! For all intents and purposes,a reference parameter has infinite lifetime for the duration ofthe function it is compiled in. Whenever we copy any reference,we do a bitwise OR on *all* of the mystery scopes. The newreference accumulates every scope it has ever had access to,directly or indirectly.


T* fun(T* a, T* b, T** c) {
  // the function's "return scope" accumulates `a` here
  return a;
  T* d = b; // `d's reference scope accumulates `b`

  // the return scope now accumulates `b` from `d`
  return d;

  *c = d; // now mutable parameter `c` gets `d`

  static T* t;
  *t = b; // this might be safe, but only the caller can know
}

All this accumulation results in the implicit function signature:

T* fun(return T* a, // DIP25
       return noscope T* d, // DIP25 and DIP71
       out!b T** c  // from DIP71
       ) @safe;

(See footnote [2] for a comment on on the `out!` and `noscope`attributes.)

Principle 9: When calling a function, DIP25 (expanded to allreference types) in combination with DIP71 gives you everythingyou need to know to ensure total memory safety. If we have afunction signature:


T* gun(return T* a, noscope T* b, out!b T** c) @safe;

T* hun(return T* a1, T** b2) {
  T t;
  T* tp, tp2;
  tp = new T; // depth zero
  tp2 = gun(a1,  // tp2 accumulates a1 based on fun()'s signature
           tp, // okay to copy a new T to a global pointer
           b2); // b2 now loaded with tp's global only scope
  return tp2; // okay, all we have so far is a1, marked `return`

  tp = &t; // tp now loaded with local t's scope
  return gun(tp, // error, gun() inherits tp's local scope
             tp2, // tp2 has a1 only right now
             b2, // error, b2 not marked `out!a1`
}

The point is that there's nothing gun() can do to corrupt hun()on its own, since all its exits are blocked.

Principle 10: You'll probably have noticed that all scopesaccumulate each other according to lexical ordering, and that'sgood news, because any sane person assigns and return referencesin lexical order. The fun part of this proposal is that for99.99% of uses the safety mechanism will catch the load orderingaccurately on the first pass, with hardly any compiler effort.It's safe because it accumulates and never loses information. Butthere is a way to break this system, although there are only twotypes of people who would ever do it: malicious programmerstrying to break the safety system, and fools. This is how you doit:


T* what() {
  T t;
  T* yay;
  foreach(i; 1..4) {
    if (i == 3)
      yay = new T;
    else if (i == 2)
      return yay;
    else if (i == 1)
      yay = &t;
  }
}

The good news is that even this kind of malicious coding can bedetected. The bad news is that checking for this 0.01% of codemay take up an unfriendly amount of compile time. Here's the wayI thought of to check even for this malicious code:

The lexical ordering can only be different from the logical orderof execution when one is inside a branching conditional which isinside a "jumpback" situation, where the code can be revisited. Ajumpback can only occur after a jump label has been found (rare),or inside a loop (common). Anytime a reference is copied underthe potentially dangerous condition, push the statement thatcopied it onto a stack. When the end of the conditional has beenreached, revisit each statement in reverse order and "reheat" therelevant scopes.

Aside from this unfortunate "gotcha", D would be 100% memory safewith this system (at least in single-threaded code -- exceptionsand thread safety different issues I haven't fully thoughtthrough).


Conclusion

1. With this system as foundation, an effective ownership systemis easily within reach. Just confine the outgoing scopes to asingle parameter and no globals, and you have your ownership. Youmight need another (rare) function attribute to help with this,and a storage class (e.g. `scope`, `unique`) to give you an errorwhen you do something wrong, but the groundwork is 90% laid.

2. Do I realize that it's weird dressing up function parameterswith so much information about what they do? Yes I do. But Ithink it's important to see what 100% safety would actually looklike, even if it's rejected on account of being too burdensome.And it wouldn't even *be* burdensome if attribute inference weremade uniform throughout the language. The function signaturescould then appear dressed up in their full glory typically onlyin compiler generated interface files, and other places whereprogrammers, not compilers, wanted them. Anyway, this is myreference safety system. Pop it with your needles!

[1] The problems with `ref` come from the fact that it is theonly storage class which changes the way a program works withoutgiving you an error:


void notRef(/*ref*/ int a) { ++a; }
void yesRef(  ref   int a) { ++a; }

void test() {
  int a = 0;
  yesRef(a); // a == 1
  notRef(a); // a still 1
}

Both yesRef() and notRef() are accepted, but it changes whathappens which one you use. Adding or subtracting any otherattribute will at most give you an error, but won't silentlychange things. `ref`, an "immutable pointer with valuesemantics," is a complicated beast, a type but not a type. I saythis because `scope` and its variants are not so complicated.`scope` is like most other attributes. All is does is help thecompiler optimize things and generate errors when misused. Itspresence or absence will never change what the program actuallydoes, and therefore it should not be lumped together with theproblems associated with `ref`. [End 1]


[2] Since the discussion to DIP71:

http://forum.dlang.org/post/xjhvpmjrlwhhgeqyo...@forum.dlang.org

...which proposes `out!` and `noscope` parameters as a way ofwarning the caller what is done inside the function, I havestarted to consider the issue of ownership in addition toreference safety. I'm not wedded to the name `noscope` in therole I proposed for it. Mark Schutz suggested reusing keyword`static` instead, to indicate that a reference is copied to aglobal variable. This may be wise, in light of the fact that anownership system may require something like `noscope` for asubtly different purpose. But there's no point in discussingdetails unless the whole proposal gains traction first. [End 2]

My Reference Safety System (DIP???)

Reply via email to