Re: Interesting Research Paper on Constructors in OO Languages

Regan Heath Fri, 19 Jul 2013 03:36:15 -0700

On Thu, 18 Jul 2013 19:00:44 +0100, H. S. Teoh <hst...@quickfur.ath.cx>wrote:

On Thu, Jul 18, 2013 at 10:13:58AM +0100, Regan Heath wrote:

On Wed, 17 Jul 2013 18:58:53 +0100, H. S. Teoh
<hst...@quickfur.ath.cx> wrote:

[...]

>I guess my point was that if we boil this down to the essentials,
>it's basically the same idea as a builder pattern, just implemented
>slightly differently. In the builder pattern, a separate object (or
>struct, or whatever) is used to encapsulate the state of the object
>that we'd like it to be in, which we then pass to the ctor to create
>the object in that state. The idea is the same, though: set up a
>bunch of values representing the desired initial state of the object,
>then, to borrow Perl's terminology, "bless" it into a full-fledged
>class instance.


It achieves the same ends, but does it differently.  My idea requires
compiler support (which makes it unlikely to happen) and doesn't
require separate objects (which I think is a big plus).


Why would requiring separate objects be a problem?

It's not a problem, it's just better not to, if at all possible. K.I.S.S.:)

In my case, the derived class ctor could manually set some of the fields
in Args before handing to the superclass. Of course, it's not as ideal,
since if user code already sets said fields, then they get silently
overridden.


That's the problem I was imagining.

Also, in your approach there isn't currently any enforcement that
the user sets all the mandatory parameters of Args, and this is
kinda the main issue my idea solves.


True. One workaround is to use Nullable and check that in the ctor. But
I suppose it's not as great as a compile-time check.


Yeah, I was angling for a static/compile time check, if at all possible.

>Whereas using my approach, you can simply reuse the Args struct
>several times:
>
>    C.Args args;
>    args.name = "test1";
>    args.age = 12;
>    args.school = "D Burg High School";
>    auto obj1 = new C(args);
>
>    args.name = "test2";
>    auto obj2 = new C(args);
>
>    args.name = "test3";
>    auto obj3 = new C(args);
>
>    ... // etc.

Or.. you use a mixin, or better still you add a copy-constructor or
.dup method to your class to duplicate it :)


But then you end up with the problem of needing to call set methods
after the .dup

Which is no different to setting args.name beforehand, the same number ofassignments. In the example above it's N+1 assignments, N args or dup'edmembers and 1 more for 'name' before or after the construction.

which may complicate things if the set methods need to
do non-trivial initialization of internal structures (caches or internal
representations, etc.).

Ahh, yes, and in this case you'd want to use the idea below, where youcall a method to set the common parts and manually set the differences.

Whereas if you hadn't needed to .dup, you could
have gotten by without writing any set methods for your class, but now
you have to.


create-set-call <- 'set' is kinda an integral part of the whole thing :P

[...]

In my case you can call different functions in the initialisation
block, e.g.

void defineObject(C c)
{
  c.school = "...);
}

C c = new C() {
  defineObject()
}

:)


So the compiler has to recursively traverse function calls in the
initialization block in order to check that all required fields are set?

Yes. This was an off the cuff idea, but it /is/ a natural extension ofthe idea for the compiler to traverse the setters called inside theinitialisation block, and ctors in the hierarchy, etc.

That could have entail some implementational issues, if said function
calls can be arbitrarily complex. (If you have complex control logic in
said functions, the compiler can't in general determine whether or not
some paths will/will not be taken that may assignment statements to the
object's fields, since that would be equivalent to the halting problem.


All true.  The compiler has a couple of options to (re)solve these issues:
1. It could simply baulk at the complexity and error.

2. It could take the safe route and assume those member assignments itcannot verify are uninitialised, forcing manual init.

In fact, erroring at complexity might make for better code in many ways.You would have to perform your complex initialisation beforehand, storethe result in a variable, and then construct/initblock your object.

It does limit your choice of style, but create-set-call already does that.. and I'm not immediately against style limitations assuming theyactually result in better code.

Worse, the compiler would have to track aliases of the object being set,
in order to know which assignment statements are setting fields in the
object, and which are just computations on the side.)

No, aliasing would simply be ignored. In fact, calling a setter onanother object in an initblock should probably be an error. Part of thewhole "don't mix initialisation" goal I started with. It does requirestrict properties.

Furthermore, what if defineObject tries to do something with C other
than setting up fields? The object would be in an illegal state since it
hasn't been fully constructed yet.

That's an error. This is why in my initial post I stated that we'd needexplicit/well defined properties. All you would be allowed to call in aninitialisation block, on the object being initialised, are setterproperties.. and possibly methods or free function which only call setterproperties.

>>I think another interesting idea is using the builder pattern with
>>create-set-call objects.
>>
>>For example, a builder template class could inspect the object for
>>UDA's indicating a data member which is required during
>>initialisation.  It would contain a bool[] to flag each member as
>>not/initialised and expose a setMember() method which would call the
>>underlying object setMember() and return a reference to itself.
>>
>>At some point, these setMember() method would want to return another
>>template class which contained just a build() member.  I'm not sure
>>how/if this is possible in D.
>[...]
>
>Hmm, this is an interesting idea indeed. I think it may be possible to
>implement in the current language.

The issue I think is the step where you want to mutate the return
type from the type with setX members to the type with build().


I'm not sure I understand that sentence. Could you rephrase it?

I am imagining using a template to create a type which wraps the originalobject. The created type would expose setter properties for all themandatory members, and nothing else. The user would call these setters,using UFCS/chain style, however, only after setting all the mandatoryproperties do we want to expose an additional member called build() whichreturns the constructed/initialised object.


So, an example:

class Foo {...}

auto f = Builder!(Foo)().setName("Regan").setAge(33).build();

The type of the object returned from the Builder!(Foo) is our firstcreated type, which exposes setName() and setAge(), however the typereturned from setAge (or whichever member assignment is done last) is thesecond created type, which either has all the set.. members plus build()or only build(). The build() method returns a Foo.


So, the type of 'f' above is Foo.

The goal here is to make build() statically available when Foo iscompletely initialised and not before. Of course we could simplify allthis by making it available immediately and throwing if some members areuninitialised - but that is a runtime check and I was angling for acompile time one.

If you wanted to enforce a specific init ordering you could even produce aseparate type containing only the next member to init, and from eachsetter return the next type in sequence - like a type state machine :p


The template bloat however..

The problem with the struct approach is, what if you need a complex
setup process, say constructing a graph with complex interconnections
between nodes? In order to express such a thing, you have to essentially
already create the object before you can pass the struct to the ctor,
which kinda defeats the purpose. Similarly, your approach of an
initialization block suffers from the limitation that the initialization
is confined to that block, and you can't allow arbitrary code in that
block (otherwise you could end up using an object that hasn't been fully
constructed yet -- like the defineObject problem I pointed out above).

Yes, neither idea works for all possible use-cases. Yours is naturallybroader and less limiting because I was starting from a limitedcreate-set-call style and imposing further limitation on how it can beused.

Keeping in mind the create-set-call pattern and Perl's approach of
"blessing" an object into a full-fledged class instance, I wonder if a
more radical approach might be to have the language acknowledge that
objects have two phases, a preinitialized state, and a fully-initialized
state. These two would have distinct types *in the type system*, such
that you cannot, for example, call post-init methods on a
pre-initialization object, and you can't call an init method on a
post-initialization object.

That is essentially the same idea as the builder template solution I talkabout above :)

The ctor would be the unique transition
point which takes a preinitialized object, verifies compliance with
class invariants, and returns a post-initialization object.


AKA build() above :)

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Re: Interesting Research Paper on Constructors in OO Languages

Reply via email to