Re: Cloning in D

2010-09-07 Thread Jacob Carlborg

On 2010-09-07 04:16, Michel Fortin wrote:

On 2010-09-06 20:55:16 -0400, dsimcha  said:


== Quote from Michel Fortin (michel.for...@michelf.com)'s article

I'm under the impression that a too permissive generic implementation
of cloning is going to break things in various scenarios.


In general you raise some very good issues, but IMHO the right way to
do cloning
is to have permissive generic cloning that works in the 90% of cases
and can be
easily overridden in the 10% of cases, not to require writing tons of
boilerplate
in the 90% of cases just to make sure it doesn't do the wrong thing by
default in
the 10% of cases.


To me automatic cloning of everything (physical cloning in your
parlance) looks more like 50/50 work/doesn't-work ratio. I can only
guess, but I'm probably used to different use cases than you are.



A second point is that the thing that brought this whole cloning issue
to my mind
was making std.concurrency's message passing model less obtuse. Right
now it's
hard to use for non-trivial things because there's no safe way to pass
complex
state between threads. If we start allowing all kinds of exceptions to
the "clone
the **entire** object graph" rule, cloning will rapidly become useless
for safely
passing complex object graphs between threads.


This I agree with. I'm not arguing against automatic cloning per-see,
I'm just trying to show cases where it doesn't work well.

Personally, I'm rather skeptical that we can make it safe and efficient
at the same time without better support from the language, something
akin the mythical "unique" type modifier representing a reference with
no aliasing.



What if your
object or structure is part of a huge hierarchy where things contains
pointers to their parent (and indirectly to the whole hierarchy), will
the whole hierarchy be cloned?


Isn't that kind of the point?


Well, that depends. If you send each leaves of a tree as a message to
various threads presumably to perform something concurrently with the
data in that leaf, then you may want only the leaf to be copied. You may
not want every parent down to the root and then up to every other leaf
to be copied alongside with each message just because the leaf you send
has a pointer to the parent.

In fact, it depends on the situation. If what you want to do with the
leaf in the other thread requires the leaf to know its parent and
everything else, then sure you need to copy the whole hierarchy. But
otherwise it's a horrible waste of memory and CPU to clone the whole
object graph for each message, even though it won't affect the program's
correctness.

And it's basically the same thing with observers. If your observer is a
controller in charge of updating a window when something changes, you
don't want to clone the observer, then clone the window and everything
in it just because you're sending some piece of data to another thread.
Perhaps the program architecture is just wrong, or perhaps that observer
is a synchronized class capable of handling function calls from multiple
threads so it doesn't really need to be copied.



What happens if your object or structure
maintains a reference to a singleton, will we get two instances of a
singleton?


Very good point. I guess the reasonable use case for holding a
reference to a
singleton (instead of just using the globally accessible one) would be
if it's
polymorphic with some other object type? If you're using message passing
concurrency, most of your mutable singletons are probably
thread-local, and what
you probably really want to do is use the thread-local singleton of
the thread
you're passing to.


What intrigues me is how such a mechanism would work... although in my
mind it's probably not even worth supporting at all, singletons be damned!



My understanding is that a data structure containing a pointer cannot
be cloned safely unless it contains some specific code to perform the
cloning. That's because the type system can't tell you which pointers
point to things owned by the struct/class and which one need to be
discarded when cloning (such as a list of observers, or the parents of
a hierarchy).


This discussion is making me think we really need two kinds of
cloning: Physical
cloning would clone the entire object graph no matter what, such that
the cloned
object could be safely passed to another thread via std.concurrency
and be given a
unique type. Logical cloning would be more like what you describe. In
general,
this discussion has been incredibly useful because I had previously only
considered physical cloning.


This is an interesting and valid observation. But I think you need to
leave a door open to customization of the "physical cloning" case too.
The ability to avoid cloning unnecessary data is as necessary as the
ability to easily copying an entire object graph.


I think everything would be a lot easier if we had support for cloning 
in the standard library. Something like a clone method in Object that as 
a default implementation

Re: Cloning in D

2010-09-06 Thread Michel Fortin

On 2010-09-06 20:55:16 -0400, dsimcha  said:


== Quote from Michel Fortin (michel.for...@michelf.com)'s article

I'm under the impression that a too permissive generic implementation
of cloning is going to break things in various scenarios.


In general you raise some very good issues, but IMHO the right way to 
do cloning

is to have permissive generic cloning that works in the 90% of cases and can be
easily overridden in the 10% of cases, not to require writing tons of 
boilerplate
in the 90% of cases just to make sure it doesn't do the wrong thing by 
default in

the 10% of cases.


To me automatic cloning of everything (physical cloning in your 
parlance) looks more like 50/50 work/doesn't-work ratio. I can only 
guess, but I'm probably used to different use cases than you are.



A second point is that the thing that brought this whole cloning issue 
to my mind

was making std.concurrency's message passing model less obtuse.  Right now it's
hard to use for non-trivial things because there's no safe way to pass complex
state between threads.  If we start allowing all kinds of exceptions to 
the "clone
the **entire** object graph" rule, cloning will rapidly become useless 
for safely

passing complex object graphs between threads.


This I agree with. I'm not arguing against automatic cloning per-see, 
I'm just trying to show cases where it doesn't work well.


Personally, I'm rather skeptical that we can make it safe and efficient 
at the same time without better support from the language, something 
akin the mythical "unique" type modifier representing a reference with 
no aliasing.




What if your
object or structure is part of a huge hierarchy where things contains
pointers to their parent (and indirectly to the whole hierarchy), will
the whole hierarchy be cloned?


Isn't that kind of the point?


Well, that depends. If you send each leaves of a tree as a message to 
various threads presumably to perform something concurrently with the 
data in that leaf, then you may want only the leaf to be copied. You 
may not want every parent down to the root and then up to every other 
leaf to be copied alongside with each message just because the leaf you 
send has a pointer to the parent.


In fact, it depends on the situation. If what you want to do with the 
leaf in the other thread requires the leaf to know its parent and 
everything else, then sure you need to copy the whole hierarchy. But 
otherwise it's a horrible waste of memory and CPU to clone the whole 
object graph for each message, even though it won't affect the 
program's correctness.


And it's basically the same thing with observers. If your observer is a 
controller in charge of updating a window when something changes, you 
don't want to clone the observer, then clone the window and everything 
in it just because you're sending some piece of data to another thread. 
Perhaps the program architecture is just wrong, or perhaps that 
observer is a synchronized class capable of handling function calls 
from multiple threads so it doesn't really need to be copied.




What happens if your object or structure
maintains a reference to a singleton, will we get two instances of a
singleton?


Very good point.  I guess the reasonable use case for holding a reference to a
singleton (instead of just using the globally accessible one) would be if it's
polymorphic with some other object type?  If you're using message passing
concurrency, most of your mutable singletons are probably thread-local, 
and what

you probably really want to do is use the thread-local singleton of the thread
you're passing to.


What intrigues me is how such a mechanism would work... although in my 
mind it's probably not even worth supporting at all, singletons be 
damned!




My understanding is that a data structure containing a pointer cannot
be cloned safely unless it contains some specific code to perform the
cloning. That's because the type system can't tell you which pointers
point to things owned by the struct/class and which one need to be
discarded when cloning (such as a list of observers, or the parents of
a hierarchy).


This discussion is making me think we really need two kinds of cloning: 
 Physical
cloning would clone the entire object graph no matter what, such that 
the cloned
object could be safely passed to another thread via std.concurrency and 
be given a
unique type.  Logical cloning would be more like what you describe.  In 
general,

this discussion has been incredibly useful because I had previously only
considered physical cloning.


This is an interesting and valid observation. But I think you need to 
leave a door open to customization of the "physical cloning" case too. 
The ability to avoid cloning unnecessary data is as necessary as the 
ability to easily copying an entire object graph.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: Cloning in D

2010-09-06 Thread dsimcha
== Quote from Michel Fortin (michel.for...@michelf.com)'s article
> I'm under the impression that a too permissive generic implementation
> of cloning is going to break things in various scenarios.

In general you raise some very good issues, but IMHO the right way to do cloning
is to have permissive generic cloning that works in the 90% of cases and can be
easily overridden in the 10% of cases, not to require writing tons of 
boilerplate
in the 90% of cases just to make sure it doesn't do the wrong thing by default 
in
the 10% of cases.

A second point is that the thing that brought this whole cloning issue to my 
mind
was making std.concurrency's message passing model less obtuse.  Right now it's
hard to use for non-trivial things because there's no safe way to pass complex
state between threads.  If we start allowing all kinds of exceptions to the 
"clone
the **entire** object graph" rule, cloning will rapidly become useless for 
safely
passing complex object graphs between threads.

> What if your
> object or structure is part of a huge hierarchy where things contains
> pointers to their parent (and indirectly to the whole hierarchy), will
> the whole hierarchy be cloned?

Isn't that kind of the point?

> What happens if your object or structure
> maintains a reference to a singleton, will we get two instances of a
> singleton?

Very good point.  I guess the reasonable use case for holding a reference to a
singleton (instead of just using the globally accessible one) would be if it's
polymorphic with some other object type?  If you're using message passing
concurrency, most of your mutable singletons are probably thread-local, and what
you probably really want to do is use the thread-local singleton of the thread
you're passing to.

> What if your object or structure implements the observer
> pattern and contains a list of objects to notify when something
> happens, will all the observers be cloned as well?

Yes.  There's no other way to make message passing safe.

> My understanding is that a data structure containing a pointer cannot
> be cloned safely unless it contains some specific code to perform the
> cloning. That's because the type system can't tell you which pointers
> point to things owned by the struct/class and which one need to be
> discarded when cloning (such as a list of observers, or the parents of
> a hierarchy).

This discussion is making me think we really need two kinds of cloning:  
Physical
cloning would clone the entire object graph no matter what, such that the cloned
object could be safely passed to another thread via std.concurrency and be 
given a
unique type.  Logical cloning would be more like what you describe.  In general,
this discussion has been incredibly useful because I had previously only
considered physical cloning.

> As for pointers to immutable things, the clone can keep pointing to the
> same immutable data without affecting semantics, so there's no problem
> there.

Right.  There aren't too many good reasons to clone immutable data.


Re: Cloning in D

2010-09-06 Thread Michel Fortin
On 2010-09-06 17:00:26 -0400, Andrei Alexandrescu 
 said:



On 09/06/2010 02:56 PM, BLS wrote:

On 06/09/2010 05:15, dsimcha wrote:

I've started playing around with Orange a little to see whether it
would meet
D's cloning needs.


Clone respective member-wise clone ( I prefer copy and deep copy) should
be part of object. no std._tricks period.


Though this might be the case in this particular instance, ideally a 
generally useful operation should be made available generically instead 
of encapsulated.


I'm under the impression that a too permissive generic implementation 
of cloning is going to break things in various scenarios. What if your 
object or structure is part of a huge hierarchy where things contains 
pointers to their parent (and indirectly to the whole hierarchy), will 
the whole hierarchy be cloned? What happens if your object or structure 
maintains a reference to a singleton, will we get two instances of a 
singleton? What if your object or structure implements the observer 
pattern and contains a list of objects to notify when something 
happens, will all the observers be cloned as well?


My understanding is that a data structure containing a pointer cannot 
be cloned safely unless it contains some specific code to perform the 
cloning. That's because the type system can't tell you which pointers 
point to things owned by the struct/class and which one need to be 
discarded when cloning (such as a list of observers, or the parents of 
a hierarchy).


As for pointers to immutable things, the clone can keep pointing to the 
same immutable data without affecting semantics, so there's no problem 
there.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: Cloning in D

2010-09-06 Thread Andrei Alexandrescu

On 09/06/2010 02:56 PM, BLS wrote:

On 06/09/2010 05:15, dsimcha wrote:

I've started playing around with Orange a little to see whether it
would meet
D's cloning needs.


Clone respective member-wise clone ( I prefer copy and deep copy) should
be part of object. no std._tricks period.


Though this might be the case in this particular instance, ideally a 
generally useful operation should be made available generically instead 
of encapsulated.


Andrei


Re: Cloning in D

2010-09-06 Thread BLS

On 06/09/2010 05:15, dsimcha wrote:

I've started playing around with Orange a little to see whether it would meet
D's cloning needs.


Clone respective member-wise clone ( I prefer copy and deep copy) should 
be part of object. no std._tricks  period.




Re: Cloning in D

2010-09-06 Thread Jacob Carlborg

On 2010-09-06 16:01, dsimcha wrote:

== Quote from Jacob Carlborg (d...@me.com)'s article

I've looked at this problem now but I don't know I can detect the
aliasing. Suggestions ?


Well, here's what I was thinking of doing when I was thinking of rolling my own:

Traverse the object graph once.  Create some data structure of all pointer
indirection, including both the start address and the length.  Then, sort this 
by
starting address and find all overlapping regions.  Whenever you find an
overlapping region, duplicate/serialize it as a single block and make 
everything a
slice into it.


I will give this a try.

--
/Jacob Carlborg


Re: Cloning in D

2010-09-06 Thread Jacob Carlborg

On 2010-09-06 16:12, Andrei Alexandrescu wrote:

On 09/05/2010 10:15 PM, dsimcha wrote:

I've started playing around with Orange a little to see whether it
would meet
D's cloning needs. IMHO one must-have feature for proper cloning that
truly
"just works" is full aliasing preservation.


What do you mean by aliasing preservation? I'd think you want no
aliasing between the clone and the original. (Perhaps you mean that the
object graph of the clone and original look the same?)

First off, let's see what cloning methods we have:

1. We know how to clone built-in types except pointers. (Arrays are
cloned by duplicating the array proper and cloning each element.)

2. Structs can be cloned with either this(this) or by cloning each of
their fields.

3. For classes we can define a conventional method/interface, e.g.
clone/Cloneable.

Now, given some arbitrary value v, we know how to obtain a clone of it
by applying the rules above. The problem is, there are two opaque calls:
this(this) for structs and obj.clone() for classes. If improperly
implemented, the whole thing comes unglued. The problem is the standard
library (and others) must rely on correct deep cloning without being
able to check it.

Although it's difficult to implement deep cloning, it's relatively easy
to _verify_ it. So I'm thinking, a generic deepdup() routine would apply
cloning per the rules above and then would verify that opaque calls
obj.clone() and this(this) did not produce any aliasing.

Bottom line, this clarifies we need dynamic field information for
classes. Otherwise we wouldn't be able to "see" through polymorphism.


Andrei


That would be getMembers in ClassInfo, but it doesn't currently work: 
issue 2844. It also returns an empty array, I think, can't find the 
ticket for that though. We also need to be able to set and get the value 
of a field, don't know if this would be possible by using the offset 
field in the MemberInfo_field class. This is also needed for a 
completely transparent serialization implementation (i.e. the user 
doesn't have to implement or register any methods/functions).


--
/Jacob Carlborg


Re: Cloning in D

2010-09-06 Thread dsimcha
== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
> On 09/05/2010 10:15 PM, dsimcha wrote:
> > I've started playing around with Orange a little to see whether it would 
> > meet
> > D's cloning needs.  IMHO one must-have feature for proper cloning that truly
> > "just works" is full aliasing preservation.
> What do you mean by aliasing preservation? I'd think you want no
> aliasing between the clone and the original. (Perhaps you mean that the
> object graph of the clone and original look the same?)

Right.  This means if you had, for example, two slices that point to overlapping
regions of memory anywhere in your original object graph, they'd overlap instead
of pointing to distinct regions in your cloned object graph.

> First off, let's see what cloning methods we have:
> 1. We know how to clone built-in types except pointers. (Arrays are
> cloned by duplicating the array proper and cloning each element.)

I think we could handle even pointers in some limited cases.  If it's allocated 
on
the GC heap (so we can easily find the bounds of the memory region) and the type
pointed to doesn't have any additional indirection (so we don't need to follow
possibly invalid pointers), then we just duplicate the entire GC memory block.

For pointers to non-primitives, we could arbitrarily assume that the pointer
points to exactly one instance, and is being used simply as a reference.

> 2. Structs can be cloned with either this(this) or by cloning each of
> their fields.

Yeah, structs with postblits are a huge, hairy PITA.  If the struct is using
reference counting, cloning the entire object graph reachable from the struct 
will
yield an incorrect reference count field.  I think that for structs we should
always assume that transitively duplicating the entire object graph field by 
field
is a safe way to clone the struct, unless it comes with its own clone() method.
this(this) can be used for too many other things, like the aforementioned ref
counting.

> 3. For classes we can define a conventional method/interface, e.g.
> clone/Cloneable.

Right, and there should be a mixin to implement this for the simple cases, 
rather
than making the programmer do it him/herself.

One concern w/ opaque clone() calls is establishing a protocol by which aliasing
(array slice overlapping, etc.) information can be shared between the clone()
function and the main deepdup() function, as well as among different clone 
functions.



Re: Cloning in D

2010-09-06 Thread Andrei Alexandrescu

On 09/05/2010 10:15 PM, dsimcha wrote:

I've started playing around with Orange a little to see whether it would meet
D's cloning needs.  IMHO one must-have feature for proper cloning that truly
"just works" is full aliasing preservation.


What do you mean by aliasing preservation? I'd think you want no 
aliasing between the clone and the original. (Perhaps you mean that the 
object graph of the clone and original look the same?)


First off, let's see what cloning methods we have:

1. We know how to clone built-in types except pointers. (Arrays are 
cloned by duplicating the array proper and cloning each element.)


2. Structs can be cloned with either this(this) or by cloning each of 
their fields.


3. For classes we can define a conventional method/interface, e.g. 
clone/Cloneable.


Now, given some arbitrary value v, we know how to obtain a clone of it 
by applying the rules above. The problem is, there are two opaque calls: 
this(this) for structs and obj.clone() for classes. If improperly 
implemented, the whole thing comes unglued. The problem is the standard 
library (and others) must rely on correct deep cloning without being 
able to check it.


Although it's difficult to implement deep cloning, it's relatively easy 
to _verify_ it. So I'm thinking, a generic deepdup() routine would apply 
cloning per the rules above and then would verify that opaque calls 
obj.clone() and this(this) did not produce any aliasing.


Bottom line, this clarifies we need dynamic field information for 
classes. Otherwise we wouldn't be able to "see" through polymorphism.



Andrei


Re: Cloning in D

2010-09-06 Thread dsimcha
== Quote from Jacob Carlborg (d...@me.com)'s article
> I've looked at this problem now but I don't know I can detect the
> aliasing. Suggestions ?

Well, here's what I was thinking of doing when I was thinking of rolling my own:

Traverse the object graph once.  Create some data structure of all pointer
indirection, including both the start address and the length.  Then, sort this 
by
starting address and find all overlapping regions.  Whenever you find an
overlapping region, duplicate/serialize it as a single block and make 
everything a
slice into it.


Re: Cloning in D

2010-09-06 Thread Jacob Carlborg

On 2010-09-06 10:44, Jacob Carlborg wrote:

On 2010-09-06 05:15, dsimcha wrote:

I've started playing around with Orange a little to see whether it
would meet
D's cloning needs. IMHO one must-have feature for proper cloning that
truly
"just works" is full aliasing preservation. For example, the following
code
modified slightly from the Orange example doesn't work properly:

import orange._; // import the whole library

class A
{
int[] arr1;
int[] arr2;

equals_t opEquals (Object other)
{
if (auto a = cast(A) other)
return a.arr1 == this.arr1&& a.arr2 == this.arr2;

return false;
}
}

void main ()
{
auto a = new A; // create something to serialize
a.arr1 = [1,2,3,4,5];
a.arr2 = a.arr1[1..$];

auto serializer = new Serializer!(XMLArchive!());
auto data = serializer.serialize(a);

println(data);

auto a2 = serializer.deserialize!(A)(data);
assert(a == a2);

a2.arr2[0] = 0;
println(a2.arr1); // [1,2,3,4,5]
}

Note that Orange gets this right for class references that point to
the same
object, but not for arrays that overlap.

A few questions:

1. Are most serialization libraries in other languages capable of getting
this right?

2. Do others agree that full aliasing preservation is essential with
regard
to array slices?

3. Jacob, do you think you could fix Orange to support this properly?


I can have a look and see what I can do. Could you please report a
ticket so I don't forget.


I've looked at this problem now but I don't know I can detect the 
aliasing. Suggestions ?


--
/Jacob Carlborg


Re: Cloning in D

2010-09-06 Thread Jacob Carlborg

On 2010-09-06 05:15, dsimcha wrote:

I've started playing around with Orange a little to see whether it would meet
D's cloning needs.  IMHO one must-have feature for proper cloning that truly
"just works" is full aliasing preservation.  For example, the following code
modified slightly from the Orange example doesn't work properly:

import orange._; // import the whole library

class A
{
 int[] arr1;
 int[] arr2;

 equals_t opEquals (Object other)
 {
 if (auto a = cast(A) other)
 return a.arr1 == this.arr1&&  a.arr2 == this.arr2;

 return false;
 }
}

void main ()
{
 auto a = new A; // create something to serialize
 a.arr1 = [1,2,3,4,5];
 a.arr2 = a.arr1[1..$];

 auto serializer = new Serializer!(XMLArchive!());
 auto data = serializer.serialize(a);

 println(data);

 auto a2 = serializer.deserialize!(A)(data);
 assert(a == a2);

 a2.arr2[0] = 0;
 println(a2.arr1);  // [1,2,3,4,5]
}

Note that Orange gets this right for class references that point to the same
object, but not for arrays that overlap.

A few questions:

1.  Are most serialization libraries in other languages capable of getting
this right?

2.  Do others agree that full aliasing preservation is essential with regard
to array slices?

3.  Jacob, do you think you could fix Orange to support this properly?


I can have a look and see what I can do. Could you please report a 
ticket so I don't forget.


--
/Jacob Carlborg


Cloning in D

2010-09-05 Thread dsimcha
I've started playing around with Orange a little to see whether it would meet
D's cloning needs.  IMHO one must-have feature for proper cloning that truly
"just works" is full aliasing preservation.  For example, the following code
modified slightly from the Orange example doesn't work properly:

import orange._; // import the whole library

class A
{
int[] arr1;
int[] arr2;

equals_t opEquals (Object other)
{
if (auto a = cast(A) other)
return a.arr1 == this.arr1 && a.arr2 == this.arr2;

return false;
}
}

void main ()
{
auto a = new A; // create something to serialize
a.arr1 = [1,2,3,4,5];
a.arr2 = a.arr1[1..$];

auto serializer = new Serializer!(XMLArchive!());
auto data = serializer.serialize(a);

println(data);

auto a2 = serializer.deserialize!(A)(data);
assert(a == a2);

a2.arr2[0] = 0;
println(a2.arr1);  // [1,2,3,4,5]
}

Note that Orange gets this right for class references that point to the same
object, but not for arrays that overlap.

A few questions:

1.  Are most serialization libraries in other languages capable of getting
this right?

2.  Do others agree that full aliasing preservation is essential with regard
to array slices?

3.  Jacob, do you think you could fix Orange to support this properly?