Re: assumeSafeAppend inconsistent when multiple slices in use

2011-04-07 Thread simendsjo

On 05.04.2011 22:04, Steven Schveighoffer wrote:

On Tue, 05 Apr 2011 15:24:46 -0400, simendsjo simen.end...@pandavre.com
wrote:


I don't know if this is an actual problem, but I don't understand the
behavior.
When one slice calls assumeSafeAppend, both slices is given control,
that is, gets the parents capacity.


You have to stop thinking of array slices as unique :)



Ok, done.



If you look at the dissection of a slice, it is a pointer and length.
That is all. So if b.ptr == c.ptr  b.length == c.length, then both
slices not only are the same, but are identical. That is, the runtime
cannot tell the difference.

It might be easier to think about it this way. First, there are no
arrays in the traditional definition, only slices. Every array slice
which references the heap is backed by a memory block. The block has a
fixed size (e.g. 16 bytes, 32 bytes, 64 bytes, etc.). However, encoded
in the block itself is the 'used length', which is basically the number
of bytes in the block which are actually being used. If you took a slice
of the block from the beginning of the block and with length equivalent
to the 'used length', we can call this the 'master slice'. The array
appending code allows appending ONLY when the slice being appended ends
at the same point that the master slice ends (and when there is space
available).

all assumeSafeAppend does is adjust that used length so it ends at the
same point as the given slice. So it's not actually blessing anything
about the specific slice you use to call it with, it's just adjusting
the 'master slice' to make it so the array you give will be appendable.



(snip)



If I have thoroughly confused you now, please ask more questions :)


A very thorough explanation which changes my view of arrays. I thought 
slices, array stomping protection and reallocation had very complicated 
rules (and perhaps it has under the hood..?)


I tried to look at assumeSafeAppend's code, and although I didn't 
understand everything, I think that together with your great explanation 
I got it, but lets see...
I'm having some problems explaining it as I'm not sure about the correct 
terminology (I should have drawn some pictures..), but I'll try.


The runtime doesn't track the original array and the slices (for this 
example at least - it obviously do for the sake of gc).
Internally it keeps an address to the current endpoint in the chunk of 
memory like array.ptr+array.length.
Any array that ends on this address is allowed to append fill the memory 
without reallocating as it points to the end of the array. But whenever 
a slice that's allowed to append/change length changes, it will change 
the endpoint address for the chunk of memory, and thus the other slices 
probably points at an earlier element and cannot append.

Any element not pointing at the endpoint will return capacity==0.

There is one thing I don't understand though..
 However, a does not end on the master slice, so its capacity is 0. Note
 that technically using a[1..$] is undefined at this point since the
 runtime considers those elements 'unused'.

What do you mean be undefined in this context? Do you mean because it 
can be overwritten like this?


int[] a = [1,2];
int[] b = a[0..1];
b.assumeSafeAppend();
b.length += 1;
assert(a == [1,0]); // a[1] overwritten as int.init is called on the element

Or is it because it's outside what the runtime considers part of the 
used array and is free to do what it want's with it?
Or doesn't the gc track slices with an endpoint greater than the current 
endpoint so it might return memory to the OS?


Re: assumeSafeAppend inconsistent when multiple slices in use

2011-04-07 Thread Steven Schveighoffer
On Thu, 07 Apr 2011 15:22:10 -0400, simendsjo simen.end...@pandavre.com  
wrote:



On 05.04.2011 22:04, Steven Schveighoffer wrote:
  However, a does not end on the master slice, so its capacity is 0.  
Note

  that technically using a[1..$] is undefined at this point since the
  runtime considers those elements 'unused'.

What do you mean be undefined in this context? Do you mean because it  
can be overwritten like this?


int[] a = [1,2];
int[] b = a[0..1];
b.assumeSafeAppend();
b.length += 1;
assert(a == [1,0]); // a[1] overwritten as int.init is called on the  
element


yes, although the above does not represent technically undefined behavior  
-- you didn't access a[1] when a[1] was not considered part of the used  
space.


Or is it because it's outside what the runtime considers part of the  
used array and is free to do what it want's with it?


Yes on this count too.

For example, it might be that some runtime implementations, when you call  
assumeSafeAppend, could write 0xdeadbeef into the elements that are  
no-longer valid.  This is perfectly acceptable, and as long as you don't  
access those elements, the behavior should be predictable.


I'm not a language lawyer, so maybe the better term is 'implementation  
defined'.  I'm not sure.


Or doesn't the gc track slices with an endpoint greater than the current  
endpoint so it might return memory to the OS?


The GC does not partially deallocate a block.  These slices will still  
point to the block, so it won't be collected.  Note that the GC proper  
does not really know about the used length. It just knows that there is  
an allocated block of size X.  Only the runtime knows what data is 'used'  
and what is 'free' inside the block (specifically the array management  
part of the runtime).


-Steve


Re: assumeSafeAppend inconsistent when multiple slices in use

2011-04-07 Thread Steven Schveighoffer
On Thu, 07 Apr 2011 15:22:10 -0400, simendsjo simen.end...@pandavre.com  
wrote:


I'm having some problems explaining it as I'm not sure about the correct  
terminology (I should have drawn some pictures..), but I'll try.


The runtime doesn't track the original array and the slices (for this  
example at least - it obviously do for the sake of gc).
Internally it keeps an address to the current endpoint in the chunk of  
memory like array.ptr+array.length.
Any array that ends on this address is allowed to append fill the memory  
without reallocating as it points to the end of the array. But whenever  
a slice that's allowed to append/change length changes, it will change  
the endpoint address for the chunk of memory, and thus the other slices  
probably points at an earlier element and cannot append.

Any element not pointing at the endpoint will return capacity==0.


This all looks correct.

-Steve


Re: assumeSafeAppend inconsistent when multiple slices in use

2011-04-05 Thread Steven Schveighoffer

On Tue, 05 Apr 2011 15:24:46 -0400, simendsjo simen.end...@pandavre.com
wrote:


I don't know if this is an actual problem, but I don't understand the
behavior.
When one slice calls assumeSafeAppend, both slices is given control,
that is, gets the parents capacity.


You have to stop thinking of array slices as unique :)

If you look at the dissection of a slice, it is a pointer and length.   
That is all.  So if b.ptr == c.ptr  b.length == c.length, then both  
slices not only are the same, but are identical.  That is, the runtime  
cannot tell the difference.


It might be easier to think about it this way.  First, there are no arrays  
in the traditional definition, only slices.  Every array slice which  
references the heap is backed by a memory block.  The block has a fixed  
size (e.g. 16 bytes, 32 bytes, 64 bytes, etc.).  However, encoded in the  
block itself is the 'used length', which is basically the number of bytes  
in the block which are  actually being used.  If you took a slice of the  
block from the beginning of the block and with length equivalent to the  
'used length', we can call this the 'master slice'.  The array appending  
code allows appending ONLY when the slice being appended ends at the same  
point that the master slice ends (and when there is space available).


all assumeSafeAppend does is adjust that used length so it ends at the  
same point as the given slice.  So it's not actually blessing anything  
about the specific slice you use to call it with, it's just adjusting the  
'master slice' to make it so the array you give will be appendable.


Keeping all this in mind, let's go through your example to see how  
assumeSafeAppend actually works



int[] a;
a.length = 4;


At this point, the array allocates a block large enough to hold 4  
integers.  You'd think that was 16 bytes, but we need to have a byte to  
store the 'used length', so this actually turns out to be 32 bytes (of  
which 31 can be used to store data).  So a.capacity should be 7.


So what we see here is, a points to the beginning of the block, it's  
length is 4, and the master slice's length is 16 (remember the master  
slice is always in bytes).  For simplicity's sake, we'll assume the master  
slice is an int[] slice, so we'll call the length 4.




int[] b = a[0..1];
int[] c = a[0..1];


each of these slices' lengths are 1, which mean they don't end at the same  
point as the master slice.  Therefore, they are not appendable in-place  
(capacity of 0).




// appending to b or c reallocates
assert(a.capacity   0);
assert(b.capacity == 0);
assert(c.capacity == 0);


ok so far.



// gives a's capacity to b AND c
b.assumeSafeAppend();


This now sets the master slice to be the same length as b (which is 1).


assert(a.capacity == 0);
assert(b.capacity   0);
assert(c.capacity   0); // why does c get capacity?


So now, even though you called assumeSafeAppend on b, both b and c end at  
the same place as the master slice.  So they are both appendable.  The  
runtime cannot tell the difference between b and c, so calling capacity  
with either should be absolutely equivalent.


However, a does not end on the master slice, so its capacity is 0.  Note  
that technically using a[1..$] is undefined at this point since the  
runtime considers those elements 'unused'.




// .. until one of the slices modifies. Just changing
assumeSafeAppend does not change this behavior
b ~= 1;


This increments the master slice's length by 1 and sets the value of the  
new element to 1.



// now b has full control
assert(a.capacity == 0);
assert(b.capacity   0);
assert(c.capacity == 0);


Since b's length still matches the master slice's length (2), it's still  
appendable.  However, a's length is still 4 and c's length is still 1, so  
neither is appendable.




// c full control
c.assumeSafeAppend();
assert(a.capacity == 0);
assert(b.capacity == 0);
assert(c.capacity   0);


Sets the master slice length to 1.  a's length is 4, b's length is 2, so  
neither are appendable.




// a full control
a.assumeSafeAppend();
assert(a.capacity   0);
assert(b.capacity == 0);
assert(c.capacity == 0);


Sets the master slice length to 4.  b's length is 2, c's length is 1, so  
neither are appendable.




// but now ONLY b gets full control. Not both b and c as the
first assumeSafeAppend
b.assumeSafeAppend();
assert(a.capacity == 0);
assert(b.capacity   0);
assert(c.capacity == 0);


Right, because b and c are no longer identical.  The call to  
assumeSafeAppend sets the master slice's length to match b's length of 2.   
So now, a and c, which don't have a length of 2, are now not appendable.


Note, if you did b ~= 1; b ~=1; then the master slice's length would equal