Re: D array expansion and non-deterministic re-allocation

Steven Schveighoffer Tue, 24 Nov 2009 04:50:24 -0800

On Mon, 23 Nov 2009 18:34:48 -0500, Leandro Lucarella <llu...@gmail.com>wrote:

Steven Schveighoffer, el 23 de noviembre a las 15:18 me escribiste:

On Mon, 23 Nov 2009 11:10:48 -0500, Leandro Lucarella <llu...@gmail.com>
wrote:
>
>The thing is, with realloc() is less likely that you forget that thedata>can be copied because it returns the new pointer (that can be the sameas>the original pointer). And even in this case, I saw a lot of bugsrelated
>to realloc() misuse (and I made a couple myself).
>
>With slices is much worse.
realloc is worse.  If I have multiple aliases to the same data, then
I realloc one of those, the others all can become dangling pointers
if the runtime decides to move the data.


Well, you are comparing GC vs no-GC, not realloc() vs. slices. I have no
intention on following that discussion, I'm just saying that realloc() is
less error-prone than slices (and realloc() is error-prone already).

I was originally saying that realloc is similar to appending because itmay or may not choose to move the data.The only difference from realloc in slices is that your other aliases tothe old data don't become invalidated, so there are no dangling pointers.The GC is partly the cause for that, but also the fact that slices don'tdeallocate the original data (that would be possible, even in a GC'denvironment).

Your assertion that realloc is less error prone doesn't make any sense tome. realloc is as error prone as appending slices *and* it can also leavedangling pointers.

You also cannot realloc data that's not malloc'd but you can append to
a slice of non-heap data without issue.


How is that? AFAIK slices uses gc_realloc() to do the actual realloc, if
that's done in a piece of malloc'ed data it will fail.

I believe step 1 of the append process is to find the GC heap block thatcontains the slice. If it doesn't, it simply copies the data to a newblock. If the original data is stack-allocated or allocated outside theGC, no errors occur AFAIK.

And even if it were
true, I don't really see this as a big source of bugs, I really never had
a bug because I tried to realloc() a non-heap piece of memory orappending
to a slice of non-GC-heap memory either.

You probably didn't have such bugs because you probably didn't think ofdoing this in C:


void foo(char *arg)
{
   realloc(arg, 10000);
}

Oops, what if arg isn't malloc'd? But in D this is safe code, and worksas you expect, no matter what size arg was to begin with. In fact, tangouses this all the time for buffer optimization. Basically, you pass in astack buffer, if it's big enough, you save the penalty of heapallocation. If it's not, no big deal, just resize the buffer and the GCtakes care of the rest.


void foo(char[] arg)
{
   arg.length = 10000;
}

No matter what you do with slice appending in D, you cannot access
dangling pointers unless you access the slice's ptr field.


Again, that's only because D is GCed, not because slices are great.

We are comparing realloc to slices. You assert slices are worse and Igave you a reason why they are not. You can't just ignore that becausecomparing slices to realloc requires taking the GC into account.

Yes, you can run into trouble if you append to a slice and then
change the original data in the slice, but that is a rare event, and
it won't result in corrupting memory you didn't already have access
to (at least, with the MRU cache).

I'm a little lost now. I don't know of what hypothetical D are youtalking

about.

Here's the deal. You have two very common operations on an array:appending and modification. In practice you either append data or youmodify data. Rarely do you append data *and* modify the original data,and the only case I've seen for that is for the buffer optimization trickabove. This is the only case where slices might get you"non-deterministic" behavior (and that's only if you still want to use theoriginal array before appending). I have no proof of this except for myexperience reading tango code.

The two major use cases for appending are, building an array of itemsusing append, and using an array as a buffer. In the first case, youstart out with an empty or fully owned array, so no harm. In the secondcase, there are actually two types of functions that accept buffers, onesthat return a slice of the buffer, and ones which don't. The ones thatreturn a slice, you use the slice, not the original buffer. The oneswhich don't return a slice, you use the original buffer if you expectimportant data to be there.

I don't think there's ever a case where you see a function that takes anarray append to the array, then modify the original part of the array*without* returning the result.

I can't see how the MRU cache can provide any safety. The cache is
finite, and not all the slices will fit in it, so for those slices that
are not cached, I don't see how the cache can provide any safety.

In those cases, the slice is reallocated because the runtime isn't surewhether it will result in stomping. The MRU cache is an optimizationwhich defaults to reallocation for safety.

Andrei has mentioned that he thinks we can store the allocated length inthe GC block, which I think would also work. You also wouldn't need anMRU cache in that case, but he says it's in *addition* to the MRU cache,so I'm not sure what he means.

Safety can be provided if a ~= b is defined to be semantically the sameas
a = a ~ b, and leaving the MRU cache as an optimization.

Yes, exactly. That is what the MRU cache does. It's no good if thecurrent behavior becomes the fallback, the fallback must be a reallocation.

In that case we
agree slices are predictable and a little safer (because stomping is not
possible). But they are still error prone if you expect them to be a full
reference type. Being the only entity in D with such semantics, is
something one can forget very easily and introduce subtle bugs. In this

case, I really think providing ~= is a bad idea, it's just too errorprone

and doesn't give you anything.

Providing the simple syntax for simple cases is what this does. I thinkit's worth having in the language.

Let's all face it: slices and arrays being the same thing is a new conceptthat most of us have only seen in D (I don't know many languages, butprobably it's been done before). And the hybrid value/reference nature ofa slice is definitely new to me. I think it provides great power andflexibility that makes code just work the way you want it to *in mostcases*. If you read a lot of the tango code, you see how elegant thingscan be with slices. But it's always going to be confusing to newcomers.It's like explaining pointers for the first time to someone.


I still think the best is to just make slices immutable value types (you
can mutate the data they point to, you just can't modify slices; ptr and
length), and provide a proper dynamic array type in the library or
something.

I think you mean slices should only be *shrinkable*, immutable makes nosense, you need to be able to rebind a slice.


I don't see the harm in allowing appending as long as it's safe.

Having a separate type in the library that optimizes appending and hascomplete reference semantics is fine with me, just leave slices and arraysalone (well, except the fix for stomping). They *can* live together inthe same language.


-Steve

Re: D array expansion and non-deterministic re-allocation

Reply via email to