On Mon, 23 Nov 2009 18:34:48 -0500, Leandro Lucarella <llu...@gmail.com> wrote:

Steven Schveighoffer, el 23 de noviembre a las 15:18 me escribiste:
On Mon, 23 Nov 2009 11:10:48 -0500, Leandro Lucarella <llu...@gmail.com>
wrote:
>
>The thing is, with realloc() is less likely that you forget that the data >can be copied because it returns the new pointer (that can be the same as >the original pointer). And even in this case, I saw a lot of bugs related
>to realloc() misuse (and I made a couple myself).
>
>With slices is much worse.

realloc is worse.  If I have multiple aliases to the same data, then
I realloc one of those, the others all can become dangling pointers
if the runtime decides to move the data.

Well, you are comparing GC vs no-GC, not realloc() vs. slices. I have no
intention on following that discussion, I'm just saying that realloc() is
less error-prone than slices (and realloc() is error-prone already).

I was originally saying that realloc is similar to appending because it may or may not choose to move the data. The only difference from realloc in slices is that your other aliases to the old data don't become invalidated, so there are no dangling pointers. The GC is partly the cause for that, but also the fact that slices don't deallocate the original data (that would be possible, even in a GC'd environment).

Your assertion that realloc is less error prone doesn't make any sense to me. realloc is as error prone as appending slices *and* it can also leave dangling pointers.

You also cannot realloc data that's not malloc'd but you can append to
a slice of non-heap data without issue.

How is that? AFAIK slices uses gc_realloc() to do the actual realloc, if
that's done in a piece of malloc'ed data it will fail.

I believe step 1 of the append process is to find the GC heap block that contains the slice. If it doesn't, it simply copies the data to a new block. If the original data is stack-allocated or allocated outside the GC, no errors occur AFAIK.

And even if it were
true, I don't really see this as a big source of bugs, I really never had
a bug because I tried to realloc() a non-heap piece of memory or appending
to a slice of non-GC-heap memory either.

You probably didn't have such bugs because you probably didn't think of doing this in C:

void foo(char *arg)
{
   realloc(arg, 10000);
}

Oops, what if arg isn't malloc'd? But in D this is safe code, and works as you expect, no matter what size arg was to begin with. In fact, tango uses this all the time for buffer optimization. Basically, you pass in a stack buffer, if it's big enough, you save the penalty of heap allocation. If it's not, no big deal, just resize the buffer and the GC takes care of the rest.

void foo(char[] arg)
{
   arg.length = 10000;
}


No matter what you do with slice appending in D, you cannot access
dangling pointers unless you access the slice's ptr field.

Again, that's only because D is GCed, not because slices are great.

We are comparing realloc to slices. You assert slices are worse and I gave you a reason why they are not. You can't just ignore that because comparing slices to realloc requires taking the GC into account.


Yes, you can run into trouble if you append to a slice and then
change the original data in the slice, but that is a rare event, and
it won't result in corrupting memory you didn't already have access
to (at least, with the MRU cache).

I'm a little lost now. I don't know of what hypothetical D are you talking
about.

Here's the deal. You have two very common operations on an array: appending and modification. In practice you either append data or you modify data. Rarely do you append data *and* modify the original data, and the only case I've seen for that is for the buffer optimization trick above. This is the only case where slices might get you "non-deterministic" behavior (and that's only if you still want to use the original array before appending). I have no proof of this except for my experience reading tango code.

The two major use cases for appending are, building an array of items using append, and using an array as a buffer. In the first case, you start out with an empty or fully owned array, so no harm. In the second case, there are actually two types of functions that accept buffers, ones that return a slice of the buffer, and ones which don't. The ones that return a slice, you use the slice, not the original buffer. The ones which don't return a slice, you use the original buffer if you expect important data to be there.

I don't think there's ever a case where you see a function that takes an array append to the array, then modify the original part of the array *without* returning the result.

I can't see how the MRU cache can provide any safety. The cache is
finite, and not all the slices will fit in it, so for those slices that
are not cached, I don't see how the cache can provide any safety.

In those cases, the slice is reallocated because the runtime isn't sure whether it will result in stomping. The MRU cache is an optimization which defaults to reallocation for safety.

Andrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.

Safety can be provided if a ~= b is defined to be semantically the same as
a = a ~ b, and leaving the MRU cache as an optimization.

Yes, exactly. That is what the MRU cache does. It's no good if the current behavior becomes the fallback, the fallback must be a reallocation.

In that case we
agree slices are predictable and a little safer (because stomping is not
possible). But they are still error prone if you expect them to be a full
reference type. Being the only entity in D with such semantics, is
something one can forget very easily and introduce subtle bugs. In this
case, I really think providing ~= is a bad idea, it's just too error prone
and doesn't give you anything.

Providing the simple syntax for simple cases is what this does. I think it's worth having in the language.

Let's all face it: slices and arrays being the same thing is a new concept that most of us have only seen in D (I don't know many languages, but probably it's been done before). And the hybrid value/reference nature of a slice is definitely new to me. I think it provides great power and flexibility that makes code just work the way you want it to *in most cases*. If you read a lot of the tango code, you see how elegant things can be with slices. But it's always going to be confusing to newcomers. It's like explaining pointers for the first time to someone.


I still think the best is to just make slices immutable value types (you
can mutate the data they point to, you just can't modify slices; ptr and
length), and provide a proper dynamic array type in the library or
something.


I think you mean slices should only be *shrinkable*, immutable makes no sense, you need to be able to rebind a slice.

I don't see the harm in allowing appending as long as it's safe.

Having a separate type in the library that optimizes appending and has complete reference semantics is fine with me, just leave slices and arrays alone (well, except the fix for stomping). They *can* live together in the same language.

-Steve

Reply via email to