On Tue, 09 Nov 2010 15:13:55 -0500, Pillsy <pillsb...@gmail.com> wrote:

Steven Schveighoffer Wrote:

On Tue, 09 Nov 2010 08:14:40 -0500, Pillsy <pillsb...@gmail.com> wrote:
[...]
> Ah! This is a lot of what was confusing me about arrays; I still thought
> they had this behavior. The fact that they don't makes me a good deal
> more comfortable with them, though I still don't like the
> non-deterministic way that they may copy their elements or they may
> share structure after you append stuff to them.

As I said before, this rarely affects code.  The common cases I've seen:

1. You append to an array and return it.
2. You modify data in the array.
3. You use a passed in array as a buffer, which means you overwrite the
array, and then start appending when it runs out of space.

I don't ever remember seeing:

You append to an array, then go back and modify the first few bytes of the
array.
I've certainly encountered situations in at least one other language where standard library functions will return mutable arrays which may or may not share structure with their inputs. This has been such a frequent source of pain when using that language that I tend to react very negatively to the possibility in any context.

Care to name names? I want to understand this dislike of D arrays, because out of all the languages I've ever used, D arrays are by far the easiest and most intuitive to use. I don't expect to be convinced, but at least we can have some debate on this, and maybe we can avoid mistakes made by other languages.

Let's assume this is a very common thing and absolutely needs to be
addressed.  What would you like the behavior to be?

Using a different, library type for a buffer you can append to. I think of "a buffer or abstract list you can cheaply append to" as a different sort of type from a fixed size buffer anyway, since it so often is a different type. Arrays/slices are a very basic type in D, and I'm generally thinking that giving your basic types simpler, easier to understand semantics is worth paying a modest cost.

There was a time when the T[new] idea was expected to be part of the language. Both Andrei and Walter were behind it, and seldom does something not make it into the language when that happens.

It turns out, that after all the academic and theoretical discussions were finished, and it came time to implement, it was a clunky and confusing feature. Andrei said that for TDPL he had a whole table dedicated to what type to use in which cases (T[] or T[new]) and he didn't even know how to fill out the table.

The beauty of D's arrays are that the slice and the array are both the same type, so you only need to define one function to handle both, and appending "just works". I feel like this is simply a case of 'not well enough understood.'

BTW, you can allocate a fixed buffer by doing:

T[BUFSIZE] buffer;

This cannot be appended to. It is still difficult to allocate one of these on the heap, which is a language shortcoming, but it can be fixed.

[...]
IMO, the benefits of just being able to append to an array any time you
want without having to set up some special type far outweighs this little
quirk that almost nobody encounters.  You can append to *any* array, no
matter where the data is located, or whether the data is a slice, and it
just works.  I can't see how anyone would prefer another solution!

There's a difference between appending and appending in place. The problem with not appending in place (and arrays not having the possibility of a reserve that's larger than the actual amount, of course) is one of efficiency. Having

auto s = "foo";
s ~= "bar";

result in a new array being allocated that is of length 6 and contains "foobar", and assigning that array to `s`, is obviously useful and desirable behavior. If the expansion can happen in place, that's a perfectly reasonable performance optimization to have in the case of strings or other immutable arrays. Indeed, one of the reasons that functional programming and GC go together like peanut butter and jelly is that together they let you get all sorts of wins in terms of efficiency from shared structure.

However, I've found working with languages that mix a lot of imperative and functional constructs (Lisp is one, but not the only one) that if you're going to do this, it's really very important that there not be any doubt about when mutable state is shared and when it isn't. D is trying to be that same kind of multi-paradigm language. This means that, for mutable arrays, having

int[] x = [1, 2, 3];
x ~= [4, 5, 6];

To leave no doubt about whether this reallocates or not try:

bool willReallocate = x.length + 3 > x.capacity;

But I still don't understand this concept. If you find out it's not going to reallocate, what are you going to do? I mean, you have three cases here:

1. You *don't* want it to reallocate -- well, you can't enforce this, but you can use ref to ensure the original is always affected
2. You *want* it to reallocate -- use dup or ~
3. You don't care -- just use the array directly

I don't see how these three options aren't enough.

maybe reallocate and maybe not seems like it's only really there to protect people from doing inefficient things by accident when they append onto the back of an array repeatedly (or to make that admittedly common case more convenient). This really doesn't strike me as worth the trouble. Like I said elsewhere, the uncertainty gives me the screaming willies.

I hear you, but at the same time, we are talking about common and uncommon cases here. D (at least in my mind) tries to be a practical language -- make the common things easy as long as they are safe. And the cases where D's arrays may surprise you are pretty uncommon IMO.

-Steve

Reply via email to