On 8/14/06, Smylers wrote:
David Green writes:
 I guess my problem is that [1,2] *feels* like it should === [1,2].
 You can explain that there's this mutable object stuff going on, and I
 can follow that (sort of...), but it seems like an implementation
 detail leaking out.

The currently defined behaviour seems intuitive to me, from a starting point of Perl 5.

But is Perl 5 the best place to start? It's something many of us are used to, but that doesn't mean it's the best solution conceptually, even if it was the most reasonable way to implement it in P5.

The reason I think it's an implementation wart is that an array -- thought of as a single, self-contained lump -- is different from a reference or pointer to some other variable. Old versions of Perl always eagerly exploded arrays, so there was no way to refer to an array as a whole; put two arrays together and P5 (or P4, etc.) thinks it's just one big array or list. Then when references were introduced, "array-refs" provided a way to encapsulate arrays so we could work with them as single lumps. It's not the most elegant solution, but being able to nest data structures at all was a tremendous benefit, and it was backwards-compatible.

P6 doesn't have to be that backwards-compatible -- it already isn't. P6 more naturally treats arrays as lumps; this may or may not be *implemented* using references as in P5, but it doesn't have to -- or at least, it doesn't have to *look* as though that's how it's doing it. Conceptually, an array consisting only of constant literals, like (1,2,3), isn't referring to anything, so it doesn't need to behave that way.

The difference between:
  my $new = [EMAIL PROTECTED];
and:
  my $new = [EMAIL PROTECTED];

is that the second one is a copy; square brackets always create a new anonymous array rather than merely refering to an existing one, and that's the same thing that's happening here. Think of square brackets as meaning something like Array->new and each one is obviously distinct.

I agree that [EMAIL PROTECTED] should be distinct from [EMAIL PROTECTED] -- in the former case, we're deliberately taking a reference to the @orig variable. What I don't like is that [EMAIL PROTECTED] is distinct from [EMAIL PROTECTED] -- sure, I'm doing something similar to Array->new(1,2) followed by another Array->new(1,2), but I still want them to be the same, just as I want Str->new("foo") to be the same as Str->new("foo"). They're just constants, they should compare equally regardless of how I created them. (And arrays should work a lot like strings, because at some conceptual level, a string is an array [of characters].)

 > And I feel this way because [1,2] looks like it should be platonically
 unique.

I'd say that C< (1, 2) > looks like that. But C< [1, 2] > looks like it's its own thing that won't be equal to another one.

Except [1,2] can look like (1,2) in P6 because it automatically refs/derefs stuff so that things Just Work. That's good, because you shouldn't have to be referencing arrays yourself (hence my point above about an array conceptually being a single lump). But if we're going to hide the [implementational] distinction in some places, we should hide it everywhere.

Actually, my point isn't even about arrays per se; that's just the implementation/practical side of it. You can refer to a scalar constant too:
        perl -e 'print \1, \1'
        SCALAR(0x8104980)SCALAR(0x810806c)

They're different because the *references* are different, but I don't care about that. A reference to a constant value is kind of pointless, because the value isn't going to change. References to *variables* are useful, because you never know what value that variable might have, and refs give you a pointer to the current value of the variable at any time.

The fact that it's even possible to take a reference to a literal is kind of funny, really; but since in P5 you had to be explicit about (de)referencing, it didn't hurt, and you could maybe even find some cute ways to take advantage of it (such as an easy way to get unique IDs out of the str/numification of a ref?). P6 just lets you gloss over certain ref/deref distinctions that in a perfect world wouldn't have existed in the first place.

Leibniz's "identity of indiscernibles" is a perfectly practical principle to pursue in programming. Now [EMAIL PROTECTED] may be discernible from [EMAIL PROTECTED] or [1, @orig] from [1, @other], but \1 is completely the same as \1 in all ways -- all ways except for being able to get a representation of its memory location. And that's not anything about "1", that's a bit of metadata about the reference itself -- something that definitely is based on the implementation.

(I can imagine some other implementation where in a ridiculous attempt to optimise for minimal memory footprint, everything with a value of 1 points to the same address. When I say "$a=1; $a++", $a first points to 0x1234567, and when I increment it, I don't change the bits in that location, instead $a changes to point to address 0x3456789, where my unique 2 value is stored. Then the only way to differentiate \1 from \1 is to generate some arbitrary unique ID. Which would be silly.)

Anyway, I hope I'm making sense about why \1 !=== \1, etc. seems a bit unnatural to me.


-David

Reply via email to