On Monday, December 20, 2010 10:44:12 doubleagent wrote: > > Are you 100% sure that you are running this version > > I have to be. There are no other versions of phobos on this box and 'which > dmd' points to the correct binary. > > > dictionary[word.idup] = newId; > > That fixes it. > > > The 'word' array is mutable and reused by byLine() on each iteration. By > > doing the above you use an immutable copy of it as the key instead. > > I REALLY don't understand this explanation. Why does the mutability of > 'word' matter when the associative array 'dictionary' assigns keys by > value...it's got to assign them by value, right? Otherwise we would only > get one entry in 'dictionary' and the key would be constantly changing.
Okay. I don't know what the actual code looks like, but word is obviously a dynamic array, and if it's from byLine(), then that dynamic array is mutable - both the array itself and its elements. Using idup gets you an immutable copy. When copying dynamic arrays, you really get a slice of that array. So, you get an array that points to the same array as the original. Any changes to the elements in one affects the other. If you append to one of them and it doesn't have the space to resize in place or dyou o anything else which could cause it to reallocate, then that array is reallocated and they no longer point to the same data and changing will not change the other. If the elements of the array are const or immutable, then the fact that the two arrays point to the same data isn't a problem because the elements can't be changed (except in cases where you'red dealing with const rather than immutable and another array points to the same data but doesn't have const elements). So, assigning one string to another, for instance (string being an alias for immutable(char)[]), will never result in one string altering another. However, if you're dealing with char[] rather than string, one array _can_ affect the elements of another. I believe that byLine() deals with a char[], not a string. Now, as for associative arrays, they don't really deal with const correctly. I believe that they're actually implemented with void* and you can actually do things like put const elements in them in spite of the fact that toHash() on Object is not currently const (there is an open bug on the fact that Object is not const-correct). So, it does not surprise me in the least if it will take mutable types as its key and then allow them to be altered (assuming that they're pointers or reference types and you can therefore have other references to them). But to fix the problem in this case would require immutability rather than const, because you're dealing with a reference type (well, pseudo-reference type since dynamic arrays share their elements such that changes to their elements affect all arrays which point to those elements, but other changes - such as altering their length don't affect other arrays and will even likely result in the arrays then being completely separate). > The behavior itself seems really unpredictable prior to testing, and really > unintended after testing. I suspect it's due to some sort of a bug. The > program, on my box anyway, only fails when we give it identical strings, > except one is prefixed with a space. That should tell us that 'splitter' > and 'strip' didn't do their job properly. The fly in the ointment is that > when we output the strings, they appear as we would expect. > > I suspect D does string comparisons (when the 'in' keyword is used) based > on some kind of a hash, and that hash doesn't get correctly updated when > 'strip' or 'splitter' is applied, or upon the next comparison or whatever. > Calling 'idup' must force the hash to get recalculated. Obviously, you > guys would know if there's any merit to this, but it seems to explain the > problem. in should use toHash() (or whatever built-in functions for built-in types if you're not dealing with a struct or class) followed by ==. I'd be stunned if there were any caching involved. The problem is that byLine() is using a mutable array, so the elements pointed to by the array that you just put in the associative array changed, which means that the hash for them is wrong, and == will fail when used to compare the array to what it was before. > > The advantage with splitter is that it is lazy and therefore more > > efficient. split() is eager and allocates memory to hold the string > > fragments. > > Yeah, that's what I thought would be the answer. Kudos to you guys for > thinking of laziness out of the box. This is a major boon for D. > > You know, there's something this touches on which I was curious about. If > D defaults to 'safety first', and with some work you can get > down-to-the-metal, why doesn't the language default to immutable > variables, with an explicit modifier for mutable ones? C compatibility? C compatability would be one reason. Familiarity would be another. Also, it would be _really_ annoying to have to mark variables mutable all over the place as you would inevitably have to do. The way that const and immutable are designed in D, to some extent, you can pretty much ignore them if you don't want to use them, which some folks like Andrei deem important. Making immutable the default would force it on everyone. - Jonathan M Davis