Thanks for the feedback guys.
Based on the discussion I propose a different set of changes:
1.
shuffle
^ self shuffleBy: Random new
2.
shuffled
^ self copy shuffle
3. remove #shuffledBy: (if you're specific enough to use a custom Random
instance you can also create a copy of the collection yourself if you want to)
I leave the matter of renaming the methods for further discussion. Personally
I'm for more intention revealing selectors; I quite like #shuffle and
#shuffleInPlace. If we can agree on a different naming scheme we should then
apply it to other methods aswell of course (e.g. #sort, #sortInPlace).
Also open for discussion is the use of Collection>>randomForPicking and
Collection>>mutexForPicking in other methods (such as #atRandom). I think it
shouldn't be too big a problem to make those methods use individual Random
instances and to remove the two class variables from Collection.
Cheers,
Max
On 24.09.2013, at 02:33, [email protected] wrote:
> Nicolas Cellier wrote:
>>
>> The reason why I don't like the explicit "copy", is because this is the
>> default behaviour.
>> In place mutation is an (evil) exception.
>> I'm personnally fine with current conventions:
>> - sort, shuffle, reverse, replace: are imperative, thus you are telling
>> collection, sort yourself !
>> - sorted clearly sounds as a sorted copy for me.
>> If ever a change was to be made, I would much prefer this to happen on
>> opposite naming, sort -> inPlaceSort, or sortYourself :)
>> It's like telling the programmer, hey - a stateful mutation is really what
>> you want, is it?
>>
>
> Conventions should encourage good programming practice. Although it makes
> perfect sense on hearing it, I was explicitly not aware of the convention
> between 'sort' and 'sorted'. For me, the small difference in letters used
> between the two doesn't provide a wide enough distinction. Also, regarding
> not mutating state, I guess I knew that in the back of my head, but it has
> not generally been an explicit concern of mine while programming. Convention
> should encourage good programming. So I like the suggestion of inPlaceSort &
> sortYourself. As well as being more intention revealing, the less desirable
> methods are made less attractive by using a longer name. Perhaps another
> naming convention could be mutatedSort, or mutSort as a hint that the
> programmer should consider using these in conjunction with mutexes, or at own
> risk.
>
> cheers -ben
>
>
>
>>
>> 2013/9/23 Stéphane Ducasse <[email protected]>
>>
>>
>>> +1
>>> Now I understand the point of max with the name sort/sorted because this
>>> is not explicit to me too :)
>>> I laways have to think.
>>>
>>> On Sep 23, 2013, at 6:28 PM, Nicolas Cellier <
>>> [email protected]> wrote:
>>>
>>> 100% agree.
>>> Do it right > do it fast.
>>> We must not turn usage of the library into something fragile for the sake
>>> of speed.
>>> We already make the code itself fragile more often than not in term of
>>> complexity (harder to understand/test/change).
>>>
>>> Especially, introduction of shared mutable states (global, class vars,
>>> singleton or any other form) should ring an alarm in reviewers head
>>> (This is some very old Squeak code in this case, so Squeakers are to
>>> blame, but we're all in same bath).
>>>
>>>
>>>
>>> 2013/9/23 Henrik Johansen <[email protected]>
>>>
>>>
>>>> On Sep 23, 2013, at 3:33 , Max Leske <[email protected]> wrote:
>>>>
>>>>
>>>>>> If a user is going to modify the same object concurrently, he/she
>>>>>>
>>>> takes care of mutual exclusion.
>>>>
>>>>> Agreed. BUT: the Random object used by these methods is the same one
>>>>>
>>>> that is used by #atRandom for instance, hence the race condition. There is
>>>> no way anyone can safely use these methods without the mutex, single
>>>> threaded or not. Calls to methods using that same Random object can be all
>>>> over the place and also in the base system.
>>>>
>>>>
>>>> It seems to me an existing Random instance is used in this case mostly*
>>>> for performance.
>>>> One could argue that since the Random in this case is used for a bulk
>>>> operation, for which the object creation cost is largely amortized for
>>>> collection sizes > 20, it's acceptable to instead use Random new by
>>>> default, which wouldn't suffer from the same race conditions.
>>>> While still slower than a mutex-protected version for single-threaded
>>>> code, it would also scale correctly if the users (and vm) are actually
>>>> multi-threaded.
>>>>
>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffle] bench '208,000 per second.'
>>>> '222,000 per second.' '223,000 per second.'
>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleWithMutex] bench '188,000 per
>>>> second.' '186,000 per second.''184,000 per second.'
>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleNewRandom] bench '167,000 per
>>>> second.' '166,000 per second.' '167,000 per second.'
>>>>
>>>> Cheers,
>>>> Henry
>>>>
>>>> * Low seed entropy is another issue, but if purely random shuffling is a
>>>> critical requirement, one shouldn't use the default Random generator
>>>> anyways...
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>