shuffle

Max Leske Sat, 28 Sep 2013 02:01:26 -0700

Thanks for the feedback guys.

Based on the discussion I propose a different set of changes:


1.
shuffle
        ^ self shuffleBy: Random new

2.
shuffled
        ^ self copy shuffle

3. remove #shuffledBy: (if you're specific enough to use a custom Random 
instance you can also create a copy of the collection yourself if you want to)


I leave the matter of renaming the methods for further discussion. Personally 
I'm for more intention revealing selectors; I quite like #shuffle and 
#shuffleInPlace. If we can agree on a different naming scheme we should then 
apply it to other methods aswell of course (e.g. #sort, #sortInPlace).

Also open for discussion is the use of Collection>>randomForPicking and 
Collection>>mutexForPicking in other methods (such as #atRandom). I think it 
shouldn't be too big a problem to make those methods use individual Random 
instances and to remove the two class variables from Collection.

Cheers,
Max


On 24.09.2013, at 02:33, [email protected] wrote:

> Nicolas Cellier wrote:
>> 
>> The reason why I don't like the explicit "copy", is because this is the
>> default behaviour.
>> In place mutation is an (evil) exception.
>> I'm personnally fine with current conventions:
>> - sort, shuffle, reverse, replace: are imperative, thus you are telling
>> collection, sort yourself !
>> - sorted clearly sounds as a sorted copy for me.
>> If ever a change was to be made, I would much prefer this to happen on
>> opposite naming, sort -> inPlaceSort, or sortYourself :)
>> It's like telling the programmer, hey - a stateful mutation is really what
>> you want, is it?
>>   
> 
> Conventions should encourage good programming practice.  Although it makes 
> perfect sense on hearing it, I was explicitly not aware of the convention 
> between 'sort' and 'sorted'.  For me, the small difference in letters used 
> between the two doesn't provide a wide enough distinction.  Also, regarding 
> not mutating state, I guess I knew that in the back of my head, but it has 
> not generally been an explicit concern of mine while programming.  Convention 
> should encourage good programming.  So I like the suggestion of inPlaceSort & 
> sortYourself.  As well as being more intention revealing, the less desirable 
> methods are made less attractive by using a longer name. Perhaps another 
> naming convention could be mutatedSort, or mutSort as a hint that the 
> programmer should consider using these in conjunction with mutexes, or at own 
> risk.
> 
> cheers -ben
> 
> 
> 
>> 
>> 2013/9/23 Stéphane Ducasse <[email protected]>
>> 
>>   
>>> +1
>>> Now I understand the point of max with the name sort/sorted because this
>>> is not explicit to me too :)
>>> I laways have to think.
>>> 
>>> On Sep 23, 2013, at 6:28 PM, Nicolas Cellier <
>>> [email protected]> wrote:
>>> 
>>> 100% agree.
>>> Do it right > do it fast.
>>> We must not turn usage of the library into something fragile for the sake
>>> of speed.
>>> We already make the code itself fragile more often than not in term of
>>> complexity (harder to understand/test/change).
>>> 
>>> Especially, introduction of shared mutable states (global, class vars,
>>> singleton or any other form) should ring an alarm in reviewers head
>>> (This is some very old Squeak code in this case, so Squeakers are to
>>> blame, but we're all in same bath).
>>> 
>>> 
>>> 
>>> 2013/9/23 Henrik Johansen <[email protected]>
>>> 
>>>     
>>>> On Sep 23, 2013, at 3:33 , Max Leske <[email protected]> wrote:
>>>> 
>>>>       
>>>>>> If a user is going to modify the same object concurrently, he/she
>>>>>>           
>>>> takes care of mutual exclusion.
>>>>       
>>>>> Agreed. BUT: the Random object used by these methods is the same one
>>>>>         
>>>> that is used by #atRandom for instance, hence the race condition. There is
>>>> no way anyone can safely use these methods without the mutex, single
>>>> threaded or not. Calls to methods using that same Random object can be all
>>>> over the place and also in the base system.
>>>> 
>>>> 
>>>> It seems to me an existing Random instance is used in this case mostly*
>>>> for performance.
>>>> One could argue that since the Random in this case is used for a bulk
>>>> operation, for which the object creation cost is largely amortized for
>>>> collection sizes > 20, it's acceptable to instead use Random new by
>>>> default, which wouldn't suffer from the same race conditions.
>>>> While still slower than a mutex-protected version for single-threaded
>>>> code, it would also scale correctly if the users (and vm) are actually
>>>> multi-threaded.
>>>> 
>>>> [#(1 2  3 4 5 6 7 8 9 10 11 12) shuffle] bench '208,000 per second.'
>>>>  '222,000 per second.' '223,000 per second.'
>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleWithMutex] bench '188,000 per
>>>> second.'  '186,000 per second.''184,000 per second.'
>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleNewRandom] bench '167,000 per
>>>> second.' '166,000 per second.'  '167,000 per second.'
>>>> 
>>>> Cheers,
>>>> Henry
>>>> 
>>>> * Low seed entropy is another issue, but if purely random shuffling is a
>>>> critical requirement, one shouldn't use the default Random generator
>>>> anyways...
>>>> 
>>>> 
>>>> 
>>>>       
>>>     
>> 
>>   
>

Re: [Pharo-dev] 11635: Race condition in SequenceableCollection>>shuffle

Reply via email to