shuffle

Max Leske Tue, 01 Oct 2013 08:03:09 -0700

bump...


> Thanks for the feedback guys.
> 
> Based on the discussion I propose a different set of changes:
> 
> 1.
> shuffle
>       ^ self shuffleBy: Random new
> 
> 2.
> shuffled
>       ^ self copy shuffle
> 
> 3. remove #shuffledBy: (if you're specific enough to use a custom Random 
> instance you can also create a copy of the collection yourself if you want to)
> 
> 
> I leave the matter of renaming the methods for further discussion. Personally 
> I'm for more intention revealing selectors; I quite like #shuffle and 
> #shuffleInPlace. If we can agree on a different naming scheme we should then 
> apply it to other methods aswell of course (e.g. #sort, #sortInPlace).
> 
> Also open for discussion is the use of Collection>>randomForPicking and 
> Collection>>mutexForPicking in other methods (such as #atRandom). I think it 
> shouldn't be too big a problem to make those methods use individual Random 
> instances and to remove the two class variables from Collection.
> 
> Cheers,
> Max
> 
> 
> On 24.09.2013, at 02:33, [email protected] wrote:
> 
>> Nicolas Cellier wrote:
>>> 
>>> The reason why I don't like the explicit "copy", is because this is the
>>> default behaviour.
>>> In place mutation is an (evil) exception.
>>> I'm personnally fine with current conventions:
>>> - sort, shuffle, reverse, replace: are imperative, thus you are telling
>>> collection, sort yourself !
>>> - sorted clearly sounds as a sorted copy for me.
>>> If ever a change was to be made, I would much prefer this to happen on
>>> opposite naming, sort -> inPlaceSort, or sortYourself :)
>>> It's like telling the programmer, hey - a stateful mutation is really what
>>> you want, is it?
>>>   
>> 
>> Conventions should encourage good programming practice.  Although it makes 
>> perfect sense on hearing it, I was explicitly not aware of the convention 
>> between 'sort' and 'sorted'.  For me, the small difference in letters used 
>> between the two doesn't provide a wide enough distinction.  Also, regarding 
>> not mutating state, I guess I knew that in the back of my head, but it has 
>> not generally been an explicit concern of mine while programming.  
>> Convention should encourage good programming.  So I like the suggestion of 
>> inPlaceSort & sortYourself.  As well as being more intention revealing, the 
>> less desirable methods are made less attractive by using a longer name. 
>> Perhaps another naming convention could be mutatedSort, or mutSort as a hint 
>> that the programmer should consider using these in conjunction with mutexes, 
>> or at own risk.
>> 
>> cheers -ben
>> 
>> 
>> 
>>> 2013/9/23 Stéphane Ducasse <[email protected]>
>>> 
>>>   
>>>> +1
>>>> Now I understand the point of max with the name sort/sorted because this
>>>> is not explicit to me too :)
>>>> I laways have to think.
>>>> 
>>>> On Sep 23, 2013, at 6:28 PM, Nicolas Cellier <
>>>> [email protected]> wrote:
>>>> 
>>>> 100% agree.
>>>> Do it right > do it fast.
>>>> We must not turn usage of the library into something fragile for the sake
>>>> of speed.
>>>> We already make the code itself fragile more often than not in term of
>>>> complexity (harder to understand/test/change).
>>>> 
>>>> Especially, introduction of shared mutable states (global, class vars,
>>>> singleton or any other form) should ring an alarm in reviewers head
>>>> (This is some very old Squeak code in this case, so Squeakers are to
>>>> blame, but we're all in same bath).
>>>> 
>>>> 
>>>> 
>>>> 2013/9/23 Henrik Johansen <[email protected]>
>>>> 
>>>>     
>>>>> On Sep 23, 2013, at 3:33 , Max Leske <[email protected]> wrote:
>>>>> 
>>>>>       
>>>>>>> If a user is going to modify the same object concurrently, he/she
>>>>>>>           
>>>>> takes care of mutual exclusion.
>>>>>       
>>>>>> Agreed. BUT: the Random object used by these methods is the same one
>>>>>>         
>>>>> that is used by #atRandom for instance, hence the race condition. There is
>>>>> no way anyone can safely use these methods without the mutex, single
>>>>> threaded or not. Calls to methods using that same Random object can be all
>>>>> over the place and also in the base system.
>>>>> 
>>>>> 
>>>>> It seems to me an existing Random instance is used in this case mostly*
>>>>> for performance.
>>>>> One could argue that since the Random in this case is used for a bulk
>>>>> operation, for which the object creation cost is largely amortized for
>>>>> collection sizes > 20, it's acceptable to instead use Random new by
>>>>> default, which wouldn't suffer from the same race conditions.
>>>>> While still slower than a mutex-protected version for single-threaded
>>>>> code, it would also scale correctly if the users (and vm) are actually
>>>>> multi-threaded.
>>>>> 
>>>>> [#(1 2  3 4 5 6 7 8 9 10 11 12) shuffle] bench '208,000 per second.'
>>>>>  '222,000 per second.' '223,000 per second.'
>>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleWithMutex] bench '188,000 per
>>>>> second.'  '186,000 per second.''184,000 per second.'
>>>>> [#(1 2 3 4 5 6 7 8 9 10 11 12) shuffleNewRandom] bench '167,000 per
>>>>> second.' '166,000 per second.'  '167,000 per second.'
>>>>> 
>>>>> Cheers,
>>>>> Henry
>>>>> 
>>>>> * Low seed entropy is another issue, but if purely random shuffling is a
>>>>> critical requirement, one shouldn't use the default Random generator
>>>>> anyways...
>>>>> 
>>>>> 
>>>>> 
>>>>>       
>>>>     
>>> 
>>>   
>> 
>

Re: [Pharo-dev] 11635: Race condition in SequenceableCollection>>shuffle

Reply via email to