Tim Converse wrote:
> --- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote:
> 
>>Dan Kalowsky wrote:
>>
>>>Anyone able to confirm or deny the validity of this
>>
>>patch?
>>
>>>
>>>---------- Forwarded message ----------
>>>
>>>[2002-07-18 01:03:51] [EMAIL PROTECTED]
>>>
>>>The shuffle() function in the CVS is (still) broken. It
>>
>>does not
>>
>>>correctly generate all possible combinations of the
>>
>>array. (I know this
>>
>>>function was recently updated, this test is with new
>>
>>version.)
>>
>>>Here is a diff to fix the problem:
>>>
>>>--- array.c  Thu Jul 18 00:50:54 2002
>>>+++ array.new.c      Thu Jul 18 00:49:35 2002
>>>@@ -1441,7 +1441,7 @@
>>> {
>>>     Bucket **elems, *temp;
>>>     HashTable *hash;
>>>-    int j, n_elems, cur_elem = 0, rnd_idx, n_left;
>>>+    int j, n_elems, rnd_idx, n_left;
>>>
>>>     n_elems = zend_hash_num_elements(Z_ARRVAL_P(array));
>>>
>>>@@ -1457,13 +1457,12 @@
>>>             elems[j++] = temp;
>>>     while (--n_left) {
>>>             rnd_idx = php_rand(TSRMLS_C);
>>>-            RAND_RANGE(rnd_idx, cur_elem, n_left, PHP_RAND_MAX);
>>>-            if (rnd_idx != cur_elem) {
>>>-                    temp = elems[cur_elem];
>>>-                    elems[cur_elem] = elems[rnd_idx];
>>>+            RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX);
>>>+            if (rnd_idx != n_left) {
>>>+                    temp = elems[n_left];
>>>+                    elems[n_left] = elems[rnd_idx];
>>>                     elems[rnd_idx] = temp;
>>>             }
>>>-            cur_elem++;
>>>     }
>>>
>>>     HANDLE_BLOCK_INTERRUPTIONS();
>>>
>>
>>I've send the message to Dan, already.
>>Just posting php-dev again.
>>
>>Top elements have more chance than end elements.
>>All elements should have equal chance to be shuffled.
> 
> 
> Actually, I think the algorithm of the patched version is
> correct.  It's counterintuitive, but it doesn't matter how
> many times any slot has an element shuffled out of it. 

It may not be intuitive that the algorithm is not perfectly
correct. It may be enough for most purposes, though.

> What matters is the invariant that whenever elems[n_left]
> is set, it's being set to an element randomly chosen from
> the subset of elements that are left (including
> elems[n_left]), and also that elems[n_left] is not messed
> with thereafter. 

For example, if the last element isn't shuffled, it's stays
in as the last element since the last element has only 1 chance
to be shuffled.

Current code has tendency that last elements stay as
last.

i.e. Probability of the last element is shuffled is
   (total_num-1)/total_num.
while n th element to be shuffled is obviously has much
more chance than last elements. It may be acceptable
for most purposes.

> 
> You would think that the right thing to do is to set
> elems[n_left] to a random choice from _all_ the elements in
> the array, but this leads to some orderings being favored
> over others.  (Check out Intro. to Algo., Cormen, 2nd ed.,
> p 103)

elemes[n_left] to a random choice does not make sense, of course.

To give equal chances to be shuffled for all elements, we
can write something like

for (current=0; current < num_of_elems; current++) {
   random_index = rand();
   RAND_RANGE(random_index, 0, num_of_elems-1, PHP_RAND_MAX);
   if (array[current] != array[random_index])
     swap(array[current], array[random_index]);
}

Current code is almost ok, but this way is correct.
This does not require any additional overhead at all, also.

--
Yasuo Ohgaki


-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to