Tim Converse wrote: > --- Yasuo Ohgaki <[EMAIL PROTECTED]> wrote: > >>Dan Kalowsky wrote: >> >>>Anyone able to confirm or deny the validity of this >> >>patch? >> >>> >>>---------- Forwarded message ---------- >>> >>>[2002-07-18 01:03:51] [EMAIL PROTECTED] >>> >>>The shuffle() function in the CVS is (still) broken. It >> >>does not >> >>>correctly generate all possible combinations of the >> >>array. (I know this >> >>>function was recently updated, this test is with new >> >>version.) >> >>>Here is a diff to fix the problem: >>> >>>--- array.c Thu Jul 18 00:50:54 2002 >>>+++ array.new.c Thu Jul 18 00:49:35 2002 >>>@@ -1441,7 +1441,7 @@ >>> { >>> Bucket **elems, *temp; >>> HashTable *hash; >>>- int j, n_elems, cur_elem = 0, rnd_idx, n_left; >>>+ int j, n_elems, rnd_idx, n_left; >>> >>> n_elems = zend_hash_num_elements(Z_ARRVAL_P(array)); >>> >>>@@ -1457,13 +1457,12 @@ >>> elems[j++] = temp; >>> while (--n_left) { >>> rnd_idx = php_rand(TSRMLS_C); >>>- RAND_RANGE(rnd_idx, cur_elem, n_left, PHP_RAND_MAX); >>>- if (rnd_idx != cur_elem) { >>>- temp = elems[cur_elem]; >>>- elems[cur_elem] = elems[rnd_idx]; >>>+ RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX); >>>+ if (rnd_idx != n_left) { >>>+ temp = elems[n_left]; >>>+ elems[n_left] = elems[rnd_idx]; >>> elems[rnd_idx] = temp; >>> } >>>- cur_elem++; >>> } >>> >>> HANDLE_BLOCK_INTERRUPTIONS(); >>> >> >>I've send the message to Dan, already. >>Just posting php-dev again. >> >>Top elements have more chance than end elements. >>All elements should have equal chance to be shuffled. > > > Actually, I think the algorithm of the patched version is > correct. It's counterintuitive, but it doesn't matter how > many times any slot has an element shuffled out of it.
It may not be intuitive that the algorithm is not perfectly correct. It may be enough for most purposes, though. > What matters is the invariant that whenever elems[n_left] > is set, it's being set to an element randomly chosen from > the subset of elements that are left (including > elems[n_left]), and also that elems[n_left] is not messed > with thereafter. For example, if the last element isn't shuffled, it's stays in as the last element since the last element has only 1 chance to be shuffled. Current code has tendency that last elements stay as last. i.e. Probability of the last element is shuffled is (total_num-1)/total_num. while n th element to be shuffled is obviously has much more chance than last elements. It may be acceptable for most purposes. > > You would think that the right thing to do is to set > elems[n_left] to a random choice from _all_ the elements in > the array, but this leads to some orderings being favored > over others. (Check out Intro. to Algo., Cormen, 2nd ed., > p 103) elemes[n_left] to a random choice does not make sense, of course. To give equal chances to be shuffled for all elements, we can write something like for (current=0; current < num_of_elems; current++) { random_index = rand(); RAND_RANGE(random_index, 0, num_of_elems-1, PHP_RAND_MAX); if (array[current] != array[random_index]) swap(array[current], array[random_index]); } Current code is almost ok, but this way is correct. This does not require any additional overhead at all, also. -- Yasuo Ohgaki -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php