I apologize for the long delay in replying, but I've been very busy
lately.

I believe your example code is not measuring all of the correct
variables. It is only recording how many permutations were created by
the shuffle functions -- your method seems to do this. However, it does
not factor in the requirement that the permutations must be equally
distributed among the buckets. Your code fails to do this.

Here is some PHP to do a better measurement:

function stats($f, $a) {
    $times = 100000;

    print "$f\n";
  
    ksort($a);
    foreach($a as $k => $v) {
        print "$k: $v: " . sprintf('%0.3f', $v / $times) . "%\n";
    }
}

$a = array();
$times = 100000;
for ($i = 0; $i < $times; $i++) {
    $p = range(1,3);
    shuffle($p);
    $s = join('', $p);
    $a[$s]++;
}

stats('shuffle', $a);

$a = array();
$times = 100000;
for ($i = 0; $i < $times; $i++) {
    $p = '123';
    $s = str_shuffle($p);
    $a[$s]++;
}

stats('str_shuffle', $a);

I ran this a few times and here are my results:

shuffle
123: 16799: 0.168%
132: 16654: 0.167%
213: 16715: 0.167%
231: 16412: 0.164%
312: 16783: 0.168%
321: 16637: 0.166%

str_shuffle
123: 51942: 0.519%
132: 7384: 0.074%
213: 9930: 0.099%
231: 4952: 0.050%
312: 14784: 0.148%
321: 11008: 0.110%

As you can see, your algorithm really likes the original permutation --
it keeps it over 50% of the time!

If you cannot come up with a better one, I suggest you remove this
function. If I find some free time, I will try and provide one, but I
don't know when this will be.

-adam

On Wed, 2002-09-25 at 15:43, Andrey Hristov wrote:
>  Consider this code :
> <?php
> $j = 0;
> while ($j++<10) {
>         $i = 0;
>         $s = "abcdef";
>         $a = array();
>         while ($i++<720) {
>                 $a[] = str_shuffle($s);
>         }
>         var_dump(count(array_unique($a)));
>         $i = 0;
>         $ar = array('a','b','c','d','e','f');
>         $c = array();
>         while ($i++<720) {
>                 $b = $ar;
>                 shuffle($b);
>                 $c[] = implode('',$b);
>         }
>         var_dump(count(array_unique($c)));
>         echo "\n";
> }
> ?>
> I am trying to compare. On my machine i got this :
> int(444)
> int(453)
> 
> int(447)
> int(454)
> 
> int(455)
> int(451)
> 
> int(444)
> int(464)
> 
> int(454)
> int(451)
> 
> int(455)
> int(465)
> 
> int(454)
> int(464)
> 
> int(450)
> int(455)
> 
> int(455)
> int(450)
> 
> int(458)
> int(446)
> 
> In 4 of 10 str_shuffle() shuffles better than array_shuffle().
> It looks like that the unique combinations depend on the machine because
> this test is performed on my work machine under vmware.
> I said that the shuffling method is like shuffle()'s in 4.2.1 but like.
> In 4.2.1 it was just php_rand(TSRMLS_C) %2 1:-1
> I tried first with this and on every 3-4 runs I got the same string, so I
> decided
> to modify. As far as I can see from my tests the result is not bad compared
> to
> shuffle().
> 
> Best regards
> Andrey Hristov
> 
> 
> ----- Original Message -----
> From: "Adam Maccabee Trachtenberg" <[EMAIL PROTECTED]>
> To: "Andrey Hristov" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Wednesday, September 25, 2002 10:11 PM
> Subject: Re: [PHP-CVS] cvs: php4 /ext/standard basic_functions.c
> php_string.h string.c
> 
> 
> > I think it would be better to either use a correct algorithm or not to
> > add this function at all. I think it's a reasonable expectation on a
> > user's part to assume that foo_shuffle() actually does a proper
> > shuffle. We got all sorts of complaints about the array shuffling code
> > being broken.
> >
> > -adam
> >
> 



-- 
PHP CVS Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to