ID:               45184
 User updated by:  kala at sankya dot com
 Reported By:      kala at sankya dot com
 Status:           Open
 Bug Type:         Math related
 Operating System: Linux
 PHP Version:      5.2.6
 New Comment:

Sorry the line 

    int val = (i/divisor)*10;

in the example program should be :

    int val = (i/divisor)*REQUESTED_RANGE;

for the program to work correctly for any other range other than 10.


Previous Comments:
------------------------------------------------------------------------

[2008-06-05 11:30:09] kala at sankya dot com

@pajoye : Yes I'm aware of the limitations of rand() (and LCG
generators in general), and have used MT extensively. The problem here
isn't with the RNG but with the algorithm used in converting the raw 31
bit random number to one within a given range. This algorithm is common
to both the rand() and mt_rand() functions that take min/max arguments
(both uses the RANGE_RANDOM macro defined in the file
"ext/standard/php_rand.h") . The only work around is to use the no
argument variant to get a 31 bit number and and do the scaling yourself.
(But obviously the better solution would be to fix the code itself :) )

------------------------------------------------------------------------

[2008-06-05 10:37:10] [EMAIL PROTECTED]

Have you tried to use mt_rand instead?

Rand is known for having "issues".

------------------------------------------------------------------------

[2008-06-05 10:22:18] kala at sankya dot com

Description:
------------
In the implementation of functions rand(min,max) and mt_rand(min,max),
the algorithm used to reduce the number to the requisite range is (min +
unscaled number*(max-min+1)/(RAND_MAX+1)). This is an inherently flawed
implementation since it will result in some numbers occurring more
frequently than others when the range of the unscaled numbers
(RAND_MAX+1) is not an exact multiple of the requested range
(max-min+1). The author seems to be aware of the standard problem
associated with using a modulus function (as per the comments in the
source code). But this implementation also suffers from exactly the same
problem, the only difference is that numbers that can occur more
frequently are different from the ones (the lower portions of the range)
if one takes a simple modulus. See the C program given below for a
demonstration of the same.

The correct way to implement this is to find a maximum value that is 
an exact multiple of the requested range, and when drawing an unscaled
number discard numbers above this range and draw a new number instead.
This way, the pigeonhole principle that results in this "modulo bias"
will be eliminated. Check out the URL
http://java.sun.com/javase/6/docs/api/java/util/Random.html#nextInt(int)
for a description of this method (as implemented correctly by Java).



Reproduce code:
---------------
/* This is a C program that demonstrates the problem with the PHP
implementation of random number scaling - The size of the unscaled
random function has been reduced to 16 bits so that the program runs in
reasonable time. Assuming that the raw rand() function produces every
number in the range with equal probability, this program counts the
number of occurrences of a scaled call with range of 10 (i.e rand(0,9);
) when each number in the unscaled range occurs exactly once. 
*/

#include <stdio.h>
#include <string.h>

/* Test numbers in the range 0 to 9 - so max-min+1 = 10  */
#define REQUESTED_RANGE 10

/* This should really be 31, but would take too long to run, so for
demonstration purposes use 16 bits */
#define UNSCALED_BITS 16

int counts[REQUESTED_RANGE];
unsigned long long MAX_RANGE = 1L << UNSCALED_BITS; // Same as
RAND_MAX+1

void print_results(char *msg)
{
        int i;
        printf(msg);
        for (i=0; i < REQUESTED_RANGE; i++)
        {
                printf("%d ",counts[i]);
        }
        printf("\n");   
}

int main()
{
        unsigned int i;
        double divisor = MAX_RANGE;
        // Modulo method
        memset(counts,0,sizeof(counts));
        for (i=0; i < MAX_RANGE; i++)
        {
                int val = i % REQUESTED_RANGE;
                counts[val]++;
        }
        print_results("Output using Modulo : ");
        memset(counts,0,sizeof(counts));
        for (i=0; i < MAX_RANGE; i++)
        {
                int val = (i/divisor)*10;
                counts[val]++;
        }
        print_results("Output with scaling : ");
}


Expected result:
----------------
Results of this run :
Output using Modulo : 6554 6554 6554 6554 6554 6554 6553 6553 6553 6553

Output with scaling : 6554 6554 6553 6554 6553 6554 6554 6553 6554 6553


As can be noticed, the PHP implementation (line 2) results in exactly
the same problem as the simple modulus function (line 1) , the only
difference being that the numbers that have higher frequency have
changed. (0,1,3,5,6,8 instead of 0,1,2,3,4,5).




------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=45184&edit=1

Reply via email to