From:             kala at sankya dot com
Operating system: Linux
PHP version:      5.2.6
PHP Bug Type:     Math related
Bug description:  algorithm to limit random numbers to a certain range is 
flawed 

Description:
------------
In the implementation of functions rand(min,max) and mt_rand(min,max), the
algorithm used to reduce the number to the requisite range is (min +
unscaled number*(max-min+1)/(RAND_MAX+1)). This is an inherently flawed
implementation since it will result in some numbers occurring more
frequently than others when the range of the unscaled numbers (RAND_MAX+1)
is not an exact multiple of the requested range (max-min+1). The author
seems to be aware of the standard problem associated with using a modulus
function (as per the comments in the source code). But this implementation
also suffers from exactly the same problem, the only difference is that
numbers that can occur more frequently are different from the ones (the
lower portions of the range) if one takes a simple modulus. See the C
program given below for a demonstration of the same.

The correct way to implement this is to find a maximum value that is  an
exact multiple of the requested range, and when drawing an unscaled number
discard numbers above this range and draw a new number instead. This way,
the pigeonhole principle that results in this "modulo bias" will be
eliminated. Check out the URL
http://java.sun.com/javase/6/docs/api/java/util/Random.html#nextInt(int)
for a description of this method (as implemented correctly by Java).



Reproduce code:
---------------
/* This is a C program that demonstrates the problem with the PHP
implementation of random number scaling - The size of the unscaled random
function has been reduced to 16 bits so that the program runs in reasonable
time. Assuming that the raw rand() function produces every number in the
range with equal probability, this program counts the number of occurrences
of a scaled call with range of 10 (i.e rand(0,9); ) when each number in the
unscaled range occurs exactly once. 
*/

#include <stdio.h>
#include <string.h>

/* Test numbers in the range 0 to 9 - so max-min+1 = 10  */
#define REQUESTED_RANGE 10

/* This should really be 31, but would take too long to run, so for
demonstration purposes use 16 bits */
#define UNSCALED_BITS 16

int counts[REQUESTED_RANGE];
unsigned long long MAX_RANGE = 1L << UNSCALED_BITS; // Same as RAND_MAX+1

void print_results(char *msg)
{
        int i;
        printf(msg);
        for (i=0; i < REQUESTED_RANGE; i++)
        {
                printf("%d ",counts[i]);
        }
        printf("\n");   
}

int main()
{
        unsigned int i;
        double divisor = MAX_RANGE;
        // Modulo method
        memset(counts,0,sizeof(counts));
        for (i=0; i < MAX_RANGE; i++)
        {
                int val = i % REQUESTED_RANGE;
                counts[val]++;
        }
        print_results("Output using Modulo : ");
        memset(counts,0,sizeof(counts));
        for (i=0; i < MAX_RANGE; i++)
        {
                int val = (i/divisor)*10;
                counts[val]++;
        }
        print_results("Output with scaling : ");
}


Expected result:
----------------
Results of this run :
Output using Modulo : 6554 6554 6554 6554 6554 6554 6553 6553 6553 6553 
Output with scaling : 6554 6554 6553 6554 6553 6554 6554 6553 6554 6553 

As can be noticed, the PHP implementation (line 2) results in exactly the
same problem as the simple modulus function (line 1) , the only difference
being that the numbers that have higher frequency have changed.
(0,1,3,5,6,8 instead of 0,1,2,3,4,5).



-- 
Edit bug report at http://bugs.php.net/?id=45184&edit=1
-- 
Try a CVS snapshot (PHP 5.2): 
http://bugs.php.net/fix.php?id=45184&r=trysnapshot52
Try a CVS snapshot (PHP 5.3): 
http://bugs.php.net/fix.php?id=45184&r=trysnapshot53
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=45184&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=45184&r=fixedcvs
Fixed in release:             
http://bugs.php.net/fix.php?id=45184&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=45184&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=45184&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=45184&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=45184&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=45184&r=notwrong
Not enough info:              
http://bugs.php.net/fix.php?id=45184&r=notenoughinfo
Submitted twice:              
http://bugs.php.net/fix.php?id=45184&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=45184&r=globals
PHP 4 support discontinued:   http://bugs.php.net/fix.php?id=45184&r=php4
Daylight Savings:             http://bugs.php.net/fix.php?id=45184&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=45184&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=45184&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=45184&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=45184&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=45184&r=mysqlcfg

Reply via email to