My "trial" function,  listed earlier (and below) was
not quite correct,  as it failed to count the
successful person.

So it should be:

    trialb =: ([: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0

So we get, for example (but it's very slow!  My variant
triala discussed with Linda is somewhat better):

    (mean, stdev) mean trialb 5000 100 $ _1
24.6133 0.180788

Linda thinks the mean should be somewhat lower,  and
Brian thinks it's a lot lower.  However, the standard
deviation suggests it's close to the true value.

I think this is the way to find the true expected number
of people.  We don't need Markov after all:

    Probability that (n-1) arrivals all have different
b/days:

    q =: Prod (1 - i%Y),  0<: i <: n-2,  Y =~ 365

    Probability that the nth arrival's b/day is one of
those present, ie is one of n-1 distinct bdays:

    p =: (n-1) % Y

Expected value of number of arrivals for "success":

    Sum (2+i) pi * qi, 0 <: i <: n-2

In J:
    5{. q =: */\(1 - (365 %~(i.))) 1000
1 0.99726 0.991796 0.983644 0.972864

    5 {. p =: (365 %~>:@i. )1000
0.00273973 0.00547945 0.00821918 0.0109589 0.0136986

    +/(2+i.1000) * p * q   NB. expected value
24.6166

This is not the same as the median, where the
probability q moves below 0.5,

    21 22 { q
0.524305 0.492703

As Roger observes, the index origin comes into play;
we should add one as the first person is 1, not zero (!)
and the median group size is therefore just below 23.

This last is dealing with a slightly different problem:
what is the probability that a certain sized group of
people do (not) share a birthday?  So we shouldn't be
surprised at the difference.

Mike

On 18/01/2012 3:17 PM, Mike Day wrote:
> People seem to be tackling two different problems.
>
> Variations on the Birthday Problem as I remember them:
>  (a) what is the probability that two (or more) people
> share a birthday in a group of N people?
>  (b) what should N be for the probability to be (say) 0.5 ?
> The somewhat counter-intuitive answers are dealt with in
> Roger's Wiki Essay,  among many treatments, and also
> Pablo's message, below.  The essential point is to
> consider the probability that there are no matches.
>
> However, Linda's single trial as stated is a random
> process with a stopping condition:
>  take one person at a time until the new person shares a
> birthday with those already present. The result is the
> number of people including the new arrival.
>
> I expect you need a Markov Process approach to get the
> exact expected value for the stopping number. Not proved!
>
> Here's a stab at the required simulation, avoiding @ and @:
> though using [:
>
> NB. I use _1 as seed, so need to decrement the count
>
>    trial =: (_1 + [: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0
>
>    trial 10#_1  NB. eg conduct 10 trials
> 27 19 29 2 24 42 30 9 34 33
>
>    mean =: +/%#    NB. ok for vectors or columns of matrix
>
>    ([:(;~mean) mean) TRIALS =: trial 500 10 $ _1
> +-------+-------------------------------------------------------------------+ 
>
> |23.5882|22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81 
> 23.654|
> +-------+-------------------------------------------------------------------+ 
>
>
> These means are indeed close to N in problems
> (a) & (b) where the probability is ~0.50,  namely
> 21 for 0.475695 and 22 for 0.507297,  but not the
> same.
>
> I used 365 rather than Pablo's 365.25 .  The simulation
> could be done for 365.25,  using the integer 1461 (say).
> The stopping condition would be a bit more complicated.
>
> The deviation of trials is quite large:
>    SS  =:  [: *: (-"1 mean)  NB. squared deviations from mean
>    stdev=: [: %: [: mean SS  NB. Observed Standard deviation
> NB. not necessarily recommended for real, large sets of data
>
>    (mean,:stdev) TRIALS
>  22.696  23.676  23.894  24.044 23.874   23.56  24.258 23.416   22.81  
> 23.654
> 11.9378 12.6587 12.6917 12.5288 12.281 11.9741 12.1957 11.442 12.0969 
> 12.8718
>
> NB. standard deviation of the means:
>
>    (mean, stdev) mean TRIALS
> 23.5882 0.477041
>
> Mike
>
>
>
>

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to