Regarding the original task, one can proceed (knowing the actual densities) via
a shifted multinomial simulation,
multinomial=. +/\ o [ <: o ((0: , [) I. ]) ? o $&0 o ] NB. dyadic verb
7 (samples=. ((densities o [) (2+multinomial) ]) ("0)) 10 10 10 10 10 NB.
day of the week samples
2 3 4 6 5 6 2 6 2 6
3 4 2 8 2 5 4 5 3 5
3 2 2 6 2 3 4 5 2 3
4 5 5 6 2 6 5 7 2 3
5 2 3 5 5 5 4 5 4 7
mean=. +/ % #
365 (samplesmeans=. (mean"1 o samples)) 10000000 NB. day of the year 10
million sample mean
24.6141
10 (] , mean) o (365 &samplesmeans) o # 500 NB. the original task
24.264 23.782 24.334 24.516 25.016 24.704 25.05 24.514 23.93 25.25 24.536
(] , mean) o (365 &samplesmeans) o # f. NB. according to the "simple" rules?
(] , +/ % #)@:(365&((+/ % #)"1@:((+/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.))@:[ (2 +
+/\@:[ <:@:((0: , [) I. ]) ?@:$&0@:]) ])"0)))@:#
Regarding accuracy, among other things, it can be argued that the distribution
could even depend whether the experiment is conducted in the northern or the
southern hemisphere (see http://www.panix.com/~murphy/bday.html and
http://answers.google.com/answers/threadview/id/280242.html). Models, maps,
and other representations are ultimately doomed to be inaccurate; the subject
matter is not only too complex but also evolving; above all, of course, my
representation of "the world" that is my mind is affected as well :)
________________________________________
From: [email protected] [[email protected]] On
Behalf Of Jose Mario Quintana [[email protected]]
Sent: Tuesday, January 24, 2012 2:23 PM
To: Programming forum
Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays
> +/(2+i.1000) * p * q NB. expected value
> 24.6166
I found the same solution in a slightly different way,
((2 + i.) +/ .* +/\^:_1@:((1 - */\)@:(1 - ] %~ 1 + i.))) 365
24.6166
The outline follows:
It is easier to start dealing with the day of the week birthday process first,
(outcomes=. 2 + i.) 7 NB. all other outcomes have zero densities; thus, they
are irrelevant
2 3 4 5 6 7 8
o=. @:
(cp=. 1 - ] %~ 1 + i.) 7 NB. conditional probabilities the process will not
stop at each outcome given that it did not stop at its predecessor
0.857143 0.714286 0.571429 0.428571 0.285714 0.142857 0
cdf=. 1 - */\ o cp NB. cumulative distribution function
load'plot'
plot (0 0 , cdf) 7 NB. ploting the (smoothed) cdf
densities=. +/\^:_1 o cdf NB. since cdf -: +/\ densities
(mean=. outcomes +/ .* densities) 7 NB. formula for discrete densities
4.01814
This generalizes to the day of the year birthday process,
plot (0 0 , cdf) 365
mean 365
24.6166
mean f.
(2 + i.) +/ .* +/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.))
________________________________________
From: [email protected] [[email protected]] On
Behalf Of Mike Day [[email protected]]
Sent: Friday, January 20, 2012 7:54 PM
To: Programming forum
Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays
My "trial" function, listed earlier (and below) was
not quite correct, as it failed to count the
successful person.
So it should be:
trialb =: ([: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0
So we get, for example (but it's very slow! My variant
triala discussed with Linda is somewhat better):
(mean, stdev) mean trialb 5000 100 $ _1
24.6133 0.180788
Linda thinks the mean should be somewhat lower, and
Brian thinks it's a lot lower. However, the standard
deviation suggests it's close to the true value.
I think this is the way to find the true expected number
of people. We don't need Markov after all:
Probability that (n-1) arrivals all have different
b/days:
q =: Prod (1 - i%Y), 0<: i <: n-2, Y =~ 365
Probability that the nth arrival's b/day is one of
those present, ie is one of n-1 distinct bdays:
p =: (n-1) % Y
Expected value of number of arrivals for "success":
Sum (2+i) pi * qi, 0 <: i <: n-2
In J:
5{. q =: */\(1 - (365 %~(i.))) 1000
1 0.99726 0.991796 0.983644 0.972864
5 {. p =: (365 %~>:@i. )1000
0.00273973 0.00547945 0.00821918 0.0109589 0.0136986
+/(2+i.1000) * p * q NB. expected value
24.6166
This is not the same as the median, where the
probability q moves below 0.5,
21 22 { q
0.524305 0.492703
As Roger observes, the index origin comes into play;
we should add one as the first person is 1, not zero (!)
and the median group size is therefore just below 23.
This last is dealing with a slightly different problem:
what is the probability that a certain sized group of
people do (not) share a birthday? So we shouldn't be
surprised at the difference.
Mike
On 18/01/2012 3:17 PM, Mike Day wrote:
> People seem to be tackling two different problems.
>
> Variations on the Birthday Problem as I remember them:
> (a) what is the probability that two (or more) people
> share a birthday in a group of N people?
> (b) what should N be for the probability to be (say) 0.5 ?
> The somewhat counter-intuitive answers are dealt with in
> Roger's Wiki Essay, among many treatments, and also
> Pablo's message, below. The essential point is to
> consider the probability that there are no matches.
>
> However, Linda's single trial as stated is a random
> process with a stopping condition:
> take one person at a time until the new person shares a
> birthday with those already present. The result is the
> number of people including the new arrival.
>
> I expect you need a Markov Process approach to get the
> exact expected value for the stopping number. Not proved!
>
> Here's a stab at the required simulation, avoiding @ and @:
> though using [:
>
> NB. I use _1 as seed, so need to decrement the count
>
> trial =: (_1 + [: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0
>
> trial 10#_1 NB. eg conduct 10 trials
> 27 19 29 2 24 42 30 9 34 33
>
> mean =: +/%# NB. ok for vectors or columns of matrix
>
> ([:(;~mean) mean) TRIALS =: trial 500 10 $ _1
> +-------+-------------------------------------------------------------------+
>
> |23.5882|22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81
> 23.654|
> +-------+-------------------------------------------------------------------+
>
>
> These means are indeed close to N in problems
> (a) & (b) where the probability is ~0.50, namely
> 21 for 0.475695 and 22 for 0.507297, but not the
> same.
>
> I used 365 rather than Pablo's 365.25 . The simulation
> could be done for 365.25, using the integer 1461 (say).
> The stopping condition would be a bit more complicated.
>
> The deviation of trials is quite large:
> SS =: [: *: (-"1 mean) NB. squared deviations from mean
> stdev=: [: %: [: mean SS NB. Observed Standard deviation
> NB. not necessarily recommended for real, large sets of data
>
> (mean,:stdev) TRIALS
> 22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81
> 23.654
> 11.9378 12.6587 12.6917 12.5288 12.281 11.9741 12.1957 11.442 12.0969
> 12.8718
>
> NB. standard deviation of the means:
>
> (mean, stdev) mean TRIALS
> 23.5882 0.477041
>
> Mike
>
>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm