> +/(2+i.1000) * p * q NB. expected value > 24.6166 I found the same solution in a slightly different way,
((2 + i.) +/ .* +/\^:_1@:((1 - */\)@:(1 - ] %~ 1 + i.))) 365 24.6166 The outline follows: It is easier to start dealing with the day of the week birthday process first, (outcomes=. 2 + i.) 7 NB. all other outcomes have zero densities; thus, they are irrelevant 2 3 4 5 6 7 8 o=. @: (cp=. 1 - ] %~ 1 + i.) 7 NB. conditional probabilities the process will not stop at each outcome given that it did not stop at its predecessor 0.857143 0.714286 0.571429 0.428571 0.285714 0.142857 0 cdf=. 1 - */\ o cp NB. cumulative distribution function load'plot' plot (0 0 , cdf) 7 NB. ploting the (smoothed) cdf densities=. +/\^:_1 o cdf NB. since cdf -: +/\ densities (mean=. outcomes +/ .* densities) 7 NB. formula for discrete densities 4.01814 This generalizes to the day of the year birthday process, plot (0 0 , cdf) 365 mean 365 24.6166 mean f. (2 + i.) +/ .* +/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.)) ________________________________________ From: programming-boun...@jsoftware.com [programming-boun...@jsoftware.com] On Behalf Of Mike Day [mike_liz....@tiscali.co.uk] Sent: Friday, January 20, 2012 7:54 PM To: Programming forum Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays My "trial" function, listed earlier (and below) was not quite correct, as it failed to count the successful person. So it should be: trialb =: ([: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0 So we get, for example (but it's very slow! My variant triala discussed with Linda is somewhat better): (mean, stdev) mean trialb 5000 100 $ _1 24.6133 0.180788 Linda thinks the mean should be somewhat lower, and Brian thinks it's a lot lower. However, the standard deviation suggests it's close to the true value. I think this is the way to find the true expected number of people. We don't need Markov after all: Probability that (n-1) arrivals all have different b/days: q =: Prod (1 - i%Y), 0<: i <: n-2, Y =~ 365 Probability that the nth arrival's b/day is one of those present, ie is one of n-1 distinct bdays: p =: (n-1) % Y Expected value of number of arrivals for "success": Sum (2+i) pi * qi, 0 <: i <: n-2 In J: 5{. q =: */\(1 - (365 %~(i.))) 1000 1 0.99726 0.991796 0.983644 0.972864 5 {. p =: (365 %~>:@i. )1000 0.00273973 0.00547945 0.00821918 0.0109589 0.0136986 +/(2+i.1000) * p * q NB. expected value 24.6166 This is not the same as the median, where the probability q moves below 0.5, 21 22 { q 0.524305 0.492703 As Roger observes, the index origin comes into play; we should add one as the first person is 1, not zero (!) and the median group size is therefore just below 23. This last is dealing with a slightly different problem: what is the probability that a certain sized group of people do (not) share a birthday? So we shouldn't be surprised at the difference. Mike On 18/01/2012 3:17 PM, Mike Day wrote: > People seem to be tackling two different problems. > > Variations on the Birthday Problem as I remember them: > (a) what is the probability that two (or more) people > share a birthday in a group of N people? > (b) what should N be for the probability to be (say) 0.5 ? > The somewhat counter-intuitive answers are dealt with in > Roger's Wiki Essay, among many treatments, and also > Pablo's message, below. The essential point is to > consider the probability that there are no matches. > > However, Linda's single trial as stated is a random > process with a stopping condition: > take one person at a time until the new person shares a > birthday with those already present. The result is the > number of people including the new arrival. > > I expect you need a Markov Process approach to get the > exact expected value for the stopping number. Not proved! > > Here's a stab at the required simulation, avoiding @ and @: > though using [: > > NB. I use _1 as seed, so need to decrement the count > > trial =: (_1 + [: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0 > > trial 10#_1 NB. eg conduct 10 trials > 27 19 29 2 24 42 30 9 34 33 > > mean =: +/%# NB. ok for vectors or columns of matrix > > ([:(;~mean) mean) TRIALS =: trial 500 10 $ _1 > +-------+-------------------------------------------------------------------+ > > |23.5882|22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81 > 23.654| > +-------+-------------------------------------------------------------------+ > > > These means are indeed close to N in problems > (a) & (b) where the probability is ~0.50, namely > 21 for 0.475695 and 22 for 0.507297, but not the > same. > > I used 365 rather than Pablo's 365.25 . The simulation > could be done for 365.25, using the integer 1461 (say). > The stopping condition would be a bit more complicated. > > The deviation of trials is quite large: > SS =: [: *: (-"1 mean) NB. squared deviations from mean > stdev=: [: %: [: mean SS NB. Observed Standard deviation > NB. not necessarily recommended for real, large sets of data > > (mean,:stdev) TRIALS > 22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81 > 23.654 > 11.9378 12.6587 12.6917 12.5288 12.281 11.9741 12.1957 11.442 12.0969 > 12.8718 > > NB. standard deviation of the means: > > (mean, stdev) mean TRIALS > 23.5882 0.477041 > > Mike > > > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm