Re: [Jprogramming] Challenge 4 Bountiful Birthdays

Raul Miller Sun, 29 Jan 2012 08:49:56 -0800

Note that you sent this in response to my response on an issue which
was only tangentially related to what I suspect you are asking about.
So I am confused (which seems to be fairly typical for me, recently.)
But you did respond specifically to me, so I feel I should respond to
you.


Additionally, however, when asking for help on a problem (in email
forum contexts, perhaps in other contexts), it's best to state what
you expected, what you did, and what you got which was different from
what you expected.  I see some of what you did, here, but not enough
to know what you expected nor what you got that was different from
what you expected.

I see from your followup message that you have found the answer that
you are looking for, so I will not attempt a deeper investigation at
this time.

But I thought I would attempt to answer the general gist of the
questions you posed, within my understanding at this time.  It's
undoubtedly not the sort of answer you were looking for, but I can
hope this will still be helpful.

Thanks,

-- 
Raul

On Fri, Jan 27, 2012 at 8:23 PM, Linda Alvord <lindaalv...@verizon.net> wrote:
> Please point out what I have done that does not represent the problem 
> correctly. I am trying for a simple solution and would like to know what is 
> incorrect in this solution.
>
>    bmd=:([:I.[: -. [: ~: [: i.~ [: ? #~)"1
>   all=:bmd ,.4$366
>   all
>   |."1 all
>   ALL=:bmd ,.500$366
>   NB. The first element of each row is the first match
>   NB. Note fill as list are of different length
>   NB. ALL is 500 samples of 366 people
>   1{."1 all
>   mean=:+/ % #
>   data=:(mean ,1{."1 bmd ,.500$366)"1
>   ]DATA=:data ,.10#10
>   ]MOM=:mean DATA
>
> Thanks.
>
> Linda
>
> -----Original Message-----
> From: programming-boun...@jsoftware.com 
> [mailto:programming-boun...@jsoftware.com] On Behalf Of Raul Miller
> Sent: Friday, January 27, 2012 3:53 PM
> To: Programming forum
> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays
>
> You must have missed Jose Mario Quintana's post:
>
> http://jsoftware.com/pipermail/programming/2012-January/026885.html
>
> --
> Raul
>
> On Fri, Jan 27, 2012 at 2:53 PM, Devon McCormick <devon...@gmail.com> wrote:
>> One thing not considered so far is that birthdays are not evenly
>> distributed - see http://www.panix.com/~murphy/bday.html - and this affects
>> the probability.
>>
>> On Fri, Jan 27, 2012 at 9:50 AM, Linda Alvord <lindaalv...@verizon.net>wrote:
>>
>>> I'm glad that we have finally covered most of the conditions that could
>>> occur. I'll enjoy pondering your work.  I'm working on a "narrower"
>>> challenge for next time that will not take us so far afield.
>>>
>>> Linda
>>>
>>> -----Original Message-----
>>> From: programming-boun...@jsoftware.com
>>> [mailto:programming-boun...@jsoftware.com] On Behalf Of Jose Mario
>>> Quintana
>>> Sent: Thursday, January 26, 2012 11:08 PM
>>> To: Programming forum
>>> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays
>>>
>>> Regarding the original task, one can proceed (knowing the actual densities)
>>> via a shifted multinomial simulation,
>>>
>>>   multinomial=. +/\ o [ <: o ((0: , [) I. ]) ? o $&0 o ] NB. dyadic verb
>>>
>>>   7 (samples=. ((densities o [) (2+multinomial) ]) ("0)) 10 10 10 10 10 NB.
>>> day of the week samples
>>> 2 3 4 6 5 6 2 6 2 6
>>> 3 4 2 8 2 5 4 5 3 5
>>> 3 2 2 6 2 3 4 5 2 3
>>> 4 5 5 6 2 6 5 7 2 3
>>> 5 2 3 5 5 5 4 5 4 7
>>>
>>>   mean=. +/ % #
>>>
>>>   365 (samplesmeans=. (mean"1 o samples)) 10000000 NB. day of the year 10
>>> million sample mean
>>> 24.6141
>>>
>>>   10 (] , mean) o (365 &samplesmeans) o # 500 NB. the original task
>>> 24.264 23.782 24.334 24.516 25.016 24.704 25.05 24.514 23.93 25.25 24.536
>>>
>>>   (] , mean) o (365 &samplesmeans) o # f. NB. according to the "simple"
>>> rules?
>>> (] , +/ % #)@:(365&((+/ % #)"1@:((+/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.))@:[
>>> (2 + +/\@:[ <:@:((0: , [) I. ]) ?@:$&0@:]) ])"0)))@:#
>>>
>>> Regarding accuracy, among other things, it can be argued that the
>>> distribution could even depend whether the experiment is conducted in the
>>> northern or the southern hemisphere (see
>>> http://www.panix.com/~murphy/bday.html and
>>> http://answers.google.com/answers/threadview/id/280242.html).  Models,
>>> maps,
>>> and other representations are ultimately doomed to be inaccurate; the
>>> subject matter is not only too complex but also evolving; above all, of
>>> course,  my representation of "the world" that is my mind is affected as
>>> well :)
>>>
>>> ________________________________________
>>> From: programming-boun...@jsoftware.com [programming-boun...@jsoftware.com
>>> ]
>>> On Behalf Of Jose Mario Quintana [josemarioquint...@2bestsystems.com]
>>> Sent: Tuesday, January 24, 2012 2:23 PM
>>> To: Programming forum
>>> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays
>>>
>>> >  +/(2+i.1000) * p * q NB. expected value
>>> > 24.6166
>>>
>>> I found the same solution in a slightly different way,
>>>
>>>   ((2 + i.) +/ .* +/\^:_1@:((1 - */\)@:(1 - ] %~ 1 + i.))) 365
>>> 24.6166
>>>
>>> The outline follows:
>>>
>>> It is easier to start dealing with the day of the week birthday process
>>> first,
>>>
>>>   (outcomes=. 2 + i.) 7 NB. all other outcomes have zero densities; thus,
>>> they are irrelevant
>>> 2 3 4 5 6 7 8
>>>
>>>   o=. @:
>>>
>>>   (cp=. 1 - ] %~ 1 + i.) 7 NB. conditional probabilities the process will
>>> not stop at each outcome given that it did not stop at its predecessor
>>> 0.857143 0.714286 0.571429 0.428571 0.285714 0.142857 0
>>>
>>>   cdf=. 1 - */\ o cp NB. cumulative distribution function
>>>
>>>   load'plot'
>>>   plot (0 0 , cdf) 7 NB. ploting the (smoothed) cdf
>>>
>>>   densities=. +/\^:_1 o cdf NB. since cdf -: +/\ densities
>>>
>>>    (mean=. outcomes +/ .* densities) 7 NB. formula for discrete densities
>>> 4.01814
>>>
>>> This generalizes to the day of the year birthday process,
>>>
>>>   plot (0 0 , cdf) 365
>>>   mean 365
>>> 24.6166
>>>
>>>   mean f.
>>> (2 + i.) +/ .* +/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.))
>>>
>>>
>>>
>>> ________________________________________
>>> From: programming-boun...@jsoftware.com [programming-boun...@jsoftware.com
>>> ]
>>> On Behalf Of Mike Day [mike_liz....@tiscali.co.uk]
>>> Sent: Friday, January 20, 2012 7:54 PM
>>> To: Programming forum
>>> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays
>>>
>>> My "trial" function,  listed earlier (and below) was
>>> not quite correct,  as it failed to count the
>>> successful person.
>>>
>>> So it should be:
>>>
>>>    trialb =: ([: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0
>>>
>>> So we get, for example (but it's very slow!  My variant
>>> triala discussed with Linda is somewhat better):
>>>
>>>    (mean, stdev) mean trialb 5000 100 $ _1
>>> 24.6133 0.180788
>>>
>>> Linda thinks the mean should be somewhat lower,  and
>>> Brian thinks it's a lot lower.  However, the standard
>>> deviation suggests it's close to the true value.
>>>
>>> I think this is the way to find the true expected number
>>> of people.  We don't need Markov after all:
>>>
>>>    Probability that (n-1) arrivals all have different
>>> b/days:
>>>
>>>    q =: Prod (1 - i%Y),  0<: i <: n-2,  Y =~ 365
>>>
>>>    Probability that the nth arrival's b/day is one of
>>> those present, ie is one of n-1 distinct bdays:
>>>
>>>    p =: (n-1) % Y
>>>
>>> Expected value of number of arrivals for "success":
>>>
>>>    Sum (2+i) pi * qi, 0 <: i <: n-2
>>>
>>> In J:
>>>    5{. q =: */\(1 - (365 %~(i.))) 1000
>>> 1 0.99726 0.991796 0.983644 0.972864
>>>
>>>    5 {. p =: (365 %~>:@i. )1000
>>> 0.00273973 0.00547945 0.00821918 0.0109589 0.0136986
>>>
>>>    +/(2+i.1000) * p * q   NB. expected value
>>> 24.6166
>>>
>>> This is not the same as the median, where the
>>> probability q moves below 0.5,
>>>
>>>    21 22 { q
>>> 0.524305 0.492703
>>>
>>> As Roger observes, the index origin comes into play;
>>> we should add one as the first person is 1, not zero (!)
>>> and the median group size is therefore just below 23.
>>>
>>> This last is dealing with a slightly different problem:
>>> what is the probability that a certain sized group of
>>> people do (not) share a birthday?  So we shouldn't be
>>> surprised at the difference.
>>>
>>> Mike
>>>
>>> On 18/01/2012 3:17 PM, Mike Day wrote:
>>> > People seem to be tackling two different problems.
>>> >
>>> > Variations on the Birthday Problem as I remember them:
>>> >  (a) what is the probability that two (or more) people
>>> > share a birthday in a group of N people?
>>> >  (b) what should N be for the probability to be (say) 0.5 ?
>>> > The somewhat counter-intuitive answers are dealt with in
>>> > Roger's Wiki Essay,  among many treatments, and also
>>> > Pablo's message, below.  The essential point is to
>>> > consider the probability that there are no matches.
>>> >
>>> > However, Linda's single trial as stated is a random
>>> > process with a stopping condition:
>>> >  take one person at a time until the new person shares a
>>> > birthday with those already present. The result is the
>>> > number of people including the new arrival.
>>> >
>>> > I expect you need a Markov Process approach to get the
>>> > exact expected value for the stopping number. Not proved!
>>> >
>>> > Here's a stab at the required simulation, avoiding @ and @:
>>> > though using [:
>>> >
>>> > NB. I use _1 as seed, so need to decrement the count
>>> >
>>> >    trial =: (_1 + [: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0
>>> >
>>> >    trial 10#_1  NB. eg conduct 10 trials
>>> > 27 19 29 2 24 42 30 9 34 33
>>> >
>>> >    mean =: +/%#    NB. ok for vectors or columns of matrix
>>> >
>>> >    ([:(;~mean) mean) TRIALS =: trial 500 10 $ _1
>>> >
>>>
>>> +-------+-------------------------------------------------------------------
>>> +
>>> >
>>> > |23.5882|22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81
>>> > 23.654|
>>> >
>>>
>>> +-------+-------------------------------------------------------------------
>>> +
>>> >
>>> >
>>> > These means are indeed close to N in problems
>>> > (a) & (b) where the probability is ~0.50,  namely
>>> > 21 for 0.475695 and 22 for 0.507297,  but not the
>>> > same.
>>> >
>>> > I used 365 rather than Pablo's 365.25 .  The simulation
>>> > could be done for 365.25,  using the integer 1461 (say).
>>> > The stopping condition would be a bit more complicated.
>>> >
>>> > The deviation of trials is quite large:
>>> >    SS  =:  [: *: (-"1 mean)  NB. squared deviations from mean
>>> >    stdev=: [: %: [: mean SS  NB. Observed Standard deviation
>>> > NB. not necessarily recommended for real, large sets of data
>>> >
>>> >    (mean,:stdev) TRIALS
>>> >  22.696  23.676  23.894  24.044 23.874   23.56  24.258 23.416   22.81
>>> > 23.654
>>> > 11.9378 12.6587 12.6917 12.5288 12.281 11.9741 12.1957 11.442 12.0969
>>> > 12.8718
>>> >
>>> > NB. standard deviation of the means:
>>> >
>>> >    (mean, stdev) mean TRIALS
>>> > 23.5882 0.477041
>>> >
>>> > Mike
>>> >
>>> >
>>> >
>>> >
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>
>>
>>
>> --
>> Devon McCormick, CFA
>> ^me^ at acm.
>> org is my
>> preferred e-mail
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Challenge 4 Bountiful Birthdays

Reply via email to