Re: [R] Sum of Bernoullis with varying probabilities

2006-10-09 Thread Thomas Lumley
On Fri, 6 Oct 2006, Deepayan Sarkar wrote:

 On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
 Hi Folks,

 Given a series of n independent Bernoulli trials with
 outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want

   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)



 OK, some abbreviation of the above can be achieved with
 the 'apply' function (and I've spelt out the details for
 clarity). But I feel the basis of the approach (i.e. 'combn')
 is crude (also tending to cause problems if n is large);
 and I'm wondering if there is a nicer way.

 I don't see how. Someone has to evaluate the n-choose-r distinct terms
 to be added, and your code seems to be doing that as directly as
 possible.

Not necessarily. It could be quicker to numerically invert the 
characteristic function, which is easy to compute.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-08 Thread Gabor Grothendieck
Or perhaps its clearer (and saves a bit of space) to use apply...prod
here instead of exp...log:

fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5


On 10/7/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 One can get a one-line solution by taking the product of the FFTs.
 For example, let p - 1:4/8 be the probabilities.  Then the solution is:

 fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)), inverse = TRUE)/5

 On 10/7/06, Ted Harding [EMAIL PROTECTED] wrote:
  Hi again.
  I had suspected that doing the calculation by a convolution
  method might be both straightforward and efficient in R.
 
  I've now located convolve() (in 'base'!!), and have a solution
  using this function. Details below. Original statement of
  problem, and method originally proposed, repeated below for
  reference.
 
  On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
   Hi Folks,
  
   Given a series of n independent Bernoulli trials with
   outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
  
 P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
  
   I can certainly find a way to do it:
  
   Let p be the vector c(P1,P2,...,Pn).
   The cases r=0 and r=n are trivial (and also are exceptions
   for the following routine).
  
   For a given value of r in (1:(n-1)),
  
 library(combinat)
 Set - (1:n)
 Subsets - combn(Set,r)
 P - 0
 for(i in (1:dim(Subsets)[2])){
   ix - numeric(n)
   ix[Subsets[,i]] - 1
   P - P + prod((p^ix) * ((1-p)^(1-ix)))
 }
 
  The convolution method implemented in convolve computes, for
  two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
  series (k = 1, 2, ... , n+m-1):
 
   Z[k] = sum( X[k-m+i]*Y[i] )
 
  over valid values of i, i.e. products of terms of X matched
  with terms of a shifted Y, using the open method (see
  ?convolve).
 
  This is not quite what is wanted for the type of convolution
  needed to compute the distribution of the sum of two integer
  random variables, since one needs to reverse one of the series
  being convolved. This is the basis of the following:
 
  Given a vector p of probabilities P1, P2, P3, for Y=1 in
  successive trials, I need to convolve the successive Bernoulli
  distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:
 
   ps-cbind(1-p,p); u-ps[1,]; u1-u
   for(i in (2:16)){
 u-convolve(u,rev(ps[i,]),type=o)
   }
 
  In the case I was looking at, I had already obtained the vector
  P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
  previously described (repeated above); and here n=16.
 
  Having obtained u by the above routine, I can now compare the
  results, u with P:
 
  cbind((0:16),u,P)
r uP
0  1.353353e-01 1.353353e-01
1  3.007903e-01 3.007903e-01
2  2.976007e-01 2.976007e-01
3  1.747074e-01 1.747074e-01
4  6.826971e-02 6.826971e-02
5  1.884969e-02 1.884969e-02
6  3.804371e-03 3.804371e-03
7  5.720398e-04 5.720398e-04
8  6.463945e-05 6.463945e-05
9  5.489454e-06 5.489454e-06
   10  3.473997e-07 3.473997e-07
   11  1.607822e-08 1.607822e-08
   12  5.262533e-10 5.262532e-10
   13  1.148626e-11 1.148618e-11
   14  1.494650e-13 1.493761e-13
   15  9.008700e-16 8.764298e-16
   16 -1.896716e-17 1.313261e-19
 
  so, apart from ignorable differences in the very small
  probabilities for r11, they agree. I have to admit, though,
  that I had to tread carefully (and experiment with Binomials)
  to make sure of exactly how to introduce the reversal
  (at rev(ps[i,])).
 
  And the convolution method is *very* much faster than the
  selection of all subsets method!
 
  However, I wonder whether the subsets method may be the more
  accurate of the two (not that it really matters).
 
  Best wishes to all,
  Ted.
 
  
  E-Mail: (Ted Harding) [EMAIL PROTECTED]
  Fax-to-email: +44 (0)870 094 0861
  Date: 07-Oct-06   Time: 22:03:27
  -- XFMail --
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-08 Thread Ted Harding
On 08-Oct-06 Gabor Grothendieck wrote:
 Or perhaps its clearer (and saves a bit of space) to use apply...prod
 here instead of exp...log:
 
 fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5

Yes, that's neat (in either version). With the example I have,
where length(p)=16, I applied your suggestion above in the form

  v-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))),
   1,prod), inverse = TRUE)/17

(making the number of columns for cbind equal to 17 = 16+1). Now,
comparing this result with the P I got by brute force (subsets
selection method), and removing the imaginary parts from v:

cbind(r=(0:16),v=Re(v),P)
   rvP
   0 1.353353e-01 1.353353e-01
   1 3.007903e-01 3.007903e-01
   2 2.976007e-01 2.976007e-01
   3 1.747074e-01 1.747074e-01
   4 6.826971e-02 6.826971e-02
   5 1.884969e-02 1.884969e-02
   6 3.804371e-03 3.804371e-03
   7 5.720398e-04 5.720398e-04
   8 6.463945e-05 6.463945e-05
   9 5.489454e-06 5.489454e-06
  10 3.473997e-07 3.473997e-07
  11 1.607822e-08 1.607822e-08
  12 5.262532e-10 5.262532e-10
  13 1.148620e-11 1.148618e-11
  14 1.494031e-13 1.493761e-13
  15 8.887779e-16 8.764298e-16
  16 1.434973e-17 1.313261e-19

so this calculation gives a better result than convolve().
The only fly in the ointment (which comes down to how one
sets up the arguments to cbind(), so is quite easily handled
in general application) is the variable number of columns
required according to the length of p.

Your suggestion worked nicely!
Thanks,
ted.

 
 
 On 10/7/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 One can get a one-line solution by taking the product of the FFTs.
 For example, let p - 1:4/8 be the probabilities.  Then the solution
 is:

 fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)), inverse = TRUE)/5

 On 10/7/06, Ted Harding [EMAIL PROTECTED] wrote:
  Hi again.
  I had suspected that doing the calculation by a convolution
  method might be both straightforward and efficient in R.
 
  I've now located convolve() (in 'base'!!), and have a solution
  using this function. Details below. Original statement of
  problem, and method originally proposed, repeated below for
  reference.
 
  On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
   Hi Folks,
  
   Given a series of n independent Bernoulli trials with
   outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
  
 P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
  
   I can certainly find a way to do it:
  
   Let p be the vector c(P1,P2,...,Pn).
   The cases r=0 and r=n are trivial (and also are exceptions
   for the following routine).
  
   For a given value of r in (1:(n-1)),
  
 library(combinat)
 Set - (1:n)
 Subsets - combn(Set,r)
 P - 0
 for(i in (1:dim(Subsets)[2])){
   ix - numeric(n)
   ix[Subsets[,i]] - 1
   P - P + prod((p^ix) * ((1-p)^(1-ix)))
 }
 
  The convolution method implemented in convolve computes, for
  two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
  series (k = 1, 2, ... , n+m-1):
 
   Z[k] = sum( X[k-m+i]*Y[i] )
 
  over valid values of i, i.e. products of terms of X matched
  with terms of a shifted Y, using the open method (see
  ?convolve).
 
  This is not quite what is wanted for the type of convolution
  needed to compute the distribution of the sum of two integer
  random variables, since one needs to reverse one of the series
  being convolved. This is the basis of the following:
 
  Given a vector p of probabilities P1, P2, P3, for Y=1 in
  successive trials, I need to convolve the successive Bernoulli
  distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:
 
   ps-cbind(1-p,p); u-ps[1,]; u1-u
   for(i in (2:16)){
 u-convolve(u,rev(ps[i,]),type=o)
   }
 
  In the case I was looking at, I had already obtained the vector
  P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
  previously described (repeated above); and here n=16.
 
  Having obtained u by the above routine, I can now compare the
  results, u with P:
 
  cbind((0:16),u,P)
r uP
0  1.353353e-01 1.353353e-01
1  3.007903e-01 3.007903e-01
2  2.976007e-01 2.976007e-01
3  1.747074e-01 1.747074e-01
4  6.826971e-02 6.826971e-02
5  1.884969e-02 1.884969e-02
6  3.804371e-03 3.804371e-03
7  5.720398e-04 5.720398e-04
8  6.463945e-05 6.463945e-05
9  5.489454e-06 5.489454e-06
   10  3.473997e-07 3.473997e-07
   11  1.607822e-08 1.607822e-08
   12  5.262533e-10 5.262532e-10
   13  1.148626e-11 1.148618e-11
   14  1.494650e-13 1.493761e-13
   15  9.008700e-16 8.764298e-16
   16 -1.896716e-17 1.313261e-19
 
  so, apart from ignorable differences in the very small
  probabilities for r11, they agree. I have to admit, though,
  that I had to tread carefully (and experiment with Binomials)
  to make sure of exactly how to introduce the reversal
  (at rev(ps[i,])).
 
  And the convolution method is *very* much faster than the
  selection of all subsets method!
 
  

Re: [R] Sum of Bernoullis with varying probabilities

2006-10-08 Thread Gabor Grothendieck
On 10/8/06, Ted Harding [EMAIL PROTECTED] wrote:
 On 08-Oct-06 Gabor Grothendieck wrote:
  Or perhaps its clearer (and saves a bit of space) to use apply...prod
  here instead of exp...log:
 
  fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5

 Yes, that's neat (in either version). With the example I have,
 where length(p)=16, I applied your suggestion above in the form

  v-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))),
   1,prod), inverse = TRUE)/17

 (making the number of columns for cbind equal to 17 = 16+1). Now,
 comparing this result with the P I got by brute force (subsets
 selection method), and removing the imaginary parts from v:

 cbind(r=(0:16),v=Re(v),P)
   rvP
   0 1.353353e-01 1.353353e-01
   1 3.007903e-01 3.007903e-01
   2 2.976007e-01 2.976007e-01
   3 1.747074e-01 1.747074e-01
   4 6.826971e-02 6.826971e-02
   5 1.884969e-02 1.884969e-02
   6 3.804371e-03 3.804371e-03
   7 5.720398e-04 5.720398e-04
   8 6.463945e-05 6.463945e-05
   9 5.489454e-06 5.489454e-06
  10 3.473997e-07 3.473997e-07
  11 1.607822e-08 1.607822e-08
  12 5.262532e-10 5.262532e-10
  13 1.148620e-11 1.148618e-11
  14 1.494031e-13 1.493761e-13
  15 8.887779e-16 8.764298e-16
  16 1.434973e-17 1.313261e-19

 so this calculation gives a better result than convolve().
 The only fly in the ointment (which comes down to how one
 sets up the arguments to cbind(), so is quite easily handled
 in general application) is the variable number of columns
 required according to the length of p.

Try this:

p - seq(4)/8 # test data
Re(fft(apply(mvfft(t(cbind(1-p, p, 0*p%o%p[-1]))), 1, prod), inv =
TRUE)/(length(p)+1))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-08 Thread Gabor Grothendieck
On 10/8/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 On 10/8/06, Ted Harding [EMAIL PROTECTED] wrote:
  On 08-Oct-06 Gabor Grothendieck wrote:
   Or perhaps its clearer (and saves a bit of space) to use apply...prod
   here instead of exp...log:
  
   fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5
 
  Yes, that's neat (in either version). With the example I have,
  where length(p)=16, I applied your suggestion above in the form
 
   v-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))),
1,prod), inverse = TRUE)/17
 
  (making the number of columns for cbind equal to 17 = 16+1). Now,
  comparing this result with the P I got by brute force (subsets
  selection method), and removing the imaginary parts from v:
 
  cbind(r=(0:16),v=Re(v),P)
rvP
0 1.353353e-01 1.353353e-01
1 3.007903e-01 3.007903e-01
2 2.976007e-01 2.976007e-01
3 1.747074e-01 1.747074e-01
4 6.826971e-02 6.826971e-02
5 1.884969e-02 1.884969e-02
6 3.804371e-03 3.804371e-03
7 5.720398e-04 5.720398e-04
8 6.463945e-05 6.463945e-05
9 5.489454e-06 5.489454e-06
   10 3.473997e-07 3.473997e-07
   11 1.607822e-08 1.607822e-08
   12 5.262532e-10 5.262532e-10
   13 1.148620e-11 1.148618e-11
   14 1.494031e-13 1.493761e-13
   15 8.887779e-16 8.764298e-16
   16 1.434973e-17 1.313261e-19
 
  so this calculation gives a better result than convolve().
  The only fly in the ointment (which comes down to how one
  sets up the arguments to cbind(), so is quite easily handled
  in general application) is the variable number of columns
  required according to the length of p.

 Try this:

 p - seq(4)/8 # test data
 Re(fft(apply(mvfft(t(cbind(1-p, p, 0*p%o%p[-1]))), 1, prod), inv =
 TRUE)/(length(p)+1))


And here is one minor improvement (rbind() rather than t(cbind()):

p - 1:4/8
Re(fft(apply(mvfft(rbind(1-p, p, 0 * p[-1] %o% p)), 1, prod), inverse
= TRUE) / (length(p) + 1))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-08 Thread Ted Harding
On 08-Oct-06 Gabor Grothendieck wrote:
 And here is one minor improvement (rbind() rather than t(cbind()):
 
 p - 1:4/8
 Re(fft(apply(mvfft(rbind(1-p, p, 0 * p[-1] %o% p)), 1, prod), inverse
 = TRUE) / (length(p) + 1))

Really nice! Much better than any solution I might have found
for myself. It really is worth posting to the list!

This all still leaves me with one nagging question in the back
of my mind -- Surely this is not the first time someone has
tackled the problem (in R) of the distribution of the sum of
Bernoulli RVs with unequal probabilities? R Site Search on
Bernoulli did not seem to throw anything up.

(Or maybe they just did it, and didn't think it worth saying
anything about--but it's not really trivial).

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 08-Oct-06   Time: 14:36:43
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-07 Thread Ted Harding
Hi again.
I had suspected that doing the calculation by a convolution
method might be both straightforward and efficient in R.

I've now located convolve() (in 'base'!!), and have a solution
using this function. Details below. Original statement of
problem, and method originally proposed, repeated below for
reference.

On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
 Hi Folks,

 Given a series of n independent Bernoulli trials with
 outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want

   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)

 I can certainly find a way to do it:

 Let p be the vector c(P1,P2,...,Pn).
 The cases r=0 and r=n are trivial (and also are exceptions
 for the following routine).

 For a given value of r in (1:(n-1)),

   library(combinat)
   Set - (1:n)
   Subsets - combn(Set,r)
   P - 0
   for(i in (1:dim(Subsets)[2])){
 ix - numeric(n)
 ix[Subsets[,i]] - 1
 P - P + prod((p^ix) * ((1-p)^(1-ix)))
   }

The convolution method implemented in convolve computes, for
two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
series (k = 1, 2, ... , n+m-1):

  Z[k] = sum( X[k-m+i]*Y[i] )

over valid values of i, i.e. products of terms of X matched
with terms of a shifted Y, using the open method (see
?convolve).

This is not quite what is wanted for the type of convolution
needed to compute the distribution of the sum of two integer
random variables, since one needs to reverse one of the series
being convolved. This is the basis of the following:

Given a vector p of probabilities P1, P2, P3, for Y=1 in
successive trials, I need to convolve the successive Bernoulli
distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:

  ps-cbind(1-p,p); u-ps[1,]; u1-u
  for(i in (2:16)){
u-convolve(u,rev(ps[i,]),type=o)
  }

In the case I was looking at, I had already obtained the vector
P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
previously described (repeated above); and here n=16.

Having obtained u by the above routine, I can now compare the
results, u with P:

cbind((0:16),u,P)
   r uP
   0  1.353353e-01 1.353353e-01
   1  3.007903e-01 3.007903e-01
   2  2.976007e-01 2.976007e-01
   3  1.747074e-01 1.747074e-01
   4  6.826971e-02 6.826971e-02
   5  1.884969e-02 1.884969e-02
   6  3.804371e-03 3.804371e-03
   7  5.720398e-04 5.720398e-04
   8  6.463945e-05 6.463945e-05
   9  5.489454e-06 5.489454e-06
  10  3.473997e-07 3.473997e-07
  11  1.607822e-08 1.607822e-08
  12  5.262533e-10 5.262532e-10
  13  1.148626e-11 1.148618e-11
  14  1.494650e-13 1.493761e-13
  15  9.008700e-16 8.764298e-16
  16 -1.896716e-17 1.313261e-19

so, apart from ignorable differences in the very small
probabilities for r11, they agree. I have to admit, though,
that I had to tread carefully (and experiment with Binomials)
to make sure of exactly how to introduce the reversal
(at rev(ps[i,])).

And the convolution method is *very* much faster than the
selection of all subsets method!

However, I wonder whether the subsets method may be the more
accurate of the two (not that it really matters).

Best wishes to all,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Oct-06   Time: 22:03:27
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-07 Thread Gabor Grothendieck
One can get a one-line solution by taking the product of the FFTs.
For example, let p - 1:4/8 be the probabilities.  Then the solution is:

fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)), inverse = TRUE)/5

On 10/7/06, Ted Harding [EMAIL PROTECTED] wrote:
 Hi again.
 I had suspected that doing the calculation by a convolution
 method might be both straightforward and efficient in R.

 I've now located convolve() (in 'base'!!), and have a solution
 using this function. Details below. Original statement of
 problem, and method originally proposed, repeated below for
 reference.

 On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
  Hi Folks,
 
  Given a series of n independent Bernoulli trials with
  outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
 
P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
 
  I can certainly find a way to do it:
 
  Let p be the vector c(P1,P2,...,Pn).
  The cases r=0 and r=n are trivial (and also are exceptions
  for the following routine).
 
  For a given value of r in (1:(n-1)),
 
library(combinat)
Set - (1:n)
Subsets - combn(Set,r)
P - 0
for(i in (1:dim(Subsets)[2])){
  ix - numeric(n)
  ix[Subsets[,i]] - 1
  P - P + prod((p^ix) * ((1-p)^(1-ix)))
}

 The convolution method implemented in convolve computes, for
 two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the
 series (k = 1, 2, ... , n+m-1):

  Z[k] = sum( X[k-m+i]*Y[i] )

 over valid values of i, i.e. products of terms of X matched
 with terms of a shifted Y, using the open method (see
 ?convolve).

 This is not quite what is wanted for the type of convolution
 needed to compute the distribution of the sum of two integer
 random variables, since one needs to reverse one of the series
 being convolved. This is the basis of the following:

 Given a vector p of probabilities P1, P2, P3, for Y=1 in
 successive trials, I need to convolve the successive Bernoulli
 distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence:

  ps-cbind(1-p,p); u-ps[1,]; u1-u
  for(i in (2:16)){
u-convolve(u,rev(ps[i,]),type=o)
  }

 In the case I was looking at, I had already obtained the vector
 P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method
 previously described (repeated above); and here n=16.

 Having obtained u by the above routine, I can now compare the
 results, u with P:

 cbind((0:16),u,P)
   r uP
   0  1.353353e-01 1.353353e-01
   1  3.007903e-01 3.007903e-01
   2  2.976007e-01 2.976007e-01
   3  1.747074e-01 1.747074e-01
   4  6.826971e-02 6.826971e-02
   5  1.884969e-02 1.884969e-02
   6  3.804371e-03 3.804371e-03
   7  5.720398e-04 5.720398e-04
   8  6.463945e-05 6.463945e-05
   9  5.489454e-06 5.489454e-06
  10  3.473997e-07 3.473997e-07
  11  1.607822e-08 1.607822e-08
  12  5.262533e-10 5.262532e-10
  13  1.148626e-11 1.148618e-11
  14  1.494650e-13 1.493761e-13
  15  9.008700e-16 8.764298e-16
  16 -1.896716e-17 1.313261e-19

 so, apart from ignorable differences in the very small
 probabilities for r11, they agree. I have to admit, though,
 that I had to tread carefully (and experiment with Binomials)
 to make sure of exactly how to introduce the reversal
 (at rev(ps[i,])).

 And the convolution method is *very* much faster than the
 selection of all subsets method!

 However, I wonder whether the subsets method may be the more
 accurate of the two (not that it really matters).

 Best wishes to all,
 Ted.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 07-Oct-06   Time: 22:03:27
 -- XFMail --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-06 Thread Deepayan Sarkar
On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
 Hi Folks,

 Given a series of n independent Bernoulli trials with
 outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want

   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)

 I can certainly find a way to do it:

 Let p be the vector c(P1,P2,...,Pn).
 The cases r=0 and r=n are trivial (and also are exceptions
 for the following routine).

 For a given value of r in (1:(n-1)),

   library(combinat)
   Set - (1:n)
   Subsets - combn(Set,r)
   P - 0
   for(i in (1:dim(Subsets)[2])){
 ix - numeric(n)
 ix[Subsets[,i]] - 1
 P - P + prod((p^ix) * ((1-p)^(1-ix)))
   }

Don't you want to multiply this with factorial(r) * factorial(n-r)? I
assume combn() gives all combinations and not all permutations, in
which case you are only counting

prod((p^ix) * ((1-p)^(1-ix)))

once instead of all the different ways in which ix could be permuted
without changing it.

 OK, some abbreviation of the above can be achieved with
 the 'apply' function (and I've spelt out the details for
 clarity). But I feel the basis of the approach (i.e. 'combn')
 is crude (also tending to cause problems if n is large);
 and I'm wondering if there is a nicer way.

I don't see how. Someone has to evaluate the n-choose-r distinct terms
to be added, and your code seems to be doing that as directly as
possible. Just for fun, here's a cute but very inefficient
implementation using recursion (even n=8 is very slow):

u -
function(n, r, p)
{
if (r == 0) return(prod(1-p))
if (r == n) return(prod(p))
ans - 0
for (i in 1:n)
{
ans -
ans +
p[i] * u(n-1, r-1, p[-i]) +
(1 - p[i]) * u(n-1, r, p[-i])
}
ans / n
}

-Deepayan


 And also wondering if someone has implemented it!

 Statutory Declaration: I have been on to R Site Search.

 Best wishes to all,
 Ted.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-06 Thread Ted Harding
Many thanks for your comments, Deepayan; and I liked your
recursive solution! Fun indeed.

Just a comment (below) on one of your comments (the rest
snipped).

On 06-Oct-06 Deepayan Sarkar wrote:
 On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
 Hi Folks,

 Given a series of n independent Bernoulli trials with
 outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want

   P = Prob[sum(Yi) = r]  (r = 0,1,...,n)

 I can certainly find a way to do it:

 Let p be the vector c(P1,P2,...,Pn).
 The cases r=0 and r=n are trivial (and also are exceptions
 for the following routine).

 For a given value of r in (1:(n-1)),

   library(combinat)
   Set - (1:n)
   Subsets - combn(Set,r)
   P - 0
   for(i in (1:dim(Subsets)[2])){
 ix - numeric(n)
 ix[Subsets[,i]] - 1
 P - P + prod((p^ix) * ((1-p)^(1-ix)))
   }
 
 Don't you want to multiply this with factorial(r) * factorial(n-r)? I
 assume combn() gives all combinations and not all permutations, in
 which case you are only counting
 
 prod((p^ix) * ((1-p)^(1-ix)))
 
 once instead of all the different ways in which ix could be permuted
 without changing it.

You had me worried for a moment -- the interplay between perms
and combs is always a head-spinner -- but since I'd verified
already that I get sum(P)=1 when I do this over all values of r,
I had to conclude that your comment must be incorrect.

In fact, if you consider the event r out of the n gave Y=1,
this can be decomposed into disjoint sub-events

  subset {1,2,3,...,r} gave Y=1, the rest 0
  subset {1,2,3,...,(r-1),(r=1)} gave Y=1, the rest 0.
  

and so on over all choose(n,r) subsets, and the probability of
r out of n is the sum of the probabilities of the sub-events.
For any one of these (say the first above), the probability is

  P(Y1=1  Y2=1  ...  Yr=1  Y[r+1]=0  ...  Y[n]=0)

which is simply the product of their probabilities. No further
permutation is involved.

I hope I've got this right (we'll soon find out if not)!

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 06-Oct-06   Time: 22:15:44
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of Bernoullis with varying probabilities

2006-10-06 Thread Deepayan Sarkar
On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
 Many thanks for your comments, Deepayan; and I liked your
 recursive solution! Fun indeed.

 Just a comment (below) on one of your comments (the rest
 snipped).

 On 06-Oct-06 Deepayan Sarkar wrote:
  On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote:
  Hi Folks,
 
  Given a series of n independent Bernoulli trials with
  outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want
 
P = Prob[sum(Yi) = r]  (r = 0,1,...,n)
 
  I can certainly find a way to do it:
 
  Let p be the vector c(P1,P2,...,Pn).
  The cases r=0 and r=n are trivial (and also are exceptions
  for the following routine).
 
  For a given value of r in (1:(n-1)),
 
library(combinat)
Set - (1:n)
Subsets - combn(Set,r)
P - 0
for(i in (1:dim(Subsets)[2])){
  ix - numeric(n)
  ix[Subsets[,i]] - 1
  P - P + prod((p^ix) * ((1-p)^(1-ix)))
}
 
  Don't you want to multiply this with factorial(r) * factorial(n-r)? I
  assume combn() gives all combinations and not all permutations, in
  which case you are only counting
 
  prod((p^ix) * ((1-p)^(1-ix)))
 
  once instead of all the different ways in which ix could be permuted
  without changing it.

 You had me worried for a moment -- the interplay between perms
 and combs is always a head-spinner -- but since I'd verified
 already that I get sum(P)=1 when I do this over all values of r,
 I had to conclude that your comment must be incorrect.

 In fact, if you consider the event r out of the n gave Y=1,
 this can be decomposed into disjoint sub-events

   subset {1,2,3,...,r} gave Y=1, the rest 0
   subset {1,2,3,...,(r-1),(r=1)} gave Y=1, the rest 0.
   

 and so on over all choose(n,r) subsets, and the probability of
 r out of n is the sum of the probabilities of the sub-events.
 For any one of these (say the first above), the probability is

   P(Y1=1  Y2=1  ...  Yr=1  Y[r+1]=0  ...  Y[n]=0)

 which is simply the product of their probabilities. No further
 permutation is involved.

You are right. The key point I was missing is that the number of
distinct _permutations_ of r 1's and (n-r) 0's is choose(n, r) (the
places to put the 1's). Sorry for the red herring.

Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.