Re: [R] Sum of Bernoullis with varying probabilities
On Fri, 6 Oct 2006, Deepayan Sarkar wrote: On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) OK, some abbreviation of the above can be achieved with the 'apply' function (and I've spelt out the details for clarity). But I feel the basis of the approach (i.e. 'combn') is crude (also tending to cause problems if n is large); and I'm wondering if there is a nicer way. I don't see how. Someone has to evaluate the n-choose-r distinct terms to be added, and your code seems to be doing that as directly as possible. Not necessarily. It could be quicker to numerically invert the characteristic function, which is easy to compute. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
Or perhaps its clearer (and saves a bit of space) to use apply...prod here instead of exp...log: fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5 On 10/7/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: One can get a one-line solution by taking the product of the FFTs. For example, let p - 1:4/8 be the probabilities. Then the solution is: fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)), inverse = TRUE)/5 On 10/7/06, Ted Harding [EMAIL PROTECTED] wrote: Hi again. I had suspected that doing the calculation by a convolution method might be both straightforward and efficient in R. I've now located convolve() (in 'base'!!), and have a solution using this function. Details below. Original statement of problem, and method originally proposed, repeated below for reference. On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } The convolution method implemented in convolve computes, for two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the series (k = 1, 2, ... , n+m-1): Z[k] = sum( X[k-m+i]*Y[i] ) over valid values of i, i.e. products of terms of X matched with terms of a shifted Y, using the open method (see ?convolve). This is not quite what is wanted for the type of convolution needed to compute the distribution of the sum of two integer random variables, since one needs to reverse one of the series being convolved. This is the basis of the following: Given a vector p of probabilities P1, P2, P3, for Y=1 in successive trials, I need to convolve the successive Bernoulli distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence: ps-cbind(1-p,p); u-ps[1,]; u1-u for(i in (2:16)){ u-convolve(u,rev(ps[i,]),type=o) } In the case I was looking at, I had already obtained the vector P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method previously described (repeated above); and here n=16. Having obtained u by the above routine, I can now compare the results, u with P: cbind((0:16),u,P) r uP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262533e-10 5.262532e-10 13 1.148626e-11 1.148618e-11 14 1.494650e-13 1.493761e-13 15 9.008700e-16 8.764298e-16 16 -1.896716e-17 1.313261e-19 so, apart from ignorable differences in the very small probabilities for r11, they agree. I have to admit, though, that I had to tread carefully (and experiment with Binomials) to make sure of exactly how to introduce the reversal (at rev(ps[i,])). And the convolution method is *very* much faster than the selection of all subsets method! However, I wonder whether the subsets method may be the more accurate of the two (not that it really matters). Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 07-Oct-06 Time: 22:03:27 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
On 08-Oct-06 Gabor Grothendieck wrote: Or perhaps its clearer (and saves a bit of space) to use apply...prod here instead of exp...log: fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5 Yes, that's neat (in either version). With the example I have, where length(p)=16, I applied your suggestion above in the form v-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))), 1,prod), inverse = TRUE)/17 (making the number of columns for cbind equal to 17 = 16+1). Now, comparing this result with the P I got by brute force (subsets selection method), and removing the imaginary parts from v: cbind(r=(0:16),v=Re(v),P) rvP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262532e-10 5.262532e-10 13 1.148620e-11 1.148618e-11 14 1.494031e-13 1.493761e-13 15 8.887779e-16 8.764298e-16 16 1.434973e-17 1.313261e-19 so this calculation gives a better result than convolve(). The only fly in the ointment (which comes down to how one sets up the arguments to cbind(), so is quite easily handled in general application) is the variable number of columns required according to the length of p. Your suggestion worked nicely! Thanks, ted. On 10/7/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: One can get a one-line solution by taking the product of the FFTs. For example, let p - 1:4/8 be the probabilities. Then the solution is: fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)), inverse = TRUE)/5 On 10/7/06, Ted Harding [EMAIL PROTECTED] wrote: Hi again. I had suspected that doing the calculation by a convolution method might be both straightforward and efficient in R. I've now located convolve() (in 'base'!!), and have a solution using this function. Details below. Original statement of problem, and method originally proposed, repeated below for reference. On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } The convolution method implemented in convolve computes, for two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the series (k = 1, 2, ... , n+m-1): Z[k] = sum( X[k-m+i]*Y[i] ) over valid values of i, i.e. products of terms of X matched with terms of a shifted Y, using the open method (see ?convolve). This is not quite what is wanted for the type of convolution needed to compute the distribution of the sum of two integer random variables, since one needs to reverse one of the series being convolved. This is the basis of the following: Given a vector p of probabilities P1, P2, P3, for Y=1 in successive trials, I need to convolve the successive Bernoulli distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence: ps-cbind(1-p,p); u-ps[1,]; u1-u for(i in (2:16)){ u-convolve(u,rev(ps[i,]),type=o) } In the case I was looking at, I had already obtained the vector P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method previously described (repeated above); and here n=16. Having obtained u by the above routine, I can now compare the results, u with P: cbind((0:16),u,P) r uP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262533e-10 5.262532e-10 13 1.148626e-11 1.148618e-11 14 1.494650e-13 1.493761e-13 15 9.008700e-16 8.764298e-16 16 -1.896716e-17 1.313261e-19 so, apart from ignorable differences in the very small probabilities for r11, they agree. I have to admit, though, that I had to tread carefully (and experiment with Binomials) to make sure of exactly how to introduce the reversal (at rev(ps[i,])). And the convolution method is *very* much faster than the selection of all subsets method!
Re: [R] Sum of Bernoullis with varying probabilities
On 10/8/06, Ted Harding [EMAIL PROTECTED] wrote: On 08-Oct-06 Gabor Grothendieck wrote: Or perhaps its clearer (and saves a bit of space) to use apply...prod here instead of exp...log: fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5 Yes, that's neat (in either version). With the example I have, where length(p)=16, I applied your suggestion above in the form v-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))), 1,prod), inverse = TRUE)/17 (making the number of columns for cbind equal to 17 = 16+1). Now, comparing this result with the P I got by brute force (subsets selection method), and removing the imaginary parts from v: cbind(r=(0:16),v=Re(v),P) rvP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262532e-10 5.262532e-10 13 1.148620e-11 1.148618e-11 14 1.494031e-13 1.493761e-13 15 8.887779e-16 8.764298e-16 16 1.434973e-17 1.313261e-19 so this calculation gives a better result than convolve(). The only fly in the ointment (which comes down to how one sets up the arguments to cbind(), so is quite easily handled in general application) is the variable number of columns required according to the length of p. Try this: p - seq(4)/8 # test data Re(fft(apply(mvfft(t(cbind(1-p, p, 0*p%o%p[-1]))), 1, prod), inv = TRUE)/(length(p)+1)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
On 10/8/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: On 10/8/06, Ted Harding [EMAIL PROTECTED] wrote: On 08-Oct-06 Gabor Grothendieck wrote: Or perhaps its clearer (and saves a bit of space) to use apply...prod here instead of exp...log: fft(apply(mvfft(t(cbind(1-p,p,0,0,0))), 1, prod), inverse = TRUE)/5 Yes, that's neat (in either version). With the example I have, where length(p)=16, I applied your suggestion above in the form v-fft(apply(mvfft(t(cbind(1-p,p,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))), 1,prod), inverse = TRUE)/17 (making the number of columns for cbind equal to 17 = 16+1). Now, comparing this result with the P I got by brute force (subsets selection method), and removing the imaginary parts from v: cbind(r=(0:16),v=Re(v),P) rvP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262532e-10 5.262532e-10 13 1.148620e-11 1.148618e-11 14 1.494031e-13 1.493761e-13 15 8.887779e-16 8.764298e-16 16 1.434973e-17 1.313261e-19 so this calculation gives a better result than convolve(). The only fly in the ointment (which comes down to how one sets up the arguments to cbind(), so is quite easily handled in general application) is the variable number of columns required according to the length of p. Try this: p - seq(4)/8 # test data Re(fft(apply(mvfft(t(cbind(1-p, p, 0*p%o%p[-1]))), 1, prod), inv = TRUE)/(length(p)+1)) And here is one minor improvement (rbind() rather than t(cbind()): p - 1:4/8 Re(fft(apply(mvfft(rbind(1-p, p, 0 * p[-1] %o% p)), 1, prod), inverse = TRUE) / (length(p) + 1)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
On 08-Oct-06 Gabor Grothendieck wrote: And here is one minor improvement (rbind() rather than t(cbind()): p - 1:4/8 Re(fft(apply(mvfft(rbind(1-p, p, 0 * p[-1] %o% p)), 1, prod), inverse = TRUE) / (length(p) + 1)) Really nice! Much better than any solution I might have found for myself. It really is worth posting to the list! This all still leaves me with one nagging question in the back of my mind -- Surely this is not the first time someone has tackled the problem (in R) of the distribution of the sum of Bernoulli RVs with unequal probabilities? R Site Search on Bernoulli did not seem to throw anything up. (Or maybe they just did it, and didn't think it worth saying anything about--but it's not really trivial). Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 08-Oct-06 Time: 14:36:43 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
Hi again. I had suspected that doing the calculation by a convolution method might be both straightforward and efficient in R. I've now located convolve() (in 'base'!!), and have a solution using this function. Details below. Original statement of problem, and method originally proposed, repeated below for reference. On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } The convolution method implemented in convolve computes, for two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the series (k = 1, 2, ... , n+m-1): Z[k] = sum( X[k-m+i]*Y[i] ) over valid values of i, i.e. products of terms of X matched with terms of a shifted Y, using the open method (see ?convolve). This is not quite what is wanted for the type of convolution needed to compute the distribution of the sum of two integer random variables, since one needs to reverse one of the series being convolved. This is the basis of the following: Given a vector p of probabilities P1, P2, P3, for Y=1 in successive trials, I need to convolve the successive Bernoulli distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence: ps-cbind(1-p,p); u-ps[1,]; u1-u for(i in (2:16)){ u-convolve(u,rev(ps[i,]),type=o) } In the case I was looking at, I had already obtained the vector P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method previously described (repeated above); and here n=16. Having obtained u by the above routine, I can now compare the results, u with P: cbind((0:16),u,P) r uP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262533e-10 5.262532e-10 13 1.148626e-11 1.148618e-11 14 1.494650e-13 1.493761e-13 15 9.008700e-16 8.764298e-16 16 -1.896716e-17 1.313261e-19 so, apart from ignorable differences in the very small probabilities for r11, they agree. I have to admit, though, that I had to tread carefully (and experiment with Binomials) to make sure of exactly how to introduce the reversal (at rev(ps[i,])). And the convolution method is *very* much faster than the selection of all subsets method! However, I wonder whether the subsets method may be the more accurate of the two (not that it really matters). Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 07-Oct-06 Time: 22:03:27 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
One can get a one-line solution by taking the product of the FFTs. For example, let p - 1:4/8 be the probabilities. Then the solution is: fft(exp(rowSums(log(mvfft(t(cbind(1-p,p,0,0,0)), inverse = TRUE)/5 On 10/7/06, Ted Harding [EMAIL PROTECTED] wrote: Hi again. I had suspected that doing the calculation by a convolution method might be both straightforward and efficient in R. I've now located convolve() (in 'base'!!), and have a solution using this function. Details below. Original statement of problem, and method originally proposed, repeated below for reference. On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } The convolution method implemented in convolve computes, for two series X[1],X[2],...,X[n] and Y[1],Y[2],...,Y[m] the series (k = 1, 2, ... , n+m-1): Z[k] = sum( X[k-m+i]*Y[i] ) over valid values of i, i.e. products of terms of X matched with terms of a shifted Y, using the open method (see ?convolve). This is not quite what is wanted for the type of convolution needed to compute the distribution of the sum of two integer random variables, since one needs to reverse one of the series being convolved. This is the basis of the following: Given a vector p of probabilities P1, P2, P3, for Y=1 in successive trials, I need to convolve the successive Bernoulli distributions (1-P1,P1), (1-P2,P2), ... , (1-Pn,Pn). Hence: ps-cbind(1-p,p); u-ps[1,]; u1-u for(i in (2:16)){ u-convolve(u,rev(ps[i,]),type=o) } In the case I was looking at, I had already obtained the vector P of probabilities P(Sum(Y)=0), P(sum(Y)=1), ... by the method previously described (repeated above); and here n=16. Having obtained u by the above routine, I can now compare the results, u with P: cbind((0:16),u,P) r uP 0 1.353353e-01 1.353353e-01 1 3.007903e-01 3.007903e-01 2 2.976007e-01 2.976007e-01 3 1.747074e-01 1.747074e-01 4 6.826971e-02 6.826971e-02 5 1.884969e-02 1.884969e-02 6 3.804371e-03 3.804371e-03 7 5.720398e-04 5.720398e-04 8 6.463945e-05 6.463945e-05 9 5.489454e-06 5.489454e-06 10 3.473997e-07 3.473997e-07 11 1.607822e-08 1.607822e-08 12 5.262533e-10 5.262532e-10 13 1.148626e-11 1.148618e-11 14 1.494650e-13 1.493761e-13 15 9.008700e-16 8.764298e-16 16 -1.896716e-17 1.313261e-19 so, apart from ignorable differences in the very small probabilities for r11, they agree. I have to admit, though, that I had to tread carefully (and experiment with Binomials) to make sure of exactly how to introduce the reversal (at rev(ps[i,])). And the convolution method is *very* much faster than the selection of all subsets method! However, I wonder whether the subsets method may be the more accurate of the two (not that it really matters). Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 07-Oct-06 Time: 22:03:27 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } Don't you want to multiply this with factorial(r) * factorial(n-r)? I assume combn() gives all combinations and not all permutations, in which case you are only counting prod((p^ix) * ((1-p)^(1-ix))) once instead of all the different ways in which ix could be permuted without changing it. OK, some abbreviation of the above can be achieved with the 'apply' function (and I've spelt out the details for clarity). But I feel the basis of the approach (i.e. 'combn') is crude (also tending to cause problems if n is large); and I'm wondering if there is a nicer way. I don't see how. Someone has to evaluate the n-choose-r distinct terms to be added, and your code seems to be doing that as directly as possible. Just for fun, here's a cute but very inefficient implementation using recursion (even n=8 is very slow): u - function(n, r, p) { if (r == 0) return(prod(1-p)) if (r == n) return(prod(p)) ans - 0 for (i in 1:n) { ans - ans + p[i] * u(n-1, r-1, p[-i]) + (1 - p[i]) * u(n-1, r, p[-i]) } ans / n } -Deepayan And also wondering if someone has implemented it! Statutory Declaration: I have been on to R Site Search. Best wishes to all, Ted. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
Many thanks for your comments, Deepayan; and I liked your recursive solution! Fun indeed. Just a comment (below) on one of your comments (the rest snipped). On 06-Oct-06 Deepayan Sarkar wrote: On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } Don't you want to multiply this with factorial(r) * factorial(n-r)? I assume combn() gives all combinations and not all permutations, in which case you are only counting prod((p^ix) * ((1-p)^(1-ix))) once instead of all the different ways in which ix could be permuted without changing it. You had me worried for a moment -- the interplay between perms and combs is always a head-spinner -- but since I'd verified already that I get sum(P)=1 when I do this over all values of r, I had to conclude that your comment must be incorrect. In fact, if you consider the event r out of the n gave Y=1, this can be decomposed into disjoint sub-events subset {1,2,3,...,r} gave Y=1, the rest 0 subset {1,2,3,...,(r-1),(r=1)} gave Y=1, the rest 0. and so on over all choose(n,r) subsets, and the probability of r out of n is the sum of the probabilities of the sub-events. For any one of these (say the first above), the probability is P(Y1=1 Y2=1 ... Yr=1 Y[r+1]=0 ... Y[n]=0) which is simply the product of their probabilities. No further permutation is involved. I hope I've got this right (we'll soon find out if not)! Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 06-Oct-06 Time: 22:15:44 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum of Bernoullis with varying probabilities
On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Many thanks for your comments, Deepayan; and I liked your recursive solution! Fun indeed. Just a comment (below) on one of your comments (the rest snipped). On 06-Oct-06 Deepayan Sarkar wrote: On 10/6/06, Ted Harding [EMAIL PROTECTED] wrote: Hi Folks, Given a series of n independent Bernoulli trials with outcomes Yi (i=1...n) and Prob[Yi = 1] = Pi, I want P = Prob[sum(Yi) = r] (r = 0,1,...,n) I can certainly find a way to do it: Let p be the vector c(P1,P2,...,Pn). The cases r=0 and r=n are trivial (and also are exceptions for the following routine). For a given value of r in (1:(n-1)), library(combinat) Set - (1:n) Subsets - combn(Set,r) P - 0 for(i in (1:dim(Subsets)[2])){ ix - numeric(n) ix[Subsets[,i]] - 1 P - P + prod((p^ix) * ((1-p)^(1-ix))) } Don't you want to multiply this with factorial(r) * factorial(n-r)? I assume combn() gives all combinations and not all permutations, in which case you are only counting prod((p^ix) * ((1-p)^(1-ix))) once instead of all the different ways in which ix could be permuted without changing it. You had me worried for a moment -- the interplay between perms and combs is always a head-spinner -- but since I'd verified already that I get sum(P)=1 when I do this over all values of r, I had to conclude that your comment must be incorrect. In fact, if you consider the event r out of the n gave Y=1, this can be decomposed into disjoint sub-events subset {1,2,3,...,r} gave Y=1, the rest 0 subset {1,2,3,...,(r-1),(r=1)} gave Y=1, the rest 0. and so on over all choose(n,r) subsets, and the probability of r out of n is the sum of the probabilities of the sub-events. For any one of these (say the first above), the probability is P(Y1=1 Y2=1 ... Yr=1 Y[r+1]=0 ... Y[n]=0) which is simply the product of their probabilities. No further permutation is involved. You are right. The key point I was missing is that the number of distinct _permutations_ of r 1's and (n-r) 0's is choose(n, r) (the places to put the 1's). Sorry for the red herring. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.