Re: [R] CDF of Sample Quantile
On Mar 7, 2011, at 19:12 , Bentley Coffey wrote: > Just to tie up this thread, I wanted to report my solution: > > When (n-1)p is an integer, there is a closed form solution: > pbinom(j-1,n,...) > > When it is not an integer, its fairly easy to approximate the solution by > interpolating between the closed-form solutions: fitting log(1 - probability > from closed form solution) on an orthogonal polynomial in n. This is a > _very_ fast and fairly accurate solution. > > Thanks to all who offered their help... If you have too much time on your hand, Wikipedia has the joint density of two order statistics, from which you could probably proceed to find the marginal density of a linear combination of two neighboring order stats. Just take a large piece of paper and a couple of spare days Numerical integration might do the job, with some care. -p > > On Thu, Feb 17, 2011 at 11:11 PM, Bentley Coffey > wrote: > >> >> Duncan, >> >> I'm not sure how I missed your message. Sorry. What you describe is what I >> do when (n-1)p is an integer so that R computes the sample quantile using a >> single order statistic. My later posting has that exact binomial expression >> in there as a special case. When (n-1)p is not an integer then R uses a >> weighted average of 2 order statistics, in which case I'm left with my >> standing problem... >> >> >> On Mon, Feb 14, 2011 at 2:26 PM, Duncan Murdoch >> wrote: >> >>> On 14/02/2011 9:58 AM, Bentley Coffey wrote: >>> I need to calculate the probability that a sample quantile will exceed a threshold given the size of the iid sample and the parameters describing the distribution of each observation (normal, in my case). I can compute the probability with brute force simulation: simulate a size N sample, apply R's quantile() function on it, compare it to the threshold, replicate this MANY times, and count the number of times the sample quantile exceeded the threshold (dividing by the total number of replications yields the probability of interest). The problem is that the number of replications required to get sufficient precision (3 digits say) is so HUGE that this takes FOREVER. I have to perform this task so much in my script (searching over the sample size and repeated for several different distribution parameters) that it takes too many hours to run. I've searched for pre-existing code to do this in R and haven't found anything. Perhaps I'm missing something. Is anyone aware of an R function to compute this probability? I've tried writing my own code using the fact that R's quantile() function is a linear combination of 2 order statistics. Basically, I wrote down the mathematical form of the joint pdf for the 2 order statistics (a function of the sample size and the distribution parameters) then performed a pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's random draws) over the region where the sample quantile exceeds the threshold. In theory, this should work and it takes about 1000 times fewer clock cycles to compute than the Brute Force approach. My problem is that there is a significant discrepancy between the results using Brute Force and using this more efficient approach that I have coded up. I believe that the problem is numerical error but it could be some programming bug; regardless, I have been unable to locate the source of this problem and have spent over 20 hours trying to identify it this weekend. Please, somebody help!!! So, again, my question: is there code in R for quickly evaluating the CDF of a Sample Quantile given the sample size and the parameters governing the distribution of each iid point in the sample? >>> >>> I think the answer to your question is no, but I think it's the wrong >>> question. Suppose Xm:n is the mth sample quantile in a sample of size n, >>> and you want to calculate P(Xm:n > x). Let X be a draw from the underlying >>> distribution, and suppose P(X > x) = p. Then the event Xm:n > x >>> is the same as the event that out of n independent draws of X, at least >>> n-m+1 are bigger than x: a binomial probability. And R can calculate >>> binomial probabilities using pbinom(). >>> >>> Duncan Murdoch >>> >>> >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __
[R] CDF of Sample Quantile
Just to tie up this thread, I wanted to report my solution: When (n-1)p is an integer, there is a closed form solution: pbinom(j-1,n,...) When it is not an integer, its fairly easy to approximate the solution by interpolating between the closed-form solutions: fitting log(1 - probability from closed form solution) on an orthogonal polynomial in n. This is a _very_ fast and fairly accurate solution. Thanks to all who offered their help... On Thu, Feb 17, 2011 at 11:11 PM, Bentley Coffey wrote: > > Duncan, > > I'm not sure how I missed your message. Sorry. What you describe is what I > do when (n-1)p is an integer so that R computes the sample quantile using a > single order statistic. My later posting has that exact binomial expression > in there as a special case. When (n-1)p is not an integer then R uses a > weighted average of 2 order statistics, in which case I'm left with my > standing problem... > > > On Mon, Feb 14, 2011 at 2:26 PM, Duncan Murdoch > wrote: > >> On 14/02/2011 9:58 AM, Bentley Coffey wrote: >> >>> I need to calculate the probability that a sample quantile will exceed a >>> threshold given the size of the iid sample and the parameters describing >>> the >>> distribution of each observation (normal, in my case). I can compute the >>> probability with brute force simulation: simulate a size N sample, apply >>> R's >>> quantile() function on it, compare it to the threshold, replicate this >>> MANY >>> times, and count the number of times the sample quantile exceeded the >>> threshold (dividing by the total number of replications yields the >>> probability of interest). The problem is that the number of replications >>> required to get sufficient precision (3 digits say) is so HUGE that this >>> takes FOREVER. I have to perform this task so much in my script >>> (searching >>> over the sample size and repeated for several different distribution >>> parameters) that it takes too many hours to run. >>> >>> I've searched for pre-existing code to do this in R and haven't found >>> anything. Perhaps I'm missing something. Is anyone aware of an R function >>> to >>> compute this probability? >>> >>> I've tried writing my own code using the fact that R's quantile() >>> function >>> is a linear combination of 2 order statistics. Basically, I wrote down >>> the >>> mathematical form of the joint pdf for the 2 order statistics (a function >>> of >>> the sample size and the distribution parameters) then performed a >>> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's >>> random draws) over the region where the sample quantile exceeds the >>> threshold. In theory, this should work and it takes about 1000 times >>> fewer >>> clock cycles to compute than the Brute Force approach. My problem is that >>> there is a significant discrepancy between the results using Brute Force >>> and >>> using this more efficient approach that I have coded up. I believe that >>> the >>> problem is numerical error but it could be some programming bug; >>> regardless, >>> I have been unable to locate the source of this problem and have spent >>> over >>> 20 hours trying to identify it this weekend. Please, somebody help!!! >>> >>> So, again, my question: is there code in R for quickly evaluating the CDF >>> of >>> a Sample Quantile given the sample size and the parameters governing the >>> distribution of each iid point in the sample? >>> >> >> I think the answer to your question is no, but I think it's the wrong >> question. Suppose Xm:n is the mth sample quantile in a sample of size n, >> and you want to calculate P(Xm:n > x). Let X be a draw from the underlying >> distribution, and suppose P(X > x) = p. Then the event Xm:n > x >> is the same as the event that out of n independent draws of X, at least >> n-m+1 are bigger than x: a binomial probability. And R can calculate >> binomial probabilities using pbinom(). >> >> Duncan Murdoch >> >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CDF of Sample Quantile
Duncan, I'm not sure how I missed your message. Sorry. What you describe is what I do when (n-1)p is an integer so that R computes the sample quantile using a single order statistic. My later posting has that exact binomial expression in there as a special case. When (n-1)p is not an integer then R uses a weighted average of 2 order statistics, in which case I'm left with my standing problem... On Mon, Feb 14, 2011 at 2:26 PM, Duncan Murdoch wrote: > On 14/02/2011 9:58 AM, Bentley Coffey wrote: > >> I need to calculate the probability that a sample quantile will exceed a >> threshold given the size of the iid sample and the parameters describing >> the >> distribution of each observation (normal, in my case). I can compute the >> probability with brute force simulation: simulate a size N sample, apply >> R's >> quantile() function on it, compare it to the threshold, replicate this >> MANY >> times, and count the number of times the sample quantile exceeded the >> threshold (dividing by the total number of replications yields the >> probability of interest). The problem is that the number of replications >> required to get sufficient precision (3 digits say) is so HUGE that this >> takes FOREVER. I have to perform this task so much in my script (searching >> over the sample size and repeated for several different distribution >> parameters) that it takes too many hours to run. >> >> I've searched for pre-existing code to do this in R and haven't found >> anything. Perhaps I'm missing something. Is anyone aware of an R function >> to >> compute this probability? >> >> I've tried writing my own code using the fact that R's quantile() function >> is a linear combination of 2 order statistics. Basically, I wrote down the >> mathematical form of the joint pdf for the 2 order statistics (a function >> of >> the sample size and the distribution parameters) then performed a >> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's >> random draws) over the region where the sample quantile exceeds the >> threshold. In theory, this should work and it takes about 1000 times fewer >> clock cycles to compute than the Brute Force approach. My problem is that >> there is a significant discrepancy between the results using Brute Force >> and >> using this more efficient approach that I have coded up. I believe that >> the >> problem is numerical error but it could be some programming bug; >> regardless, >> I have been unable to locate the source of this problem and have spent >> over >> 20 hours trying to identify it this weekend. Please, somebody help!!! >> >> So, again, my question: is there code in R for quickly evaluating the CDF >> of >> a Sample Quantile given the sample size and the parameters governing the >> distribution of each iid point in the sample? >> > > I think the answer to your question is no, but I think it's the wrong > question. Suppose Xm:n is the mth sample quantile in a sample of size n, > and you want to calculate P(Xm:n > x). Let X be a draw from the underlying > distribution, and suppose P(X > x) = p. Then the event Xm:n > x > is the same as the event that out of n independent draws of X, at least > n-m+1 are bigger than x: a binomial probability. And R can calculate > binomial probabilities using pbinom(). > > Duncan Murdoch > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CDF of Sample Quantile
Yeah, I think that you don't understand me. You suggest: 1 - pnorm(Threshold,mean,sd) = Probability that rnorm(1,mean,sd) > Threshold I want to know: Probability that quantile(rnorm(n,mean,sd),prob) > Threshold I use rnorm() to simulate a sample of size n and then I compute the statistic from that sample using quantile(). Like all statistics, that quantile stat (which is a weighted average of 2 order statistics) is a function of the realized data and hence has a sampling distribution. I want to compute the cdf of that sampling distribution. Even own the David and Nagaraja _Order Statistics_ text in my library does not have a closed-form cdf for that statistic... On Mon, Feb 14, 2011 at 2:20 PM, Jonathan P Daily wrote: > If I understand this, you have a value x, or a vector of values x, and you > want to know the CDF that this value is drawn from a normal distribution? > > I assume you are drawing from rnorm for your simulations, so look at the > other functions listed when you ?rnorm. > > HTH > -- > Jonathan P. Daily > Technician - USGS Leetown Science Center > 11649 Leetown Road > Kearneysville WV, 25430 > (304) 724-4480 > "Is the room still a room when its empty? Does the room, > the thing itself have purpose? Or do we, what's the word... imbue it." > - Jubal Early, Firefly > > r-help-boun...@r-project.org wrote on 02/14/2011 09:58:09 AM: > > > [image removed] > > > > [R] CDF of Sample Quantile > > > > Bentley Coffey > > > > to: > > > > r-help > > > > 02/14/2011 01:58 PM > > > > Sent by: > > > > r-help-boun...@r-project.org > > > > I need to calculate the probability that a sample quantile will exceed a > > threshold given the size of the iid sample and the parameters describing > the > > distribution of each observation (normal, in my case). I can compute the > > probability with brute force simulation: simulate a size N sample, apply > R's > > quantile() function on it, compare it to the threshold, replicate this > MANY > > times, and count the number of times the sample quantile exceeded the > > threshold (dividing by the total number of replications yields the > > probability of interest). The problem is that the number of replications > > required to get sufficient precision (3 digits say) is so HUGE that this > > takes FOREVER. I have to perform this task so much in my script > (searching > > over the sample size and repeated for several different distribution > > parameters) that it takes too many hours to run. > > > > I've searched for pre-existing code to do this in R and haven't found > > anything. Perhaps I'm missing something. Is anyone aware of an R > function to > > compute this probability? > > > > I've tried writing my own code using the fact that R's quantile() > function > > is a linear combination of 2 order statistics. Basically, I wrote down > the > > mathematical form of the joint pdf for the 2 order statistics (a > function of > > the sample size and the distribution parameters) then performed a > > pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's > > random draws) over the region where the sample quantile exceeds the > > threshold. In theory, this should work and it takes about 1000 times > fewer > > clock cycles to compute than the Brute Force approach. My problem is > that > > there is a significant discrepancy between the results using Brute Force > and > > using this more efficient approach that I have coded up. I believe that > the > > problem is numerical error but it could be some programming bug; > regardless, > > I have been unable to locate the source of this problem and have spent > over > > 20 hours trying to identify it this weekend. Please, somebody help!!! > > > > So, again, my question: is there code in R for quickly evaluating the > CDF of > > a Sample Quantile given the sample size and the parameters governing the > > distribution of each iid point in the sample? > > > > Grateful for any help, > > > > Bentley > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CDF of Sample Quantile
On 14/02/2011 9:58 AM, Bentley Coffey wrote: I need to calculate the probability that a sample quantile will exceed a threshold given the size of the iid sample and the parameters describing the distribution of each observation (normal, in my case). I can compute the probability with brute force simulation: simulate a size N sample, apply R's quantile() function on it, compare it to the threshold, replicate this MANY times, and count the number of times the sample quantile exceeded the threshold (dividing by the total number of replications yields the probability of interest). The problem is that the number of replications required to get sufficient precision (3 digits say) is so HUGE that this takes FOREVER. I have to perform this task so much in my script (searching over the sample size and repeated for several different distribution parameters) that it takes too many hours to run. I've searched for pre-existing code to do this in R and haven't found anything. Perhaps I'm missing something. Is anyone aware of an R function to compute this probability? I've tried writing my own code using the fact that R's quantile() function is a linear combination of 2 order statistics. Basically, I wrote down the mathematical form of the joint pdf for the 2 order statistics (a function of the sample size and the distribution parameters) then performed a pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's random draws) over the region where the sample quantile exceeds the threshold. In theory, this should work and it takes about 1000 times fewer clock cycles to compute than the Brute Force approach. My problem is that there is a significant discrepancy between the results using Brute Force and using this more efficient approach that I have coded up. I believe that the problem is numerical error but it could be some programming bug; regardless, I have been unable to locate the source of this problem and have spent over 20 hours trying to identify it this weekend. Please, somebody help!!! So, again, my question: is there code in R for quickly evaluating the CDF of a Sample Quantile given the sample size and the parameters governing the distribution of each iid point in the sample? I think the answer to your question is no, but I think it's the wrong question. Suppose Xm:n is the mth sample quantile in a sample of size n, and you want to calculate P(Xm:n > x). Let X be a draw from the underlying distribution, and suppose P(X > x) = p. Then the event Xm:n > x is the same as the event that out of n independent draws of X, at least n-m+1 are bigger than x: a binomial probability. And R can calculate binomial probabilities using pbinom(). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CDF of Sample Quantile
If I understand this, you have a value x, or a vector of values x, and you want to know the CDF that this value is drawn from a normal distribution? I assume you are drawing from rnorm for your simulations, so look at the other functions listed when you ?rnorm. HTH -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly r-help-boun...@r-project.org wrote on 02/14/2011 09:58:09 AM: > [image removed] > > [R] CDF of Sample Quantile > > Bentley Coffey > > to: > > r-help > > 02/14/2011 01:58 PM > > Sent by: > > r-help-boun...@r-project.org > > I need to calculate the probability that a sample quantile will exceed a > threshold given the size of the iid sample and the parameters describing the > distribution of each observation (normal, in my case). I can compute the > probability with brute force simulation: simulate a size N sample, apply R's > quantile() function on it, compare it to the threshold, replicate this MANY > times, and count the number of times the sample quantile exceeded the > threshold (dividing by the total number of replications yields the > probability of interest). The problem is that the number of replications > required to get sufficient precision (3 digits say) is so HUGE that this > takes FOREVER. I have to perform this task so much in my script (searching > over the sample size and repeated for several different distribution > parameters) that it takes too many hours to run. > > I've searched for pre-existing code to do this in R and haven't found > anything. Perhaps I'm missing something. Is anyone aware of an R function to > compute this probability? > > I've tried writing my own code using the fact that R's quantile() function > is a linear combination of 2 order statistics. Basically, I wrote down the > mathematical form of the joint pdf for the 2 order statistics (a function of > the sample size and the distribution parameters) then performed a > pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's > random draws) over the region where the sample quantile exceeds the > threshold. In theory, this should work and it takes about 1000 times fewer > clock cycles to compute than the Brute Force approach. My problem is that > there is a significant discrepancy between the results using Brute Force and > using this more efficient approach that I have coded up. I believe that the > problem is numerical error but it could be some programming bug; regardless, > I have been unable to locate the source of this problem and have spent over > 20 hours trying to identify it this weekend. Please, somebody help!!! > > So, again, my question: is there code in R for quickly evaluating the CDF of > a Sample Quantile given the sample size and the parameters governing the > distribution of each iid point in the sample? > > Grateful for any help, > > Bentley > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CDF of Sample Quantile
I need to calculate the probability that a sample quantile will exceed a threshold given the size of the iid sample and the parameters describing the distribution of each observation (normal, in my case). I can compute the probability with brute force simulation: simulate a size N sample, apply R's quantile() function on it, compare it to the threshold, replicate this MANY times, and count the number of times the sample quantile exceeded the threshold (dividing by the total number of replications yields the probability of interest). The problem is that the number of replications required to get sufficient precision (3 digits say) is so HUGE that this takes FOREVER. I have to perform this task so much in my script (searching over the sample size and repeated for several different distribution parameters) that it takes too many hours to run. I've searched for pre-existing code to do this in R and haven't found anything. Perhaps I'm missing something. Is anyone aware of an R function to compute this probability? I've tried writing my own code using the fact that R's quantile() function is a linear combination of 2 order statistics. Basically, I wrote down the mathematical form of the joint pdf for the 2 order statistics (a function of the sample size and the distribution parameters) then performed a pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's random draws) over the region where the sample quantile exceeds the threshold. In theory, this should work and it takes about 1000 times fewer clock cycles to compute than the Brute Force approach. My problem is that there is a significant discrepancy between the results using Brute Force and using this more efficient approach that I have coded up. I believe that the problem is numerical error but it could be some programming bug; regardless, I have been unable to locate the source of this problem and have spent over 20 hours trying to identify it this weekend. Please, somebody help!!! So, again, my question: is there code in R for quickly evaluating the CDF of a Sample Quantile given the sample size and the parameters governing the distribution of each iid point in the sample? Grateful for any help, Bentley [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.