Re: [R] Rank-based p-value on large dataset

Sean Davis Fri, 04 Mar 2005 07:20:00 -0800

On 3/3/05 17:40, "Deepayan Sarkar" <[EMAIL PROTECTED]> wrote:

> On Thursday 03 March 2005 16:32, Deepayan Sarkar wrote:
>> On Thursday 03 March 2005 16:22, Sean Davis wrote:
>>> I have a fairly simple problem--I have about 80,000 values (call
>>> them y) that I am using as an empirical distribution and I want to
>>> find the p-value (never mind the multiple testing issues here, for
>>> the time being) of 130,000 points (call them x) from the empirical
>>> distribution. I typically do that (for one-sided test) something
>>> like
>>> 
>>> loop over i in x
>>> p.val[i] = sum(y>x[i])/length(y)
>>> 
>>> and repeat for all i.  However, length(x) is large here as is
>>> length(y), so this process takes quite a long time.  Any
>>> suggestions?
>> 
>> The obvious thing to do would be
>> 
>> p.val = 1 - ecdf(x)(y)
> 
> or rather: p.val = 1 - ecdf(y)(x)
> 

Deepayan,

Thanks (and to Martin, also).  This works wonderfully.  I didn't expect such
a function to exist, but knowing of it will simplify matters significantly
for me.  

Sean

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Rank-based p-value on large dataset

Reply via email to