Re: [R] Permutation or Bootstrap to obtain p-value for one sample

francesca casalino Sun, 09 Oct 2011 06:38:14 -0700

Thank you very much to both Ken and Peter for the very helpful
explanations.


Just to understand this better (sorry for repeating but I am also new in
statisticsso please correct me where I am wrong):

Ken' method:
Random sampling of the mean, and then using these means to construct a
distribution of means (the 'null' distribution), and I can then use this
normal distribution and compare the population mean to my mean using, for
example, z-score.
Of note: The initial distributions are not normal, so I thought I needed to
base my calculations on the median, but I can use the mean to construct a
normal distribution.
This would be defined a bootstrap test.

Peter's method:
Random sampling of the mean, and then comparing each sampled mean with the
population mean and see if it is higher or equal to the difference between
my sample and the population mean. This is a permutation test, but to
actually get CI and a p-value I would need bootstrap method.

Did I understand this correctly?

I tried to start with Ken's approach for now, and followed his steps, but:

1) I get a lot of NaN in the sampling distribution, is this normal?
2) I think I am doing again something wrong when I try to find a
p-valueThis is what I did:

nreps=10000
mean.dist=rep(NA,nreps)

for(replication in 1:nreps)
{
my.sample=sample(population$Y, 250, replace=F)
#Peter mentioned that this sampling should be without replacement, so I went
for that---

mean.for.rep=mean(my.sample) #mean for this replication
mean.dist[replication]=mean.for.rep #store the mean
}

hist(mean.dist,main="Null Dist of Means", col="chartreuse")
 #Show the means in a nifty color

mean_dist= mean(mean.dist, na.rm=TRUE)
sd_pop= sd(mean.dist, na.rm=TRUE)

mean_sample= mean(population$Y, na.rm=TRUE)

z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089))
p_value= 2*pnorm(-abs(z_stat))

Is this correct?
THANK YOU SO MUCH FOR ALL YOUR HELP!!

2011/10/9 Ken Hutchison <vicvoncas...@gmail.com>

> Hi Francy,
>   A bootstrap test would likely be sufficient for this problem, but a
> one-sample t-test isn't advisable or necessary in my opinion. If you use a
> t-test multiple times you are making assumptions about the distribution of
> your data; more importantly, your probability of Type 1 error will be
> increased with each test. So, a valid thing to do would be to sample
> (computation for this problem won't be expensive so do alotta reps) and
> compare your mean to the null distribution of means. I.E.
>
> nreps=10000
> mean.dist=rep(NA,nreps)
>
> for(replication in 1:nreps)
> {
> my.sample=sample(population, 2500, replace=T)
> #replace could be false, depends on preference
> mean.for.rep=mean(my.sample) #mean for this replication
> mean.dist[replication]=mean.for.rep #store the mean
> }
>
> hist(mean.dist,main="Null Dist of Means", col="chartreuse")
>  #Show the means in a nifty color
>
> You can then perform various tests given the null distribution, or infer
> from where your sample mean lies within the distribution or something to
> that effect. Also, if the distribution is normal, which is somewhat likely
> since it is a distribution of means: (shapiro.test or require(nortest)
> ad.test will let you know) you should be able to make inference from that
> using parametric methods (once) which will fit the truth a bit better than a
> t.test.
>         Hope that's helpful,
>            Ken Hutchison
>
>
> On Sat, Oct 8, 2011 at 10:04 AM, francy <francy.casal...@gmail.com> wrote:
>
>> Hi,
>>
>> I am having trouble understanding how to approach a simulation:
>>
>> I have a sample of n=250 from a population of N=2,000 individuals, and I
>> would like to use either permutation test or bootstrap to test whether
>> this
>> particular sample is significantly different from the values of any other
>> random samples of the same population. I thought I needed to take random
>> samples (but I am not sure how many simulations I need to do) of n=250
>> from
>> the N=2,000 population and maybe do a one-sample t-test to compare the
>> mean
>> score of all the simulated samples, + the one sample I am trying to prove
>> that is different from any others, to the mean value of the population.
>> But
>> I don't know:
>> (1) whether this one-sample t-test would be the right way to do it, and
>> how
>> to go about doing this in R
>> (2) whether a permutation test or bootstrap methods are more appropriate
>>
>> This is the data frame that I have, which is to be sampled:
>> df<-
>> i.e.
>> x y
>> 1 2
>> 3 4
>> 5 6
>> 7 8
>> . .
>> . .
>> . .
>> 2,000
>>
>> I have this sample from df, and would like to test whether it is has
>> extreme
>> values of y.
>> sample1<-
>> i.e.
>> x y
>> 3 4
>> 7 8
>> . .
>> . .
>> . .
>> 250
>>
>> For now I only have this:
>>
>> R=999 #Number of simulations, but I don't know how many...
>> t.values =numeric(R)     #creates a numeric vector with 999 elements,
>> which
>> will hold the results of each simulation.
>> for (i in 1:R) {
>> sample1 <- df[sample(nrow(df), 250, replace=TRUE),]
>>
>> But I don't know how to continue the loop: do I calculate the mean for
>> each
>> simulation and compare it to the population mean?
>> Any help you could give me would be very appreciated,
>> Thank you.
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Permutation-or-Bootstrap-to-obtain-p-value-for-one-sample-tp3885118p3885118.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Permutation or Bootstrap to obtain p-value for one sample

Reply via email to