Re: [R] Speed up studentized confidence intervals ?

David Winsemius Wed, 29 Dec 2021 17:23:10 -0800


On 12/29/21 11:08 AM, varin sacha via R-help wrote:

Dear David,
Dear Rui,

Many thanks for your response. It perfectly works for the mean. Now I have a 
problem with my R code for the median. Because I always get 1 (100%) coverage 
probability that is more than very strange. Indeed, considering that an 
interval whose lower limit is the smallest value in the sample and whose upper 
limit is the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, 
implying that the confidence of such an interval is 15/16 rather than 1 (100%), 
I suspect that the confidence interval I use for the median is not correctly 
defined for n=5 observations, and likely contains all observations in the 
sample ? What is wrong with my R code ?

Seems to me that doing a bootstrap within a `replicate` call is notneeded. (Use one or the other as a mechanism for replication.

Here's what I would consider to be a "bootstrap" operation forestimating a 95% CI on the Gamma distributed population you created:


Used a sample size of 10000 rather than 100000


> quantile( replicate( 1000, {median(sample(s,5))}) , .5+c(-0.475,0.475))
     2.5%     97.5%
0.1343071 0.6848352

This is using boot::boot to calculate medians of samples of size 5

> med <- function( data, indices) {
+     d <- data[indices[1:5]] # allows boot to select sample
+     return( median(d))
+ }
> res <- boot(data=s, med, 1000)

> str(res)
List of 11
 $ t0       : num 0.275
 $ t        : num [1:1000, 1] 0.501 0.152 0.222 0.11 0.444 ...
 $ R        : num 1000
 $ data     : num [1:10000] 0.7304 0.4062 0.1901 0.0275 0.2748 ...

$ seed : int [1:626] 10403 431 -118115842 -603122380 -2026881868758139796 1148648893 -1161368223 1814605964 -1456558535 ...

 $ statistic:function (data, indices)
  ..- attr(*, "srcref")= 'srcref' int [1:8] 1 8 4 1 8 1 1 4

.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'<environment: 0x5562fcb434e8>

 $ sim      : chr "ordinary"
 $ call     : language boot(data = s, statistic = med, R = 1000)
 $ stype    : chr "i"
 $ strata   : num [1:10000] 1 1 1 1 1 1 1 1 1 1 ...

$ weights : num [1:10000] 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-041e-04 1e-04 1e-04 ...

 - attr(*, "class")= chr "boot"
 - attr(*, "boot_type")= chr "boot"

> quantile( res$t , .5+c(-0.475,0.475))
     2.5%     97.5%
0.1283309 0.6821874


########################################
library(boot)

s=rgamma(n=100000,shape=2,rate=5)
median(s)

N <- 100
out <- replicate(N, {
a<- sample(s,size=5)
median(a)

dat<-data.frame(a)
med<-function(d,i) {
temp<-d[i,]
median(temp)
}

   boot.out <- boot(data = dat, statistic = med, R = 10000)
   boot.ci(boot.out, type = "bca")$bca[, 4:5]
})

#coverage probability
median(out[1, ] < median(s) & median(s) < out[2, ])
########################################




Le jeudi 23 décembre 2021, 14:10:36 UTC+1, Rui Barradas <ruipbarra...@sapo.pt> 
a écrit :





Hello,

The code is running very slowly because you are recreating the function
in the replicate() loop and because you are creating a data.frame also
in the loop.

And because in the bootstrap statistic function med() you are computing
the variance of yet another loop. This is probably statistically wrong
but like David says, without a problem description it's hard to say.

Also, why compute variances if they are never used?

Here is complete code executing in much less than 2:00 hours. Note that
it passes the vector a directly to med(), not a df with just one column.


library(boot)

set.seed(2021)
s <- sample(178:798, 100000, replace = TRUE)
mean(s)

med <- function(d, i) {
   temp <- d[i]
   f <- mean(temp)
   g <- var(temp)
   c(Mean = f, Var = g)
}

N <- 1000
out <- replicate(N, {
   a <- sample(s, size = 5)
   boot.out <- boot(data = a, statistic = med, R = 10000)
   boot.ci(boot.out, type = "stud")$stud[, 4:5]
})
mean(out[1, ] < mean(s) & mean(s) < out[2, ])
#[1] 0.952



Hope this helps,

Rui Barradas

Às 11:45 de 19/12/21, varin sacha via R-help escreveu:

Dear R-experts,

Here below my R code working but really really slowly ! I need 2 hours with my 
computer to finally get an answer ! Is there a way to improve my R code to 
speed it up ? At least to win 1 hour ;=)

Many thanks

########################################################
library(boot)

s<- sample(178:798, 100000, replace=TRUE)
mean(s)

N <- 1000
out <- replicate(N, {
a<- sample(s,size=5)
mean(a)
dat<-data.frame(a)

med<-function(d,i) {
temp<-d[i,]
f<-mean(temp)
g<-var(replicate(50,mean(sample(temp,replace=T))))
return(c(f,g))

}

     boot.out <- boot(data = dat, statistic = med, R = 10000)
     boot.ci(boot.out, type = "stud")$stud[, 4:5]
})
mean(out[1,] < mean(s) & mean(s) < out[2,])
########################################################

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed up studentized confidence intervals ?

Reply via email to