Re: [R] Speed up studentized confidence intervals ?
Dear John, Dear Rui, I really thank you a lot for your R code. Best, SV Le jeudi 30 décembre 2021, 05:25:11 UTC+1, Fox, John a écrit : Dear varin sacha, You didn't correctly adapt the code to the median. The outer call to mean() in the last line shouldn't be replaced with median() -- it computes the proportion of intervals that include the population median. As well, you can't rely on the asymptotics of the bootstrap for a nonlinear statistic like the median with an n as small as 5, as your example, properly implemented (and with the code slightly cleaned up), illustrates: > library(boot) > set.seed(123) > s <- rgamma(n=10, shape=2, rate=5) > (m <- median(s)) [1] 0.3364465 > N <- 1000 > n <- 5 > set.seed(321) > out <- replicate(N, { + dat <- data.frame(sample(s, size=n)) + med <- function(d, i) { + median(d[i, ]) + } + boot.out <- boot(data = dat, statistic = med, R = 1) + boot.ci(boot.out, type = "bca")$bca[, 4:5] + }) > #coverage probability > mean(out[1, ] < m & m < out[2, ]) [1] 0.758 You do get the expected coverage, however, for a larger sample, here with n = 100: > N <- 1000 > n <- 100 > set.seed(321) > out <- replicate(N, { + dat <- data.frame(sample(s, size=n)) + med <- function(d, i) { + median(d[i, ]) + } + boot.out <- boot(data = dat, statistic = med, R = 1) + boot.ci(boot.out, type = "bca")$bca[, 4:5] + }) > #coverage probability > mean(out[1, ] < m & m < out[2, ]) [1] 0.952 I hope this helps, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/ On 2021-12-29, 2:09 PM, "R-help on behalf of varin sacha via R-help" wrote: Dear David, Dear Rui, Many thanks for your response. It perfectly works for the mean. Now I have a problem with my R code for the median. Because I always get 1 (100%) coverage probability that is more than very strange. Indeed, considering that an interval whose lower limit is the smallest value in the sample and whose upper limit is the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, implying that the confidence of such an interval is 15/16 rather than 1 (100%), I suspect that the confidence interval I use for the median is not correctly defined for n=5 observations, and likely contains all observations in the sample ? What is wrong with my R code ? library(boot) s=rgamma(n=10,shape=2,rate=5) median(s) N <- 100 out <- replicate(N, { a<- sample(s,size=5) median(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] median(temp) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "bca")$bca[, 4:5] }) #coverage probability median(out[1, ] < median(s) & median(s) < out[2, ]) Le jeudi 23 décembre 2021, 14:10:36 UTC+1, Rui Barradas a écrit : Hello, The code is running very slowly because you are recreating the function in the replicate() loop and because you are creating a data.frame also in the loop. And because in the bootstrap statistic function med() you are computing the variance of yet another loop. This is probably statistically wrong but like David says, without a problem description it's hard to say. Also, why compute variances if they are never used? Here is complete code executing in much less than 2:00 hours. Note that it passes the vector a directly to med(), not a df with just one column. library(boot) set.seed(2021) s <- sample(178:798, 10, replace = TRUE) mean(s) med <- function(d, i) { temp <- d[i] f <- mean(temp) g <- var(temp) c(Mean = f, Var = g) } N <- 1000 out <- replicate(N, { a <- sample(s, size = 5) boot.out <- boot(data = a, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1, ] < mean(s) & mean(s) < out[2, ]) #[1] 0.952 Hope this helps, Rui Barradas Às 11:45 de 19/12/21, varin sacha via R-help escreveu: > Dear R-experts, > > Here below my R code working but really really slowly ! I need 2 hours with my computer to finally get an answer ! Is there a way to improve my R code to speed it up ? At least to win 1 hour ;=) > > Many thanks > > > library(boot) > > s<- sample(178:798, 10, replace=TRUE) > mean(s) > > N <- 1000 > out <- replicate(N, { > a<- sample(s,size=5) > mean(a) > dat<-data.frame(a) > > med<-function(d,i) { > temp<-d[i,] > f<-mean(temp) > g<-var(replicate(50,mean(sample(temp,replace=T > return(c(f,g)) > > } > > boot.out <- boot(data = dat, statistic = med, R = 1) >
Re: [R] Speed up studentized confidence intervals ?
Dear varin sacha, You didn't correctly adapt the code to the median. The outer call to mean() in the last line shouldn't be replaced with median() -- it computes the proportion of intervals that include the population median. As well, you can't rely on the asymptotics of the bootstrap for a nonlinear statistic like the median with an n as small as 5, as your example, properly implemented (and with the code slightly cleaned up), illustrates: > library(boot) > set.seed(123) > s <- rgamma(n=10, shape=2, rate=5) > (m <- median(s)) [1] 0.3364465 > N <- 1000 > n <- 5 > set.seed(321) > out <- replicate(N, { + dat <- data.frame(sample(s, size=n)) + med <- function(d, i) { + median(d[i, ]) + } + boot.out <- boot(data = dat, statistic = med, R = 1) + boot.ci(boot.out, type = "bca")$bca[, 4:5] + }) > #coverage probability > mean(out[1, ] < m & m < out[2, ]) [1] 0.758 You do get the expected coverage, however, for a larger sample, here with n = 100: > N <- 1000 > n <- 100 > set.seed(321) > out <- replicate(N, { + dat <- data.frame(sample(s, size=n)) + med <- function(d, i) { + median(d[i, ]) + } + boot.out <- boot(data = dat, statistic = med, R = 1) + boot.ci(boot.out, type = "bca")$bca[, 4:5] + }) > #coverage probability > mean(out[1, ] < m & m < out[2, ]) [1] 0.952 I hope this helps, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/ On 2021-12-29, 2:09 PM, "R-help on behalf of varin sacha via R-help" wrote: Dear David, Dear Rui, Many thanks for your response. It perfectly works for the mean. Now I have a problem with my R code for the median. Because I always get 1 (100%) coverage probability that is more than very strange. Indeed, considering that an interval whose lower limit is the smallest value in the sample and whose upper limit is the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, implying that the confidence of such an interval is 15/16 rather than 1 (100%), I suspect that the confidence interval I use for the median is not correctly defined for n=5 observations, and likely contains all observations in the sample ? What is wrong with my R code ? library(boot) s=rgamma(n=10,shape=2,rate=5) median(s) N <- 100 out <- replicate(N, { a<- sample(s,size=5) median(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] median(temp) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "bca")$bca[, 4:5] }) #coverage probability median(out[1, ] < median(s) & median(s) < out[2, ]) Le jeudi 23 décembre 2021, 14:10:36 UTC+1, Rui Barradas a écrit : Hello, The code is running very slowly because you are recreating the function in the replicate() loop and because you are creating a data.frame also in the loop. And because in the bootstrap statistic function med() you are computing the variance of yet another loop. This is probably statistically wrong but like David says, without a problem description it's hard to say. Also, why compute variances if they are never used? Here is complete code executing in much less than 2:00 hours. Note that it passes the vector a directly to med(), not a df with just one column. library(boot) set.seed(2021) s <- sample(178:798, 10, replace = TRUE) mean(s) med <- function(d, i) { temp <- d[i] f <- mean(temp) g <- var(temp) c(Mean = f, Var = g) } N <- 1000 out <- replicate(N, { a <- sample(s, size = 5) boot.out <- boot(data = a, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1, ] < mean(s) & mean(s) < out[2, ]) #[1] 0.952 Hope this helps, Rui Barradas Às 11:45 de 19/12/21, varin sacha via R-help escreveu: > Dear R-experts, > > Here below my R code working but really really slowly ! I need 2 hours with my computer to finally get an answer ! Is there a way to improve my R code to speed it up ? At least to win 1 hour ;=) > > Many thanks > > > library(boot) > > s<- sample(178:798, 10, replace=TRUE) > mean(s) > > N <- 1000 > out <- replicate(N, { > a<- sample(s,size=5) > mean(a) > dat<-data.frame(a) > > med<-function(d,i) { > temp<-d[i,] > f<-mean(temp) > g<-var(replicate(50,mean(sample(temp,replace=T > return(c(f,g)) > > } > >boot.out <- boot(data = dat, statistic = med, R = 1) >boot.ci(boot.out, type = "stud")$stud[, 4:5] > }) > mean(out[1,] < mean(s) & mean(s) < out[2,]) >
Re: [R] Speed up studentized confidence intervals ?
On 12/29/21 11:08 AM, varin sacha via R-help wrote: Dear David, Dear Rui, Many thanks for your response. It perfectly works for the mean. Now I have a problem with my R code for the median. Because I always get 1 (100%) coverage probability that is more than very strange. Indeed, considering that an interval whose lower limit is the smallest value in the sample and whose upper limit is the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, implying that the confidence of such an interval is 15/16 rather than 1 (100%), I suspect that the confidence interval I use for the median is not correctly defined for n=5 observations, and likely contains all observations in the sample ? What is wrong with my R code ? Seems to me that doing a bootstrap within a `replicate` call is not needed. (Use one or the other as a mechanism for replication. Here's what I would consider to be a "bootstrap" operation for estimating a 95% CI on the Gamma distributed population you created: Used a sample size of 1 rather than 10 > quantile( replicate( 1000, {median(sample(s,5))}) , .5+c(-0.475,0.475)) 2.5% 97.5% 0.1343071 0.6848352 This is using boot::boot to calculate medians of samples of size 5 > med <- function( data, indices) { + d <- data[indices[1:5]] # allows boot to select sample + return( median(d)) + } > res <- boot(data=s, med, 1000) > str(res) List of 11 $ t0 : num 0.275 $ t : num [1:1000, 1] 0.501 0.152 0.222 0.11 0.444 ... $ R : num 1000 $ data : num [1:1] 0.7304 0.4062 0.1901 0.0275 0.2748 ... $ seed : int [1:626] 10403 431 -118115842 -603122380 -2026881868 758139796 1148648893 -1161368223 1814605964 -1456558535 ... $ statistic:function (data, indices) ..- attr(*, "srcref")= 'srcref' int [1:8] 1 8 4 1 8 1 1 4 .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' $ sim : chr "ordinary" $ call : language boot(data = s, statistic = med, R = 1000) $ stype : chr "i" $ strata : num [1:1] 1 1 1 1 1 1 1 1 1 1 ... $ weights : num [1:1] 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 ... - attr(*, "class")= chr "boot" - attr(*, "boot_type")= chr "boot" > quantile( res$t , .5+c(-0.475,0.475)) 2.5% 97.5% 0.1283309 0.6821874 library(boot) s=rgamma(n=10,shape=2,rate=5) median(s) N <- 100 out <- replicate(N, { a<- sample(s,size=5) median(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] median(temp) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "bca")$bca[, 4:5] }) #coverage probability median(out[1, ] < median(s) & median(s) < out[2, ]) Le jeudi 23 décembre 2021, 14:10:36 UTC+1, Rui Barradas a écrit : Hello, The code is running very slowly because you are recreating the function in the replicate() loop and because you are creating a data.frame also in the loop. And because in the bootstrap statistic function med() you are computing the variance of yet another loop. This is probably statistically wrong but like David says, without a problem description it's hard to say. Also, why compute variances if they are never used? Here is complete code executing in much less than 2:00 hours. Note that it passes the vector a directly to med(), not a df with just one column. library(boot) set.seed(2021) s <- sample(178:798, 10, replace = TRUE) mean(s) med <- function(d, i) { temp <- d[i] f <- mean(temp) g <- var(temp) c(Mean = f, Var = g) } N <- 1000 out <- replicate(N, { a <- sample(s, size = 5) boot.out <- boot(data = a, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1, ] < mean(s) & mean(s) < out[2, ]) #[1] 0.952 Hope this helps, Rui Barradas Às 11:45 de 19/12/21, varin sacha via R-help escreveu: Dear R-experts, Here below my R code working but really really slowly ! I need 2 hours with my computer to finally get an answer ! Is there a way to improve my R code to speed it up ? At least to win 1 hour ;=) Many thanks library(boot) s<- sample(178:798, 10, replace=TRUE) mean(s) N <- 1000 out <- replicate(N, { a<- sample(s,size=5) mean(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] f<-mean(temp) g<-var(replicate(50,mean(sample(temp,replace=T return(c(f,g)) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1,] < mean(s) & mean(s) < out[2,]) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] Speed up studentized confidence intervals ?
Dear David, Dear Rui, Many thanks for your response. It perfectly works for the mean. Now I have a problem with my R code for the median. Because I always get 1 (100%) coverage probability that is more than very strange. Indeed, considering that an interval whose lower limit is the smallest value in the sample and whose upper limit is the largest value has 1/32 + 1/32 = 1/16 probability of non-coverage, implying that the confidence of such an interval is 15/16 rather than 1 (100%), I suspect that the confidence interval I use for the median is not correctly defined for n=5 observations, and likely contains all observations in the sample ? What is wrong with my R code ? library(boot) s=rgamma(n=10,shape=2,rate=5) median(s) N <- 100 out <- replicate(N, { a<- sample(s,size=5) median(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] median(temp) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "bca")$bca[, 4:5] }) #coverage probability median(out[1, ] < median(s) & median(s) < out[2, ]) Le jeudi 23 décembre 2021, 14:10:36 UTC+1, Rui Barradas a écrit : Hello, The code is running very slowly because you are recreating the function in the replicate() loop and because you are creating a data.frame also in the loop. And because in the bootstrap statistic function med() you are computing the variance of yet another loop. This is probably statistically wrong but like David says, without a problem description it's hard to say. Also, why compute variances if they are never used? Here is complete code executing in much less than 2:00 hours. Note that it passes the vector a directly to med(), not a df with just one column. library(boot) set.seed(2021) s <- sample(178:798, 10, replace = TRUE) mean(s) med <- function(d, i) { temp <- d[i] f <- mean(temp) g <- var(temp) c(Mean = f, Var = g) } N <- 1000 out <- replicate(N, { a <- sample(s, size = 5) boot.out <- boot(data = a, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1, ] < mean(s) & mean(s) < out[2, ]) #[1] 0.952 Hope this helps, Rui Barradas Às 11:45 de 19/12/21, varin sacha via R-help escreveu: > Dear R-experts, > > Here below my R code working but really really slowly ! I need 2 hours with > my computer to finally get an answer ! Is there a way to improve my R code to > speed it up ? At least to win 1 hour ;=) > > Many thanks > > > library(boot) > > s<- sample(178:798, 10, replace=TRUE) > mean(s) > > N <- 1000 > out <- replicate(N, { > a<- sample(s,size=5) > mean(a) > dat<-data.frame(a) > > med<-function(d,i) { > temp<-d[i,] > f<-mean(temp) > g<-var(replicate(50,mean(sample(temp,replace=T > return(c(f,g)) > > } > > boot.out <- boot(data = dat, statistic = med, R = 1) > boot.ci(boot.out, type = "stud")$stud[, 4:5] > }) > mean(out[1,] < mean(s) & mean(s) < out[2,]) > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up studentized confidence intervals ?
Hello, The code is running very slowly because you are recreating the function in the replicate() loop and because you are creating a data.frame also in the loop. And because in the bootstrap statistic function med() you are computing the variance of yet another loop. This is probably statistically wrong but like David says, without a problem description it's hard to say. Also, why compute variances if they are never used? Here is complete code executing in much less than 2:00 hours. Note that it passes the vector a directly to med(), not a df with just one column. library(boot) set.seed(2021) s <- sample(178:798, 10, replace = TRUE) mean(s) med <- function(d, i) { temp <- d[i] f <- mean(temp) g <- var(temp) c(Mean = f, Var = g) } N <- 1000 out <- replicate(N, { a <- sample(s, size = 5) boot.out <- boot(data = a, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1, ] < mean(s) & mean(s) < out[2, ]) #[1] 0.952 Hope this helps, Rui Barradas Às 11:45 de 19/12/21, varin sacha via R-help escreveu: Dear R-experts, Here below my R code working but really really slowly ! I need 2 hours with my computer to finally get an answer ! Is there a way to improve my R code to speed it up ? At least to win 1 hour ;=) Many thanks library(boot) s<- sample(178:798, 10, replace=TRUE) mean(s) N <- 1000 out <- replicate(N, { a<- sample(s,size=5) mean(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] f<-mean(temp) g<-var(replicate(50,mean(sample(temp,replace=T return(c(f,g)) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1,] < mean(s) & mean(s) < out[2,]) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up studentized confidence intervals ?
I’m wondering if this is an X-Y problem. (A request to do X when the real problem should be doing Y. ) You haven’t explained the goals in natural or mathematical language which is leaving me to wonder why you are doing either sampling or replication (much less doing both within each iteration in the the function given to boot. ) — David Sent from my iPhone > On Dec 19, 2021, at 3:50 AM, varin sacha via R-help > wrote: > > Dear R-experts, > > Here below my R code working but really really slowly ! I need 2 hours with > my computer to finally get an answer ! Is there a way to improve my R code to > speed it up ? At least to win 1 hour ;=) > > Many thanks > > > library(boot) > > s<- sample(178:798, 10, replace=TRUE) > mean(s) > > N <- 1000 > out <- replicate(N, { > a<- sample(s,size=5) > mean(a) > dat<-data.frame(a) > > med<-function(d,i) { > temp<-d[i,] > f<-mean(temp) > g<-var(replicate(50,mean(sample(temp,replace=T > return(c(f,g)) > > } > > boot.out <- boot(data = dat, statistic = med, R = 1) > boot.ci(boot.out, type = "stud")$stud[, 4:5] > }) > mean(out[1,] < mean(s) & mean(s) < out[2,]) > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Speed up studentized confidence intervals ?
Dear R-experts, Here below my R code working but really really slowly ! I need 2 hours with my computer to finally get an answer ! Is there a way to improve my R code to speed it up ? At least to win 1 hour ;=) Many thanks library(boot) s<- sample(178:798, 10, replace=TRUE) mean(s) N <- 1000 out <- replicate(N, { a<- sample(s,size=5) mean(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] f<-mean(temp) g<-var(replicate(50,mean(sample(temp,replace=T return(c(f,g)) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1,] < mean(s) & mean(s) < out[2,]) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.