Matt Oliver wrote:

Dear R Help List,

I have a vector of n and a vector of n-1 and I want to use boot() to
bootstrap the ratio of their respective medians. I want to eventually
use boot.ci() to generate confidence intervals. Because the vectors
are not equal in length, I tried a few things, but have yet to be
successful.

Try 1:


x <- runif(20)

y <- c(runif(19), NA)

median(x)

[1] 0.522284

median(y[1:19])

[1] 0.488046

median(x)/median((y)[1:19])

[1] 1.070153

t <- as.data.frame(cbind(x, y))

ratio <- function(t, i) median(t$x[i])/median((t$y[1:19])[i])

boot(t, ratio, R = 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call: boot(data = t, statistic = ratio, R = 1000)


Bootstrap Statistics : original bias std. error t1* NA NA 0.4603294


I thought this might be successful because median(x)/median((y)[1:19]) gives a result, and not an NA.

I also tried to use a regular list (even though boot() technically
doesn't accept them) so I didn't have to use NA.

Try 2:


x <- runif(20)

y <- runif(19)

median(x)

[1] 0.732906

median(y)

[1] 0.5596225

median(x)/median(y)

[1] 1.309644

t <- list(x = x, y = y)

ratio <- function(t, i) median(t$x[i])/median(t$y[i])

boot(t, ratio, R = 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call: boot(data = t, statistic = ratio, R = 1000)


Bootstrap Statistics : original bias std. error t1* 1.153598 -0.004907764 0.08266257


At first glance this seemed to work, but median(x)/median(y) is not equal to the "original" in the Bootstrap Statistics (which is a bit odd to me, but it may be because boot() doesn't accept this kind of list.)


Is there a way to do this type of bootstrap with the boot() function?

Thanks in advance

Matt Oliver

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


The question is always how to handle NA, obviously you do NOT want to throw away whole observations including NAs from your data.frame, but only throw away NAs for each vector separately.

Your idea specifying "median((t$y[1:19])[i])" does not work, because the statistic is evaluated on subsets of t, hence you get for the subset of observations 1:5: t[1:5,]$y[1:19], and you have even more NAs

Instead, you want to use median()'s na.rm argument as in:

ratio <- function(t, i)
    median(t$x[i], na.rm=TRUE) / median(t$y[i], na.rm=TRUE)

Uwe Ligges

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to