On Fri, May 18, 2012 at 06:37:03AM -0400, Axel Urbiz wrote: > Would I be able to accomplish the same if x.sample was created from x > instead of x.sorted. The problem is that in my real problem, I have to sort > with respect to many variables and thus keep the sample indexes consistent > across variables. So I need to first take the sample and then sort it > with respect to potentially any variable.
The suggestion set.seed(12345) x <- sample(0:100, 10) x.order <- order(x) x.sorted <- x[x.order] sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size with replacement x.sample <- x.sorted[sample.ind] freq <- tabulate(sample.ind, nbins=length(x)) x.sample.sorted <- rep(x.sorted, times=freq) uses the fact that rep(x.sorted, times=freq) keeps the order in x.sorted. This x.sorted can be a data frame, in which case we should use sample.ind <- sample(1:nrow(x), 5, replace = TRUE) x.sample <- x.sorted[sample.ind, ] freq <- tabulate(sample.ind, nbins=nrow(x)) x.sample.sorted <- x.sorted[rep(1:nrow(x.sorted), times=freq), ] It is possible to have several x.sorted data frames sorted according to different variables. In this case, we generate pairs x.sample and x.sample.sorted which are the same sample once unsorted and once sorted. However, we get different samples for each sorting variable. In order to save CPU time, if the same sample should be sortable by different variables, try the following. Calculate the order of the original data according to each relevant variable and store them as rank vectors determining the order of cases. Then, instead of sorting a data frame representing a sample, determine the order from the corresponding subset of the rank vector. This may be faster and produces the same order. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.