Re: [R] sampling rows with values never sampled before

Daniel Nordlund Mon, 22 Jun 2015 11:22:12 -0700

On 6/22/2015 9:42 AM, C W wrote:

Hello R list,


I am have question about sampling unique coordinate values.

Here's how my data looks like

dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5))
dat

       x1  x2
  [1,]  1 3.7
  [2,]  2 3.7
  [3,]  3 3.7
  [4,]  4 3.7
  [5,]  5 3.7
  [6,]  1 2.9
  [7,]  2 2.9
  [8,]  3 2.9
  [9,]  4 2.9
[10,]  5 2.9
[11,]  1 5.2
[12,]  2 5.2
[13,]  3 5.2
[14,]  4 5.2
[15,]  5 5.2


If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7).

I want to avoid either the first or second coordinate repeated.  It leads
to undefined matrix inversion.

I thought of using sampling(), but not sure about applying it to a data
frame.

Thanks in advance,

Mike

I am not sure you gave us enough information to solve your real worldproblem. But I have a few comments and a potential solution.

1. In your example the unique values in in x1 are completely crossedwith the unique values in x2.2. since you don't want duplicates of either number, then the maximumnumber of samples that you can take is the minimum number of uniquevalues in either vector, x1 or x2 (in this case x2 with 3 unique values).

3. Sample without replace from the smallest set of unique values first.
4. Sample without replacement from the larger set second.

> x <- 1:5
> xx <- c(3.7, 2.9, 5.2)
> s2 <- sample(xx,2, replace=FALSE)
> s1 <- sample(x,2, replace=FALSE)
> samp <- cbind(s1,s2)
>
> samp
     s1  s2
[1,]  5 3.7
[2,]  1 5.2
>

Your actual data is probably larger, and the unique values in eachvector may not be completely crossed, in which case the task is a littleharder. In that case, you could remove values from your data as yousample. This may not be efficient, but it will work.


smpl <- function(dat, size){
  mysamp <- numeric(0)
  for(i in 1:size) {
    s <- dat[sample(nrow(dat),1),]
    mysamp <- rbind(mysamp,s, deparse.level=0)
    dat <- dat[!(dat[,1]==s[1] | dat[,2]==s[2]),]
    }
  mysamp
}

This is just an example of how you might approach your real worldproblem. There is no error checking, and for large samples it may notscale well.



Hope this is helpful,

Dan

--
Daniel Nordlund
Bothell, WA USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sampling rows with values never sampled before

Reply via email to