Hi Jean, Thanks!
Daniel, Yes, you are absolutely right. I want sampled vectors to be as different as possible. I added a little more to the earlier data set. x1 x2 x3 [1,] 1 3.7 2.1 [2,] 2 3.7 5.3 [3,] 3 3.7 6.2 [4,] 4 3.7 8.9 [5,] 5 3.7 4.1 [6,] 1 2.9 2.1 [7,] 2 2.9 5.3 [8,] 3 2.9 6.2 [9,] 4 2.9 8.9 [10,] 5 2.9 4.1 [11,] 1 5.2 2.1 [12,] 2 5.2 5.3 [13,] 3 5.2 6.2 [14,] 4 5.2 8.9 [15,] 5 5.2 4.1 If I sampled row, 1, 6, 11, solving the system of equations will not be possible. So, I am avoiding "similar vectors". Thanks, Mike On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund <djnordl...@frontier.com> wrote: > On 6/22/2015 9:42 AM, C W wrote: > >> Hello R list, >> >> I am have question about sampling unique coordinate values. >> >> Here's how my data looks like >> >> dat <- cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5)) >>> dat >>> >> x1 x2 >> [1,] 1 3.7 >> [2,] 2 3.7 >> [3,] 3 3.7 >> [4,] 4 3.7 >> [5,] 5 3.7 >> [6,] 1 2.9 >> [7,] 2 2.9 >> [8,] 3 2.9 >> [9,] 4 2.9 >> [10,] 5 2.9 >> [11,] 1 5.2 >> [12,] 2 5.2 >> [13,] 3 5.2 >> [14,] 4 5.2 >> [15,] 5 5.2 >> >> >> If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7). >> >> I want to avoid either the first or second coordinate repeated. It leads >> to undefined matrix inversion. >> >> I thought of using sampling(), but not sure about applying it to a data >> frame. >> >> Thanks in advance, >> >> Mike >> >> > I am not sure you gave us enough information to solve your real world > problem. But I have a few comments and a potential solution. > > 1. In your example the unique values in in x1 are completely crossed with > the unique values in x2. > 2. since you don't want duplicates of either number, then the maximum > number of samples that you can take is the minimum number of unique values > in either vector, x1 or x2 (in this case x2 with 3 unique values). > 3. Sample without replace from the smallest set of unique values first. > 4. Sample without replacement from the larger set second. > > > x <- 1:5 > > xx <- c(3.7, 2.9, 5.2) > > s2 <- sample(xx,2, replace=FALSE) > > s1 <- sample(x,2, replace=FALSE) > > samp <- cbind(s1,s2) > > > > samp > s1 s2 > [1,] 5 3.7 > [2,] 1 5.2 > > > > Your actual data is probably larger, and the unique values in each vector > may not be completely crossed, in which case the task is a little harder. > In that case, you could remove values from your data as you sample. This > may not be efficient, but it will work. > > smpl <- function(dat, size){ > mysamp <- numeric(0) > for(i in 1:size) { > s <- dat[sample(nrow(dat),1),] > mysamp <- rbind(mysamp,s, deparse.level=0) > dat <- dat[!(dat[,1]==s[1] | dat[,2]==s[2]),] > } > mysamp > } > > > This is just an example of how you might approach your real world > problem. There is no error checking, and for large samples it may not > scale well. > > > Hope this is helpful, > > Dan > > -- > Daniel Nordlund > Bothell, WA USA > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.